Essprog

May 7, 2021.


Essprog is an unfinished programming language and compiler I worked on from 2017 to 2020.


My goal was to create a language that would improve the programming experience (more productivity and enjoyment in programming) while addressing certain issues I had with existing languages. It started out as an interpreted scripting language that resembled Python and eventually grew to be a more complex compiled language that aimed to be an improvement over languages like Java and C.


It was a very ambitious project for a solo developer with no compiler/language design experience to take on. Despite this, I am happy with the time I spent on Essprog. Even though the project never completed, I have learned much and improved myself as a result. I also just enjoyed the design process.


I stopped working on this project because further progress required more experience than I had at the time and it was consuming far too much of my limited availability. I also discovered that there is already someone with much more experience and knowledge working towards a similar (but more well-informed and thought-out) goal for a new programming language: Jonathan Blow, the creator of Braid and The Witness, with his working-title language, "Jai".


Looking back on the design of my language now that I have halted the project, I realize that I may have fallen into certain design traps that -- despite my good intentions in creating the language -- could have made it just another boring "new language" that would fail to bring any new, beneficial changes to the programming community. For example, I was much too focused on the syntax of the language, which distracted me from implementing actual features. Also, when I first began, Java was my primary language and I had relatively little experience in other languages. That majorly limited my mindset for designing a new language. Consequently, by browsing the Essprog documentation you will find that the language largely resembles Java with more minimalist syntax.


Since this project, my views on language design have changed considerably. I have more experience with other languages and other projects that have influenced these ideas. So, take these language design notes simply as records of what I tried and not necessarily what I would recommend today.


Design Overview

Essprog is an acronym for "(e)fficient, (s)imple, (s)afe (prog)ramming."


So as one would expect, Essprog is designed with efficiency, simplicity, and safety at its core:

• The compiler frontend is currently written in Java and uses LLVM as the compiler backend for machine code optimization and generation. Essprog code compiles directly to the desired architecture's machine code. Code performance should be comparable with that of C++.
• The syntax and abstractions are recognizable, simple (but not overly simplified), and easy to understand. Code maintainability and readability is also taken into account. Also, Essprog provides integration support for C/C++, meaning existing C/C++ libraries can be used in Essprog code.
• Design choices related to typing, null values, exception handling, multi-threading, and more ensures that there is less possibility for bugs and unintended errors.
• The language is statically-typed and general purpose.


I began building Essprog in late 2017. When I began this project I knew virtually nothing about how compilers worked, so my first year of "building" Essprog was actually me learning about compilers, language design, and how to put it all together and make it work. Because of this, I completely rewrote and redesigned the compiler and rules of Essprog three times. Essprog grew from jottings on scratch-paper to being a toy interpreter to being nearly an "actual" compiler.


Example

declare com.ak.example.Main import com.ak.example.Tests import com.ak.example.old.Main as Main_old int VAL = Main_old:VAL + 1 int(int) operation = @fib(int) string name = "Andrew Klinge" byte[][] map = {{0, 0, 1}, {0, 1, 0}, {0, 1, 1}} // fibonacci series up to n int fib(int n)   int a = 0   int b = 1   while a < n     print(a + " ")     int temp = a     a = b     b += temp   ;   return a ; string? who_am_i(byte code)   switch code     case 0 -> return "Andrew"     case 1 -> return "Jon"     case 2       if VAL > 3         return "Bob"       else         return "Alice"       ;     ;   ;   return null ;


[ Compiler source code here ]


Details

Each code file in Essprog would be given its own "namespace" (container/module that can be referenced by name and contains all data and functions in the file). This provides simple encapsulation and removes issues with name conflicts and naming conventions like in C.

Code is compiled line-by-line, where each statement is separated by a new line, not a semicolon. The idea was that automatic code formatting would remove issues with line wrapping or something, but this is probably debatable. Somewhat of an arbitrary decision I made. Now, I think semicolons are still the way to go in order to support different code styles and readability.

All other whitespace is ignored. I don't like hassling with the mess caused by indentation and whitespace like Python enforces.

The new usage of the semicolon was to close code blocks. See the example above. I think this works well and eliminates the need to deal with brackets. You can also quickly "chain" statements, like "if"/"else" by simply moving the semicolon. Again, entirely a cosmetic change without much actual value.

File scope in code can only contain immutable data (constants, functions, type definitions, etc.). In hindsight, this was a bad decision, since anything variable would have to reside on the stack inside a function.

Variables, functions, and type definitions can be given visibility modifiers to affect how they can be accessed from other scopes. However, to some extent, visibility modifiers just cause headaches when editing code.

Essprog has single-line comments, multi-line comments, and mega-comments. Mega-comments allow for commenting out code that contains multi-line comments. They start with "#IGNORE" and are closed with "#!IGNORE". A better way to have done this would be to simply allow nesting of multi-line comments.

Imported files can be given name aliases to resolve naming conflicts.

Operators are similar to those in Java as well. A new operator is the power operator, which takes a number to some power (ex: 2 ** 3 equals 8). There are no increment/decrement operators (++, --), only assignment operators (including +=, -=). Additionally, assignment to a variable must be its own statement.

The "super" keyword can be used to access objects from the next-higher scope. This mitigates variable shadowing. Not to be confused with Java's "super" and "this" keywords. Can also be stacked to access multiple scopes up, (e.g. "super.super.x").

The type system is similar to that of Java. It included void, bool, byte, short, int, long, float, double, obj, and string as primitive types. Additional "compound" primitive types include function reference types in the form "ret_type(param_type1, paramtype2, ...)" and array types in the form "type[][]" with a "[]" per dimension of the array.

Additionally, all object types are by default not allowed to have null value. To specify that a value of a given type can be null, put a "?" after the type name (e.g. "string?").

Memory management was originally going to be done semi-automatically with reference counting. It seemed to be a nice balance between garbage collection as in Java and manual memory management as in C (gives the user more control over memory than Java, but is not fully managed). However, this remains to be a deal-breaker in high-performance applications, where the programmer needs to be able to have full control over the program behavior.



[Home]