28 August 2019 - 7 minute read

There's been something of a blank period here. I haven't been away; I just haven't really had any ideas of things to put here. Over the last few weeks, I've been making some big progress towards O - my programming language brain-child. I've redesigned the O website which you can oodle at here, and also formalised the parser spec and put it up on the website. This is a really big step forward for the project as having this formal spec for the syntax written out neatly not only conveys much more information to everyone, but also means I can now really start working on getting a complete lexer and parser pair done.


Docs-wise, we're looking really good. We have lexer spec and parser spec more-or-less done. I'm going to remove the "output format" section for the lexer spec for reasons I'll go over further down, and the paragraphs at the start of these pages will be modified for the same reasons. However, the bulk of the information on both pages is done. Code-wise, there is plenty of work to do. I was experimenting with git branches and had a branch on each project for O and the MiniO precursor language that I was toying with but have since abandoned somewhat. This has left some of the projects with a few holes in them, but all the business logic is still there - it just all needs stitching back together. This can all get cleaned up and refactored in the coming weeks.


Along with all this, I'm going to reorganise things. Originally, each compiler component would be its own program entirely, reading stdin, outputting through stdout, and logging to stderr. These programs would then be stitched together with a wrapper program that would manage everything. The advantage with all this is that each compiler component can be completely broken off from the rest and used in isolation very easily, which makes it very easy to do things like incremental compilation and parallel processing of files which are all great things to be able to do. The downside is that I need to come up with a binary format for every single compiler stage, which adds needless extra work as not only do I need to convert my nice data structures to crude binary data, but I also need to start each program with a section that'll read this binary and turn it back into a nice data structure.

The solution I'm going with is a lot more ordinary. I have 2 new projects in the O GitLab group: 'lib' and 'o'. 'lib' is going to be a generic libraries project that'll include all the class definitions and arrays of bulk data and so on. This will be used by most projects to prevent excessive code repetition between projects that the current setup suffers from. 'o' is going to be the wrapper program around all the smaller systems, each of which will now be written in a library style rather than program style. The wrapper program then just orchestrates them just as the original wrapper program concept would, only all in one process rather than kicking off lots of others. Some things like parallel compilation will be more challenging as it'll require setting up threading and maintaining that (where in the previous case it would be as simple as program & for instance), but overall it'll be much more structured.

My first idea was to approach this with git submodules seeing as this was designed specifically for the task of having one git repository depend on another. After looking into it and talking to a few other people about it though, the easier method is to just use symlinks and describe the required other projects in the README.


The next steps aren't tremendously interesting - it's the steps after them that get really interesting. First order of business is going to be to get the main 'o' project going, plan out a few commands for the syntax (which at the moment I'm thinking of basing somewhat on the syntax of pacman, Arch's package manager), and skeleton out sections for lexing and parsing files. I can then sort out my 'lib' project and then tie everything together. Once all this is done, I should be able to use my wrapper program to lex and parse individual files and - depending on how productive I can be - possibly also automatically find a project file and use its path to lex and parse a whole project tree.

Some interesting things that can be done after that include:

The next compiler stage will be the semantic processor. This is what will take the AST and be the first place where the data no longer directly corresponds to source code. It will convert the code into a more abstract representation of the actual meaning rather than thinking about syntax and code. For instance, it'll take the AST section for say a class definition and break it down into an object that will have an array of type arguments, array of implemented interfaces, and then an array of objects that describe attributes and methods and so on. The output of that will then go on to following stages like the resolver which will pick out all the namespaces and resolve every type, method, and variable to its complete "path" through namespaces and so on.

By breaking the compiler chain up into so many small components, it makes the mammoth task of writing a compiler feel much more feasible as each "next step" isn't tremendously complicated. Even just at this stage, this has been by far the most rewarding project I've worked on and I've learned a tremendous amount about all sorts of things, from parsing algorithms to the structure of ELF binaries, to the lower-level workings of OOP. Here's to the rest of the project being just as great and to an eventual v1!


O-lang - Programming


Copyright Oliver Ayre 2019. Site licensed under the GNU Affero General Public Licence version 3 (AGPLv3).