28 August 2019 - 7 minute read
There's been something of a blank period here. I haven't been away; I just
haven't really had any ideas of things to put here. Over the last few weeks,
I've been making some big progress towards O - my programming language
brain-child. I've redesigned the O website which you can oodle at here, and also
formalised the parser spec and put it up on the website. This is a really big
step forward for the project as having this formal spec for the syntax written
out neatly not only conveys much more information to everyone, but also means I
can now really start working on getting a complete lexer and parser pair done.


Docs-wise, we're looking really good. We have lexer spec and parser spec
more-or-less done. I'm going to remove the "output format" section for the lexer
spec for reasons I'll go over further down, and the paragraphs at the start of
these pages will be modified for the same reasons. However, the bulk of the
information on both pages is done. Code-wise, there is plenty of work to do. I
was experimenting with git branches and had a branch on each project for O and
the MiniO precursor language that I was toying with but have since abandoned
somewhat. This has left some of the projects with a few holes in them, but all
the business logic is still there - it just all needs stitching back together.
This can all get cleaned up and refactored in the coming weeks.


Along with all this, I'm going to reorganise things. Originally, each compiler
component would be its own program entirely, reading stdin, outputting through
stdout, and logging to stderr. These programs would then be stitched together
with a wrapper program that would manage everything. The advantage with all this
is that each compiler component can be completely broken off from the rest and
used in isolation very easily, which makes it very easy to do things like
incremental compilation and parallel processing of files which are all great
things to be able to do. The downside is that I need to come up with a binary
format for every single compiler stage, which adds needless extra work as not
only do I need to convert my nice data structures to crude binary data, but I
also need to start each program with a section that'll read this binary and turn
it back into a nice data structure.

The solution I'm going with is a lot more ordinary. I have 2 new projects in the
O GitLab group: 'lib' and 'o'. 'lib' is going to be a generic libraries project
that'll include all the class definitions and arrays of bulk data and so on.
This will be used by most projects to prevent excessive code repetition between
projects that the current setup suffers from. 'o' is going to be the wrapper
program around all the smaller systems, each of which will now be written in a
library style rather than program style. The wrapper program then just
orchestrates them just as the original wrapper program concept would, only all
in one process rather than kicking off lots of others. Some things like parallel
compilation will be more challenging as it'll require setting up threading and
maintaining that (where in the previous case it would be as simple as program &
for instance), but overall it'll be much more structured.

My first idea was to approach this with git submodules seeing as this was
designed specifically for the task of having one git repository depend on
another. After looking into it and talking to a few other people about it
though, the easier method is to just use symlinks and describe the required
other projects in the README.

### NEXT STEPS ###

The next steps aren't tremendously interesting - it's the steps after them that
get really interesting. First order of business is going to be to get the main
'o' project going, plan out a few commands for the syntax (which at the moment
I'm thinking of basing somewhat on the syntax of pacman, Arch's package
manager), and skeleton out sections for lexing and parsing files. I can then
sort out my 'lib' project and then tie everything together. Once all this is
done, I should be able to use my wrapper program to lex and parse individual
files and - depending on how productive I can be - possibly also automatically
find a project file and use its path to lex and parse a whole project tree.

Some interesting things that can be done after that include: The next compiler stage will be the semantic processor. This is what will take
the AST and be the first place where the data no longer directly corresponds to
source code. It will convert the code into a more abstract representation of the
actual meaning rather than thinking about syntax and code. For instance, it'll
take the AST section for say a class definition and break it down into an object
that will have an array of type arguments, array of implemented interfaces, and
then an array of objects that describe attributes and methods and so on. The
output of that will then go on to following stages like the resolver which will
pick out all the namespaces and resolve every type, method, and variable to its
complete "path" through namespaces and so on.

By breaking the compiler chain up into so many small components, it makes the
mammoth task of writing a compiler feel much more feasible as each "next step"
isn't tremendously complicated. Even just at this stage, this has been by far
the most rewarding project I've worked on and I've learned a tremendous amount
about all sorts of things, from parsing algorithms to the structure of ELF
binaries, to the lower-level workings of OOP. Here's to the rest of the project
being just as great and to an eventual v1!


Olang - Programming
Copyright Oliver Ayre 2019. Site licensed under the GNU Affero General Public
Licence version 3 (AGPLv3).