HOME - RSS


O SHIT THAT'S A LOT OF CHANGES

26 November 2019 - 45 minute read


Recently, I started to feel a bit "on the fence" about O. I still have no doubt I'll eventually create a v1 compiler for it, but my uncertainty was more about language design. I can kind of break my idea of "ideal language" into 2 competing sides:

Full OOP, where you have classes with methods, and the only things outside classes are extension methods that are obviously still tied to a class. I know some people don't like full OOP languages, but considering it's been what I've used the most since I started programming, it's where I'm the most comfortable. Being able to create data structures that match what I need and then connect actions to those structures, but also use interfaces and polymorphism to create abstractions across multiple data structures so I can work with lots of different things in the same way. All fun stuff.

Semi-functional, which is sort of a comfortable mid-point between imperative and functional. You're still working with individual actions and statements executed in an order, but applying a functional style of programming to them leads to some very satisfying-looking code. Here, data structures are just data, and the functions sit separate to them, just taking in some set of structures and producing an output based on the inputs.

Where they clash is in program structure. I feel the semi-functional approach works best with a more "structs and references" mindset, which full OOP simply doesn't play nicely with unless you write some horrendous code with all your methods in static classes. Going with full OOP, it's possible to do semi-functional to an extent, but not really in the same way. The functional style happens more at the statement level rather than the program level.

At the time, I was also thinking about error systems. I was originally going to just cop out and use exceptions, but then I started to see how other languages do things. I know V isn't considered that great a language and that the creator could have done much better, but I was still able to take some inspiration from it. I was also recommended this great article (note the site has no SSL certificate) that goes into great detail on different error systems and that talks a lot about contract programming. I already had designs for something that resembled contracts, but they weren't intended to work in the way the article described. I took great inspiration from this article when revamping all sorts.

So, let's go over all the changes and tweaks I'm making to the designs, hopefully bringing me closer to something I can stick with until I have a functioning compiler.


ERROR SYSTEM

I strongly recommend you read the article I linked above, because it describes a lot more than I could here. To summarise it all very briefly: the conclusion of the article was that there's an important distinction to be made between bugs and exceptions. Bugs shouldn't be handle-able at run time because they're bugs. They're not something you can just let happen and somehow deal with in spite of your program having a bug - your program has a bug, so you fix it. These errors are normally called "panics" or "abandonments". Exceptions are used for what they're called, not what you might be accustomed to. When there is some issue that is expected as part of normal functionality, such as a dropped network packet or invalid user input. If your program just killed itself the moment a packet gets dropped, it would be considered a broken program. It's here where you want exceptions. The key contrast to be made is that panics cannot be handled. If a method causes a panic, the whole thread is deconstructed and ended on the spot. In a synchronous program (as O will be to start with), that means the whole program goes. In a heavily asynchronous program, it just means the smaller "worker" dies, and the core of your program - acting more as a process manager than the program itself - would react to the thread dying and start a new one for instance. This is the approach that the article was describing.

There's also another key thing with exceptions to think about. In most languages with exceptions (I'll use C# as an example), you can't know whether or not a program will throw an exception, but because exceptions are everywhere, you'll often want to just assume they do, so this pattern of wrapping large blocks of code in try catch starts emerging and uglifying code. This is only made worse by the overwhelming quantity of exceptions being thrown everywhere.

The suggested solution is to require a method that throws be labelled as such. Whether or not an exception may be thrown is now part of the type of that method. I describe the syntax being used further down. You then couple this with a requirement to deal with exceptions on the spot rather than in large try catch blocks. Here's 2 methods being called. The first doesn't throw, the second is labelled to throw.

doesntthrow() ; // simple call as you'd expect
// exception must be caught and handled on the spot
try maythrow() catch { /* handle */ } ;

This new try catch syntax comprises a primary expression, meaning it still fits into usual this.that syntax without having to make any parenthesis soup. Because exceptions are designed to be called much more infrequently, you're only going to have a few types of them, and in most cases you're only throwing one kind of exception, so you can just stick with a normal exception type. Whilst thinking about other parts of syntax - namely types of block - I also decided to consider a method that breaks a case that should be labelled and dealt with in exactly the same way as exceptions - because in a way it is.


CONTRACTS

O already had designs in place for a kind of contract programming. It looked something like this:

int add_positive_ints( int a , int b ) where ( a >= 0 && b >= 0 )
	return a + b ;

The original idea was that if the conditions weren't met, an error would be passed that would be optionally dealt with. I was thinking about the right things, just in a not-so-right way. The article also talked about postconditions, which was something else to think about. The resulting syntax looks like this:

int add_positive_ints( int a , int b )
        requires ( a >= 0 , b >= 0 )
        ensures ( return >= 0 ) {
	a + b
}

Preconditions are marked with requires and are simply a list of expressions in parentheses, then postconditions are marked with ensures and are provided in the same way. The ensures section has access to the return implicit variable that (as you might expect), gets you the value ultimately returned by the method.

Unlike before where a failed precondition would pass an error and noop, now, a breached contract is fatal.

The huge advantage this brings is that testing and QA becomes massively simpler. Rather than have to add in lots of extra code to try and cope with undesired input and then write test cases presenting strange or extreme input to try and catch bugs, you just add a contract to your method. If any input is passed in that is not considered valid by the contracts you write, your program dies and you know immediately that there's a problem. The article describes how this ended up saving tremendous amounts of time, and you can absolutely expect that.

It's also here where throws are marked. They look like this:

int mymethod() throws {}
int othermethod() throws ( ex1 , ex2 ) {}

When throws is provided with no type arguments, it is assumed it just throws a plain exception and would be caught with a plain catch. When you provide type arguments, you're specifying exactly what things may be thrown, which can be caught as one with a plain catch or can be caught separately by putting type arguments on catch. If your method breaks, you mark it as such with breaks. This however doesn't take type arguments.

There are other things you can do with this small try catch syntax though. If you don't want to do anything complicated on an exception but instead just want to default to a different value, you can use else like int x = try maythrow() else 3 ;. On methods that might throw multiple things, you can also use else as a default for all exception types that aren't explicitly caught. The only thing you're not allowed to do is not handle some kind of exception. If a method is marked to throw, you need to handle it. If you want your exception to propagate, you need to mark the calling method with throws too, catch your exception, then rethrow it in the handler.


BLOCKS

There are 2 kinds of body block in O: block<T> and fnbody<T>. A block is the kind you see in if else statements. It can define its own locals, but has access to everything the parent scope has access to. If you call return or throw then that happens in the outer scope (e.g. a whole method). Blocks notably don't take parameters. The other side is the fnbody, which takes parameters. This is the type of all the bodies of any method you define. If you call return from one of these, that's what the method returns, not the outer scope. These obviously take parameters.

The key issue is ambiguity. As both appear the same in source, it's possible for there to be some confusion without looking at the type. I'm still not completely certain how these could be made clearer though. I'm going to keep them as they are for now though, and if that ends up causing problems then I can deal with it then.

All body blocks that have a single statement have an implicit return/give and an optional semicolon to encourage more functional-style code. Both are independently optional though, so you can have the semicolon if you want but no return, or have the return but no semicolon, or you can have neither, or both. This has the amusing side effect that - when using inline blocks in expressions - the braces can essentially act as parentheses.These block rules allow writing functions in the way the one demonstrating contracts is written.

With these modifications is also coming another key change: braces on body blocks with single statements no longer have optional braces - they're not required. As a compromise, I'm considering putting this kind of pattern into the O style guide:

if ( some condition ) {
	statement ;
	other statement ;
}

if ( some condition )
	{ statement }

If you think this looks absolutely horrendous then to be honest I don't blame you. I'm pretty on-the-fence about it, because it saves space and is structured very similarly to the original pattern with no braces. The bit that's questionable is the indented open brace. However, consider that the alternative looks like this:

if ( some condition ) {
	statement }

And I think - given those 2 options - the former would be preferred for readability's sake. Of course, there's nothing stopping either of these:

if ( some condition ) {
	statement
}

if ( some condition ) {
	statement ;
}

As hinted at earlier, blocks may be placed inline in expressions to provide more complicated values directly. These blocks implicitly expand to their value anywhere that is not asking for something of type block<T>. To have a block evaluate to some value, have it give a value in the same way a fnbody<T> would return a value.

int y = 3 ;
int x = {
	if ( y >= 3 ) { 3 }
	else { 4 }
} ; // x == 3

ACCESS

O's member access originally looked like this:

public // publically readable, publically writeable
restricted // publically readable, privately writeable (default)
private // privately readable, privately writeable
public const // publically readable, immutable
private const // privately readable, immutable

This seemed ok, but with influences from V and a little from languages like Rust, the idea of enforcing immutability as much as possible makes a lot of sense. Member access will now look more like this (which is - until further notice - just ripped from V):

// privately readable, immutable (default)
mut // privately readable, privately writeable
pub // publically readable, immutable
pub mut // publically readable, privately writeable
global // publically readable, publically writeable

You'll also notice I'm liking the abbreviated look. Here's a somewhat-esoteric but surprisingly informative alternative:

r--- // privately readable, immutable (default)
rw-- // privately readable, privately writeable
r-r- // publically readable, immutable
rwr- // publically readable, privately writeable
rwrw // publically readable, publically writeable

These can be abbreviated to r, rw, rr, rwr, and rwrw respectively. They're not tremendously readable, but I think they look cool. Again, these are rather esoteric, but they have that kind of effect where it's a joke, but still feels like a good idea. The only real problem I have with the other option is that pub mut seems ambiguous and could fairly easily be misread as to mean it's also publicly writeable. For now though, it seems like a workable solution.


PASS BY REFERENCE

In OOP languages such as C#, all non-core types are passed by reference and core types are passed by value. This is normally expected, but it's inconsistent. Instead, O will pass everything by reference unless marked with dup. Note that dup only applies to types that have the relevant operator defined - which by default only includes the core types.

This dup operator system also means that the idea of the flat class is no longer really necessary. For a class to be usable in the way a flat class would, you simply need to write a definition of the dup operator for it and you can then pass it by value.

Both mut and dup have to be provided on both the method definition and method invocations to make it clear to the reader what is happening when a method is called, so he/she can know immediately that a given parameter may be modified for instance. This is the same principle behind having to call throws methods with try.


TYPES

These changes to method definitions and the introduction of contracts means that some changes happen to types. Now, method parameter names are part of the type's value, but not signature. This means you can't define 2 methods that are identical apart from parameter names, but you can use parameter names in type aliases. All the contracts are part of the value, but again not the signature. This means you can't define multiple versions of the same method to cover different contracts. The point of contracts is not to differentiate methods but instead to enforce that they are used correctly.

Whilst working on a simple exercise in implementing an _enum<T> interface and related foreach, I found that I needed to use the same method type in 2 places not accessible with the same type argument. Type aliases made this an easy thing to plan in to get around all kinds of possible issues where the same time is being used in multiple places and you want to make sure they're always the same - especially when it is a complicated type. Here's my foreach implementation showing how the type alias was helpful:

pub iface _enum<T> {
	pub mut T current ;
	pub bool shift() ;
}

// the alias stores a complicated method type
alias Tp<I , O> = O ( I each ) breaks ;

class foreach_<I , O> implements ( _enum<O> ) {
	_enum<I> source ;
	// aliased type used here
	mut Tp<I , O> pred ;
	pub mut O current ;
	pub bool shift() {
		if ( source.shift() ) {
			current = try pred( source.current )
				onbreak { return false } ;
			return true ;
		}
		return false ;
	}
	pub this( this.source , this.pred ) ;
}

// aliased type used here for the parameter
pub _enum<I>.foreach<I , O>( body Tp<I , O> pred.body ) {
	// explicitly enumerate every item and return an enumerator for the
	// result to produce expected behavour
	// for lazy evaluation, a function like 'map' would be used
	new foreach_<I , O>( this , pred ).enumerate().get_enumerator()
}

DECLARATIONS

Before these changes, these were the things you could define in O:

// class members:
int x ; // attribute
int thing() {} // method
expose member.a.b.c as d ; // expose
op + ( int rhs ) {} // operator
this -> string {} // cast
enum myenum {} // enum
this() {} // constructor
destroy() {} // destructor

// class extensions and external declarations:
int thing( this myclass c ) {} // extension method
enum myenum {} // enum

Some things have changed. First, extension methods are now written somewhat more intuitively:

int myclass.thing() {} // 'this' is available

Next, expose definitions are being dropped in favour of delegates, which are basically just lazy attributes:

del<int> myint = { ... } ; // provide a block<T>

Note that it's possible to create a block<T> variable, but in order to evaluate the block it must be invoked:

block<int> myint = { 3 } ; // implicit 'give'
int x = myint() ; // must be invoked

Operators are defined slightly differently and can be extensions:

op(+)( int rhs ) {}
myclass.op(+)( int rhs ) {}
op(dup)() {}

Invoke and index operators are being dropped as they are unnecessary. Read and write operators are now per-member rather than per-type and so no longer count as operators. These may look somewhat familiar to C# programmers. The key difference though is that - unlike C# - there aren't 2 types of attribute with arbitrary rules attached to them. Both operators may have contracts assigned to them. A fairly full attribute definition may look like this:

int x
	requires ( x >= 0 )
	get ensures ( return >= 0 ) { x }
	set( int a ) requires ( a >= 0 ) { x = a }
	= 3 ;

Casts have the same syntax, but may also be extensions. Also, I'm introducing 'implicit' casts. The idea is that if you have some method mymethod( B param ) that takes a parameter of type B but you have something of type A that you want to provide to the method, you can write an implicit cast, allowing you to do the following:

int mymethod( B param ) {}
A a ;
mymethod( a ) ; // type error
A -> B { ... } // cast A to B
mymethod( a -> B ) ; // fine because we have a cast
mymethod( a ) ; // type error
implicit A -> B { ... } // implicit cast A to B
mymethod( a ) ; // fine because the cast is implicit

Note that when checking parameter types, a maximum of 1 implicit cast may be used. This means if you have a mymethod( C param ) and you have implicit A -> B {} and implicit B -> C {} then you cannot call mymethod( A ). This is to avoid issues with ambiguous type coercion and avoid needing to write a path-finding algorithm to determine implicit cast routes between types.

Lastly, aliases are being added, which are used solely for types. They meet the need outlined in my foreach example. They are very simple:

alias myint = int ;

Aliases are expanded at compile time. Something accessing a member with an alias type does not need access to the alias, as the accessor sees the member as having the type the alias represents.


COMPLEX LITERALS

Complex literals can be used to represent classes, objects, methods, and arrays directly in expressions without having to declare them distinctly. This can be helpful in a variety of situations, such as providing them as method parameters or creating temporary types in situ if they are only needed once. These are all mostly staying the same, but with some changes in a few places.

Firstly, class literals are being "dropped" in a way. Their original purpose was to provide single-use types that may be used when being passed through methods that don't require specific types, such as the output of a select or map-style function. Classes can no longer be named, but can still contain everything any other class could.

var people = new a<person>() ;
var applicants = people.select {
	// the literal is 'class { ... }'
	new class { pub string encodedname ; pub bool canapply ; }() {
		encodedname = v"{each.lname.upper()} {each.fname.upper()}" ,
		canapply = each.age >= 18
	}
}.where( each.canapply ).select { each.encodedname } ;
assert( applicants is _enum<string> ) ;

These class literals can be used anywhere a type can. Note though that because they are anonymous, you can present yourself with serious challenges if you start returning things with these types and taking them away from where the class is made.

Method literals are similar in that they can no longer be named. Instead, they look like this:

fn int ( int a , int b ) { a + b }

The type is optional, in which case type inference is used.

Notice how the literal fairly closely matches the appearance of a type representing a method, which for the literal above would be int ( int a , int b ).

These changes will make grammar and so also parsing simpler, hopefully resulting in faster parse times. The main benefit though will be in more readable code.


FILE STRUCTURE

I'm also changing how files are structured. I'm keeping the system of brace-less class definitions, but these will only apply to files where the class is the only definition. This means that for many classes, you save a whole indent level. However, braces and an indent level will be needed for classes where there are other things present in the file such as extension methods or other classes, and also required for sub-classes. The idea however is that most files will have a single class in them and so won't need the braces or indent level.


ATTRIBUTES

When you think about the properties like requires and throws on things, you can describe them in a similar way to attributes. Both act like metadata on a certain type or member. The difference being that the properties directly add behaviour, where attributes are more acted upon externally when the type is reflected upon. This difference aside, it makes sense that both be applied to a type in a similar manner. Here is an example attribute creation and reading in C# which we will use as a means of comparison to the design in O:

class MyCustomAttribute : Attribute
{
	string Str;
	public MyCustomAttribute(string s)
	{
		Str = s;
	}
}

class Program
{
	[MyCustom("hello")]
	public static int x;

	static void Main()
	{
		string importantValue = ((MyCustomAttribute)typeof(Program)
			.GetMembers().Where(m => m.Name == nameof(x))
			.FirstOrDefault()
			.GetCustomAttributes(typeof(MyCustomAttribute), true)
			.FirstOrDefault()).Str;
	}
}

Ok, first off, this is absolutely horrendous, but we're not here to rant about C#. Things to notice: the class name for the attribute has Attribute on the end despite it not being in the attribute itself, the attribute is part of the type of Program, not the type of x, and the attribute is applied by calling the constructor in brackets as a prefix to the variable declaration.

As I haven't properly planned out the reflection libraries for O yet, consider how the attribute is being retrieved as dummy code. What we're interested in is how the attribute is defined and how it is applied:

class myattr implements ( attribute ) {
	pub string str ;
	pub this( this.str ) ;
}

epoint program {
	pub int x myattr( "hello" ) ;

	pub this() {
		string val = typeof(x).get_attrs( myattr )
			.try first() catch { "" }.str ;
	}
}

Looks pretty familiar right? It's applied in exactly the same way as throws, requires, breaks et al, because for the most part they all have the same function: metadata on a type used to add or modify behaviour. I like this consistency.

And - as all these attributes are essentially just types or constructor calls, they needn't be in any specific order, so I can build the grammar rules to allow mixing of custom and builtin attributes so you can put them in whatever order seems best.


PIPELINE

After lots of thought, I'll also be dropping the pipeline. The main reason for it is that it tried to do 2 things at once, both of which could already easily be done other ways. Firstly, a method taking a piped parameter is just a roundabout extension method. Secondly, the approach of having the right-hand side of a pipe just be an expression in terms of pipe representing the left-hand side is fixed by - well - using a variable.

Now that the pipeline is gone, it means that my lovely ultimate fizz buzz splat example is no longer valid code, so here's a new one:

import static std.io.term ;

epoint ufbs ;

int getnum( ulong num , ulong f ) requires ( f < 10 ) {
	( num % f == 0 ? 1 # 0 ) + num.digits.where( each == f ).len
}

this() {
	1 .. rdln<ulong>( "Enter a number: " , "No, a number: " ).foreach {
		var str = getnum( each , 3 ) *~ "Fizz"  ~
		          getnum( each , 5 ) *~ "Buzz"  ~
		          getnum( each , 7 ) *~ "Splat" ;
		prln( str.len ? str # each ) ;
	}
}

Not quite as beautiful as the previous one with the pipeline that was a single statement with no variables, but still looks very nice I think. Technically you could argue that the one with the pipeline had a greater use of variables if you were to expand out each pipe section into a variable assignment as is what would be happening under the hood in the old system.


ARRAYS AND INDEXING

One thing I stopped and thought about for a bit were arrays. They seem like a rather strange construct, like an odd syntactical exception. Basically all types in most languages follow the form identifier<args...>... ok, I say "basically all", I really mean OOP languages with generics, and even some of them have differences, like D's identifier!(args...), but that's more just detail where what I'm talking about is the general form of a type. However, if you want an array, you write identifier[]. No other type is defined like this. There's the odd exception like D's associative arrays (dictionary/hash table etc.) which are defined as valuetype[keytype], but D's a special outlier it seems. My point is that this array type syntax is unusual and breaks a pattern, so why not just drop it?

Of course we need a replacement though, so welcome the a<T> type (or arr<T>, not sure yet). This does exactly the same thing as a normal array because it is a normal array, just with the type syntax aligned with everything else. You can then have a collection of types on a similar principle for things like this, like dict<K , V> for a dictionary or associative array.

Indexing needs to be thrown out too. It's another similarly unusual syntactic outlier. Instead, let's replace it with a member method at() or i() or similar. The only requirement is that it suggests indexing and is short. So let's do a little comparison:

int[][] table = [ [ 1 , 2 , 3 ] ,
                  [ 4 , 5 , 6 ] ,
                  [ 7 , 8 , 9 ] ] ;
assert( table[1][2] == 6 ) ;

string[int][string] complicated = [
	"a" : [ 1 : "one"   , 2 : "two"   , 3 : "three" ] ,
	"b" : [ 4 : "four"  , 5 : "five"  , 6 : "six"   ] ,
	"c" : [ 7 : "seven" , 8 : "eight" , 9 : "nine"  ]
] ;
assert( complicated["b"][5] == "five" ) ;

Thing to note immediately: The type says [int][string], but we index with [string][int]. I cannot tell you how often I've been caught out by this. Now let's see how this stacks up to more "normal" types:

a<a<int>> table = [ [ 1 , 2 , 3 ] ,
                    [ 4 , 5 , 6 ] ,
                    [ 7 , 8 , 9 ] ] ;
assert( table.at( 1 ).at( 2 ) == 6 ) ;

dict<string , dict<int , string>> complicated = [
	"a" : [ 1 : "one"   , 2 : "two"   , 3 : "three" ] ,
	"b" : [ 4 : "four"  , 5 : "five"  , 6 : "six"   ] ,
	"c" : [ 7 : "seven" , 8 : "eight" , 9 : "nine"  ]
] ;
assert( complicated.at( "b" ).at( 5 ) == "five" ) ;

Array and dictionary literals are the same, but how we declare the type and how we index into it have changed. Notice how the type of table very obviously says "this is an array of arrays of ints" and how our at() method for indexing is no more complicated than the old [] syntax. The biggest improvement though is with the associative array. Where before the types in the declaration are the reverse of the indexing, here we can see if we index with a string we will get a dict<int , string>, which we can index with an int. The types in the declaration and the indexing align, meaning it's one fewer little gotcha to deal with. The "compromise" is that these types take up a little bit of extra room in source. Here are some other suggestion methods for doing the other things we need for indexing:

array.from( 2 to 8 ) ; // [ 2 , 3 , 4 , 5 , 6 , 7 , 8 ]
array.at( 1 , 4 , 8 .. 11 , 2 ) ; // [ 1 , 4 , 8 , 9 , 10 , 11 , 2 ]

CONCLUSION

This is easily the longest post I've written for this site, and it's been something of an eye-opener to the fact that - as much as you might think you're on top of the design of something, when it's something as large as a programming language, it's very easy to get swept up in the countless details and find yourself changing large parts of the language. At this point though, I think I'm starting to get more of a grip on it, and hopefully this design will be able to get me through semantic analysis to the point where we're past the point of syntax and onto the real language behaviours.

You can see all the source code for O on Sourcehut, and if you find waffling about this stuff interesting then I do plenty of it on Mastodon.


CATEGORIES

O-lang - Programming


HOME - RSS

Copyright Oliver Ayre 2019. Site licensed under the GNU Affero General Public Licence version 3 (AGPLv3).