Last Change: 2016 Jul 01
UNDER CONSTRUCTIONThis page contains both an informal explanation and a formal specification of
the Zimbu programming language.
More information on the Zimbu website.
There is another specification document for Zimbu Templates (ZUT).
The notation used to specify the exact syntax can be found near the end.
A Zimbu file is always UTF-8 encoded.
The file name must end in ".zu".
For an imported file the file name up to ".zu" must match the toplevel item in the file (class, module, enum, etc.) exactly. Case matters.
Before a file is parsed the following operations are performed:
A Zimbu file usually starts with comments. You should describe what is in the file, what the code is intended to do. You can also add a copyright statement and license. The Apache license is recommended.
IMPORT statements come next. They must appear before any other items, except comments. It is recommended to group imports by directory and sort them alphabetically.
The main program file must define the Main() method somewhere, in the file scope. Other items can come before and after it, in any order. There is no requirement to define an item before using it in the file scope.
The Main program file will look like this:
# Description of what this program does, arguments, etc. # Copyright, license. IMPORT SomeClass.zu # Methods and other things go here. FUNC Main() int # your code goes here RETURN exitVal } # More methods and other things go here.
The Main() method is the entry point to the program. It will be called after initializations are done, see the Startup Sequence section.
Command line arguments are not passed to Main(), they can be obtained with the ARG module.
The Main method returns an int, which is the exit code for the program. The convention is that zero is returned for success and a positive number for failure. Alternatively the program ends with an EXIT statement or when an exception is thrown that is not caught.
MAINFILE is the starting point for a Zimbu program file. IMPORTFILE is the starting point for an imported file.
MAINFILE -> skip import* file-item* main file-item* ; IMPORTFILE -> skip import* common-item ; file-item -> ( common-item | var-def | method-def ) ; common-item -> ( module-def | class-def | interface-def | piece-def | enum-def | bits-def ) main -> "FUNC" sep "Main()" sep "int" sep-with-eol block-item+ block-end ;
An IMPORT specifies a file to include.
The imported file must define exacly one item, usually a class or module. The name of this item must match the file name up to the ".zu". Thus when defining a class "FooBar" the file name must be "FooBar.zu". This way, at every place where the file is imported, you know exactly what symbol is going to be defined by that import.
Zimbu files can import each other, the compiler takes care of cyclic dependencies.
There is no need to import builtin modules, such as IO, ARG and E. The compiler will take care of that automatically.
The name of the file to import can usually be give as-is. If the file contains special characters, such as a space, put it inside double quotes, similar to a string literal. Always use a slash as a path separator, not a backslash.
If the symbol that the IMPORT defines conflicts with another symbol, the AS part can be used to give the imported symbol another name in this file only. The name after "AS" must follow the naming rules of what is being imported: it must start with an upper case letter and have a lower case letter somewhere.
This example uses the name OldParser for the Parser defined in the second imported file. Thus both Parser and OldParser can be used.
IMPORT Parser.zu IMPORT old/Parser.zu AS OldParser
An IMPORT can specify a plugin to use. A Zimbu plugin converts the specified file and turns it into Zimbu code. The Zimbu code is then what gets imported.
When using a plugin the OPTIONS part can be used to pass command line arguments to the plugin. Example:
IMPORT.WavToBytes poing.wav OPTIONS "--multiline --name=poingSound"
NOTE: Custom plugins have not been implemented yet. There will be a way to configure what executable is used for each plugin name.
At the start of the filename $PLUGIN can be used. This refers to the "plugin" directory. This can be used by plugins to find files imported in the generated Zimbu file, for example:
IMPORT $PLUGIN/proto/Message.zu
Imports using a plugin should be put before other imports, so that they are easy to spot.
A builtin plugin is PROTO, it generates Zimbu code from a .proto file. This is used in the Zimbu compiler:
IMPORT.PROTO zui.proto
How the PROTO plugin works is specified elsewhere. TODO: add a link
Another builtin plugin is ZUT, which stands for Zimbu Templates and can be used to create active web pages. This uses CSS and HTML, mixed with Zimbu code to create them dynamically and controllers to make them interactive. More information can be found in the separate zut.html document.
Another builtin plugin is ZWT, which stands for Zimbu Web Toolkit and can be used to built a GUI. This is specified </p> <a href="https://sites.google.com/site/zimbuweb/documentation/zwt-a-javascript-ui">on the ZWT page</a>.
The CHEADER plugin can be used to directly include a C header file in the program. See the native code section.
Inside a test file this imports another test file. See the Running tests section.
import -> "IMPORT" plugin? sep ( file-name | """ file-name """ | "<" file-name ">" ) ( import-as? import-options? | import-options? import-as? ) sep-with-eol ; plugin -> "." ( var-name | "PROTO" | "ZWT" | "ZUT" | "CHEADER" ) ; import-as -> sep "AS" sep var-name ; import-options -> sep "OPTIONS" sep \(g ;
TODO
The module name must start with an upper case character and must be followed by at least one lower case letter.
module-def -> "MODULE" sep group-name sep-with-eol block-item* block-end ;
The Zimbu CLASS is much like C++ and Java, but not exactly the same:
The class name must start with an upper case letter and must be followed by at least one lower case letter. The second character cannot be an underscore. Builtin types start with a lower case letter, that nicely creates two namespaces and allows for adding more builtin types later.
Single inheritance is done with EXTENDS. The child class inherits all members and methods from the parent class. A child object can be used where a parent object is expected.
AUGMENTS is used to add methods to the parent class. It does not result in a separate object type, the class and its parent have identical objects.
GROWS is like AUGMENTS but also allows for adding members to the parent class. These members are not visible in the parent class, but they do exist there. This matters for when an object is created.
CLASS Parent int $nr FUNC $getNr() int RETURN $nr } } CLASS Child EXTENDS Parent string $name FUNC $getName() string RETURN $name } } CLASS MoreMethods AUGMENTS Child FUNC $getNrAndName() string RETURN $nr .. ": " .. $name } } CLASS BiggerParent GROWS Child float $fraction FUNC $getResult() string RETURN ($nr * $fraction) .. ": " .. $name } } Parent p = NEW() p.nr = 5 IO.print(p.getNr()) # "5" Child c = NEW() c.nr = 5 # member inherited from Parent c.name = "foo" IO.print(c.getNr()) # function inherited from Parent: "5" IO.print(c.getName()) # "foo" IO.print(c ISA Parent) # "TRUE": c is a child of Parent p = c MoreMethods mm = c IO.print(mm.getNrAndName()) # "5: foo" BiggerParent bp = c bp.fraction = 0.5 IO.print(bp.getResult()) # "2.5: bar"
GROWS can do what AUGMENTS does, but AUGMENTS clearly states that no members are added to the parent. Always use AUGMENTS when only methods are added, so that one does not need to inspect the class to see there are no members.
When a class implements an interface, it can be used as that interface. A good example is the I.Iterator interface:
CLASS ReverseListIterator<Titem> IMPLEMENTS I.Iterator<Titem> list<Titem> $list int $idx NEW(list<Titem> list) $list = list $idx = list.Size() } FUNC $hasNext() bool RETURN $idx > 0 } FUNC $next() int IF $idx == 0 THROW E.OutOfRange.NEW("No more items") } --$idx RETURN $list[$idx] } FUNC $peekSupported() bool @public RETURN TRUE } FUNC $peek() Titem @public IF $idx == 0 THROW E.OutOfRange.NEW("No more items") } RETURN $list[$idx - 1] } }
When a class has an IMPLEMENTS argument, the compiler will check that the members of the interface are actually implemented by the class.
An INCLUDE block can be used to compose a CLASS from other classes and pieces.
CLASS Address string $street string $city } PIECE Locator FUNC $location() string RETURN $street .. ", " .. $city } } CLASS Person string $name INCLUDE Address $address Locator $locator } } Person p = NEW() p.name = "John Doe" p.street = "Langstraat 42" # equivalent to setting p.address.street p.city = "Amsterdam" IO.print(p.location()) # prints: "Langstraat 42, Amsterdam"
Fields and methods declared in the SHARED section are available to all objects, they are shared between all objects. In C++ and Java these are called "static" (which is a weird name).
Fields and methods in the SHARED section do not start with a $, that makes them easy to recognize.
The methods in the SHARED section cannot directly access members of an object or call object methods. Thus members that start with a $. They can be accessed if the object reference is passed in:
CLASS Foo int $nr SHARED FUNC right(Foo f) int RETURN f.nr } FUNC wrong() int RETURN $nr # ERROR! } } }
There can be multiple SHARED sections in a CLASS. This is convenient for keeping the shared members close to where they are used.
Calling a method that is defined in the SHARED section of a class does not require specifying the class name:
But not in the SHARED section of an inner class (a class defined in the scope of the class).
An object is constructed from a class by calling a NEW() method. Before the statements in NEW() are executed the object is initialized. This may involve an $Init() method defined in the class. This is explained in the section Object Initialization Sequence.
There can be several NEW() methods with different types of arguments. Which one is used is explained at the section Method call.
If a class does not define a NEW() method then invoking NEW() on that class will create an object with all members set to their default values.
If a class defines a NEW() method that accepts a list it is used when a list is assigned to a variable of this class. Similarly for a dict. Examples:
CLASS MyList list<string> $items NEW(list<string> l) $items = l } } CLASS MyDict dict<string, int> $lookup NEW(dict<string, int> d) $lookup = d } } ... MyList foo = ["one", "two", "three"] # Invokes MyList NEW() MyDict bar = ["one": 1, "two": 2, "three": 3] # Invokes MyDict NEW()
See Object Destruction.
class-def -> "CLASS" sep group-name sep-with-eol block-item* block-end ;
TODO
The interface name must start with "I_" and must be followed by at least one lower case letter.
interface-def -> "INTERFACE" sep group-name sep-with-eol block-item* block-end ;
TODO
The piece name must start with an upper case character and must be followed by at least one lower case letter.
piece-def -> "PIECE" sep group-name sep-with-eol block-item* block-end ; include-block -> "INCLUDE" sep-with-eol block-item* block-end ;
An enum is a value type where the value is one out of the list of possible values. The implementation uses a number, thus enums are very efficient. In most places the values are referred to by name.
Example:
ENUM Color black white red green } Color one = Color.green IO.print("Color: \(one)") # invokes one.ToString()
The enum name must start with an upper case character and must be followed by at least one lower case letter.
The enum values must start with a lower case character and can be followed by letters, numbers and an underscore (not two together).
Like with classes, enums can be extended:
ENUM MoreColors EXTENDS Color purple orange } MoreColors color = Color.green IO.print("Color: \(color)") color = MoreColors.purple IO.print("Color: \(color)")
As the example shows, the child enum can use values from the parent enum.
Using the ToString() method on an enum value returns the name as specified in the enum declaration as a string, as illustrated in the above example. To do the opposite use FromString() on the Enum, see below.
Using the value() method on an enum value returns the integer that is used for that value. The values are numbered sequentially, starting with zero. Note that this means that the values change whenever the list of values is changed. Also note that when an enum is extended there is no guarantee in which order the parent and children are numbered. Each one will be numbered sequentially.
Using the FromString(name) method on the enum returns the associated enum value. The name must match excactly. When the name does not have a match the first value is returned. Example:
Color red = Color.FromString("red") Color fail = Color.FromString("xxx") # will return Color.black
Using FromStringOrThrow(name) is similar to FromString(name), but when the name does not have a match then it throws E.BadValue. Example:
Color c string name = "xxx" TRY c = Color.FromStringOrThrow(name) CATCH E.BadValue e IO.print("There is no Color called " .. name) }
The Type() method returns a type describing the Enum.
Not implemented yet: Define a method in the ENUM.
enum-def -> "ENUM" sep group-name sep-with-eol (( var-name sep )* var-name sep-with-eol )? block-end ;
BITS is a value where each bit or group of bits is given a specific meaning. It is an efficient way to store several flags and small values in one value, with the convenience of accessing them as if they were individual items.
BITS is a value type, thus a copy is made when it is passed around. It does not need to be allocated, which makes it as efficient as an int.
A BITS is often used to pass settings to a chain of functions, and allows the fields to be changed when it is passed to another function. For example, to specify how a file is to be written:
BITS WriteFlags bool :create # create file when it does not exist bool :overwrite # truncate an existing file OnError :errorHandling # what to do on an error nat4 :threads # number of threads to be used for writing } FUNC write(string name) status RETURN write(name, :create + :overwrite + :errorHandling=return) } FUNC writeFast(string name, WriteFlags flags) status RETURN write(name, flags + :threads=8) } FUNC write(string name, WriteFlags flags) status ...
The field names are prepended with a colon, like field names of a class are prepended with dollar. When using the field name with the BITS type a dot is used, just like a member of a class. However, when the BITS type is inferred, the colon must be used before the field name. This way they can be recognized, they look different from a variable name.
These types are supported in a BITS:
The current limitation is that up to 64 bits can be used.
The assignment to a BITS variable is just like an assignment to any other value type variable. The expression on the right must evaluate to the correct BITS type. See below for what expression can be used for this.
There is one special value: Assigning zero to a BITS type resets all the fields to their default value (FALSE, zero).
When assigning a BITS field to another type of variable, the value of the field is used. Note the difference:
BITS MyBits bool :enabled } MyBits mine = MyBits.enabled # Result in a MyBits with "enabled" TRUE. bool error = MyBits.enabled # ERROR: Cannot assign a BITS field to a bool bool enabled = mine.enabled # gets the "enabled" field out of "mine"
The value of an individual field is assigned with the equal sign and followed by the value, without any white space. Examples:
:create=TRUE
:errorHandling=return
:threads=3
The values of fields are combined with the plus sign, which must be surrounded by white space. Example:
:create=TRUE + :errorHandling=no + :threads=3
The plus operator can be used to set fields to a value, using a field value as specified above. Example:
WriteFlags wf1 = :create + :threads=2
WriteFlags wf2 = wf1 + :threads=4 # assign 4 to wf2.threads
The value of individual fields can be accessed like with object members: variable-name dot field-name.
The Standard method ToString() returns a string representation of the BITS. NOT IMPLEMENTED YET, currently returns the int value.
Methods can be defined inside the BITS. This mostly works like methods defined in a CLASS. NOT IMPLEMENTED YET.
bits-def -> "BITS" sep group-name sep-with-eol block-item* block-end ;
The bits name must start with an upper case character and must be followed by at least one lower case letter.
A method name used with PROC or FUNC must start with a lower case character, unless it is a predefined method.
A method can be defined multiple times with the same name if the arguments are different. When there are optional arguments the arguments before them must be different. When the last argument has "..." (varargs) then the arguments before it must be different. In short: the non-optional arguments must be different.
What is considered to be different arguments depends on the rules for automatic conversion. This is explained at the section Method call.
It is recommended to use the same name for methods that do almost the same thing. When the intention of the functions is different it's better to use a different name than just using different arguments. For example, if there is a method "append(int x)" there should not be a method "append(int x, bool insert)", which inserts instead of appends.
NEW() is used to create an object from a class. It is the object constructor. This is not a normal method, it does not contain a RETURN statement, but the caller will get the newly created object as if it was returned.
See Constructor.
A procedure is declared like this:
PROC write(string text) fd.write(text) }
A procedure can also be defined in an expression. In that case the name is omitted:
proc<int> callback = PROC (int result) IO.print("Received: " .. result) }Only use this for short methods, for longer ones it's better to define them elsewhere. When the argument types can be figured out from the context it is possible to use a Lambda expression or method, see the sections below.
A function is just like a procedure, but additionally returns a value. The type of the return value goes after the arguments:
FUNC write(string text) status RETURN fd.write(text) }
The RETURN statement with an expression of the specified return type is the only way a FUNC may end.
In a class an object method can use the return type THIS. This means the class type is used.
CLASS Base FUNC $next() THIS # return type is Base RETURN $nextItem } } CLASS Child EXTENDS Base # $next() is inherited from Base, but here the return type is Child FUNC $prev() THIS # return type is Child RETURN $prevItem } }
Multiple values can be returned at once. The types are listed separated with a comma. And the RETURN statement has a comma separated list of expressions. Example:
FUNC $read() string, status IF $closed RETURN "", FAIL } RETURN $fd.read(), OK }
It is recommended to add a comment about what is returned, especially if this is not obvious:
FUNC minMax() int /* minimum */, int /* maximum */ ... RETURN min, max }
To use only one of the returned values add a subscript:
FUNC tryIt() int, string RETURN 33, "yes" } ... IO.print(tryIt()[0]) # prints "33" IO.print(tryIt()[1]) # prints "yes"
Do not return more than a few values, otherwise it may be difficult to understand what the code is doing.
A function can also be defined in an expression. In that case the name is omitted:
func<int => int> nextNr = FUNC (int increment) int counter += increment RETURN counter }Only use this for short methods, for longer ones it's better to define them elsewhere. When the argument and return types can be figured out from the context it is possible to use a Lambda expression or method, see the next sections.
This is shorthand for defining a PROC or a FUNC that only evaluates one expression. Lambda functions are especially useful for the map() and keyMap() methods of containers:
intDict.map({ v => v + 3 }) # add 3 to every item stringDict.keyMap({ k, v => k .. ": " .. v }) # every items becomes "key: value"The types of the arguments and the return type are inferred from the context. Therefore the context must have these types. Illustration:
VAR callback = { a, b => a * b } # ERROR: types can't be inferred. func<int, int => int> callback = { a, b => a * b } # OK
Before the => is the comma separated list of arguments. This is like in a method declaration, but without types. If there are no arguments use white space.
After the => goes a single expression. For a FUNC this is what is returned. For a PROC it must have a side effect to be useful.
This is shorthand for defining a nameless PROC or a FUNC. Lambda methods are especially useful for the map() and keyMap() methods of containers that consist of a few statements:
intDict.map(LAMBDA (v); count(); RETURN v + 3; }) # add 3 to every item stringDict.keyMap(LAMBDA (k, v) IO.print("processing " .. k) RETURN k .. ": " .. v # every items becomes "key: value" })The types of the arguments and the return type are inferred from the context. Therefore the context must have these types. Illustration:
VAR callback = LAMBDA (a, b); RETURN a * b; } # ERROR: types can't be inferred. func<int, int => int> callback = LAMBDA (a, b); RETURN a * b; } # OK
Inside the parenthesis after LAMBDA is the list of arguments. This is like in a method declaration, but without types. If there are no arguments use "()".
The statements can either be on a separate line, or separated with a semicolon.
Arguments can be declared to have a default value. In that case the argument can be omitted and the default value will be used.
When an argument has a default value, all following arguments must have a default value.
PROC foo(int x, int y = 0, int z = 0) IO.print("x:\(x) y:\(y) z:\(z)") } foo(3) # prints "x:3 y:0 z:0" foo(3, 7) # prints "x:3 y:7 z:0" foo(3, 7, 11) # prints "x:3 y:7 z:11"
The last argument may have "..." between the type and the name. This means this argument can be present zero or more times in the call.
Example:
FUNC add(int ... numbers) int int result FOR nr IN numbers.values result += nr } RETURN result } IO.print(add(1, 2, 3)) # prints 6 IO.print(add()) # prints 0
When using the argument in the method the type is a tuple with two arrays: tuple<array<string> names, array<arg-type> values>. This tuple and the arrays cannot be changed.
A short name for the tuple is varargs<arg-type>.
Example:
PROC show(int ... numbers) FOR idx IN 0 UNTIL numbers.values.Size() IO.print("\(numbers.names[idx]) is \(numbers.values[idx])") } } show(one = 1, five = 5) # prints "one is 1", "five is 5"
To pass the varargs to another method, or to pass a tuple as the varargs argument, pass it by name. Example using the show() function from above:
varargs<int> tup = [["a", "b"], [3, 9]] show(numbers = tup)Note that "numbers" is the name of the varargs argument.
A function cannot have both optional arguments and varargs.
A method can pick up variables from its context. The method is then called a closure.
Let's start with an example for USE by value:
string m = "one" PROC display(USE m) IO.print(m) } display() # displays "one" m = "two" display() # displays "one" proc<> p = display m = "three" p() # displays "one"
You can see that the value of "m" is taken at the moment when the PROC is defined. Changing "m" later has no effect.
To use the changed value of "m" it has to be a USE by reference:
string m = "one" PROC display(USE &m) IO.print(m) } display() # displays "one" m = "two" display() # displays "two" proc<> p = display m = "three" p() # displays "three"
If the variable is not a simple name, it must be given one with AS:
CLASS Foo SHARED string foo = "foo" string bar = " bar" } } Foo.foo = "two" PROC display(USE Foo.foo AS f, Foo.bar AS b) IO.print(f .. b) } display() # displays "two bar"
The USE keyword must come after the normal arguments. There must be a space before and after USE. When there is no normal argument there must be a space after it only. There is no comma before USE.
An example that has a bit more usefulness (translated from the Python example on Wikipedia):
FUNC getCounter() proc<int> int x PROC increment(int y USE &x) x += y IO.print("total: " .. x) } RETURN increment } VAR increment1 = getCounter() VAR increment2 = getCounter() increment1(1) # prints 1 increment1(7) # prints 8 increment2(1) # prints 1 increment1(1) # prints 9 increment2(1) # prints 2
What happens here is that the variable "x" in getCounter() is referenced by the callback stored in increment1, even though the function itself has returned and the scope no longer exists. Zimbu recognizes this situation and puts "x" into allocated memory. This happens every time getCounter() is called, thus increment1 and increment2 each have their own instance of "x".
The USE arguments can also be used with lambda functions. Here is an example with a lambda function and a thread:
string m = "world" pipe<string> sp = Z.evalThread<string>.NEW().eval({ USE m => "hello " .. m }) IO.print(sp.read())
Method names starting with an upper case letter are reserved for predefined methods. You can define these methods in your class or module. They must behave as specified, have the specified arguments and return type.
FUNC Main() int
Main() is the program entrance point. It can only appear at the toplevel of the main program file. Also see File Level.
FUNC Init() status
Used in a module or shared section of a class. Invoked during the startup sequence. Not to be confused with $Init(), see below.
FUNC EarlyInit() status
Used in a module or shared section of a class. Invoked during the startup sequence.
FUNC $ToString() string
Returns a string representation of the object. If a class does not define a ToString method, one is generated that lists the value of every member, using curly braces, similar to an initializer for the object.
CLASS NoToString int $value string $name } NoToString nts = NEW() nts.value = 555 nts.name = "foobar" IO.print(nts.ToString()) # result: {value: 555, name: "foobar"}
FUNC $Type() type
Returns a type object, which contains information about the type. Especially useful to find out what a "dyn" variable contains.
FUNC $Size() int
Returns the number of items. For a primitive type (int, nat, float, etc.) this can be the number of bytes. For a string it is the number of characters, for a byteString it is the number of bytes.
FUNC $Equal(Titem other) bool
Makes it possible to compare the value of two objects. It must return TRUE when the value of the object is equal to "other".
This does not necessarily mean all members of the object have the same value. For example, cached results of computations can be ignored.
Defining the $Equal() method on an object makes it possible to use the "==" and "!=" operators.
FUNC $Compare(Titem other) int
Must return zero when the object value is equal to "other", smaller than zero when the object value is smaller than "other", and larger than zero when the object value is larger than "other".
If the relevant value of the object is "int $value", it can be implemented like this:
FUNC $Compare(Titem other) int RETURN $value - other.value }
Defining the $Compare() method on an object makes it possible to use the ">", ">=", "<" and "<=" operators.
FUNC $Init() status
Used for initializing an object. See Object Initialization Sequence.
FUNC $Finish() status FUNC $Finish(Z.FinishReason fr) status
Used when an object is about to be destructed.
When the Z.FinishReason is unused or called, and the method returns OK it will not be called again. When it returns FAIL it will be called again the next time when the object is about to be destructed.
When the Z.FinishReason is leave or exit Finish() is only called once. The return value is ignored.
See Object Destruction.
TODO: lambda method
TODO: PROC and FUNC without a name, used in an expression
method-def -> func-def | proc-def | new-def ; func-def -> "FUNC" sep var-name method-args ":" sep type method-common ; proc-def -> "PROC sep var-name method-args method-common ; new-def -> "NEW" method-args method-common ; method-args -> "(" sep-with-eol? arg-defs? ")" arg-defs -> arg-def ( "," sep arg-def ) * skip ; arg-def -> type sep "&"? var-name ; arguments -> "&"? expr ( "," sep "&"? expr )* ; method-common -> sep-with-eol block-item* block-end ;
Variables can be declared in these scopes:
It is not allowed to declare a variable with the same name as a method.
It is not allowed to declare a variable with the same name, where it could be used.
Variables can be declared with these statements:
Type varName # simple variable declaration Type varName = expr # simple variable declaration with initialization Type var1, var2 # declare multiple variables of the same typeNote that when declaring multiple variables it is not possible to initialize any of them.
In a variable declaration VAR can be used instead of the type. The type will then be inferred from the first assignment. If the variable has an initializer that is the first assignment.
VAR s = "string" # type inferred from initializer VAR n n = 15 * 20 # int type inferred from first assignment n = "string" # ERROR, n is an int
Note that VAR and the dyn type are very different. VAR gets it type at runtime, the compiler infers it from how the variable is used. A variable of the dyn type can store any type of value.
dyn s = "string" # type of s is a string s = 15 * 20 # type of s is now an int s = "string" # type of s is a string again
When the initialization value is a constant or a computation of constants, and the value does not fit in the variable the compiler produces an error. When the initialization is an expression this does not happen.
A variable declared inside a method normally only exists while executing the method, it is located on the stack. To have a variable exist forever, prepend STATIC. The variable will then be located in static memory.
PROC printStartTime() STATIC int startTime IF startTime == 0 startTime = TIME.current() } IO.print(TIME.Values.NEW(startTime).ToString()) } printStartTime() # prints the current time TIME.sleepSec(3) printStartTime() # prints the same time again
Variables declared with STATIC are shared by all calls to the method. Only one variable exists, no matter how often the method is called. Still, the variable can only be accessed inside the method, it is not visible outside the method.
The static variable can be initialized. The expression must evaluate to a constant.
There is no thread safety, all methods share the same variable.
[STATIC] type name [attribute-list] [= expression]
[STATIC] | optional | |
type | The type name, such as "int", "string" or "MyClass". VAR can also be used here. | |
name | The variable name, e.g., "$foo" or "foo". | |
attribute-list | Optional attributes, such as @public. | |
= expression | Initializer. When using the VAR type also infers the variable type. |
Examples:
int i string hello = "Hello" VAR ref @public STATIC int startTime
var-def -> type sep var-decl ( skip "," sep var-decl )* line-sep ; var-decl -> var-name attribute* var-init? ; var-init -> sep "=" sep expr ;
In a class, not in the SHARED section, all variable names start with a dollar and then a lower case letter. Example:
CLASS Foo string $name }
Everywhere else the variable names start with a lower case letter. Example:
PROC foo() string name }
TODO
The @local attribute can be used on members and methods of a class and a piece. The effect is that the declaration is local to the scope where it is defined. It is not visible in child classes, interfaces and, for a piece, the class where it is included.
For example, this piece keeps $done and $maxVal local. A class that includes this piece may define $done and $maxVal without causing a conflict.
PIECE Max bool $done @local int $maxVal @local = T.int.min FUNC $max() int IF !$done $done = TRUE FOR n IN $Iterator() IF n > $maxVal $maxVal = n } } } RETURN $maxVal } }
TODO
TODO: this section is incomplete
The default visibility is the directory where the item is defined and subdirectories thereof. This implies that code can be organized in a directory tree without worrying about visibility too much.
top-directory | can access items in top-directory |
sub-directory | can access items in top- and sub-directory |
sub-sub-directory | can access items in top-, sub- and sub-sub-directory |
These are attributes that can be added to specify the visibility:
@private | only the current class, not in a child class |
@protected | only the current class and child classes |
@local | only the current directory, not subdirectories |
@file | only the current file |
@directory | only the current directory and subdirectories |
@public | everywhere |
For example, to make a class member only visible in the class itself:
int $count @private
Attributes that can be prepended to the above:
@read= | only for read access |
@items= | applies to all members |
For example, to make all members of a module public:
MODULE Parse @items=public
To make a class member writable only in the class itself, and readable everywhere:
int $count @private @read=public
Although Zimbu does not follow the "everything is an object" concept, you can use every type like it was an object. For example, you can invoke a method on a value type:
bool nice IO.print(nice.ToString()) IO.print(1234.toHex())
Value types, such as int and bool, are passed around by value. Every time it is passed as an argument to a method and when assigned to another variable a copy is made. When changing the original value the copy remains unchanged.
int aa = 3 # |aa| is assigned the value 3 someMethod(aa) # |aa| is still 3, no matter what someMethod() does. int bb = aa # the value of |aa| is copied to |bb| bb = 8 # |aa| is still 3, changing |bb| has no effect on that.
Value types always have a valid value, there is no "undefined" state. There is a default value, but you can't tell whether that was from an assignment or not.
See below for the list of builtin value types.
BITS is a special kind of value type. It contains several small fields, like a class. But it is passed by value, unlike objects.
Reference types, such as string, list and objects, are passed around by reference. When two variables reference the same item, changing one also changes the other.
list<string> aa = ["one"] someMethod(aa) # |aa| may have been changed by someMethod() list<string> bb = aa # |bb| refers to the same list as |aa| bb.add("two") # |aa| is now ["one", "two"], as is |bb|
However, the reference itself is a value that is copied. Example:
list<string> aa = ["one"] list<string> bb = aa bb = ["two"] # |aa| is unchanged
The default value for all reference types is NIL. That means it refers to nothing. Trying to use the NIL value usually leads to an E.NilAccess exception. You usually call NEW() to create an instance of a reference type.
See below for the list of builtin reference types: string types, container types and other types.
The special value THIS is a reference for the current object. It can only be used in object methods (the ones that start with a $).
THIS can also be used as the return type of an object method. It means the type of the class is used. If the class is extended and the child class does not replace the method, then type of the child class is used for THIS. Thus in the child class the return type is different from the parent class. This is especially useful in functions that return THIS. There is an example [[Method Declaration_FUNC|here].
Any variable, also value typed variables, can be referred to with the "&" operator. This results in a reference to the variable and must be declared as such.
int aa = 4 someMethod(&aa) # |aa| may have been changed by someMethod()
Use this with care, it can be confusing. Especially when referencing a variable of reference type. For returning more than one value from a function you can do this directly. It is useful for passing a variable both for input and output, e.g. a counter.
There are three method reference types:
proc | reference to a PROC |
func | reference to a FUNC |
callback | reference to a PROC or FUNC with extra arguments |
On top of this it matters whether the method is to be used with an object or not. When not, it's possible that an object method is called, the object must be stored in the reference then, it works like a callback.
Type declaration examples:
proc<string> # A reference to a PROC taking one string argument. proc<> # A reference to a PROC without arguments. func<int => int> # A reference to a FUNC taking one int argument and returning an int. func< => string> # A reference to a FUNC without arguments and returning a string.
Note the use of "=>" between arguments and the return type of a FUNC. You can pronounce "=>" as "gives". There is always a space before and after the "=>".
To use a method reference, simply put the variable name in place of where the method name would go. Continuing the example above:
proc<int> p = addFive p(20) # prints 25
You can think of these method references as a pointer to the method. However, it can in fact be a callback, where the reference holds the object and additional arguments. This does not matter to the caller, only to where the reference is created. In this example the object is stored:
CLASS MyClass int $count PROC $add(int n) $count += n } } MyClass obj = NEW() proc<int> add = obj.add add(7) IO.print(obj.count) # prints "7"
Compare this to the example below that passes the object when calling the method.
This is similar to method references without an object, but the name of the class is prepended:
MyClass.proc<string> # A reference to a PROC taking one string argument. MyClass.proc<> # A reference to a PROC without arguments. MyClass.func<int => int> # A reference to a FUNC taking one int argument and returning an int. MyClass.func< => string> # A reference to a FUNC without arguments and returning a string.
To use the method reference put it in parenthesis in place of where the method name would go:
CLASS MyClass int $count PROC $add(int n) $count += n } } MyClass.proc<int> add = MyClass.add MyClass obj = NEW() obj.(add)(7) IO.print(obj.count) # prints "7"
An object method reference needs to be called using an object. The object is *not* stored with the reference, even though it is possible to obtain the reference using an object. This is useful especially for objects with inheritance, where the method to be called depends on the class of the object.
CLASS ParentClass int $count PROC $add(int n) @default $count += n } } CLASS ChildClass EXTENDS ParentClass PROC $add(int n) @replace $count += n + 2 } } ChildClass child = NEW() ParentClass.proc<int> add = child.add # stores ChildClass.add() ParentClass parent = NEW() parent.(add)(7) IO.print(parent.count) # prints "9"
Type declaration examples:
callback<proc<int>, int> # A reference to a PROC with two int arguments, one of which is stored in the callback.
Calling a method using the reference is just like a method call:
func<int => string> f = { n => "number " .. n } IO.print(f(3)) # output: number 3
A callback has two method type specifications:
Example:
PROC add(int val, int inc) IO.print(val + inc) } callback<proc<int>, int> addFive = NEW(add, 5) callback<proc<int>, int> addEight = NEW(add, 8) addFive(10) # prints 15 addEight(10) # prints 18
Once a callback is created, it can be passed around as if it is reference to the inner method. That the callback stores the extra argument is transparent, it has the type of the inner method. The argumens stored inside the callback only become visible when the callback is invoked.
Note that the extra arguments of the outer method always come after the arguments of the innter method. There is no way to change that.
A method reference for a method with USE arguments is very similar to a callback but the way it is created is different. See Closures.
Classes, interfaces and methods can be defined with template types. The type is declared by adding the actual types in angle brackets:
list<string> # list with string items dict<int, bool> # dict with int key and bool items MyContainer<Address> # MyContainer class with Address objects I.Iterable<int> # I.Iterable interface for iterating over ints
For most code types should be specified at compile time and will be checked at compile time. This catches mistakes as early as possible. E.g., if you declare a string variable and pass it to a method that requires an int the compiler will tell you this is wrong.
string word = "hello" increment(word) # Compile time error: int required.
For more flexibility, at the cost of performance and causing mistakes to be discovered only when the program is being executed, the dyn type can be used. A variable of this type can contain any kind of value or reference. Assignment to a dyn variable never fails. However, using the variable where a specific type is expected will invoke a runtime type check. For this purpose the dyn type stores information about the actual type.
The dyn type is most useful in containers. This example stores key-value pairs where the value can be any type:
dict<string, dyn> keyValue = NEW() parseFile("keyvalue.txt", keyValue) FOR key IN keyValue.keys() dyn value = keyValue[key] SWITCH value.Type() CASE T.int; IO.print(key .. " is number " .. value) CASE T.string; IO.print(key .. " is string '" .. value .. "'") DEFAULT; IO.print(key .. " is not a number or string") } }
Methods for the dyn type are documented in the dyn class.
Value typed variables have no identity, only a value. You can not tell one FALSE from another.
Reference typed variables can have exactly the same value and still reference to another instance. Therefore we have different operators to compare the value and the identity:
string a = "one1" string b = "one" .. 1 IO.print(a == b) # TRUE IO.print(a IS b) # FALSE
Note: String constants are de-duplicated. Also when the compiler can perform concatenation at compile time:
string a = "one" string b = "o" .. "ne" IO.print(a == b) # TRUE IO.print(a IS b) # TRUE !
All builtin type names start with a lower case letter. The types defined in Zimbu code must start with an upper case letter. That way new types can be added later without breaking an existing program.
When used in an expression the standard types need to be preceded with "T.":
thread t = T.thread.NEW()It's rarely needed though, in the example you would normally leave out "T.thread." and NEW() would work with the inferred type.
type name | contains |
bool | TRUE or FALSE |
status | FAIL or OK |
int | 64 bit signed number |
int8 | 8 bit signed number |
int16 | 16 bit signed number |
int32 | 32 bit signed number |
int64 | 64 bit signed number, identical to int |
nat | 64 bit unsigned number |
nat8 | 8 bit unsigned number |
nat16 | 16 bit unsigned number |
nat32 | 32 bit unsigned number |
nat64 | 64 bit unsigned number, identical to nat |
float | 64 bit floating point number |
float32 | 32 bit floating point number |
float64 | 64 bit floating point number, identical to float |
float80 | 80 bit floating point number |
float128 | 128 bit floating point number |
fixed1 | 64 bit signed number with one decimal: 1.1 |
fixed2 | 64 bit signed number with two decimals: 1.12 |
... | |
fixed15 | 64 bit signed number with 15 decimals: 1.123456789012345 |
See Default Values for what value a variable has when not explicitly initialzed.
status is similar to bool, but with clearer meaning for success/failure. It is often used as return value for methods.
NOTE: fixed types have not been implemented yet
fixed1, fixed2, ... fixed15 are used for computations where the number of digits behind the point needs to be fixed. fixed2 is specially useful for money, fixed3 for meters, etc.
Use the link under the type name to go to the type documentation.
type name | functionality |
string | a sequence of utf-8 encoded Unicode characters, immutable |
byteString | a sequence of 8-bit bytes, immutable |
varString | a sequence of utf-8 encoded Unicode characters, mutable |
varBytesString | a sequence of 8-bit bytes, mutable |
All string types can contain a NUL character. The length is remembered, getting the length of a very long string is not slow, like it is with NUL terminated strings.
String and byteString use the same storage format and can be typecast to each other without conversion. Same for varString and varByteString.
Varstring and varByteString are mutable. They are implemented in a way that does not require reallocating memory and copying text for every mutation.
When using a varString where a string is expected, the varString is automatically converted using the ToString() method. And the other way around, using the toVarstring() method.
When using a varByteString where a byteString is expected, the varByteString is automatically converted using the toBytes() method. And the other way around, using the toVarbytes() method.
These conversions also work for NIL, so that this works:
varString vs # NIL by default string s = vs # no problem.Most other operations on string types fail when the value is NIL.
Use the link under the type name to go to the type documentation.
type name | functionality |
array | multi-dimentional vector of known size |
list | one-dimensional, can insert |
sortedList | one-dimensional, can insert, ordered |
dict | lookup by key, no duplicate keys |
multiDict | lookup by key, duplicate keys allowed |
set | lookup by key, no duplicate keys |
multiSet | lookup by key, duplicate keys allowed |
All containers contain items of the same type. However, the type can be dyn, in which case the container can hold items of any type.
type name | functionality |
tuple | structure with one or more items of a specified type |
A tuple requires the type of every item it contains to be specified. It is convenient for when a function returns more than one thing:
# Read a line. Returns a tuple with: # |status| OK or FAIL # |string| the text when |status| is OK, an error message when |status| is FAIL FUNC readLine() tuple<status, string>
The items in a tuple can be accessed with an index, starting at zero, like with a list. With square brackets on the left side of an assignment all items can be obtained at once:
tuple<int, string> tup = NEW() # sets all values to their default tup = [5, "foo"] # Create tuple and initialize from a list. tup[0] = 7 tup[1] = "bar" int i = tup[0] # get 7 string s = tup[1] # get "bar" [i, s] = tup # unpack the tuple, get 7 and "bar" at once
To make clear what each item in the tuple is for names can be added. The items can then be accessed by that name, like a class member:
tuple<int x, int y, string title> tup = NEW(5, 10, "hello") int xval = tup.x # same as int xval = tup[0] string title = tup.title # same as string title = tup[2] t.y = 3 # same as t[1] = 3 t.title = "there" # same as t[2] = "there"
It is not possible to add a method to a tuple. If you need that use a CLASS instead.
Use the link under the type name to go to the type documentation.
type name | functionality |
pipe | synchronized stream |
thread | unit of execution |
evalThread | unit of execution to evaluate an expression |
lock | object used to get exclusive access |
cond | condition to wait on |
The standard libraries define many useful types, but they do not have a short type name, e.g.
type name | functionality |
IO.File | opened file |
IO.Stat | information about a file |
Z.Pos | position in a file |
Use the link under the type name to go to the type documentation.
Some type declarations can become long and using a short name instead makes code easier to read. Zimbu offers two ways for this: ALIAS and TYPE. ALIAS is nothing else than a different name for the same type. The name still stands for the same type and can be used instead of that type. TYPE defines a new type and restricts how that type can be used.
TYPE type name ALIAS type name
type | The type name, such as "int", "string" or "MyClass". | |
name | The declared name, e.g., "BirdName" or "Length". |
ALIAS is used to give a short name to a type, method or variable. Example:
ALIAS Token.Type TType
Here the name TType stands for Token.Type.
This can also be used to define a name in a module or class as if it is part of that module or class, while it is actually defined elsewhere. For example, the ZWT library defines items that are actually defined in another file.
IMPORT "zwt/PanelModule.zu" ... MODULE ZWT ... ALIAS PanelModule.Panel @public Panel }
Now the Panel class defined in PanelModule can be used as ZWT.Panel.
TYPE is used to define a new type from another type. There are two reasons to do this:
Example for the first reason:
TYPE int WeightPerMeter TYPE int Length TYPE int Weight WeightPerMeter w = 8 Length l = 100 Weight t = w * l w = l # Error!
Here WeightPerMeter, Length and WeightPerMeter are all integers, but they are a different type. When assigning l (which is Length) to w (which is WeightPerMeter) the compiler will generate an error.
When operating on a typedef'ed type it loses its special meaning and the type it stands for is used instead. Therefore the result of multiplying w and l can be assigned to t, even though its type is different.
Also, the typedef'ed type can be assigned to and from the type it stands for. This is more apparent when using container types:
TYPE dict<string, int> KeyValue TYPE dict<string, int> NameNumber KeyValue kv = NEW() NameNumber nn = NEW() dict<string, int> xx = kv nn = ["hello": 5] kv = nn # Error!
In a block it is possible to declare a class, method, enum, etc. These items will then only be visible inside the block. Just like other items declared in the block.
A nested block can be used to restrict the visibility of declared items.
The NOP statement does nothing.
block-item -> ( file-item | assignment | method-call | conditional | switch | try | while | do-until | for-in | break | continue | nop | block ) ; nop -> "NOP" line-sep ; block -> "{" line-sep block-item+ block-end ;
A simple assignment has the form:
variable = expression
The type of the expression must match the type of the variable, or it must be possible to convert the value without loss of information. E.g. you can assign a byte to an int variable, but not the other way around. The same applies to the other kinds of assignment below.
When the expression is a constant or a computation using only constants, and the value does not fit in the variable the compiler produces an error.
It is possible to assign multiple values at the same time:
var1, var2 = multiFunc() var3, var4 = someTupleHere multiFunc() returns two values and someTuple results in a tuple type with two values.
It is also possible to swap two variables, rotate three or do related assignments at the same time:
x, y = y, x a, b, c = b, c, a r, g, b = red, green, blueThere is no limit on the number of variables, but it quickly becomes unreadable with more than three. Only use this when it makes sense, otherwise split into multiple assignments.
It is possible to do multiple assignments and declare some variables at the same time:
string var1, status var2 = getStringWithStatus() var3, list<int> var4 = getCountAndList()
Note that there cannot be a line break between the type and the variable name, because the compiler would see this as a declaration and an assignment:
string var1, status var2 = someFunction()
This declares a variable named status as a string and assigns the result of someFunction() to var2.
lhs += expr # add expr to lhs (numbers only) lhs -= expr # subtract expr from lhs (numbers only) lhs *= expr # multiple lhs by expr (numbers only) lhs /= expr # divide lhs by expr (numbers only) lhs ..= expr # concatenate expr to lhs (strings only)
This works like "lhs = lhs OP expr", except that "lhs" is only evaluated once. This matters when evaluating "lhs" has side effects.
assignment -> comp-name sep "=" sep expr line-sep ; TODO: more types
TODO
NEW() can be used as an expression when the type can be inferred from the context. This is usually the case when assigned to a variable:
list<string> names = NEW() # empty list of strings array<int> numbers = NEW(8) # one-dimensional array containing 8 ints
Otherwise the class must be specified:
VAR names = NameList.NEW()
Normally arguments are passed by position, their sequence at the call and the method being called is the same. When passing arguments by name, the order can differ. When an argument is passed by name, all following argument must be passed by name.
The following example outputs "There are 3 red desks" and "There are 2 green chairs".
PROC show(string color, string what, int amount) IO.print("There are \(amount) \(color) \(what)") } show("red", "desks", 3) show(amount = 2, what = "chairs", color = "green")
This has advantages and disadvantages. The main advantage is that you can see at the caller side what the argument means. When there are several booleans and you pass TRUE or FALSE, it is easy to get confused about what each value is used for.
The main disadvantage is that you can't change the name used in the method without also changing it for all callers. This can be a problem when adding a new argument which makes the meaning of an existing argument unclear. Or when the name turns out to be a bad choice.
Since there can be multiple methods with the same name there are rules about which one to call, depending on the arguments used.
The return type, and whether the method is a PROC or a FUNC, does not matter for selecting the method.
Generally, the method with the lowest argument conversion cost is selected. If there is more than one method with the lowest cost, this results in a compile time error, since the compiler does not know which one to use. For computing the conversion cost add up the conversion cost for each argument, as explained in the following section.
When the argument name is used in the call ("name = expression") the name itself is used, not the type of the expression. All arguments passed by name must exist.
Optional arguments, the ones specified with a default value and the varargs argument, are not used to select the method.
When a method is called with an argument that is of different type than the type specified for the function, the compiler will attempt an automatic conversion.
When the method arg is a typedef and the used argument is not a typedef, the method arg is considered to be what the typedef is defined to be. For example, if the argument is a typedef Length, which is an int, conversion cost for using an int is zero. If the used argument is a typedef Width, which is also an int, no conversion is possible.
When two ways of conversion are possible the one with the lower cost is used.
Cost 0: When no conversion is to be done. This includes:
Cost 1: When the method arg type is of the same type as the used argument but bigger. This includes:
Cost 2: When the method arg type is very similar and no information will be lost.
Cost 100: When the conversion is cheap
Cost 10000: When the conversion takes some effort
Some resulting choices:
method-call -> comp-name skip "(" arguments? ")" line-sep ;
The RETURN statement causes the flow of execution to return to the caller. When inside a TRY statement any FINALLY block will be executed before returning. When DEFER statements were executed, their function calls will be executed, in reverse order.
A PROC can have a RETURN statement without any arguments.
A FUNC must end in a RETURN statement and the argument or arguments must match the return type or return types of the function. When there is more than one return type they are separated with commas, like the arguments to a function.
No statements may directly follow RETURN. They would never be executed.
return -> "RETURN" ( sep expr )? ( "," sep expr)* line-sep ;
The EXIT statement causes the program to end. However, a TRY statement may catch the E.Exit exception and continue execution.
The EXIT statement has one integer argument, which is used as the exit status for the program.
exit -> "EXIT" sep expr line-sep ;
conditional -> "IF" sep expr line-sep block-item+ elseif-part* else-part? block-end ; elseif-part -> "ELSEIF" sep expr line-sep block-item+ ; else-part -> "ELSE" line-sep block-item+ ;
IFNIL is just like IF, except that it does not take an expression. Its condition is TRUE when THIS (the object the method is invoked on) is NIL.
FUNC $values() list<int> IFNIL RETURN [] } RETURN $members.values() } FUNC $Size() int IFNIL RETURN 0 } ... } FUNC $find(int c) int IFNIL RETURN -1 # not found } ... }
IFNIL must be the very first statement in the method. It can only be used inside a method of a class.
Without IFNIL an E.NilAccess exception will be thrown.
An alternative is to use the ?. operator, it will result in the default return value. The advantage of IFNIL is that you can return any value, such as an emptly list for $values() above, or -1 for $find() above.
When inheritance is involved a NIL object can be one of several classes. All the classes that the object could be an instance of should use IFNIL in the called method. Otherwise the program may crash. If @replace is not used then it will always work.
Let's start with an example, where "color" is an enum:
SWITCH color CASE Color.red; IO.print("stop!") CASE Color.yellow; IO.print("brake!") CASE Color.green; IO.print("go!") DEFAULT; IO.print("what?") }
After SWITCH comes an expression, which must evaluate to a number, enum, string or type. This value is compared to each of the arguments of the following CASE statements and the code block of the matching CASE is executed.
The argument of CASE must be a value. Each value can only appear once.
Multiple CASE statements can appear before a block of code. A match with any of the CASE values causes that block to be executed. The block ends at the next CASE or DEFAULT statement.
SWITCH val CASE 1 CASE 2 IO.print("one or two") CASE 3 IO.print("three") }
A BREAK statement in a CASE block causes execution to jump to the end of the SWITCH statement.
A PROCEED statement at the end of a block, before a CASE statement, causes execution to continue in the next block.
SWITCH val CASE 1; IO.print("one") PROCEED CASE 2; IO.print("one or two") }
The optional DEFAULT block is used when none of the CASE statements match. There can be only one DEFAULT statement, it must come after all the CASE statements and if there is a CASE before it there must be code in between.
When the SWITCH expression is a string then the MATCH statement can be used in place of a CASE. The argument of MATCH is either a string, which is used as a regex, or a regex.
SWITCH text CASE "foo"; IO.print("text is foo") MATCH "foo"; IO.print("text contains foo") MATCH re; IO.print("text matches re") }
The CASE and MATCH items are checked in the order given, the first one that matches is used and no further items are checked.
switch -> "SWITCH" sep expr line-sep switch-item+ default-item? block-end ; switch-item -> ( ( "CASE" sep expr line-sep ) | ( "MATCH" sep expr line-sep ) )+ block-item+ ; default-item -> "DEFAULT" line-sep block-item+ ;
A BREAK statement inside the loop causes execution to jump to the end of the WHILE statement.
A CONTINUE statement inside the loop causes execution to jump back to the start of the WHILE statement, evaluationg the condition again.
while -> "WHILE" loop-name? sep expr line-sep block-item+ block-end ; break -> "BREAK" loop-name? line-sep ; continue -> "CONTINUE" loop-name? line-sep ;
BREAK and CONTINUE work as with WHILE.
The condition of the UNTIL is evaluated in the context of the loop block. That allows checking a variable defined in that block. Example:
DO bool doPass = ++loop < 3 UNTIL !doPass
do-until -> "DO" loop-name? line-sep block-item+ "UNTIL" sep expr sep-with-eol ;
The FOR loop is used to iterate over anything that can be iterated over.
A number range:
# TO is inclusive FOR i IN 1 TO 5 # i is set to 1, 2, 3, 4 and 5 IO.write(i) } # UNTIL is exclusive FOR i IN 0 UNTIL list.Size() # i is set to 0, 1, .. list.Size() - 1 IO.write(list[i]) }
A backwards range:
FOR i IN 5 TO 0 STEP -1 # range is inclusive # i = 5, 4, 3, 2, 1, 0 }
The loop variable can be set inside the loop, e.g. to skip over some numbers:
FOR idx IN 0 UNTIL l.Size() IF l[idx] == '\\' ++idx # skip over next item ELSE produce(l[idx]) }
Characters in a string:
FOR c IN "1234" # c is set to each character in the string IO.write(c) }
Values of an enum:
ENUM Some one two } FOR v IN Some # v is set to each value in the enum IO.write(v.ToString()) }
Items in a list (array is the same):
FOR item IN [1, 2, 3] # item is set to each item in the list IO.write(item) }
Items in a list with the index:
FOR index, item IN ["zero", "one", "two", "three"] IO.write(index .. ": " .. item) }
Items in a dictionary, using only the values
FOR item IN [1: "one", 2: "two", 3: "three"] # item is set to each string IO.write(item) }
Items in a dictionary, using the keys and the values
FOR key, val IN [1: "one", 2: "two", 3: "three"] # key is set to each number, val is set to each string IO.write(key .. ": " .. item) }
Any class that implements I.Iterable can be iterated over:
FOR name IN nameList # name is obtained with nameList.Iterator() IO.write(name.ToString()) }
Any class that implements I.KeyIterable can be iterated over with two loop variables:
FOR key, name IN nameList # name is obtained with nameList.KeyIterator() IO.write(key .. ":" .. name.ToString()) }
For the above, if the variable to be iterated over is NIL, this works as if there are no items. Thus it does not throw an E.NilAccess exception.
BREAK and CONTINUE work as with WHILE.
There can be multiple, comma separated iterable expressions after IN. There must be one loop variable for each iterable. The loop uses one item from each iterable on each iteration. The loop ends when one of the iterables runs out of items.
list<string> week_en = ["Mon", "Tue", "Wed", "Thu", "Fri"] list<string> week_nl = ["ma", "di", "wo", "do", "fr"] list<string> week_de = ["Mo", "Di", "Mi", "Do", "Fr"] FOR en, nl, de IN week_en, week_nl, week_de IO.print("English: " .. en .. ", Dutch: " .. nl .. ", German: " .. de) }
None of the iterable expressions can be an I.KeyIterator. When any iterator is NIL the loop is skipped, as if there are no items to iterate over.
The type of the loop variable(s) is inferred from what is being iterated over.
When using two loop variables and one expression the first variable is the index or key and the second the value.
For a class a FOR loop with one variable will use the I.Iterator interface, with two variables the I.KeyIterator interface. If an object is given, the Itorator() and KeyIterator() methods will be used to obtain the iterator.
The loop variable is available in the scope of the FOR block. If it needs to be available elsewhere, explicitly declare a variable and use it with the USE keyword:
int idx FOR USE idx IN 0 UNTIL list.Size() IF list[idx] == 0 BREAK } } IO.print("valid size: " .. idx)
for-in -> "FOR" loop-name? sep ( "USE"? key-var-name )? "USE"? item-var-name "IN" expr ( ("TO" | "UNTIL") expr)? ( "STEP" expr )? line-sep block-item+ block-end ;
A DEFER statement has one argument, which must be a method call. This call is postponed until the end of the current method. The arguments for the method call are evaluated at the time the DEFER statement is executed.
DEFER is most useful right after a resource is allocated. The argument is then a call to free up the resource. Example:
PROC copy() IO.File in = IO.fileReader("source") DEFER in.close() IO.File out = IO.fileWriter("destination") DEFER out.close() ... copy from in to out, possibly throws an exception # out.close() is called here # in.close() is called here }
The callbacks are invoked in reverse order, the callback from the first DEFER statement is called first.
It is possible to use a DEFER statement inside a loop. Keep in mind that the arguments for the called method are evaluated when the DEFER statement is executed:
FOR idx IN 1 TO 3 DEFER IO.print("loop " .. idx) } # At the end of the method will print: # loop 3 # loop 2 # loop 1
If somewhere in the method an exception is thrown, that is not caught by a TRY/CATCH, the callbacks for the executed DEFER statements are invoked before the exception is handled. This also happens for nested methods, going up the stack until either Main() is handled or a TRY/CATCH handles the exception.
When the method being called throws an exception, this is reported on stderr and the processing of callbacks continues. Note that this means that executing the deferred methods happens inside a TRY/CATCH, which has some overhead.
This could also be done with exception handling, but this has more overhead and gets messy when there are several resources to free.
Another alternative is to use a Finish() method in a class. This has the advantage that it does not require an extra statement. A disadvantage is that it won't be called until the garbage is collected. Unless a not allocated variable is used.
defer -> "DEFER" sep expr line-sep ;
TRY can be used to handle an exception. The TRY block contains statements that might cause an exception to be throw. CATCH blocks are used to deal with them:
string s TRY IO.File f = openFile("does not exist") CATCH E.AccessDenied e IO.print("Could not open file: " .. e.toString()) ELSE IF f == NIL IO.print("File does not exist") ELSE TRY s = f.read() FINALLY f.close() } } }
This example uses the openFile() method, which returns NIL when the file does not exist. That is the normal way to fail, thus it does not throw an exception but returns NIL. Another way to fail is that the file exists, but cannot be accessed. This throws an E.AccessDenied exception, which is caught by the CATCH statement.
The ELSE block is executed when no exception was thrown in the TRY block.
Note that the variable "f" that was declared n the TRY block is also available in the ELSE block. They use the same scope.
The FINALLY block is always executed. Also when an exception is thrown in a CATCH or ELSE block. In that case the exception is thrown again at the end of the FINALLY block. However, if an exception is thrown inside the FINALLY block, this will not happen.
Also, when BREAK, CONTINUE or RETURN was used, the FINALLY block is executed and the statement takes affect at the end of it.
The exceptions throws in the CATCH, ELSE and FINALLY blocks are not caught by this TRY statement. Except that this may cause the FINALLY block to be executed.
try -> "TRY" line-sep block-item+ catch-part* else-part? finally-part? block-end ; catch-part -> "CATCH" sep type ( "," sep type)* sep var-name line-sep block-item+ ; else-part -> "ELSE" line-sep block-item+ ; finally-part -> "FINALLY" line-sep block-item+ ;
TODO
When writing a module that uses a C type, it can be included in a class like this:
C(pthread_t) thread_id
The text between C( and ) is used literally in the produced C code. There cannot be a line break between C( and ).
This does not automatically define the type, see the next section about including the C header file.
NOTE: Variables defined this way will NOT be garbage-collected! You must take care of this yourself, possibly using a Finish() method.
For C header files you can use IMPORT.CHEADER. That makes sure the header file is included early and only once.
The include statement will appear near the start of the generated C code. The compiler discards duplicate names. The meaning of using "" or <> matters, it is passed on to the C code. Example:
IMPORT.CHEADER <ncurses.h>
For small pieces of C code you can use C(code):
bool special = (value & C(SPECIAL_MASK)) != 0
There Zimbu compiler does not check the code, if you do something wrong the C compiler will produce errors or warnings.
Text between ">>>" and "<<<" is copied as-is to the generated C or Javascript file.
>>> blockgc FILE *fd = fopen("temp", "r"); <<<
Both the "<<<" and the ">>>" must appear at the start of the line without any preceding white space. They can not appear halfway a statement.
string x = >>> "This does not work!"; <<<
Comments are allowed in the same line after ">>>" and "<<<":
>>> # debug code printf("hello\n"); <<< # end of debug code
The "blockgc" argument means the garbage collector (GC) should not run while inside this block. "blockgc" must be used for a block that contains an unsafe function. An unsafe function is any function that is not safe, as indicated by the POSIX standard. This includes a function that allocates memory.
"fopen" is an unsafe function, it allocates memory, and the GC must not be run while this is happening. Unfortunately, "fopen" may take a while, and blocks any pending GC. This should be avoided.
After a block marked with "blockgc" the GC will run if it was postponed.
To test for missing "blockgc" run your code compiled with the --exitclean argument.
Inside >>> and <<< references to Zimbu variables and methods can be used. Examples:
%var% %obj.member% %funcName%
For functions this results in a callback. If this is not wanted, the function name itself is to be obtained, use %[ expr ]% instead:
%[$funcName]%
Note that for a function in a parent class the value of THIS is used to determine with method needs to be called, since a child class can replace it.
Zimbu expressions can be used as: %{ expression }%. Examples:
%{var + 5}% %{ myFunc("foobar") }%
Note that mixing C and Zimbu variables can be tricky. Look at the generated code to make sure this is what you wanted.
To specify what items the native code depends on, so that it gets added to the program, the uses() item is put after ">>>":
>>> uses(getCstring) >>> uses(sys_types, socket, hostname, unistd, getCstring)
Items available in uses() for C and what they make available:
name | made available | comment |
ctype_h | ctype.h include file | |
dirent | dirent.h include file | |
errno | errno.h include file | |
fcntl | fcntl.h include file | |
gcRun | garbage collection | rarely needed |
getCstring | ZgetCstring(s) | converts a Zimbu string to a C "char *" NUL terminated string |
hostname | netdb.h include file | |
limits | limits.h include file | |
pthread | pthread.h include file | also adds pthread library to link with |
setjmp_h | setjmp.h include file | |
socket | include files needed for sockets | also adds socket library to link with |
string_h | string.h include file | |
sys_stat | sys/stat.h include file | |
sys_time | sys/time.h include file | |
sys_types | sys/types.h include file | |
sys_wait | sys/wait.h include file | not available on MS-Windows |
time_h | time.h include file | |
unistd | unistd.h include file | |
windows_h | window.h include file | only available on MS-Windows |
Items available in uses() for JavaScript and what they make available:
name | made available | comment |
jsChildProcess | child_process Node module | |
jsFile | fs Node module | |
xhr | RPC | XML HTTP request from client to server |
Items available in uses() for Java and what they make available:
name | made available | comment |
javaCalendar | java.util.Calendar class | |
javaDate | java.util.Date class |
The GENERATE_IF statement can be used to produce output only when a condition is true or false. All alternative code paths are still parsed and verified. This is useful in libraries where different code must be produced depending on the situation.
The BUILD_IF statement can be used to build code only when a condition is true or false. This allows skipping code which would not compile, e.g. a missing enum value. This can be used to build code with different versions of the compiler, with different features or for different purposes (testing, profiling).
Example:
GENERATE_IF Z.lang == "C" >>> fputs("this is C code", stdout); <<< GENERATE_ELSEIF Z.lang == "JS" >>> alert("this is JavaScript"); <<< GENERATE_ELSE Z.error("Language " .. Z.lang .. " not supported) }
All alternative code paths are still parsed and resolved. Thus even when producing C code an error in the JavaScript code will be noticed.
The structure of the statement is:
GENERATE_IF boolean_expr statements GENERATE_ELSEIF boolean_expr statements GENERATE_ELSE statements }
The GENERATE_ELSEIF can appear any number of times.
The GENERATE_ELSE is optional.
For "boolean_expr" see the Compile time expression section below.
NOT IMPLEMENTED YET
Examples:
BUILD_IF Z.has("thread") # compiler has thread support # run jobs in parallel job1.start() job2.start() job1.wait() job2.wait() BUILD_ELSE # run jobs sequentially job1.run() job2.run() } BUILD_IF Color.has("purple") c = Color.purple # purple is available BUILD_ELSE c = Color.red # there is no purple, use red }
The alternate code paths are all parsed, to be able to find the end of the BUILD_IF statements. Thus the syntax must be correct, a missing } will be noticed. But only when the condition evaluates to true will the code be resolved and produced. This allows for using variables that don't exist, enum values that are not defined, etc.
The structure of the statement is:
BUILD_IF boolean_expr statements BUILD_ELSEIF boolean_expr statements BUILD_ELSE statements }
The BUILD_ELSEIF can appear any number of times.
The BUILD_ELSE is optional.
For "boolean_expr" see the next section.
When compilation is not supported, then GENERATE_ERROR can be used inside a GENERATE_IF to produce an error at compile time. This avoids that broken code is produced, causing a cryptic error from the C compiler or an error message at runtime.
GENERATE_ERROR takes one argument, which must evaluate to a string at compile time.
GENERATE_IF Z.lang == "C" >>> printf("%d", %nr%); <<< GENERATE_ELSE GENERATE_ERROR "Unsupported" }
The boolean_expr supports these operators:
|| # OR && # AND == # equal != # not equal
These values are supported:
TRUE FALSE "string literal" Z.lang # string: "C" when producing C code, or "JS" when producing JavaScript Z.have("backtrace") # boolean, TRUE when stack backtrace is available
Expressions are evaluated according to the operator precedence and then from left to right.
expr1 | expr2 ?: expr1 | if-nil | |
expr2 | expr3 ? expr1 : expr1 | ternary operator | |
expr3 | expr4 || expr3 | boolean or | |
expr4 | expr5 && expr4 | boolean and | |
expr5 | expr6 == expr6 expr6 != expr6 expr6 >= expr6 expr6 > expr6 expr6 <= expr6 expr6 < expr6 expr6 IS expr6 expr6 ISNOT expr6 expr6 ISA expr6 expr6 ISNOTA expr6 | equal not equal greater than or equal greater than smaller than or equal smaller than same object not same object same class not same class | |
expr6 | expr7 .. expr6 | string concatenation | |
expr7 | expr8 &expr7 expr8 | expr7 expr8 ^ expr7 | logical and logical or logical xor | |
expr8 | expr9 << expr9 expr9 >> expr9 | bitwise left shift bitwise right shift | |
expr9 | expr10 + expr9 expr10 - expr9 | add subtract | |
expr10 | expr11 * expr11 expr11 / expr11 expr11 % expr11 | multiply divide remainder | |
expr11 | ++expr12 --expr12 expr12++ expr12-- | pre-increment pre-decrement post-increment post-decrement | can be combined |
expr12 | -expr13 !expr13 ~expr13 &expr13 | negate boolean invert bitwise invert reference | not in front of a number |
expr13 | expr14.name expr14?.name expr14(expr1 ...) expr14.name(expr1 ...) expr14?.name(expr1 ...) expr14.(expr1 ...) expr14[expr1 ...] expr14.name[expr1 ...] expr14=name expr14<expr1 ...> expr14.<expr1 ...> | member not-nil member method call object method call not-nil method call method reference call get item get object item bits item value template typecast | |
expr14 | ( expr1 ) 1234 -1234 0x1abc 0b010110 'c' "string" R"string" ''"string"'' name $name [ expr1, ... ] { expr1: expr1, ... } NIL THIS PARENT NEW(expr1, ...) PROC (args) .. } FUNC (args) type .. } TRUE FALSE FAIL OK | grouping number negative number hex number binary number character constant string literal raw string literal multi-line string literal identifier member list initializer dict initializer |
Note that compared to C the precedence of &, | and ^ is different. In C their precedence is lower than for comparative operators, which often leads to mistakes.
Note that with "-1234" the minus sign belongs to the number, while otherwise "-" is a separate operator. This matters for members:
-1234.toHex() # apply toHex() on -1234 -var.member # apply "-" to "var.member" -var.func() # apply func() on "var", then apply "-"
This is a binary operator that evaluates to the left value when it is not zero or NIL and the right value otherwise. This is referred to as the null-coalescing operator or Elvis operator in other languages.
Example, where a translated message is used if it exists, otherwise the untranslated message is used:
getValue(translateMessage(msg) ?: msg)
Simplified syntax:
left ?: rightWhen "left" has its default value then the result is "right". Otherwise the result is "left".
This is equivalent to:
left != NIL ? left : right
Except that "left" is evaluated only once.
This operator uses a condition and two value expressions:
cond ? left : right
When the condition evaluates to TRUE the result is the left expression, otherwise the right expression. The expression that is not used is not evaluated.
Simplified syntax:
left || rightThe result is TRUE when "left" or "right" or both evaluate to TRUE. The result is FALSE when both "left" and "right" evaluate to FALSE.
When "left" evaluates to TRUE then "right" is not evaluated.
The compiler will generate an error when "right" or "true" do not evaluate to a bool type.
TODO
TODO
left == right # equal value left != right # unequal value
"left" and "right" must be of the same type, but size does not matter. Thus you can compare an int8 with int64. Also, signedness does not matter, you can compare a nat with an int. TODO: what if the nat value doesn't fit in an int?
Comparing Strings:
It is possible to compare a Bits value with zero. The result is TRUE if all fields in the Bits are at their default value.
When comparing objects the Equal() method is used. When there is no Equal() method this is a compilation error.
These operators have a string on the left and a regular expression pattern on the right. The =~ operator evaluates to TRUE when the pattern matches the string, !~ evaluates to TRUE when the pattern does not match the string. =~? and !~? do the same while ignoring differences in upper and lower case letters.
This is a short way of using a regex:
string =~ pattern string !~ pattern # equivalent to: RE.Regex.NEW(pattern).matches(string) !RE.Regex.NEW(pattern).matches(string) string =~? pattern string !~? pattern # equivalent to: RE.Regex.NEW(pattern, ignoreCase).matches(string) !RE.Regex.NEW(pattern, ignoreCase).matches(string)
See the regex type
TODO
left > right # larger than left >= right # larger or equal left < right # smaller than left <= right # smaller or equal
TODO
Using IS for string values may give unexpected results, because concatenation of string constants is done at compile time, and equal string values point to the same string. Therefore this condition evaluates to TRUE:
IF "Hello" IS "Hel" .. "lo"
These operators are used to test for the type of an object which can be one of multiple classes or interfaces. Example:
IF e ISA E.NilAccess IF decl ISNOTA Declaration
Simplified syntax:
left ISA right left ISNOTA rightThe "left" expression must evaluate to a value. The "right" expression must evaluate to a class or interface type.
This also works for an interface:
CLASS Foo IMPLEMENTS I_One ... Foo foo = NEW() IF foo ISA I_One # TRUE I_One one = foo
For ISA, if "left" is not NIL and can be typecast to "right", then the result is TRUE, otherwise it is FALSE.
For ISNOTA the result the opposite. These two expressions are equivalent:
left ISNOTA right !(left ISA right)
To test for whether a value is a specific class and not a child of that class, use the Type() function:
VAR left = ChildOfFoo.NEW() left ISA Foo # TRUE left.Type() IS Foo.Type() # FALSE
See .<Typecast> for when a typecast is valid.
TODO
left .. right
If "left" or "right" is not a string automatic conversion is done for these types, using their ToString() method:
TODO
left & right # bitwise AND left | right # bitwise OR left ^ right # bitwise XOR
"left" and "right" must be of a number or bits type.
When "left" and "right" are of the Bits type the operator is applied to all fields.
NOTE: In Javascript only the lower 32 bits are used.
TODO
NOTE: In Javascript only the lower 32 bits are used.
TODO
TODO
TODO
TODO
TODO
TODO
The "?." operator, called dotnil operator, works like ".", unless the expression before the "?." evaluates to NIL. In that case using "." would throw an E.NilAccess exception. When using "?." the result is the default value: zero, NIL or FALSE.
var?.member # value of "var.member" or 0/FALSE/NIL if var is NIL
Simplified syntax:
left?.rightWhen "left" is NIL then the result is the default value for "right". Otherwise the result is equal to "left.right".
This is equivalent to:
left == NIL ? 0 : left.right
Except that "left" is evaluated only once.
foo?.member = "value" # Does not work!
Using "?." on a member in the left-hand-side of a assignment will still throw E.NilAccess, since there is no place to write the value.
The "?." operator, called dotnil operator, works like ".", unless the expression before the "?." evaluates to NIL. In that case using "." would throw an E.NilAccess exception (unless IFNIL is used, see below). When using "?." the result is usually the default return value: zero, NIL or FALSE.
var?.Size() # size of "var", or 0 if var is NIL
Simplified syntax:
left?.right()When "left" is NIL then the result is the default return value for "right()". Otherwise the result is equal to "left.right()".
This is equivalent to:
left == NIL ? 0 : left.right()
Except that "left" is evaluated only once.
mylist?.add("value") # Does not work!
Using "?." on a method that modifies the object will still throw E.NilAccess, since there is no sensible fallback.
Note that when using IFNIL as the first statement in a method then "." behaves like "?.". And the behavior of both depends on the statements inside the IFNIL block.
TODO
TODO
TODO
foo.<ChildOfFoo>.childOfFooMethod()
This operator is most useful when invoking a method on an object which was declared to be of a parent class, while the method exists on a child class.
Simplified syntax:
left.<Type>
In general, a type is cast from the type of "left" to a more specific type. At compile time there is only a check if this typecast would be possible for some value of "left". If the typecast is never possible that is an error.
At runtime there will be a check if "left" is indeed of the type being casted to, or a child of it. If not than an E.WrongType exception will be thrown.
expr -> alt-expr ; alt-expr -> or-expr ( sep "?" sep alt-expr sep ":" sep alt-expr )? ; or-exp -> and-expr ( sep "||" sep and-expr )* ; and-expr -> comp-expr ( sep "&&" sep comp-expr )* ; comp-expr -> concat-expr ( sep ( "==" | "!=" | ">" | >=" | "<" | "<=" | "IS" | "ISNOT" | "ISA" | "ISNOTA" ) sep concat-expr )* ; concat-expr -> bitwise-expr ( sep ".." sep bitwise-expr )* ; bitwise-expr -> shift-expr ( sep ( "&" | "|" | "^" ) sep shift-expr )* ; shift-expr -> add-expr ( sep ( ">>" | "<<" ) sep add-expr )* ; add-expr -> mult-expr ( sep ( "+" | "-" ) sep mult-expr )* ; mult-expr -> incr-expr ( sep ( "*" | "/" | "%" ) sep incr-expr )* ; incr-expr -> ( "++" | "--" )? mult-expr ( "++" | "--" )? ; neg-expr -> ( "-" | "!" )? dot-expr ; dot-expr -> paren-expr ( TODO )? ; paren-expr -> "(" skip expr skip ")" | base-expr ; base-expr -> ( "EOF" | "NIL" | "THIS" | "TRUE" | "FALSE" | "OK" | "FAIL" | new-item | string | char | number | list | dict | comp-name ) ;
type -> comp-name ; comp-name -> var-name comp-follow* | member-name comp-follow* | group-name comp-follow+ ; comp-follow -> ( dot-item | paren-item | bracket-item | angle-item ) ; dot-item -> sep-with-eol? "." ( var-name | member-name ) ; paren-item -> "(" arguments? ")" ; bracket-item -> "[" skip expr skip "]" ; angle-item -> "<" arguments ">" ;
Using clear names for variables, classes, methods, etc. is very important to make a program easy to understand. Here are a few recommendations:
A few rules are enforced when using names:
Using CamelCase is recommended, but not enforced.
bool camelCaseName # recommended bool underscore_separated # discouraged
It is possible to use the builtin type names for variable names, if you really want:
string string = "foo" bool bool = TRUE dict<string, int> dict = ["foo": 6] func< => int> func = { => 6 }
When Zimbu grows and more features are added we want to make sure that your existing programs keep on working. Therefore you can not use names that are reserved for the language itself and for builtin libraries.
All words made out of upper case letters, underscores and digits are reserved. When there is at least one lower case letter the word is not reserved. Examples:
MY THERE_ MY_NAME _OPEN KEY2
Names cannot contain two or more consecutive underscores. Examples:
My__name __Foo there_____too
Type names starting with a lower case letter are reserved for predefined types. This applies to the name of classes, enums, modules, etc. Not to member variables and methods, which actually must start with a lower case letter. Examples:
bigInt bool string dict multiDict
Method and member names starting with an upper case letter are reserved for predefined methods and members. The methods can be defined in your class or module, so long as the arguments and return type match the predefined method, see predefined method. Examples:
FUNC $ToString() string FUNC $Equal(Titem other) bool FUNC Main() int
loop-name -> "." var-name ; file-name -> ( ! NL ) + ; group-name -> upper id-char* lower id-char* ; var-name -> lower id-char* ; member-name -> upper id-char* lower id-char* | lower id-char* ; id-char -> alpha | digit | "_" ; alpha -> upper | lower ; upper -> "A" .. "Z" ; lower -> "a" .. "z" ; digit -> "0" .. "9" ; block-end -> "}" sep-with-eol
The type of a value depends on the context. For example, using "123" can be an int or a nat, depending on where it is used. You will get an error if the value does not match the expected type. For example, using "1000" for a byte does not work, a byte can only store a number from 0 to 255.
int a = 1234 # 1234 used as an int nat b = 1234 # 1234 used as a nat byte c = 1234 # Error! 1234 does not fit in a byte list<int> la = [1, 2, 3] # 1, 2 and 3 used as an int list<dyn> la = [1, 2, 3] # 1, 2 and 3 used as a dyn
Examples:
0 # int or nat -123 # int 32239234789382798039480923432734343 # bigInt or bigNat 0xff00 # int or nat 0b11110000 # int or nat 0.01 # float
It can be difficult to see the value of large numbers. Zimbu allows using single quotes to separate groups of digits. For Java programmers an underscore can be used as well. But the single quote is recommended, it's easier to read. Swiss bankers use it!
1'000'000 0xffff'00ff 0b1010'0000'1111'1111 1_000_000 0xffff_00ff 0b1010_0000_1111_1111
A string value is mostly written with double quotes: "string". It cannot contain a literal line break. Special characters start with a backslash:
\\ \ \' ' \" " \a BEL 0x07 \b BS 0x08 \f FF 0x0c \n NL 0x0a \r CR 0x0d \t TAB 0x09 \v VT 0x0b \123 octal byte, must have three digits, start with 0, 1, 2 or 3 \x12 hex byte, must have two hex digits \u1234 hex character, must have four hex digits \U12345678 hex character, must have eight hex digits
With the \x item it is possible to create invalid UTF-8. In that case the type of the result will be a byteString instead of a string. When concatenating string literals with ".." and one of them is a byteString the result becomes a byteString.
If the bytes specified with "\x" result in valid UTF-8 then the result is still a string type.
IO.write("\u00bb mark \u00ab ¡really!\n") # output: » mark « ¡really!
All Unicode characters can be entered directly, the backslash notation is only required for control characters.
A raw string is written as R"string". Only the double quote character is special, it must be doubled to get one. A raw string cannot contain a line break: a literal line break is not allowed and \n does not stand for a line break.
IO.print(R"contains a \ backslash, \n no newline and a "" quote") # output: contains a \ backslash, \n no newline and a " quote
A long string can contain line breaks. Only "'' is special: it always terminates the string.
IO.write(''"line one line two line three "'') # output: line one # line two # line threeNote that leading space is included, also the line break just before "''.
A string can contain an expression in \(), for example:
list<string> names = ["Peter", "John"] IO.print("The \( names.Size() ) names are \( names )") # prints: The 2 names are ["Peter", "John"]After the expression inside \() is evaluated it is converted to a string, as if calling ToString().
Inside the \() spaces are optional. Usually it's easier to read when the \( is followed by a space and there is a space before the ).
Just after the \( a format can be specified. This format is passed to the ToString() method. Example:
int number = 111 int result = -8 IO.print("the \(.5d number) is \(5d result)") # prints: the 00111 is -8There must be no space between the \( and the format.
All the parts are concatenated into one string result. The string expression:
"the \(5d number ) is \( result )"is equivalent to:
"the " .. number.ToString("5d") .. " is " .. result.ToString()
[1, 2, 3] ["one", "two", "three", ] # trailing comma is allowed [1, "two", [3, 3, 3]] # mix of types can be used for list<dyn> [] # empty list
The type of the items is inferred from the context, if possible. Otherwise the type of the first item is used. If needed, cast the first type to the desired type. For example, to have a list that starts with a number but force the item type to be dyn:
[1, "text", TRUE] # Error: list<int> cannot contain "text" [1.<dyn>, "text", TRUE] # list<dyn> value
A list can also be used to intialize an array and a tuple. In the case of a tuple the type of each value must be correct.
Dict constants:
[1: "one", 2: "two", ] # trailing comma is allowed O[1: "one", 2: "two"] # with ordered keys [:] # empty dict
The type of the keys and items is inferred from the context, if possible. If the context doesn't specify the type the first key and item types are used.
An empty dict can only be used if the context specifies the types.
If the context specifies a parent type while the first key or item is a child of that parent, the parent type is used.
An object initializer can only be used when assigned to an object of a known class. The compiler will verify the type of each value.
{name: "Peter", address: { street: "Gracht", nr: 1234, city: "Amsterdam", } phone: ["+3120987644", "+31623423432"], }
As the example shows nesting is allowed. Not only with objects, also with lists, arrays and dicts.
The class must support a NEW() method without arguments. It is used to create an object before applying the values.
The last comma is optional.
string -> """ ( "^\"" | "\" ANY )* """ ; char -> "'" ( "^\'" | "\" ANY ) "'" ; number -> decimal-number | hex-number | binary-number ; decimal-number -> digit ( digit | "'")* ; hex-number -> ( "0x" | "0X" ) ( "0" .. "9" | "a" .. "f" | "A" .. "F" | "'" )+ ; binary-number -> ( "0b" | "0B" ) ( "0" | "1" | "'" )+ ; list -> "[" ( skip ( expr "," sep )* expr ( "," sep)? )? skip "]" ; dict -> empty-dict | non-empty-dict ; empty-dict -> "[:]" ; non-empty-dict -> "[" ( skip ( dict-item "," sep )* dict-item ","? )? skip "]" ; dict-item -> expr skip ":" sep expr ; new-item -> "NEW" "(" arguments? ")" ;
When a variable has not been explicitly initialized it will have the default value. This also applies to all members of an object. At the lowest level all bytes have the value zero.
type | value | also for |
bool | FALSE | |
status | FAIL | |
int | 0 | int8 int16 int32 int64 bigInt |
nat | 0 | byte nat8 nat16 nat32 nat64 bigNat |
float | 0.0 | float32 float64 float80 float128 |
fixed10 | 0 | fixed1 fixed 2 ... fixed15 |
enum | the first item | |
string | NIL | byteString varString varbyteString |
container | NIL | list, dict, set, etc. |
object | NIL |
For a bits every field will have the default value.
Modules and classes can define an Init() method to initialze things when the program is starting. In its simplest form this executes code that does not depend on other initializations. Example:
MODULE Foo list<string> weekendDays FUNC Init() status weekendDays = NEW() weekendDays.add("Saturday") weekendDays.add("Sunday") return OK } }
The EarlyInit() method is used in the same way, but it is called before the command line arguments are processed.
If an Init() or EarlyInit() method depends on other initialization to be done, and that has not been done yet, it should return FAIL. It will then be called again after making a round through all modules and classes.
This is how it works exactly:
Illustration:
MODULE Foo # A boolean command line argument "-v" or "--verbose". # This will be initialized in step 3, because ARG.Bool has the @earlyInit attribute. ARG.Bool verbose = NEW("v", "verbose", FALSE, "Verbose messages") # This will be initialized in step 6, after "verbose". string leader = verbose.value() ? "Foo module: " : "" # This will be invoked in step 7, after "leader" was initialized. FUNC Init() status IF Bar.Ready # when Bar has been initialized Bar.setLeader(leader) RETURN OK # initialization of Foo is done } RETURN FAIL # we need another round } }
If a class extends a class that has an Init method, and it does not define its own Init method, the Init method of the parent is invoked. Only one "Ready" flag is used to avoid calling it again after it returns OK.
Note that the initialization happens in one thread. If an Init() or EarlyInit() blocks then the whole program startup is blocked. It is not a good idea to block on something that takes longer than reading a file. Internet connections are better not used, unless the program really can't do anything without them.
When NEW() is invoked to create a new object, this happens:
The $Init() method is a PROC without arguments.
It is allowed to call $Init() again later. It will execute both the assignments for members and the body of the $Init() method. That includes the parent class, and its parent, etc. Note that none of the NEW() methods are called.
Best is to do simple initializations in the declaration, e.g.:
CLASS Example list<int> $numbers = NEW() string $message = "this is an example" }
More complicated initializations belong in $Init():
CLASS Example list<int> $numbers = NEW() string $message PROC $Init() FOR i IN 1 TO 10 $numbers.add(i) } IF Lang.current == Lang.ID.nl $message = "dit is een voorbeeld" ELSE $message = "this is an example" } } }
Keep in mind that these initializations cannot be overruled in sub-classes. Use NEW() if you do want that.
Garbage collection (GC) will find allocated objects that are no longer used and free the associated memory. This is done automatically, the programmer does not need to keep track of what objects are in use. The GC can be invoked intentionally with:
GC.run()
Normally there are no side effects when an object is destructed, other than the memory becoming available. If a side effect is desired, a Finish method can be defined. For example, when an object is used to keep track of a temp file:
CLASS TempFileName string $tempFileName NEW() $tempFileName = createTempFile() } FUNC $Finish() status IF $tempFileName != NIL IO.delete($tempFileName) $tempFileName = NIL # only delete it once } RETURN OK } }
NOTE: $Finish() is only fully supported for generated C code.
For Javascript it only works for not allocated variables.
$Finish() is never called when an object is garbage collected.
NOTE: $Finish() is not called when memory management has been disabled at
compile time with --manage=none.
NOTE: An alternative is to use a DEFER statement. The advantage is that the
work is done at the end of the function, not later when the object is
garbage collected. The disadvantage is that it requires an extra statement.
Finish has one optional argument: Z.FinishReason. This specifies the reason why it was called.
An attribute @notOnExit can be added to the Finish method. It will then not be called when the program is exiting. This is used by IO.File.Finish() to prevent the stdin/stdout/stderr files to be closed when exiting.
The Finish method can do anything. For allocated objects, if Finish() is called with unused and it returns FAIL this prevents the object from being freed. Also, when executing the Finish() method causes the object to be referenced from another object that is in use, the object will not be freed.
If a Finish method throws an exception it is caught and a message is written to stderr. Finish will not be called again, just like when it returned OK. However, running out of memory or another fatal error may cause the program to exit, and some Finish methods may not be called.
For not allocated objects, e.g., on the stack, the Finish() method is called once when leaving the block it was defined in, with an argument leave. Exceptions will be thrown as usual. This can be used to automatically executed code at the end of the block:
FOR name IN ["1", "22", "333"] TempFileName %tf = NEW() doSomething(tf, name) # uses the temp file. # %tf.Finish() called here, because leaving the block where %tf is declared }
When an exception causes the block to be left, the same happens as when the block is left in a normal way, thus Finish() is called with leave.
In a single-threaded application Finish methods will be called by the GC, and thus delay execution of the program. To avoid this put work to be done in a work queue (e.g. using a pipe), and invoke it at a convenient time.
In a multi-threaded application Finish methods will be called by the same thread that executes the GC. This is usually OK, but if a Finish method takes very long it prevents from the next GC round to happen. To avoid this run a separate thread to do the work, using a pipe to send the work from the Finish method to that thread.
One can also call Finish directly. This is useful to avoid waiting for the GC to kick in. You are expected to pass the called argument, but this is not enforced. Returning OK will prevent the method from being called again. The method can be called this way multiple times, also when it returned OK previously. Exceptions are not caught like when Finish is called by the GC.
This is how objects with a Finish method are handled by the GC:
The result is that an object with a Finish() method is not freed in the first GC round, but only in the GC round after it returned OK.
On exit (also when exiting because of an exception) the following happens:
The program may hang on exit when a Finish() method hangs. It is up to the programmer to make sure this does not happen. When a Finish() method throws an exception that is does not catch itself, e.g. when running out of memory or a NIL pointer access, the exception will be written to stderr. If an error occurs that is not caught the program will exit with some Finish() methods not being called.
The CTX module offers a way to pass objects down the call stack. This is useful for deciding at a high level what happens at a low level, without having to pass the object down all the way in a function argument. E.g. create one of several backends when a request arrives, and invoke that backend where it is needed at a function much deeper in the call stack.
This is also very useful for testing, to insert mock objects.
See the CTX module for more information.
Run Zimbu with the "test" argument and the main test file, like this:
zimbu test Something_test.zu
It is recommended to name test files like the file they are testing, with "_test" appended to the root name. This way they sort together.
The test file is like a main Zimbu file, without the Main function. The methods that execute tests need to be have a name starting with "test_".
FUNC test_Escaping() status TEST.equal("<div>", ZUT.htmlEscape("<div>")) RETURN OK }
These test functions will be called one by one. If an exception is thrown it is caught and reported. This counts as a failure.
Any other methods, variables, etc. can be present. There are no rules for these, they can go anywhere in the file. IMPORT can be used normally.
To include another test file use IMPORT.TEST, e.g.:
IMPORT.TEST One_test.zu IMPORT.TEST Two_test.zu
This allows for making one main test file that imports all the individual test files. That is faster than running each individual test separately.
While running tests each test file will be reported. At the end the number tests and number of failed tests is reported. To report each test function when it is executed add the -v argument to the execute argument:
zimbu test Something_test.zu -x -v
To run the tests with Javascript add the --js argument:
zimbu test --js Something_test.zu
A test method does not have arguments and must return status.
The test is considered to have failed:
Use methods from the CHECK module when continuing the test makes no sense if the check fails.
Use methods from the TEST module if testing can always continue.
Use LOG.error() if there is no TEST method for what you want to check.
FUNC test_Parser() status MyParser parser = MyParser.get() CHECK.notNil(parser) TEST.equal("result", parser.getResult()) IF parser.failCount() > 5 LOG.error("Too many parser failures") } IF parser.success() RETURN OK } parser.reportError() RETURN FAIL }
If all the test methods in a test file require some work before the actual testing starts, and/or some cleanup must be done after the test, the setUp and tearDown methods can be used. Example:
IO.File tmpFile string tmpFileName = "junk" PROC setUp() tmpFile = IO.fileWriter(tempFileName) } PROC tearDown() tmpfile.close() IO.delete(tempFileName) } FUNC test_One() status TEST.true(MyModule.dump(tmpFile)) }
The setUp method is called before every test method is called. If setUp throws an exception the test method is not invoked.
The tearDown method is called after the test method finishes. Also if the method throws an exception and also if the setUp method throws an exception.
There are two types of comments. The first type starts with a # and continues until the end of the line. Multi-line comments require repeating the # in every line.
The second type of comment starts with /* and ends with */. This comment must not contain a line break.
Comments can be used in many places, but not inside a string.
Recommended is to make the comment either a short note or a full sentence. A sentence starts with a capital letter and ends in a full stop, while a short note does not.
# Whole line comments are usually a sentence. idx++ # next item b = 0 # Reset b so that blah blah blah blah blah blah blah blah.
Zudocu can be used to generate documentation from source code. Special markers in the comments are used. A wiki-like syntax is used for formatting. See the web page. This is extensively used in the Zimbu library code.
Zimbu is very strict about use of white space. This ensures that every Zimbu program that compiles has at least the basic spacing right. Examples:
a="foo" # Error: Must have white space before and after the "=". a = "foo" # OK f(1,2) # Error: A comma must be followed by white space. f(1, 2) # OK f( 1) # Error: No white space after "(" if text follows. f(1 ) # Error: No white space before ")" if text precedes. f(1) # OK
Zimbu uses line breaks to separate statements, so that there is no need for a semicolon. This is done in a natural way, the exact syntax specifies what the rules are.
If you do want to put several statements in one line, use a semicolon as a statement separator:
SWITCH count CASE 0; $write("no items"); RETURN FAIL CASE 1; $write("1 item"); RETURN OK DEFAULT; $write("\(count) items"); RETURN OK }
line-sep | Line separator: Either a semicolon or an NL with optional white space and comments. | |
semicolon | A semicolon with mandatory following white space. This is only used to separate statements. | |
sep-with-eol | At least one line break, with optional comments and white space. | |
sep | Mandatory white space with optional comments and line breaks. | |
skip | Optional white space, comments and line breaks. | |
white | One or more spaces. | |
comment | One comment, continues until the end of the line. |
line-sep -> semicolon | sep-with-eol ; semicolon -> ";" white sep-with-eol -> ( white comment )? NL skip ; sep -> ( white | NL ) skip ; skip -> ( ( white | NL ) ( white | comment | NL )* )? ; white -> " "+ ; comment -> "#" ( ! NL )* ;
one-item non-terminal "abc" terminal representing string literal "abc" "a" .. "z" terminal: a character in the range from "a" to "z" "^abc" terminal: any character but "a", "b" or "c" NL terminal, New Line character, ASCII 0x0a ANY terminal, any character not discarded by the preprocessor -> produces | alternative ; end of rule () group items into one non-terminal ? preceding item is optional * preceding item appears zero or more times + preceding item appears one or more times ! anything but next item
Copyright 2013 Bram Moolenaar All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. The License can be found it in the LICENSE file, or you may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.