Qri
Starlark
automating qri dataset version creation & maintenance
Qri Starlark is built atop Starlark in Go, which originally came from the bazel build tool at google. The spec provides a nice high-level description:
Starlark is an untyped dynamic language with high-level data types, first-class functions with lexical scope, and automatic memory management or garbage collection.
Starlark is strongly influenced by Python, and is almost a subset of that language. In particular, its data types and syntax for statements and expressions will be very familiar to any Python programmer. However, Starlark is intended not for writing applications but for expressing configuration: its programs are short-lived and have no external side effects and their main result is structured data or side effects on the host application. As a result, Starlark has no need for classes, exceptions, reflection, concurrency, and other such features of Python.
Runtime Options
Starlark provides a nubmer of runtime options. Qri starlark runs with the following nonstandard runtime options:
- while loops and recursion are enabled
- floating point numbers are enabled.
set
builtins are enabled
Nondeterminism
Starlark execution is deterministic: all functions and operators in the core language produce the same execution each time the program is run; there are no sources of random numbers, clocks, or unspecified iterators. This makes Starlark suitable for use in applications where reproducibility is paramount, such as build tools.
Unlike Starlark, Qri Starlark is not determinsitic. Users have access to packages that make HTTP calls, generate random numbers, all of which are sources of nondeterminism. Because these sources are a hard requirement of doing data maintenance, we take this to it's natural conculsion and enable while
loops and recursion.
Transform function
A Qri Starlark step can define either zero or exactly one function that will be called by the qri runtime:
def transform(ds):
This is equivelant to a main
function in a c-language-style program. The qri runtime will always pass a dataset as the first argument. The function can modify the dataset argument to mutate the dataset, or return a dataset to explicitly define the returned dataset. Transform functions must return either None
or a dataset. Any other value is a fatal error. If a transform function returns a dataset mutataions to the dataset argument are ignored.
Enhanced print
Print is "enhanced" to accept numerous arguments and coerce them output strings. The following program should strive to match the output of console.log()
in node.js:
print({"a": 1, "b": False })
# output: { "a": 1, "b": False }
Standard library
Qri Starlark exposes a standard library. The standard libary is called starlib, and all packages are available by default.
The version of starlark syntax recorded in a transform step matches that of this standard libary, and follows semantic versioning vMAJOR.MINOR.PATCH
format. Any given version of qri is de-facto bound to a specific version of starlark, and will use that version for all new programs. For example we might say "qri version 0.10.0 ships with starlark version 2.1.0". The qri runtime may ship with multiple starlark runtimes for backwards compatibility.
Python Differences
the following an edited version of bazel docs
- Global variables are immutable.
for
statements are not allowed at the top-level. Use them within functions instead.if
statements are not allowed at the top-level. However, if expressions can be used:first = data[0] if len(data) > 0 else None.
- Deterministic order for iterating through Dictionaries.
- Recursion is not allowed.
Int
type is limited to 32-bit signed integers. Overflows will throw an error.- Modifying a collection during iteration is an error.
- Except for equality tests, comparison operators
<, <=, >=, >,
etc. are not defined across value types. In short:5 < 'foo'
will throw an error and5 == "5"
will return false. - In tuples, a trailing comma is valid only when the tuple is between parentheses, e.g. write
(1,)
instead of1,
. - Dictionary literals cannot have duplicated keys. For example, this is an error: {"a": 4, "b": 7, "a": 1}.
- Strings are represented with double-quotes (e.g. when you call repr).
- Strings aren’t iterable.
The following Python features are not supported:
- implicit string concatenation (use explicit + operator)
- Chained comparisons (e.g.
1 < x < 5
) class
(see struct function)import
(see load statement)yield
- generators and generator expressions
is
(use == instead)try
,raise
,except
,finally
(see fail for fatal errors)- global, nonlocal
- most builtin functions, most methods