Qri
Inside CI
building dataset versions within a continuous integration pipeline
A user wants to construct a github action that updates a dataset & publishes it to qri cloud on completion.
In this case github actions are acting as the orchestrator, so no workflow necessarily needs to be defined. The dataset may or may not be automated. This is a real use case for qri save --apply
to run & save transforms in a one-off fashion. That said, users may opt to skip starlark entirely in this context, as they have the benefit of containers & their own language of comfort.
Assumptions:
- The qri repo that creates the new version will operate in an ephemeral container
- An online node external to the container will host the dataset post-construction
- The dataset is private & encrypted at rest
- The ephemeral container will be provided a credential in advance, stored in a secure location and provided to the ephemeral container at startup
Steps:
- Ned is the dataset author. Ned constructs a dataset with
qri init --encrypted ned/dataset
. At this point the dataset has no history, but an operation is added to logbook stating all created versions will be encrypted. -
Ned runs
qri access container-token ned/dataset
. This does a few things:- Creates a new ED25519 key
- Creates a UCAN granting write access to
ned/dataset
. token contains the initID of the dataset. - Adds an operation into the
ned/dataset
logbook that states the new key is a collaborator - outputs a JSON file of
{ key, token }
. This is the container token
-
Ned hops over to CI, sets up a job to run the following commands in a qri container:
curl -fsSL https://qri.io/install_for_container.sh | sh QRI_CONTAINER_TOKEN
to install qri & setup with the container tokenqri save --pull --push --body my_file.csv ned/dataset
to write a new commit. The--pull
flag instructs qri to pull the dataset before saving. The--push
flag tells qri to push to the default location on successful save.
install_for_container.sh
this script is a variant of the installer. It does what the base install.sh script does: download and install the latest version of qri,then runs one additional command: qri setup --container-token QRI_CONTAINER_TOKEN
to create a new qri instance backed by the key stored in the robo-token. using the container-token
flag with setup also configures qri to disable many services that are unnecessary in a containerized context.
It seems like a lot of steps, but provides the necessary amounts of flexibility. The qri access create-robo-token
command should accept multiple datasets, or an entire user namespace (ned
in this case), which would provide edit access to all user datasets.
Using a base qri image
Another option is to use a container that already comes with qri installed. This would remove the need for the install_for_container
script.
Alternative: BLS co-signing
Moving keys around is a bad idea. To get around this we'd need to design a solution that provisions a capability to a key that's not known at the time of construction. This adds a bunch of complexity, and I'm content for the moment to inform users that the security of this setup is dependant on them.