Qri

Visit qri.io

Inside CI

building dataset versions within a continuous integration pipeline


A user wants to construct a github action that updates a dataset & publishes it to qri cloud on completion.

In this case github actions are acting as the orchestrator, so no workflow necessarily needs to be defined. The dataset may or may not be automated. This is a real use case for qri save --apply to run & save transforms in a one-off fashion. That said, users may opt to skip starlark entirely in this context, as they have the benefit of containers & their own language of comfort.

Assumptions:

  • The qri repo that creates the new version will operate in an ephemeral container
  • An online node external to the container will host the dataset post-construction
  • The dataset is private & encrypted at rest
  • The ephemeral container will be provided a credential in advance, stored in a secure location and provided to the ephemeral container at startup

Steps:

  1. Ned is the dataset author. Ned constructs a dataset with qri init --encrypted ned/dataset. At this point the dataset has no history, but an operation is added to logbook stating all created versions will be encrypted.
  2. Ned runs qri access container-token ned/dataset. This does a few things:

    • Creates a new ED25519 key
    • Creates a UCAN granting write access to ned/dataset. token contains the initID of the dataset.
    • Adds an operation into the ned/dataset logbook that states the new key is a collaborator
    • outputs a┬áJSON file of { key, token }. This is the container token
  3. Ned hops over to CI, sets up a job to run the following commands in a qri container:

    • curl -fsSL https://qri.io/install_for_container.sh | sh QRI_CONTAINER_TOKEN to install qri & setup with the container token
    • qri save --pull --push --body my_file.csv ned/dataset to write a new commit. The --pull flag instructs qri to pull the dataset before saving. The --push flag tells qri to push to the default location on successful save.

install_for_container.sh

this script is a variant of the installer. It does what the base install.sh script does: download and install the latest version of qri,then runs one additional command: qri setup --container-token QRI_CONTAINER_TOKEN to create a new qri instance backed by the key stored in the robo-token. using the container-token flag with setup also configures qri to disable many services that are unnecessary in a containerized context.

It seems like a lot of steps, but provides the necessary amounts of flexibility. The qri access create-robo-token command should accept multiple datasets, or an entire user namespace (ned in this case), which would provide edit access to all user datasets.

Using a base qri image

Another option is to use a container that already comes with qri installed. This would remove the need for the install_for_container script.

Alternative: BLS co-signing

Moving keys around is a bad idea. To get around this we'd need to design a solution that provisions a capability to a key that's not known at the time of construction. This adds a bunch of complexity, and I'm content for the moment to inform users that the security of this setup is dependant on them.