Project¶
A workflow is defined within a Project.
To create a new ZnTrack project, initialize a new repository:
Tip
ZnTrack builds a DVC data pipeline for you. You don’t need to know DVC to use ZnTrack, but it is recommended to familiarize yourself with the basics. For more information, see the DVC documentation on data pipelines.
mkdir my_project
cd my_project
git init
dvc init
Note
This documentation assumes that you have a single workflow file, main.py
, in the root of your project.
Additionally, all Node definitions that do not originate from a package should be imported from src/__init__.py
.
To ensure this structure, run:
touch main.py
mkdir src
touch src/__init__.py
Available Nodes
This project uses the following Node definitions in src/__init__.py
:
import zntrack
class Add(zntrack.Node):
a: int = zntrack.params()
b: int = zntrack.params()
result: int = zntrack.outs()
def run(self) -> None:
self.result = self.a + self.b
# Multiply uses ``zntrack.deps`` to process data from other nodes
class Multiply(zntrack.Node):
a: int = zntrack.deps()
b: int = zntrack.deps()
result: int = zntrack.outs()
def run(self) -> None:
self.result = self.a * self.b
We will now define a workflow that connects multiple Node instances.
As you can see, ZnTrack allows you to connect Nodes directly through their attributes.
It is important to treat the main.py
file purely as a workflow configuration file.
For a great explanation of this approach, refer to the Apache Airflow documentation.
Note
In addition to the predefined fields (e.g., a
, b
, and result
), it is also possible to pass the full Node instance as an argument.
For on-the-fly computations, you can define @property
methods that are not stored in the Node state and pass them between Node instances.
The @property
decorator can also be used to define custom file readers.
The Project Context Manager
import zntrack
from src import Add, Multiply
project = zntrack.Project()
with project:
add1 = Add(a=1, b=2)
add2 = Add(a=3, b=4)
add3 = Multiply(a=add1.result, b=add2.result)
project.build()
Calling project.build()
generates all necessary configuration files and prepares the project for execution.
ZnTrack Configuration Files
A ZnTrack project typically consists of three configuration files:
params.yaml
: Stores parameters defined inmain.py
, organized by node name keys.dvc.yaml
: Defines the DVC workflow. For details, see the DVC documentation.zntrack.json
: Contains additional metadata used by ZnTrack to manage the workflow.
You should not modify dvc.yaml
or zntrack.json
manually.
While you can edit params.yaml
, it is recommended to change parameters within main.py
to maintain a single source of truth.
To execute the workflow, use the dvc
command-line tool:
dvc repro
Tip
Instead of running dvc repro
, you can call project.repro()
instead of project.build()
.
Groups¶
To organize the workflow, you can group Node instances. Groups are purely for organization and do not affect execution.
Note
Each Node is assigned a unique name. By default, this name consists of the class name followed by a counter. If a Node is part of a group, the group name is prefixed to its name.
You can list all Node names using the CLI command zntrack list
.
If you want to set a custom name, pass the name
argument when creating the Node instance:
add1 = Add(a=1, b=2, name="custom_name")
If a Node is in a group, the group name is also prefixed to the custom name. Custom names must be unique within their group. If a duplicate name is found, ZnTrack will raise an error.
project = zntrack.Project()
with project:
add1 = Add(a=1, b=2)
print(add1.name)
>>> Add
with project.group("grp"):
add2 = Add(a=1, b=2)
print(add2.name)
>>> grp_Add
with project.group("grp", "subgrp"):
add3 = Add(a=3, b=4)
print(add3.name)
>>> grp_subgrp_Add
project.build()
MLFlow Integration¶
ZnTrack provides an integration between DVC and MLFlow. You can upload existing runs using a command line interface if mlflow is installed. See the CLI help for information on how to configure the MLFlow server, selected Node instances, and the experiment id.
zntrack mlflow-sync --help