Project¶

A workflow is defined within a Project.

To create a new ZnTrack project, initialize a new repository:

Tip

ZnTrack builds a DVC data pipeline for you. You don’t need to know DVC to use ZnTrack, but it is recommended to familiarize yourself with the basics. For more information, see the DVC documentation on data pipelines.

mkdir my_project
cd my_project
git init
dvc init

Note

This documentation assumes that you have a single workflow file, main.py, in the root of your project. Additionally, all Node definitions that do not originate from a package should be imported from src/__init__.py.

To ensure this structure, run:

touch main.py
mkdir src
touch src/__init__.py

We will now define a workflow that connects multiple Node instances. As you can see, ZnTrack allows you to connect Nodes directly through their attributes. It is important to treat the main.py file purely as a workflow configuration file. For a great explanation of this approach, refer to the Apache Airflow documentation.

Note

In addition to the predefined fields (e.g., a, b, and result), it is also possible to pass the full Node instance as an argument. For on-the-fly computations, you can define @property methods that are not stored in the Node state and pass them between Node instances. The @property decorator can also be used to define custom file readers.

import zntrack

from src import Add, Multiply

project = zntrack.Project()

with project:
    add1 = Add(a=1, b=2)
    add2 = Add(a=3, b=4)
    add3 = Multiply(a=add1.result, b=add2.result)

project.build()

Calling project.build() generates all necessary configuration files and prepares the project for execution.

To execute the workflow, use the dvc command-line tool:

dvc repro

Tip

Instead of running dvc repro, you can call project.repro() instead of project.build().

Groups¶

To organize the workflow, you can group Node instances. Groups are purely for organization and do not affect execution.

Note

Each Node is assigned a unique name. By default, this name consists of the class name followed by a counter. If a Node is part of a group, the group name is prefixed to its name.

You can list all Node names using the CLI command zntrack list. If you want to set a custom name, pass the name argument when creating the Node instance:

add1 = Add(a=1, b=2, name="custom_name")

If a Node is in a group, the group name is also prefixed to the custom name. Custom names must be unique within their group. If a duplicate name is found, ZnTrack will raise an error.

project = zntrack.Project()

with project:
    add1 = Add(a=1, b=2)
    print(add1.name)
    >>> Add

with project.group("grp"):
    add2 = Add(a=1, b=2)
    print(add2.name)
    >>> grp_Add

with project.group("grp", "subgrp"):
    add3 = Add(a=3, b=4)
    print(add3.name)
    >>> grp_subgrp_Add

project.build()

MLFlow Integration¶

ZnTrack provides an integration between DVC and MLFlow. You can upload existing runs using a command line interface if mlflow is installed. See the CLI help for information on how to configure the MLFlow server, selected Node instances, and the experiment id.

zntrack mlflow-sync --help