Node¶
A Node is the core component of ZnTrack, defining a unit of computation used in a workflow. It encapsulates a self-contained piece of logic that can be executed independently or as part of a larger pipeline.
Note
The Node is built on top of Python’s dataclasses, leveraging their simplicity and power to define structured, reusable components.
A Node consists of three key parts:
Inputs¶
Every parameter or dependency required to run the Node. Inputs define the data or configuration that the Node needs to perform its computation. Possible inputs include:
zntrack.params()
for JSON-serializable data, e.g.,{"loss": "mse", "epochs": 10}
.zntrack.params_path()
for parameter files. See parameter dependencies for more information.zntrack.deps()
for dependencies from another Node. More details are provided in the Project section.zntrack.deps_path()
for file dependencies. See simple dependencies for more information.
Outputs¶
Every result produced by the Node. Outputs are the data or artifacts generated after the Node has executed its logic. Possible outputs include:
zntrack.outs()
for any output data. This uses JSON and pickle to serialize data.zntrack.outs_path()
to define an output file path.zntrack.metrics()
for metrics stored asdict[str, int|float]
.zntrack.metrics_path()
for file paths to store metrics.zntrack.plots()
for plots as pandas DataFrames.zntrack.plots_path()
for file paths to store plots.
Run¶
The function executed when the Node is run. This is where the core computation or logic of the Node is defined.
It is also possible to define multiple run methods for a single Node, enabling flexible execution strategies depending on the context. For more details, see Custom Run Methods.
Example¶
ZnTrack integrates features that simplify file writing and reading.
The file paths for fields without the _file
suffix are automatically handled by ZnTrack.
The following example demonstrates how to define a simple Node that adds two numbers.
import zntrack
class Add(zntrack.Node): # Inherit from zntrack.Node
# Define parameters similar to dataclass.Field
a: int = zntrack.params()
b: int = zntrack.params()
# Define an output
result: int = zntrack.outs()
def run(self) -> None:
# Core computation of the Node
self.result = self.a + self.b
The Node above can also be written in a more explicit manner, manually saving and loading inputs and outputs.
Tip
ZnTrack provides an nwd path specific to each Node in the workflow. It is highly recommended to use this path to store all data generated by the Node to avoid file name conflicts.
from pathlib import Path
class AddViaFile(zntrack.Node):
params_file: str = zntrack.params_path()
results_file: Path = zntrack.outs_path(zntrack.nwd / "results.json")
def run(self) -> None:
import json
with open(self.params_file, "r") as f:
params = json.load(f)
result = params[self.name]["a"] + params[self.name]["b"]
self.results_file.parent.mkdir(parents=True, exist_ok=True)
with open(self.results_file, "w") as f:
json.dump({"result": result}, f)
Design Patterns¶
A Node should encapsulate a single, well-defined piece of logic to improve readability and maintainability. However, since communication between Node instances occurs through files, excessive splitting can slow down the workflow due to file I/O overhead. To optimize performance, related tasks that always run together should be grouped within a single Node. For example, if a task can be efficiently parallelized—such as preprocessing data in batches—it is better to handle the parallelization within a single Node rather than splitting it into multiple Node instances.