Yggdrasil is an in-house orchestration framework designed to automate well-defined workflows. It watches directories, CouchDB changes, etc., then calls realm modules (external or internal packages) to do the heavy lifting. Example realms today:
tenx(internal) - 10x Genomics best practice analysissmartseq3(internal) - Smart-seq3 best practice analysisdataflow-dmx(external) - [under developmennt] Demultiplexing pipeline for Illumina / Aviti / ONT- (more to come)
External realms self-register through the entry-point group ygg.handler.
- Installation
- Install External Realms
- Project Structure
- Usage
- Configuration
- Development Guidelines
- Contributing
- License
# Clone & create an isolated env
git clone https://github.com/NationalGenomicsInfrastructure/Yggdrasil.git
cd Yggdrasil
conda create -n ygg-dev python=3.11 pip
conda activate ygg-dev
# Editable install with dev extras (ruff, mypy, ...)
pip install -e .[dev]
# Run Yggdrasil
yggdrasil
# Or alternatively
python -m yggdrasil- Runtime dependencies come from
[project] dependenciesinpyproject.toml. - Dev tooling is pulled from
[project.optional-dependencies] dev.
# 1. Clone & create an isolated env
git clone https://github.com/NationalGenomicsInfrastructure/Yggdrasil.git
cd Yggdrasil
conda create -n ygg python=3.11 pip
conda activate ygg
# 2. Install locked runtime stack
pip install -r requirements/lock.txt
# 3. Install Yggdrasil itself (no dev extras)
pip install -e .requirements/lock.txt is generated from the dependency list with pip-compile --strip-extras
# Clone next to Yggdrasil or organize in a `realms` dir (any folder works)
git clone https://github.com/NationalGenomicsInfrastructure/dmx.git
pip install -e ./dmxRestart Yggdrasil so it re-scans entry-points. Startup log shows the handler is active:
✓ registered external handler flowcell-dmx for FLOWCELL_READY
When a new event is detected, Yggdrasil schedules the appropriate handler as an async background task in its event loop.
Brief overview of the main components and directories:
Yggdrasil/
├── lib/
│ ├── base/
│ ├── core_utils/
│ ├── couchdb/
│ ├── handlers/
│ ├── module_utils/
│ ├── realms/
│ │ ├── tenx/
│ │ └── smartseq3/
│ └── watchers/
├── tests/
├── .github/
│ └── workflows/
├── requirements/
├── yggdrasil.py
├── ygg_trunk.py (depr)
├── ygg-mule.py (depr)
├── pyproject.toml
├── LICENSE
└── README.md
- lib/: Core library containing base classes and utilities.
- base/: Abstract base classes and interfaces.
- core_utils/: Utility modules for Yggdrasil core functionalities.
- couchdb/: Classes specific for Yggdrasil-CouchDB interactions and document management.
- handlers/: Base classes and built-in event/data handlers for processing and workflow orchestration.
- module_utils/: Utility modules for various Yggdrasil module functionalities.
- realms/: Internal modules specific to different sequencing technologies (e.g. TenX, SmartSeq3, etc.)
- watchers/: File system and CouchDB watchers for monitoring and triggering events.
- tests/: Test cases for the application.
- .github/workflows/: GitHub Actions workflows for CI/CD.
- requirements/: Dependency lock files and requirements management for reproducible environments.
Yggdrasil has a single entry-point for both daemon operation (background watchers + handlers) and one-off project processing. After you installed Yggdrasil in an environment, call it in the following way:
yggdrasil [--dev] {daemon | run-doc} [OPTIONS]| Global flag | Description |
|---|---|
--dev |
Turns on development mode: • DEBUG-level logging • Dev-mode configuration overrides (useful on a laptop) |
You can also run the CLI via python -m yggdrasil or python -m yggdrasil.cli if you prefer.
Starts the long-running service:
- instantiates all configured watchers (file-system, CouchDB, ...);
- auto-registers built-in and external handlers;
- processes events until you stop it with Ctrl-C.
# production-style run
yggdrasil daemon
# verbose local run
yggdrasil --dev daemonLogs are written to the directory set in yggdrasil_workspace/common/configurations/config.json → yggdrasil_log_dir.
Processes exactly one CouchDB project document and then exits. Useful for manual re-processing or debugging.
yggdrasil run-doc DOC_ID [--manual-submit]| Option | Meaning |
|---|---|
--manual-submit |
Force manual HPC submission for this invocation (handlers check a session flag instead of auto‐calling sbatch). |
Objective: Rerun project N.Surname (CouchDB doc_id: a1b2c3d4e5f), but stop before Slurm submission because we need to manually edit the project's configurations.
# Initially run
yggdrasil run-doc a1b2c3d4e5f --manual-submitAfter you run this, manually edit the project as needed and submit to Slurm. Copy the Slurm job_id to the respective field in the project's CouchDB doc, and re-run the same command:
yggdrasil run-doc a1b2c3d4e5f --manual-submit`Yggdrasil will pick up the running Slurm job and wait for it until it finishes, to continue with post-processing.
| You want to… | Command |
|---|---|
| Run Yggdrasil as a background service | yggdrasil daemon |
| Same, but with dev logging & dev servers | yggdrasil --dev daemon |
| (re)Process one document | yggdrasil run-doc <DOC_ID> |
| (re)Process with manual Slurm submission | yggdrasil run-doc <DOC_ID> --manual-submit |
| When developing, use module form instead of console-script | python -m yggdrasil ... |
Yggdrasil uses a configuration loader to manage settings. Configuration files should be placed in the yggdrasil_workspace/common/configurations directory. This directory path can be adjusted in the lib/core_utils/common.py script if needed.
config.json: This file contains global settings for Yggdrasil.
Fields:
- yggdrasil_log_dir: Directory where logs will be stored.
- couchdb_url: URL of the CouchDB server (host:port format).
- couchdb_database: Name of the CouchDB project database.
- couchdb_status_tracking: Name of the CouchDB yggdrasil database for project status tracking.
- couchdb_poll_interval: Interval (in seconds) for polling CouchDB for changes.
- job_monitor_poll_interval: Interval (in seconds) for polling the job monitor.
- activate_ngi_cmd: Command to activate NGI environment (can be "None" if not used).
- report_transfer: Settings for transferring reports (server, user, destination, ssh_key).
Example Configuration File (config.json)
{
"yggdrasil_log_dir": "yggdrasil_workspace/logs",
"couchdb":{
"url": "<host>:<port>"
},
"couchdb_database": "my_projects",
"couchdb_status_tracking": "my_yggdrasil_db",
"couchdb_poll_interval": 3,
"job_monitor_poll_interval": 5,
"activate_ngi_cmd": "None",
"report_transfer": {
"server": "<server>",
"user": "<username>",
"destination": "<destination_path>",
"ssh_key": "<ssh_key_path>"
}
}module_registry.json: This file maps different library construction methods to their respective internal processing modules. The modules specified here will be dynamically loaded and executed based on the entire name of a library_prep_method specified in the CouchDB document, or a designated prefix of them.
Example:
{
"SmartSeq 3": {
"module": "lib.realms.smartseq3.smartseq3.SmartSeq3"
},
"10X": {
"module": "lib.realms.tenx.tenx_project.TenXProject",
"prefix": true
}
}- SmartSeq 3:
- module: The path to the module handling SmartSeq 3 library data.
- 10X:
- module: The path to the module handling 10X-prefixed library data.
The following variables can also be set in the config.json, but for safety reasons, you are endorsed to set them as environment variables, like so:
- COUCH_USER: Your CouchDB username.
- COUCH_PASS: Your CouchDB password.
Yggdrasil uses a custom logging utility to manage logs. Logs are stored in the directory specified by the yggdrasil_log_dir configuration.
Debug Logging: By setting the --dev flag when running Yggdrasil, the debug logging is enabled automatically.
Ensure you have activated the Conda environment, and have installed runtime + dev tools. The latter can be done in one go with:
pip install -e .[dev].[dev] pulls:
- ruff (lint) · black (format) · mypy (type-check)
- pip-tools (
pip-compile) - pre-commit itself — no separate pip install needed.
Use pre-commit to automate code formatting and linting on each commit.
# Install Git hooks (runs ruff / black / mypy automatically)
pre-commit install| Task | Command |
|---|---|
| Format everything | black . |
| Lint | ruff check . |
| Static types | mypy . |
| Run all hooks | pre-commit run --all-files |
(Hooks fire automatically on git commit; run manually only if you want a
full pass before staging.)
Install extensions:
- Python (Microsoft)
- Ruff (Astral Software)
- Black Formatter (Microsoft)
- Mypy Type Checker (Microsoft)
VSCode Settings
Add to settings.json (user or workspace):
{
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.formatOnSave": true,
"ruff.configuration": "pyproject.toml",
"mypy-type-checker.args": [ "--config-file=pyproject.toml" ]
}Ignore bulk-format commits so git blame stays useful:
git config blame.ignoreRevsFile .git-blame-ignore-revsAppend the commit (full) hashes of large "black-only" or "ruff-fix" commits to the .git-blame-ignore-revs file (one hash per line), e.g.:
a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0
b1c2d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0
GitHub Actions are set up to automatically run ruff, black, and mypy on pushes and pull requests.
- Workflow File:
.github/workflows/lint.yml - Jobs:
ruff-check,black-check,mypy-check - Each job installs exact runtime versions from
requirements/lock.txt, then the tool it needs.
Contributions are very welcome! To have as smooth of an experience as possible, the following guidelines are recommended:
- Forking: Fork the main repository to your personal GitHub account.
- Git workflow: Open pull-requests against the
devbranch. - Code Style: Format with
blackand lint withruff. - Type Annotations: If you use type annotations make sure to set (and pass)
mypychecks. - Pre-commit:
black,ruff, andmypyrun automatically. Make surepre-commit installis enabled and hooks pass before pushing. - Documentation: Documented contributions are easier to understand and review.
Suggested contributions: Tests, Bug Fixes, Code Optimization, New Modules (reach out to Anastasios if you don't know where to start with developing a new module).
Yggdrasil is licensed under the MIT License - see the LICENSE file for details.