layout | title | nav_order |
---|---|---|
default |
Experiment tracking |
10 |
The tracking module handles parsing training logs to extract Marian training metrics in real time.
The parser supports different sources:
- Online publication from Taskcluster training or evaluation tasks.
- Deferred publication from a Taskcluster task or group of tasks.
- Deferred publication from a local directory containing archived training data.
The parser supports writting metrics to Weight & Biases external storage (see the section above), or produce local artifacts (CSV files).
It actually supports logs from Marian 1.10 and Marian 1.12. Above versions (even minor) will raise a warning and may result in missing data.
Publication is implemented within the training (pipeline.train.train.get_log_parser_command
) and evaluation (pipeline.eval.eval.main
). This is the prefered way to track metrics, as machine resource usage will also be published to Weight & Biases.
Any new experiment will automatically be published to the public Weight & Biases dashboard.
Any new pull request will trigger publication to the ci
project in Weight & Biases. You may want to edit a value in taskcluster/configs/config.ci.yml
(e.g. the first disp-freq
entry) to force a new publication, because of Taskcluster cache.
It is possible to use the parser on Taskcluster's tasks that have finished. The parser supports reading training tasks directly from the Taskcluster API (no authentication).
This method is useful to reupload data of past training and evaluation tasks.
You can run the parser on a Taskcluster group by running:
$ parse_tc_group <task_group_id>
By default, this command will fetch other traversal tasks (related experiments). You can avoid this behavior by using the --no-recursive-lookup
option.
You can also run the parser based on the logs of a single task:
parse_tc_logs --input-file=live_backing.log
The parser supports browsing a folder structure from a GCP archive of multiple training runs. This method is useful to reupload data of past training and evaluation tasks that are not available anymore from Taskcluster (expired) or when handling a large amount of data.
The structure from experiments that ran on Taskcluster should look like this:
.
├── logs
│ └── en-hu
│ └── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
│ └── student
│ ├── train.log
│ └── …
└── models
└── en-hu
└── baseline_enhu_aY25-4fXTcuJNuMcWXUYtQ
└── evaluation
├── speed
│ ├── sacrebleu_wmt09.metrics
│ └── …
└── student
├── flores_devtest.metrics
└── …
The structure from older experiments that ran with Snakemake should look like this:
.
├── logs
│ └── …
└── models
└── en-sv
└── opusmt-multimodel-test
├── evaluation
│ └── speed
│ ├── tc_Tatoeba-Challenge-v2021-08-07.metrics
│ └── …
├── student-finetuned
│ ├── train.log
│ └── valid.log
└─ …
You can run the parser from a local GCP archive folder by running:
$ parse_experiment_dir --directory gcp_archive -mode taskcluster
The publication is handled via the extensible module translations_parser.publishers
.
Runs on Weight & Biases are groupped by expermient. The group is suffixed by the complete Taskcluster group ID, and each of its runs is suffixed by the first 5 characters. This is required to compare runs with similar name among different groups.
Examples of runs naming for Taskcluster group dzijiL-PQ4ScKBB3oIjGQg
:
- Training task:
teacher-1_dziji
- Evaluation task:
teacher-ensemble_dziji
- Experiment summary
group_logs_dziji
(See #group-logs)
Metrics parsed in real time during the training are published in the Charts section of Weight & Biases.
Training runs have their Marian and Opustrainer configuration published to the Overview section in Weight & Biases:
- arguments: Full list of arguments used to run the
marian
command. - marian: Marian runtime configuration read from logs.
- model: YAML configuration file passed to Marian as
configs/model/${model_type}.yml
. - opustrainer: OpusTrainer YAML configuration read from fixed path
config.opustrainer.yml
. - training: YAML configuration file passed to Marian as
configs/model/${model_type}.yml
.
The categories we came up with (model, arguments, marian, opustrainer, training), what they mean and where those configs come from.
Metrics from evaluation tasks are published as table artifacts on Weight & Biases, with a custom chart for better comparison among runs.
On every group, a last run named group_logs
is also published. This group does not represent a training nor evaluation task, but contains the overall experiment configuration in the Overview link in the left menu.
This run also contain a table published as artifact, with a summary of all evaluation metrics which is visible in the Tables section.
When running online from Taskcluster, the resources used by the machine will be published in a System section of Weight & Biases.
The parser can be built as a distinct package to make developments easier using pip.
On a virtual environment, you can install the package in editable mode (i.e from the local folder):
$ pip install -e ./tracking
The parser does not supports arbitrary metrics (i.e. via the --valid-metrics
argument).
In order to support new values, you may want to update the regular expression matching Marian output lines in tracking.translations_parser.parser
and the dataclass in tracking.translations_parser.data
.