Releases: instructlab/sdg
Releases · instructlab/sdg
v0.6.0
SDG v0.6.0
What's Changed
- fix: formatting error by @RobotSail in #378
- Prefer tesserocr over easyocr, if available by @bbrowning in #369
- ci: add large-size E2E CI job by @nathan-weinberg in #380
- Add Release Strategy Document by @khaledsulayman in #381
- Docling models path by @aakankshaduggal in #362
- Check for tokenizer in downloaded models directory by @khaledsulayman in #364
- fix: upsample the phase10 knowledge dataset by @RobotSail in #377
- build(deps): bump DavidAnson/markdownlint-cli2-action from 17.0.0 to 18.0.0 by @dependabot in #386
- Delete .gitattributes by @khaledsulayman in #393
New Contributors
- @RobotSail made their first contribution in #378
Full Changelog: v0.5.0...v0.6.0
v0.3.3
What's Changed
- Prepare release-v0.3 branch for backports by @bbrowning in #371
- Run the simple pipeline on small runners by @bbrowning in #372
- Data mix fix (backport #366) by @mergify in #368
Full Changelog: v0.3.2...v0.3.3
v0.5.0
v0.5.0
What's Changed
- build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
- build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
- build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
- chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
- fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
- fix: remove stop token from mixtral by @cdoern in #310
- ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
- ci: update medium job to run as PR check by @nathan-weinberg in #318
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
- fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
- build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
- ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
- ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
- build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
- build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
- build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
- build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
- build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
- build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
- build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
- Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
- feat: parametrize system prompt by @jaideepr97 in #339
- feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
- Move to Docling v2 APIs by @bbrowning in #347
- feat: expose max_num_tokens as configurable by @cdoern in #340
- Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
- Upgrade docling, expand chunking testing by @bbrowning in #349
- Don't attempt batching with InstructLab's llama-cpp-python by @bbrowning in #358
- Consolidate test sample documents into one subdir by @bbrowning in #356
- Move a spurious print to a debug log message by @bbrowning in #359
- Only use CPU for the docling OCR models by @bbrowning in #361
- Data mix fix by @aakankshaduggal in #366
New Contributors
- @alimaredia made their first contribution in #305
Full Changelog: v0.4.2...v0.5.0
v0.5.0a2
What's Changed
- build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
- build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
- build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
- build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
- build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
- build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
- build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
- Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
- feat: parametrize system prompt by @jaideepr97 in #339
- feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
- Move to Docling v2 APIs by @bbrowning in #347
- feat: expose max_num_tokens as configurable by @cdoern in #340
- Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
- Upgrade docling, expand chunking testing by @bbrowning in #349
Full Changelog: v0.5.0a1...v0.5.0a2
v0.5.0a1
v0.5.0a1
What's Changed
- build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
- build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
- build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
- chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
- fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
- fix: remove stop token from mixtral by @cdoern in #310
- ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
- ci: update medium job to run as PR check by @nathan-weinberg in #318
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
- fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
- build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
- ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
- ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
New Contributors
- @alimaredia made their first contribution in #305
Full Changelog: v0.4.2...v0.5.0a1
v0.3.2
What's Changed
- map mistral model name to mixtral by @cdoern in #315
- Without these changes, the mistral models will use merlinite templates which will result in unusable output.
Full Changelog: v0.3.1...v0.3.2
v0.4.2
What's Changed
Full Changelog: v0.4.1...v0.4.2
v0.4.1
v0.4.0
What's Changed
- build(deps): bump actions/cache from 4.0.2 to 4.1.0 by @dependabot in #297
- build(deps): bump rhysd/actionlint from 1.7.2 to 1.7.3 in /.github/workflows by @dependabot in #291
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.2 to 1.10.3 by @dependabot in #295
- fix: e2e job using wrong flags by @nathan-weinberg in #303
- Mistral Family support and Logging by @cdoern in #302
New Contributors
Full Changelog: v0.3.1...v0.4.0
v0.3.1
What's Changed
- Add more tests for golden/distractor context picking by @bbrowning in #256
- Document dataset formats by @markmc in #236
- ci: move E2E runner from github to AWS by @nathan-weinberg in #260
- ci: add AWS tag to show github PR number for all jobs by @nathan-weinberg in #264
- build(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0 by @dependabot in #263
- ci: add GitHubRef to AWS labels as well by @nathan-weinberg in #265
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 by @dependabot in #266
- chore: replace platformdirs with xdg-base-dirs by @jaideepr97 in #269
- chore: add auto-merging policy for SDG by @khaledsulayman in #262
- ci: update lint workflow by @nathan-weinberg in #278
- build(deps): bump step-security/harden-runner from 2.9.1 to 2.10.1 by @dependabot in #274
- build(deps): bump hynek/build-and-inspect-python-package from 2.8.0 to 2.9.0 by @dependabot in #268
- build(deps): bump actions/checkout from 4.1.6 to 4.1.7 by @dependabot in #280
- build(deps): bump actions/setup-python from 5.1.0 to 5.2.0 by @dependabot in #279
- build(deps): bump rhysd/actionlint from 1.7.1 to 1.7.2 in /.github/workflows by @dependabot in #285
- build(deps): bump rojopolis/spellcheck-github-actions from 0.41.0 to 0.42.0 by @dependabot in #283
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.1 to 1.10.2 by @dependabot in #282
- build(deps): bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 by @dependabot in #275
- ci: add additional autolabeling rules by @nathan-weinberg in #286
- github: add stale bot to sdg repo by @nathan-weinberg in #287
- ci: fix lint action by @nathan-weinberg in #288
- build(deps): bump actions/checkout from 4.1.7 to 4.2.0 by @dependabot in #289
- Handle empty dataset from output of sdg leaf node without raising error by @relyt0925 in #272
New Contributors
- @jaideepr97 made their first contribution in #269
- @khaledsulayman made their first contribution in #262
- @relyt0925 made their first contribution in #272
Full Changelog: v0.3.0...v0.3.1