Skip to content

v0.0.5

Latest

Choose a tag to compare

@github-actions github-actions released this 08 Jan 12:42
· 5 commits to main since this release

Added

  • SFT (Supervised Fine-tuning):

    • SFTDataset for instruction tuning with input masking
    • SFTDataModule for data loading
    • SFTTask registered as --task sft in CLI
    • Tests for all SFT components
  • DPO (Direct Preference Optimization):

    • DPODataset handling chosen/rejected pairs
    • DPODataModule for preference data loading
    • DPOTask with reference model management and DPO loss
    • Registered as --task dpo in CLI
    • Tests for all DPO components
  • Continuous Batching Engine (Serving):

    • src/llm/serving/engine.py with ContinuousBatchingEngine class
    • Iteration-level scheduling via Scheduler and SlotAllocator
    • Pre-allocated KV cache pool for efficient memory management
    • Supports mixed prefill/decode batching with automatic padding
    • Clean API: requires model and tokenizer instances upfront
    • src/llm/serving/scheduler.py with FCFS scheduling logic
  • LoRA (Low-Rank Adaptation):

    • src/llm/core/lora.py with LoRALinear class for parameter-efficient fine-tuning
    • apply_lora(), merge_lora(), get_lora_parameters() helper functions
    • Device/dtype handling for CUDA compatibility
    • 17 tests covering training and weight merging
  • QLoRA (Quantized LoRA):

    • src/llm/core/qlora.py with QLoRALinear class
    • NF4 4-bit quantization for base weights (~4x memory reduction)
    • LoRA adapters remain in fp16/bf16 for training stability
    • apply_qlora() and get_qlora_parameters() helpers
  • RoPE (Rotary Position Embedding):

    • src/llm/core/rope.py with RotaryPositionEmbedding class
    • Linear, dynamic, and NTK-aware scaling methods for extended context
    • apply_rotary_pos_emb(), get_rope_scaling_factor() utilities
    • 15 tests
  • ALiBi (Attention with Linear Biases):

    • src/llm/core/alibi.py with ALiBiPositionBias class
    • get_alibi_slopes(), build_alibi_bias() functions
    • Cached bias computation for efficiency
    • 13 tests
  • Sliding Window Attention:

    • window_size parameter in scaled_dot_product_attention
    • Propagated through MultiHeadAttention, TransformerBlock, DecoderModel
    • Reduces memory for long sequences by limiting attention scope
    • 10 tests
  • KV Cache Optimization:

    • src/llm/core/kv_cache.py with KVCache class for pre-allocated cache buffers
    • In-place updates during autoregressive generation (avoids O(n²) memory operations)
    • Integrated into MHA, TransformerBlock, DecoderModel
    • Factory method KVCache.from_model_config() for easy instantiation
    • Backward compatible: legacy past_key_value tuple format still works
  • E2E Testing Infrastructure:

    • tests/e2e/ directory with comprehensive pipeline tests
    • test_training.py, test_sft.py, test_dpo.py
    • test_gradient_accumulation.py, test_resume_training.py
    • Advanced inference and callback tests
  • Documentation:

    • notebooks/quick_start.ipynb interactive tutorial
    • Covers model building, training, inference, and advanced features

Changed

  • SDPA Refactoring:

    • Consolidated scaled_dot_product_attention wrapper into src/llm/core/attn/sdpa.py
    • Refactored MultiHeadAttention and MultiLatentAttention to use common sdpa wrapper
    • Archived custom implementation to _learning/03_lab/experiments/custom_sdpa.py
  • Test Suite Refactoring:

    • Organized test files into subdirectories (tests/training/, tests/inference/, etc.)
    • Converted to functional testing style (real components over mocks)
    • Added shared fixtures in tests/conftest.py
    • Test count: 385 → 432
  • TrainingEngine:

    • Support for dictionary batches in training/validation loops
    • Gradient accumulation implementation
  • DPO Reference Model:

    • Use model reconstruction instead of deepcopy for ref_model creation
  • Documentation:

    • Added docs/README.md as documentation entry point
    • Added MkDocs Material configuration (mkdocs.yml) for documentation site
    • Added GitHub Actions workflow for automatic GitHub Pages deployment
    • Added guide-finetuning.md (LoRA/QLoRA) and guide-inference.md (KVCache/GQA/Continuous Batching)
    • Enhanced architecture.md with detailed component diagrams and data flow analysis
    • Updated ROADMAP Phase 10.2 (Continuous Batching complete)