Added
-
SFT (Supervised Fine-tuning):
SFTDatasetfor instruction tuning with input maskingSFTDataModulefor data loadingSFTTaskregistered as--task sftin CLI- Tests for all SFT components
-
DPO (Direct Preference Optimization):
DPODatasethandling chosen/rejected pairsDPODataModulefor preference data loadingDPOTaskwith reference model management and DPO loss- Registered as
--task dpoin CLI - Tests for all DPO components
-
Continuous Batching Engine (Serving):
src/llm/serving/engine.pywithContinuousBatchingEngineclass- Iteration-level scheduling via
SchedulerandSlotAllocator - Pre-allocated KV cache pool for efficient memory management
- Supports mixed prefill/decode batching with automatic padding
- Clean API: requires
modelandtokenizerinstances upfront src/llm/serving/scheduler.pywith FCFS scheduling logic
-
LoRA (Low-Rank Adaptation):
src/llm/core/lora.pywithLoRALinearclass for parameter-efficient fine-tuningapply_lora(),merge_lora(),get_lora_parameters()helper functions- Device/dtype handling for CUDA compatibility
- 17 tests covering training and weight merging
-
QLoRA (Quantized LoRA):
src/llm/core/qlora.pywithQLoRALinearclass- NF4 4-bit quantization for base weights (~4x memory reduction)
- LoRA adapters remain in fp16/bf16 for training stability
apply_qlora()andget_qlora_parameters()helpers
-
RoPE (Rotary Position Embedding):
src/llm/core/rope.pywithRotaryPositionEmbeddingclass- Linear, dynamic, and NTK-aware scaling methods for extended context
apply_rotary_pos_emb(),get_rope_scaling_factor()utilities- 15 tests
-
ALiBi (Attention with Linear Biases):
src/llm/core/alibi.pywithALiBiPositionBiasclassget_alibi_slopes(),build_alibi_bias()functions- Cached bias computation for efficiency
- 13 tests
-
Sliding Window Attention:
window_sizeparameter inscaled_dot_product_attention- Propagated through
MultiHeadAttention,TransformerBlock,DecoderModel - Reduces memory for long sequences by limiting attention scope
- 10 tests
-
KV Cache Optimization:
src/llm/core/kv_cache.pywithKVCacheclass for pre-allocated cache buffers- In-place updates during autoregressive generation (avoids O(n²) memory operations)
- Integrated into
MHA,TransformerBlock,DecoderModel - Factory method
KVCache.from_model_config()for easy instantiation - Backward compatible: legacy
past_key_valuetuple format still works
-
E2E Testing Infrastructure:
tests/e2e/directory with comprehensive pipeline teststest_training.py,test_sft.py,test_dpo.pytest_gradient_accumulation.py,test_resume_training.py- Advanced inference and callback tests
-
Documentation:
notebooks/quick_start.ipynbinteractive tutorial- Covers model building, training, inference, and advanced features
Changed
-
SDPA Refactoring:
- Consolidated
scaled_dot_product_attentionwrapper intosrc/llm/core/attn/sdpa.py - Refactored
MultiHeadAttentionandMultiLatentAttentionto use commonsdpawrapper - Archived custom implementation to
_learning/03_lab/experiments/custom_sdpa.py
- Consolidated
-
Test Suite Refactoring:
- Organized test files into subdirectories (
tests/training/,tests/inference/, etc.) - Converted to functional testing style (real components over mocks)
- Added shared fixtures in
tests/conftest.py - Test count: 385 → 432
- Organized test files into subdirectories (
-
TrainingEngine:
- Support for dictionary batches in training/validation loops
- Gradient accumulation implementation
-
DPO Reference Model:
- Use model reconstruction instead of
deepcopyfor ref_model creation
- Use model reconstruction instead of
-
Documentation:
- Added
docs/README.mdas documentation entry point - Added MkDocs Material configuration (
mkdocs.yml) for documentation site - Added GitHub Actions workflow for automatic GitHub Pages deployment
- Added
guide-finetuning.md(LoRA/QLoRA) andguide-inference.md(KVCache/GQA/Continuous Batching) - Enhanced
architecture.mdwith detailed component diagrams and data flow analysis - Updated ROADMAP Phase 10.2 (Continuous Batching complete)
- Added