Skip to content

Conversation

@Alex-Wengg
Copy link
Contributor

@Alex-Wengg Alex-Wengg commented Jan 25, 2026

Summary

  • Fix check_array_shape to properly detect MLX vs PyTorch conv weight formats for 1D convolutions
  • Update weight sanitization in kokoro.py and istftnet.py to use format detection instead of unconditional transpose
  • Add Chinese-to-Bopomofo conversion using pypinyin for ZH model compatibility (the ZH model uses Bopomofo symbols, not IPA)
  • Add number-to-Chinese conversion for proper TTS of numeric content (e.g., "23" → "二十三")
  • Add mixed Chinese/English text processing in pipeline
  • Update tests for check_array_shape function

Test plan

  • Verify Kokoro-82M-v1.1-zh model loads without shape mismatch errors
  • Test Chinese TTS output with lang_code="z"
  • Test mixed Chinese/English text (e.g., "今天天气很好。Hello, how are you?")
  • Test numbers in Chinese text (e.g., "大概是23度左右")
  • Verify transcription of generated audio matches input

@Alex-Wengg Alex-Wengg marked this pull request as draft January 25, 2026 16:44
@Alex-Wengg Alex-Wengg force-pushed the fix/kokoro-zh-shape-mismatch branch from 026f053 to a3b1976 Compare January 25, 2026 16:44
- Fix check_array_shape to properly detect MLX vs PyTorch conv weight formats
- Update weight sanitization in kokoro.py and istftnet.py to use format detection
- Add Chinese-to-Bopomofo conversion using pypinyin for ZH model compatibility
- Add number-to-Chinese conversion for proper TTS of numeric content
- Add mixed Chinese/English text processing in pipeline
- Update tests for check_array_shape function

Fixes Blaizzy#226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant