The IEEE CPU is a high-performance RISC-V compatible processor with advanced coprocessor offload capabilities. The design features a pipelined architecture with integrated floating-point, system control, and memory management coprocessors.
- 5-stage pipeline: IF → ID → EX → MM → WB
- Advanced hazard detection and resolution
- Branch prediction and speculation
- Pipeline stall management
- CP0: System Control and CSR operations
- CP1: Floating Point Unit (FPU)
- CP2: Memory Management Unit (MMU)
- CP3: GPU/Custom operations (optional)
The CPU features a sophisticated instruction offload system for coprocessor integration:
- Offload Stall Handler: Manages pipeline stalls and hazard detection
- Destination Table: Tracks offloaded instructions and completion
- Offload Manager: Unified interface for offload coordination
See Offload System Documentation for detailed information.
- Multi-cycle operation support
- Out-of-order completion for coprocessors
- Advanced stall minimization
- Timeout protection for hung operations
- Virtual memory support with TLB
- Cache control operations
- Memory-mapped I/O
- DMA integration
- Comprehensive debug interface
- Performance counters
- Trace buffer
- Breakpoint/watchpoint support
src/
├── cpu_top.sv # Top-level CPU module
├── pipeline_stages.sv # Pipeline stage implementations
├── control_unit.sv # Main control unit
├── dispatcher.sv # Basic coprocessor dispatcher
├── offload_logic.sv # Unified offload interface
└── coprocessor.sv # Individual coprocessor modules
rtl_utils/
├── fifo.sv # FIFO implementations
├── mux_n.sv # Parameterized multiplexers
├── adder_n.sv # N-bit adders
└── arbiter.sv # Bus arbitration
tb/
├── cpu_top_tb.py # Main CPU testbench
├── offload_stall_handler_tb.py # Offload system tests
└── coprocessor_*_tb.py # Individual CP tests
docs/
├── README.md # This file
├── offload_system.md # Offload system documentation
├── cpu_architecture.md # Architecture details
├── cpu_instruction_set.md # ISA documentation
└── memory_map.md # Memory layout
- Python 3.7+
- Cocotb testing framework
- Verilator or other Verilog simulator
- GTKWave (for waveform viewing)
# Install dependencies
cd tb
pip install -r requirements.txt
# Run all tests
make all
# Run specific tests
make test-offload-stall
make test-coprocessor-fpu
# Clean build files
make cleancd synth
# Xilinx Vivado
vivado -mode tcl -source board_config.tcl
# Intel Quartus (if supported)
quartus_sh --flow compile cpu_project- CPU Architecture - Detailed architecture overview
- Instruction Set - ISA specification
- Memory Map - Memory layout and mapping
- Offload System - Coprocessor offload documentation
- Create coprocessor module in
src/coprocessor_cp*.sv - Add opcode mapping in
offload_destination_table.sv - Update coprocessor system integration
- Add testbench in
tb/ - Update documentation
- All new features must include comprehensive tests
- Use cocotb for Python-based testing
- Include both unit and integration tests
- Verify timing and performance requirements
- Base clock: 100MHz - 200MHz (depending on synthesis)
- IPC: 0.7 - 0.9 (instructions per cycle)
- Coprocessor latency: 1-10 cycles (operation dependent)
- Memory latency: 2-5 cycles (cache hit)
- Logic Elements: ~15K - 25K
- Memory: ~500KB BRAM
- DSP blocks: ~20 (for FPU)
- Single-threaded execution only
- Limited cache size options
- Basic branch prediction
- Simplified exception handling
- Multi-core support
- Advanced branch prediction
- Larger cache hierarchies
- Hardware virtualization support
- Vector processing unit
- Fork the repository
- Create feature branch
- Implement changes with tests
- Update documentation
- Submit pull request
- Follow SystemVerilog best practices
- Include comprehensive comments
- Use consistent naming conventions
- Implement proper error handling
[Specify license here]
[Contact information]
This documentation is part of the Holy CPU project. Last updated: [Date]