Skip to content

[LFX 2026 term1] Cloud-Edge Simulation Benchmark for LLM Speculative Decoding in KubeEdge-Ianvs #304

@hsj576

Description

@hsj576

Description:LLM inference acceleration is increasingly important for cloud-edge collaborative AI deployments. Speculative decoding can improve end-to-end generation speed by using a lightweight draft model to propose token candidates and a larger target model to verify them, but its real-world gains depend heavily on cloud-edge constraints such as network latency, bandwidth limits, and heterogeneous compute.
Ianvs provides a unified benchmarking framework, and KubeEdge scenarios often require evaluating AI workloads under cloud-edge conditions. This project proposes a single-host cloud-edge simulation benchmark in Ianvs to evaluate speculative decoding for LLM inference. The benchmark will simulate edge (draft) and cloud (verify) roles as separate processes, inject configurable network constraints, and report standardized throughput and latency metrics, enabling reproducible comparison between baseline decoding and speculative decoding under different cloud-edge budgets.

Expected Outcome

  • An Ianvs benchmark test case for cloud-edge speculative decoding.
  • A single-host simulation runner that emulates edge/cloud roles and configurable network constraints (e.g., latency, bandwidth).
  • Benchmark reports comparing baseline vs speculative decoding with key metrics (e.g., TTFT, end-to-end latency, tokens/s), and reproducible configs/scripts.

Recommended Skills
Python, PyTorch, HuggingFace Transformers, LLM inference/decoding, benchmarking & performance profiling, KubeEdge/Ianvs basics

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions