-
Notifications
You must be signed in to change notification settings - Fork 689
[build] support build sm 80,86,89,90 to one whl package #6173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
此PR实现了在单个wheel包中支持多个CUDA SM版本(80, 86, 89, 90)的构建功能。通过将不同SM版本的custom_ops单独编译到不同的子目录中,确保每个custom_ops包小于2GB,从而实现统一打包。
Changes:
- 新增FD_UNIFY_BUILD模式,支持将多个SM版本编译到一个wheel包
- 实现运行时根据GPU的SM版本自动选择对应的custom_ops模块
- 扩展setup.py的package_data以包含SM版本特定的目录
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
| build.sh | 添加build_custom_ops函数以支持统一构建模式,重构build_and_install_ops函数以接受参数化的构建架构和目标目录 |
| fastdeploy/model_executor/ops/gpu/init.py | 实现decide_module函数,在运行时根据当前GPU的SM版本选择对应的custom_ops模块 |
| fastdeploy/import_ops.py | 改进错误日志,在import失败时输出详细的异常信息 |
| setup.py | 扩展package_data配置以包含SM版本特定的fastdeploy_ops子目录及其内容 |
| import paddle | ||
|
|
||
| prop = paddle.device.cuda.get_device_properties() | ||
| sm_version = prop.major * 10 + prop.minor | ||
| print(f"current sm_version={sm_version}") | ||
|
|
||
| import os | ||
|
|
||
| curdir = os.path.dirname(os.path.abspath(__file__)) | ||
| sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}") | ||
| if os.path.exists(sm_version_path): | ||
| return f".fastdeploy_ops_{sm_version}.fastdeploy_ops" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decide_module函数缺少错误处理。当CUDA不可用或paddle.device.cuda.get_device_properties()调用失败时(例如在CPU-only环境中),会导致未捕获的异常。建议添加try-except块来处理这种情况,并在CUDA不可用时返回默认的fastdeploy_ops模块。
| import paddle | |
| prop = paddle.device.cuda.get_device_properties() | |
| sm_version = prop.major * 10 + prop.minor | |
| print(f"current sm_version={sm_version}") | |
| import os | |
| curdir = os.path.dirname(os.path.abspath(__file__)) | |
| sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}") | |
| if os.path.exists(sm_version_path): | |
| return f".fastdeploy_ops_{sm_version}.fastdeploy_ops" | |
| """Decide which GPU ops module to load based on CUDA SM version. | |
| If CUDA is unavailable or querying device properties fails (e.g., on CPU-only | |
| environments), fall back to the default fastdeploy_ops module. | |
| """ | |
| try: | |
| import paddle | |
| except Exception: | |
| # Paddle is not available, fall back to default ops. | |
| return ".fastdeploy_ops" | |
| try: | |
| # If Paddle is compiled without CUDA, fall back directly. | |
| if hasattr(paddle.device, "is_compiled_with_cuda") and not paddle.device.is_compiled_with_cuda(): | |
| return ".fastdeploy_ops" | |
| prop = paddle.device.cuda.get_device_properties() | |
| sm_version = prop.major * 10 + prop.minor | |
| print(f"current sm_version={sm_version}") | |
| import os | |
| curdir = os.path.dirname(os.path.abspath(__file__)) | |
| sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}") | |
| if os.path.exists(sm_version_path): | |
| return f".fastdeploy_ops_{sm_version}.fastdeploy_ops" | |
| except Exception: | |
| # Any failure when querying CUDA properties should not break import; | |
| # fall back to the default ops implementation. | |
| return ".fastdeploy_ops" |
| import paddle | ||
|
|
||
| prop = paddle.device.cuda.get_device_properties() | ||
| sm_version = prop.major * 10 + prop.minor | ||
| print(f"current sm_version={sm_version}") | ||
|
|
||
| import os | ||
|
|
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decide_module函数没有考虑多GPU异构环境。如果系统中有多个不同SM版本的GPU,paddle.device.cuda.get_device_properties()只会返回当前设备(通常是设备0)的属性。这可能导致选择的custom_ops不适用于其他GPU。建议添加文档说明该功能假设所有GPU具有相同的SM版本,或者考虑在多GPU环境中检测是否存在不同的SM版本并提供警告。
| import paddle | |
| prop = paddle.device.cuda.get_device_properties() | |
| sm_version = prop.major * 10 + prop.minor | |
| print(f"current sm_version={sm_version}") | |
| import os | |
| import os | |
| import warnings | |
| import paddle | |
| # Collect SM versions of all visible CUDA devices | |
| device_count = paddle.device.cuda.device_count() | |
| sm_versions = set() | |
| for device_id in range(device_count): | |
| prop = paddle.device.cuda.get_device_properties(device_id) | |
| sm_versions.add(prop.major * 10 + prop.minor) | |
| # Warn user if multiple different SM versions are detected. | |
| # Custom ops are still selected based on device 0 and are assumed | |
| # to be compatible with all GPUs in the current environment. | |
| if len(sm_versions) > 1: | |
| warnings.warn( | |
| "Detected multiple GPUs with different SM versions. " | |
| "Custom GPU ops are selected based on the SM version of device 0; " | |
| "please ensure that all GPUs used with these ops have compatible " | |
| "SM versions.", | |
| UserWarning, | |
| ) | |
| # Use device 0 as the canonical device for selecting SM-specific ops. | |
| prop0 = paddle.device.cuda.get_device_properties(0) | |
| sm_version = prop0.major * 10 + prop0.minor | |
| print(f"current sm_version={sm_version}") |
| def decide_module(): | ||
| import paddle | ||
|
|
||
| prop = paddle.device.cuda.get_device_properties() | ||
| sm_version = prop.major * 10 + prop.minor | ||
| print(f"current sm_version={sm_version}") | ||
|
|
||
| import os | ||
|
|
||
| curdir = os.path.dirname(os.path.abspath(__file__)) | ||
| sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}") | ||
| if os.path.exists(sm_version_path): | ||
| return f".fastdeploy_ops_{sm_version}.fastdeploy_ops" | ||
| return ".fastdeploy_ops" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增的decide_module函数和SM版本自动选择逻辑缺少单元测试。考虑到项目在tests/operators/目录下有大量的测试用例,建议添加测试来验证:1)当SM版本特定的目录存在时,返回正确的模块路径;2)当SM版本特定的目录不存在时,回退到默认模块;3)处理CUDA不可用的情况。这对于确保多SM版本打包功能的正确性至关重要。
|
|
||
| prop = paddle.device.cuda.get_device_properties() | ||
| sm_version = prop.major * 10 + prop.minor | ||
| print(f"current sm_version={sm_version}") |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print语句应该使用logger记录而不是直接打印。在第28行使用print输出sm_version信息不符合代码库中的日志记录规范。参考fastdeploy/import_ops.py中使用logger的模式,应该使用logger.info或logger.debug来输出这类信息。
| build_and_install_ops "[89]" "$custom_ops_dir" | ||
|
|
||
| build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}" | ||
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在build_custom_ops函数的FD_UNIFY_BUILD模式中,第282行的cp命令没有检查返回值。如果复制操作失败,脚本会继续执行但可能导致不完整的包。建议在cp命令后添加错误检查,或使用'set -e'确保命令失败时脚本终止。
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu | |
| cp -r "$OPS_SRC_DIR/$OPS_TMP_DIR/"* ./fastdeploy/model_executor/ops/gpu || { | |
| echo -e "${RED}[FAIL]${NONE} failed to copy unified custom ops to target directory" | |
| exit 1 | |
| } |
| function build_custom_ops() { | ||
| if [ "$FD_UNIFY_BUILD" ]; then | ||
| mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR} | ||
|
|
||
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86 | ||
| build_and_install_ops "[86]" "$custom_ops_dir" | ||
|
|
||
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89 | ||
| build_and_install_ops "[89]" "$custom_ops_dir" | ||
|
|
||
| build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}" | ||
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新的统一构建方式会显著增加wheel包的大小,因为它包含了多个SM版本的custom_ops副本。虽然PR描述提到每个custom_ops包小于2GB,但没有说明最终wheel包的总大小。建议在PR描述或文档中添加关于最终包大小的估算,以及这种方式对存储空间和下载时间的影响,帮助用户评估是否应该使用统一构建模式。
| function build_and_install_ops() { | ||
| local building_arcs=${1:-$FD_BUILDING_ARCS} | ||
| local tmp_dir=${2:-$OPS_TMP_DIR} | ||
| echo "BUILD CUSTOM OPS: ${building_arcs}, ${tmp_dir}" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第294行的日志输出格式与脚本其他部分不一致。该脚本在其他地方使用颜色编码的格式化日志输出(如第297行的"${BLUE}[build]${NONE}"),但第294行使用了简单的echo。为了保持一致性和更好的可读性,建议使用类似"echo -e "${BLUE}[build]${NONE} Building custom ops: ${building_arcs} to ${tmp_dir}""的格式。
| echo "BUILD CUSTOM OPS: ${building_arcs}, ${tmp_dir}" | |
| echo -e "${BLUE}[build]${NONE} Building custom ops: ${building_arcs} to ${tmp_dir}" |
|
|
||
| function build_custom_ops() { | ||
| if [ "$FD_UNIFY_BUILD" ]; then | ||
| mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR} |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_custom_ops函数中缺少对FD_UNIFY_BUILD模式下目录创建的错误检查。如果mkdir -p失败或者目录权限问题导致后续操作失败,脚本不会提供清晰的错误信息。建议在mkdir之后检查目录是否成功创建。
| mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR} | |
| # Ensure ops temporary directory exists and is writable before building | |
| local ops_build_dir="${OPS_SRC_DIR}/${OPS_TMP_DIR}" | |
| mkdir -p "${ops_build_dir}" | |
| if [ $? -ne 0 ] || [ ! -d "${ops_build_dir}" ] || [ ! -w "${ops_build_dir}" ]; then | |
| echo "[ERROR] Failed to create or access directory: ${ops_build_dir}" >&2 | |
| exit 1 | |
| fi |
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86 | ||
| build_and_install_ops "[86]" "$custom_ops_dir" | ||
|
|
||
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89 | ||
| build_and_install_ops "[89]" "$custom_ops_dir" | ||
|
|
||
| build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}" | ||
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu | ||
| else | ||
| build_and_install_ops "$FD_BUILDING_ARCS" "$OPS_TMP_DIR" |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_custom_ops函数中使用相对路径可能导致问题。在第275和278行,custom_ops_dir使用了相对路径${OPS_TMP_DIR}/fastdeploy_ops_XX,但这些路径是相对于当前目录的。由于build_and_install_ops会切换到OPS_SRC_DIR目录,这可能导致路径解析不正确。虽然第298行使用了readlink -f进行绝对路径转换,但建议在调用build_and_install_ops之前明确使用绝对路径或确保在正确的目录上下文中,以避免潜在的路径解析问题。
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86 | |
| build_and_install_ops "[86]" "$custom_ops_dir" | |
| custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89 | |
| build_and_install_ops "[89]" "$custom_ops_dir" | |
| build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}" | |
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu | |
| else | |
| build_and_install_ops "$FD_BUILDING_ARCS" "$OPS_TMP_DIR" | |
| custom_ops_dir=${OPS_SRC_DIR}/${OPS_TMP_DIR}/fastdeploy_ops_86 | |
| build_and_install_ops "[86]" "$custom_ops_dir" | |
| custom_ops_dir=${OPS_SRC_DIR}/${OPS_TMP_DIR}/fastdeploy_ops_89 | |
| build_and_install_ops "[89]" "$custom_ops_dir" | |
| build_and_install_ops "[80, 90]" "${OPS_SRC_DIR}/${OPS_TMP_DIR}" | |
| cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu | |
| else | |
| build_and_install_ops "$FD_BUILDING_ARCS" "${OPS_SRC_DIR}/${OPS_TMP_DIR}" |
| import paddle | ||
|
|
||
| prop = paddle.device.cuda.get_device_properties() | ||
| sm_version = prop.major * 10 + prop.minor |
Copilot
AI
Jan 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sm_version的计算使用了prop.major * 10 + prop.minor,但没有验证计算结果的合理性。对于未来的GPU架构,如果minor版本超过9,这个计算可能会产生意外的结果。虽然当前NVIDIA的命名约定使minor版本不会超过9,但为了代码的健壮性,建议添加断言或验证来确保计算出的sm_version在预期范围内(如80-100)。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6173 +/- ##
==========================================
Coverage ? 67.03%
==========================================
Files ? 383
Lines ? 50543
Branches ? 7894
==========================================
Hits ? 33882
Misses ? 14188
Partials ? 2473
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
通过将不同的sm_version单独编译,保证每个custom_ops的包 <2GB,使得可以支持sm80,86,89,90编译到一个whl包
Modifications
Usage or Command
export FD_UNIFY_BUILD="true"
bash build.sh 1 python false
(当前模式下固定会编译80, 90, 86, 89,其他场景,不设置 FD_UNIFY_BUILD 的时候,编译方式和当前一致)
Accuracy Tests
不涉及
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.