[build] support build sm 80,86,89,90 to one whl package #6173

mitu626 · 2026-01-22T13:09:27Z

Motivation

通过将不同的sm_version单独编译，保证每个custom_ops的包 <2GB，使得可以支持sm80,86,89,90编译到一个whl包

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

build.sh 支持根据 FD_UNIFY_BUILD 环境变量打开按照单独的sm编译custom_ops
fastdeploy/model_executor/ops/gpu/init.py 支持按照当前系统sm版本选择不同的custom_ops编译包加载
setup.py 支持将独立编译的sm custom_ops包打包到最终的whl包中

Usage or Command

export FD_UNIFY_BUILD="true"
bash build.sh 1 python false

(当前模式下固定会编译80, 90, 86, 89，其他场景，不设置 FD_UNIFY_BUILD 的时候，编译方式和当前一致）

Accuracy Tests

不涉及

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-01-22T13:09:34Z

Thanks for your contribution!

Copilot

Pull request overview

此PR实现了在单个wheel包中支持多个CUDA SM版本（80, 86, 89, 90）的构建功能。通过将不同SM版本的custom_ops单独编译到不同的子目录中，确保每个custom_ops包小于2GB，从而实现统一打包。

Changes:

新增FD_UNIFY_BUILD模式，支持将多个SM版本编译到一个wheel包
实现运行时根据GPU的SM版本自动选择对应的custom_ops模块
扩展setup.py的package_data以包含SM版本特定的目录

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File	Description
build.sh	添加build_custom_ops函数以支持统一构建模式，重构build_and_install_ops函数以接受参数化的构建架构和目标目录
fastdeploy/model_executor/ops/gpu/init.py	实现decide_module函数，在运行时根据当前GPU的SM版本选择对应的custom_ops模块
fastdeploy/import_ops.py	改进错误日志，在import失败时输出详细的异常信息
setup.py	扩展package_data配置以包含SM版本特定的fastdeploy_ops子目录及其内容

Copilot · 2026-01-23T08:13:06Z

fastdeploy/model_executor/ops/gpu/__init__.py

+    import paddle
+
+    prop = paddle.device.cuda.get_device_properties()
+    sm_version = prop.major * 10 + prop.minor
+    print(f"current sm_version={sm_version}")
+
+    import os
+
+    curdir = os.path.dirname(os.path.abspath(__file__))
+    sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}")
+    if os.path.exists(sm_version_path):
+        return f".fastdeploy_ops_{sm_version}.fastdeploy_ops"


decide_module函数缺少错误处理。当CUDA不可用或paddle.device.cuda.get_device_properties()调用失败时（例如在CPU-only环境中），会导致未捕获的异常。建议添加try-except块来处理这种情况，并在CUDA不可用时返回默认的fastdeploy_ops模块。

Suggested change

import paddle

prop = paddle.device.cuda.get_device_properties()

sm_version = prop.major * 10 + prop.minor

print(f"current sm_version={sm_version}")

import os

curdir = os.path.dirname(os.path.abspath(__file__))

sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}")

if os.path.exists(sm_version_path):

return f".fastdeploy_ops_{sm_version}.fastdeploy_ops"

"""Decide which GPU ops module to load based on CUDA SM version.

If CUDA is unavailable or querying device properties fails (e.g., on CPU-only

environments), fall back to the default fastdeploy_ops module.

"""

try:

import paddle

except Exception:

# Paddle is not available, fall back to default ops.

return ".fastdeploy_ops"

try:

# If Paddle is compiled without CUDA, fall back directly.

if hasattr(paddle.device, "is_compiled_with_cuda") and not paddle.device.is_compiled_with_cuda():

return ".fastdeploy_ops"

prop = paddle.device.cuda.get_device_properties()

sm_version = prop.major * 10 + prop.minor

print(f"current sm_version={sm_version}")

import os

curdir = os.path.dirname(os.path.abspath(__file__))

sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}")

if os.path.exists(sm_version_path):

return f".fastdeploy_ops_{sm_version}.fastdeploy_ops"

except Exception:

# Any failure when querying CUDA properties should not break import;

# fall back to the default ops implementation.

return ".fastdeploy_ops"

Copilot · 2026-01-23T08:13:06Z

fastdeploy/model_executor/ops/gpu/__init__.py

+    import paddle
+
+    prop = paddle.device.cuda.get_device_properties()
+    sm_version = prop.major * 10 + prop.minor
+    print(f"current sm_version={sm_version}")
+
+    import os
+


decide_module函数没有考虑多GPU异构环境。如果系统中有多个不同SM版本的GPU，paddle.device.cuda.get_device_properties()只会返回当前设备（通常是设备0）的属性。这可能导致选择的custom_ops不适用于其他GPU。建议添加文档说明该功能假设所有GPU具有相同的SM版本，或者考虑在多GPU环境中检测是否存在不同的SM版本并提供警告。

Suggested change

import paddle

prop = paddle.device.cuda.get_device_properties()

sm_version = prop.major * 10 + prop.minor

print(f"current sm_version={sm_version}")

import os

import os

import warnings

import paddle

# Collect SM versions of all visible CUDA devices

device_count = paddle.device.cuda.device_count()

sm_versions = set()

for device_id in range(device_count):

prop = paddle.device.cuda.get_device_properties(device_id)

sm_versions.add(prop.major * 10 + prop.minor)

# Warn user if multiple different SM versions are detected.

# Custom ops are still selected based on device 0 and are assumed

# to be compatible with all GPUs in the current environment.

if len(sm_versions) > 1:

warnings.warn(

"Detected multiple GPUs with different SM versions. "

"Custom GPU ops are selected based on the SM version of device 0; "

"please ensure that all GPUs used with these ops have compatible "

"SM versions.",

UserWarning,

)

# Use device 0 as the canonical device for selecting SM-specific ops.

prop0 = paddle.device.cuda.get_device_properties(0)

sm_version = prop0.major * 10 + prop0.minor

print(f"current sm_version={sm_version}")

Copilot · 2026-01-23T08:13:06Z

fastdeploy/model_executor/ops/gpu/__init__.py

+def decide_module():
+    import paddle
+
+    prop = paddle.device.cuda.get_device_properties()
+    sm_version = prop.major * 10 + prop.minor
+    print(f"current sm_version={sm_version}")
+
+    import os
+
+    curdir = os.path.dirname(os.path.abspath(__file__))
+    sm_version_path = os.path.join(curdir, f"fastdeploy_ops_{sm_version}")
+    if os.path.exists(sm_version_path):
+        return f".fastdeploy_ops_{sm_version}.fastdeploy_ops"
+    return ".fastdeploy_ops"


新增的decide_module函数和SM版本自动选择逻辑缺少单元测试。考虑到项目在tests/operators/目录下有大量的测试用例，建议添加测试来验证：1）当SM版本特定的目录存在时，返回正确的模块路径；2）当SM版本特定的目录不存在时，回退到默认模块；3）处理CUDA不可用的情况。这对于确保多SM版本打包功能的正确性至关重要。

Copilot · 2026-01-23T08:13:06Z

fastdeploy/model_executor/ops/gpu/__init__.py

+
+    prop = paddle.device.cuda.get_device_properties()
+    sm_version = prop.major * 10 + prop.minor
+    print(f"current sm_version={sm_version}")


print语句应该使用logger记录而不是直接打印。在第28行使用print输出sm_version信息不符合代码库中的日志记录规范。参考fastdeploy/import_ops.py中使用logger的模式，应该使用logger.info或logger.debug来输出这类信息。

Copilot · 2026-01-23T08:13:07Z

build.sh

+    build_and_install_ops "[89]" "$custom_ops_dir"
+
+    build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}"
+    cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu


在build_custom_ops函数的FD_UNIFY_BUILD模式中，第282行的cp命令没有检查返回值。如果复制操作失败，脚本会继续执行但可能导致不完整的包。建议在cp命令后添加错误检查，或使用'set -e'确保命令失败时脚本终止。

Suggested change

cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu

cp -r "$OPS_SRC_DIR/$OPS_TMP_DIR/"* ./fastdeploy/model_executor/ops/gpu || {

echo -e "${RED}[FAIL]${NONE} failed to copy unified custom ops to target directory"

exit 1

}

Copilot · 2026-01-23T08:13:08Z

build.sh

+function build_custom_ops() {
+  if [ "$FD_UNIFY_BUILD" ]; then
+    mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR}
+
+    custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86
+    build_and_install_ops "[86]" "$custom_ops_dir"
+
+    custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89
+    build_and_install_ops "[89]" "$custom_ops_dir"
+
+    build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}"
+    cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu


新的统一构建方式会显著增加wheel包的大小，因为它包含了多个SM版本的custom_ops副本。虽然PR描述提到每个custom_ops包小于2GB，但没有说明最终wheel包的总大小。建议在PR描述或文档中添加关于最终包大小的估算，以及这种方式对存储空间和下载时间的影响，帮助用户评估是否应该使用统一构建模式。

Copilot · 2026-01-23T08:13:08Z

build.sh

 function build_and_install_ops() {
+  local building_arcs=${1:-$FD_BUILDING_ARCS}
+  local tmp_dir=${2:-$OPS_TMP_DIR}
+  echo "BUILD CUSTOM OPS: ${building_arcs}, ${tmp_dir}"


第294行的日志输出格式与脚本其他部分不一致。该脚本在其他地方使用颜色编码的格式化日志输出（如第297行的"${BLUE}[build]${NONE}"），但第294行使用了简单的echo。为了保持一致性和更好的可读性，建议使用类似"echo -e "${BLUE}[build]${NONE} Building custom ops: ${building_arcs} to ${tmp_dir}""的格式。

Suggested change

echo "BUILD CUSTOM OPS: ${building_arcs}, ${tmp_dir}"

echo -e "${BLUE}[build]${NONE} Building custom ops: ${building_arcs} to ${tmp_dir}"

Copilot · 2026-01-23T08:13:09Z

build.sh


+function build_custom_ops() {
+  if [ "$FD_UNIFY_BUILD" ]; then
+    mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR}


build_custom_ops函数中缺少对FD_UNIFY_BUILD模式下目录创建的错误检查。如果mkdir -p失败或者目录权限问题导致后续操作失败，脚本不会提供清晰的错误信息。建议在mkdir之后检查目录是否成功创建。

Suggested change

mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR}

# Ensure ops temporary directory exists and is writable before building

local ops_build_dir="${OPS_SRC_DIR}/${OPS_TMP_DIR}"

mkdir -p "${ops_build_dir}"

if [ $? -ne 0 ] || [ ! -d "${ops_build_dir}" ] || [ ! -w "${ops_build_dir}" ]; then

echo "[ERROR] Failed to create or access directory: ${ops_build_dir}" >&2

exit 1

fi

Copilot · 2026-01-23T08:13:09Z

build.sh

+    custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86
+    build_and_install_ops "[86]" "$custom_ops_dir"
+
+    custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89
+    build_and_install_ops "[89]" "$custom_ops_dir"
+
+    build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}"
+    cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu
+  else
+    build_and_install_ops "$FD_BUILDING_ARCS" "$OPS_TMP_DIR"


build_custom_ops函数中使用相对路径可能导致问题。在第275和278行，custom_ops_dir使用了相对路径${OPS_TMP_DIR}/fastdeploy_ops_XX，但这些路径是相对于当前目录的。由于build_and_install_ops会切换到OPS_SRC_DIR目录，这可能导致路径解析不正确。虽然第298行使用了readlink -f进行绝对路径转换，但建议在调用build_and_install_ops之前明确使用绝对路径或确保在正确的目录上下文中，以避免潜在的路径解析问题。

Suggested change

custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_86

build_and_install_ops "[86]" "$custom_ops_dir"

custom_ops_dir=${OPS_TMP_DIR}/fastdeploy_ops_89

build_and_install_ops "[89]" "$custom_ops_dir"

build_and_install_ops "[80, 90]" "${OPS_TMP_DIR}"

cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu

else

build_and_install_ops "$FD_BUILDING_ARCS" "$OPS_TMP_DIR"

custom_ops_dir=${OPS_SRC_DIR}/${OPS_TMP_DIR}/fastdeploy_ops_86

build_and_install_ops "[86]" "$custom_ops_dir"

custom_ops_dir=${OPS_SRC_DIR}/${OPS_TMP_DIR}/fastdeploy_ops_89

build_and_install_ops "[89]" "$custom_ops_dir"

build_and_install_ops "[80, 90]" "${OPS_SRC_DIR}/${OPS_TMP_DIR}"

cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu

else

build_and_install_ops "$FD_BUILDING_ARCS" "${OPS_SRC_DIR}/${OPS_TMP_DIR}"

Copilot · 2026-01-23T08:13:09Z

fastdeploy/model_executor/ops/gpu/__init__.py

+    import paddle
+
+    prop = paddle.device.cuda.get_device_properties()
+    sm_version = prop.major * 10 + prop.minor


sm_version的计算使用了prop.major * 10 + prop.minor，但没有验证计算结果的合理性。对于未来的GPU架构，如果minor版本超过9，这个计算可能会产生意外的结果。虽然当前NVIDIA的命名约定使minor版本不会超过9，但为了代码的健壮性，建议添加断言或验证来确保计算出的sm_version在预期范围内（如80-100）。

codecov-commenter · 2026-01-23T10:35:38Z

Codecov Report

❌ Patch coverage is 75.00000% with 5 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@17866c0). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/ops/gpu/__init__.py	72.22%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6173   +/-   ##
==========================================
  Coverage           ?   67.03%           
==========================================
  Files              ?      383           
  Lines              ?    50543           
  Branches           ?     7894           
==========================================
  Hits               ?    33882           
  Misses             ?    14188           
  Partials           ?     2473

Flag	Coverage Δ
GPU	`67.03% <75.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mitu626 had a problem deploying to Metax_ci January 22, 2026 13:09 — with GitHub Actions Failure

support build sm 80,86,89,90 to one whl package

ceefed5

mitu626 force-pushed the build branch from 10b73af to ceefed5 Compare January 23, 2026 06:47

mitu626 had a problem deploying to Metax_ci January 23, 2026 06:47 — with GitHub Actions Failure

create tmp dir before build custom ops in FD_UNIFY_BUILD mode

948eb27

mitu626 had a problem deploying to Metax_ci January 23, 2026 07:49 — with GitHub Actions Error

typo fix

7a7cd17

mitu626 had a problem deploying to Metax_ci January 23, 2026 07:53 — with GitHub Actions Failure

Jiang-Jia-Jun previously approved these changes Jan 23, 2026

View reviewed changes

Jiang-Jia-Jun requested a review from Copilot January 23, 2026 08:03

Copilot started reviewing on behalf of Jiang-Jia-Jun January 23, 2026 08:04 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

ignore exceptions in xpu ..

fe8ee93

mitu626 dismissed Jiang-Jia-Jun’s stale review via fe8ee93 January 23, 2026 11:38

mitu626 temporarily deployed to Metax_ci January 23, 2026 11:38 — with GitHub Actions Inactive

Jiang-Jia-Jun approved these changes Jan 26, 2026

View reviewed changes

EmmonsCurse approved these changes Jan 26, 2026

View reviewed changes

EmmonsCurse merged commit 84a1780 into PaddlePaddle:develop Jan 26, 2026
22 of 24 checks passed

-    import paddle
-    prop = paddle.device.cuda.get_device_properties()
-    sm_version = prop.major * 10 + prop.minor
-    print(f"current sm_version={sm_version}")
-    import os
+    import os
+    import warnings
+    import paddle
+    # Collect SM versions of all visible CUDA devices
+    device_count = paddle.device.cuda.device_count()
+    sm_versions = set()
+    for device_id in range(device_count):
+        prop = paddle.device.cuda.get_device_properties(device_id)
+        sm_versions.add(prop.major * 10 + prop.minor)
+    # Warn user if multiple different SM versions are detected.
+    # Custom ops are still selected based on device 0 and are assumed
+    # to be compatible with all GPUs in the current environment.
+    if len(sm_versions) > 1:
+        warnings.warn(
+            "Detected multiple GPUs with different SM versions. "
+            "Custom GPU ops are selected based on the SM version of device 0; "
+            "please ensure that all GPUs used with these ops have compatible "
+            "SM versions.",
+            UserWarning,
+        )
+    # Use device 0 as the canonical device for selecting SM-specific ops.
+    prop0 = paddle.device.cuda.get_device_properties(0)
+    sm_version = prop0.major * 10 + prop0.minor
+    print(f"current sm_version={sm_version}")

-    cp -r $OPS_SRC_DIR/$OPS_TMP_DIR/* ./fastdeploy/model_executor/ops/gpu
+    cp -r "$OPS_SRC_DIR/$OPS_TMP_DIR/"* ./fastdeploy/model_executor/ops/gpu || {
+      echo -e "${RED}[FAIL]${NONE} failed to copy unified custom ops to target directory"
+      exit 1
+    }

	echo "BUILD CUSTOM OPS: ${building_arcs}, ${tmp_dir}"
	echo -e "${BLUE}[build]${NONE} Building custom ops: ${building_arcs} to ${tmp_dir}"

-    mkdir -p ${OPS_SRC_DIR}/${OPS_TMP_DIR}
+    # Ensure ops temporary directory exists and is writable before building
+    local ops_build_dir="${OPS_SRC_DIR}/${OPS_TMP_DIR}"
+    mkdir -p "${ops_build_dir}"
+    if [ $? -ne 0 ] || [ ! -d "${ops_build_dir}" ] || [ ! -w "${ops_build_dir}" ]; then
+      echo "[ERROR] Failed to create or access directory: ${ops_build_dir}" >&2
+      exit 1
+    fi

[build] support build sm 80,86,89,90 to one whl package #6173

[build] support build sm 80,86,89,90 to one whl package #6173

Conversation

mitu626 commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mitu626 commented Jan 22, 2026 •

edited

Loading

codecov-commenter commented Jan 23, 2026 •

edited

Loading