-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-36411: [C++][Python] Use meson-python for PyArrow build system #45854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
python/meson.build
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before doing this, include the following code:
# no-op placeholder
arrow_dep = dependency('', required: false)
if get_option('wrap_mode') != 'forcefallback'
arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'], required: false)
endifAnd then shift the rest to look like this:
if not arrow_dep.found()
cmake = import('cmake')
# further lookups
# ...
arrow_dep = arrow_proj.dependency('arrow_shared')
endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this does:
- check build options for wrap_mode, which is a builtin meson option allowing you to choose whether you wish to resolve bundled dependencies or look for system dependencies. It defaults to finding system dependencies, but when users run meson with
--wrap-mode=forcefallbackthey are asking to explicitly avoid system deps - first try to find an arrow dependency, using both names it might be available as:
- "arrow" (pkgconfig)
- "Arrow" (cmake, yes capitalization does matter), with
modules:ensuring we pick up the correct cmake find_package() variable
- if it is not available,
required: falsemeans we continue to import the cmake subproject as a fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to avoid doing if/else checks:
$ cat subprojects/arrow.wrap
[wrap-file]
directory = arrow
method = cmake
[provide]
arrow = arrow_static_dep
However, using wrap files with method=cmake doesn't (currently) allow you to pass your add_cmake_defines. If you didn't need any defines, then you could simply do this:
arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'])and you would not need any if/else, it would automatically build the cmake subproject if either:
- wrap-mode=forcefallback
- no system arrow was found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that wrap file work? The directory to the cpp source is in arrow/cpp whereas the wrap file itself will be located in arrow/python/subprojects - how would that resolve to the right directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you included a symlink anyway, I chose not to bother including any mechanism for downloading the wrap contents. Meson skips over that because the directory already exists with the correct content.
The key benefit of the wrap file is that it allows specifying in ini syntax:
- the subproject should use the
method=cmakeautomatically, when used viadependency() - the autogenerated
arrow_static_dep(maybe this should bearrow_shared_depinstead?) will fulfilldependency('arrow')
Again, it's missing the necessary cmake defines so it may not be worth pursuing further.
cf5b610 to
b902e1d
Compare
eabf11f to
7be3f7b
Compare
python/pyproject.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could still use setuptools-scm with meson if you want.
project(
'pyarrow',
# ....,
version: run_command('python3', '-m', 'setuptools_scm', '--force-write-version-files', check: true).stdout().strip(),
)05ff60d to
6e0a5fe
Compare
|
@kou I have made some offline progress on this, but one of the things I am getting stuck on is how the pyarrow C++ modules are being compiled. From what I understand, the current build process will compile Cython modules first (at least lib.pyx) and from that auto-generate Assuming that understanding is correct, where in the process are lib.h and lib_api.h being generated? I found the CMake command that copies them from the source to the build folder, but I can't figure out where they come from in the first place. Any guidance would be appreciated. |
|
The following codes may be related: arrow/cpp/cmake_modules/UseCython.cmake Lines 120 to 126 in 5e9fce4
Line 674 in 5e9fce4
|
|
Ah nevermind I think I have figured it out. So it looks like Cython generates the header files in the build directory when compiling lib.pyx, so the idea is to copy those header files to a directory structure in the build directory that the sources can resolve to. I'll have to think about the best way to accomplish that via Meson. |
d67b903 to
ba8b276
Compare
a2d07ad to
4ff818e
Compare
2b331bd to
554b730
Compare
9a0be37 to
62168a0
Compare
|
@github-actions crossbow submit -g python |
|
Revision: 62168a0 Submitted crossbow builds: ursacomputing/crossbow @ actions-180cf98f75 |
|
Rebased on main and all green. From the crossbow jobs, the freethreading and pandas job failures appear directly related to #48314 and not this PR The emscripten failure also seems to be unrelated, as it fails on an install step prior to the PyArrow steps The cuda failures don't seem to have produced logs? If there is something to look into there please let me know |
844d284 to
3595424
Compare
|
@WillAyd I am going to have some time for this once I finish releasing Arrow 23.0.0. I've fixed the remaining pre-existing CI failures on main both for the PR checks and the extended archery ones. From what I read/understand this PR should be ready to be reviewable and the current PR failures where also happening on main at the time of last pushing. In this case, can you rebase to see if we can get cleaner CI? |
3595424 to
bf88854
Compare
|
That's awesome thanks @raulcd ! The biggest challenge with this I think is just having maintained a long-lived PR. I am of the impression that the core foundation is in a good place, but some of the non-Meson items (particularly the conda library versions) can take a lot of time to troubleshoot as they come up Any help to troubleshoot and get another set of eyes on this is greatly appreciated |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current Python failures are related to azurite not being installed properly and test_fs azure failing. This is because the version of nodejs installed by mamba is a really old one, I am unsure why the resolver is using nodejs 12.4.0 here, see error installing azurite:
2026-01-13T18:32:39.6961900Z #15 [ 9/11] RUN /arrow/ci/scripts/install_azurite.sh
2026-01-13T18:32:39.9624124Z #15 0.417 Node.js version = v12.4.0
2026-01-13T18:32:42.2087322Z #15 2.663 npm WARN deprecated [email protected]: Rimraf versions prior to v4 are no longer supported
2026-01-13T18:32:42.3917601Z #15 2.846 npm WARN deprecated [email protected]: Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details.
2026-01-13T18:32:51.4870197Z #15 11.94 npm WARN deprecated [email protected]: Glob versions prior to v9 are no longer supported
2026-01-13T18:32:51.7035681Z #15 12.01 npm WARN deprecated [email protected]: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
2026-01-13T18:33:02.1406491Z #15 22.59 /opt/conda/envs/arrow/bin/azurite -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/azurite.js
2026-01-13T18:33:02.3841290Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-queue -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/queue/main.js
2026-01-13T18:33:02.3842792Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-blob -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/blob/main.js
2026-01-13T18:33:02.3844216Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-table -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/table/main.js
2026-01-13T18:33:02.3846002Z #15 22.66 npm WARN [email protected] requires a peer of applicationinsights-native-metrics@* but none is installed. You must install peer dependencies yourself.
2026-01-13T18:33:02.3847278Z #15 22.66
2026-01-13T18:33:02.3847564Z #15 22.66 + [email protected]
2026-01-13T18:33:02.3848038Z #15 22.66 added 376 packages from 296 contributors in 20.644s
2026-01-13T18:33:02.3848830Z #15 22.69 /opt/conda/envs/arrow/bin/azurite
2026-01-13T18:33:02.8929329Z #15 23.35 /opt/conda/envs/arrow/lib/node_modules/azurite/node_modules/fs-extra/lib/util/async.js:14
2026-01-13T18:33:02.8930231Z #15 23.35 (err) => err ?? new Error('unknown error')
2026-01-13T18:33:02.8930740Z #15 23.35 ^
2026-01-13T18:33:02.8931089Z #15 23.35
2026-01-13T18:33:02.8931379Z #15 23.35 SyntaxError: Unexpected token ?
2026-01-13T18:33:02.8931960Z #15 23.35 at Module._compile (internal/modules/cjs/loader.js:718:23)
2026-01-13T18:33:02.8932747Z #15 23.35 at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
2026-01-13T18:33:02.8933534Z #15 23.35 at Module.load (internal/modules/cjs/loader.js:641:32)
2026-01-13T18:33:02.8934253Z #15 23.35 at Function.Module._load (internal/modules/cjs/loader.js:556:12)
2026-01-13T18:33:02.8934994Z #15 23.35 at Module.require (internal/modules/cjs/loader.js:681:19)
2026-01-13T18:33:02.8935642Z #15 23.35 at require (internal/modules/cjs/helpers.js:16:16)
2026-01-13T18:33:02.8936708Z #15 23.35 at Object.<anonymous> (/opt/conda/envs/arrow/lib/node_modules/azurite/node_modules/fs-extra/lib/copy/copy.js:9:44)
2026-01-13T18:33:02.8937785Z #15 23.35 at Module._compile (internal/modules/cjs/loader.js:774:30)
2026-01-13T18:33:02.8938733Z #15 23.35 at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10)
2026-01-13T18:33:02.8939855Z #15 23.35 at Module.load (internal/modules/cjs/loader.js:641:32)
I have reproduced locally and fixed pinning the minimum required node for azurite to run:
diff --git a/ci/conda_env_cpp.txt b/ci/conda_env_cpp.txt
index 5f264fe515..3fd1fdea1a 100644
--- a/ci/conda_env_cpp.txt
+++ b/ci/conda_env_cpp.txt
@@ -39,7 +39,7 @@ lz4-c>=1.10.0
make
meson
ninja
-nodejs
+nodejs>=16
orc<2.1.0
pkg-config
pythonReproduced locally with:
ARCHERY_DEBUG=1 PYTHON=3.13 PANDAS=nightly NUMPY=nightly PANDAS_FUTURE_INFER_STRING=1 archery docker run -e SETUPTOOLS_SCM_PRETEND_VERSION="23.0.0.dev328" --no-leaf-cache conda-python-pandas
and final result with the diff:
=== 7814 passed, 253 skipped, 22 xfailed, 2 xpassed, 53 warnings in 311.51s (0:05:11) =====
@WillAyd let me know if you prefer me to push those minor fixes directly to your branch or if you prefer me to comment with diffs so you can apply 👍
|
Wow thanks for finding that! Feel free to push directly - thanks @raulcd ! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@github-actions crossbow submit -g python -g wheel |
|
Revision: ce85d06 Submitted crossbow builds: ursacomputing/crossbow @ actions-3efa5e7447 |
| git init . | ||
| git add --all . | ||
| git commit -m "dummy commit for meson dist" | ||
| ${PYTHON:-python} -m build --sdist . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried a couple of things to try and fix sdist:
diff --git a/ci/scripts/python_sdist_build.sh b/ci/scripts/python_sdist_build.sh
index 7bea9c3dc1..a8d849183c 100755
--- a/ci/scripts/python_sdist_build.sh
+++ b/ci/scripts/python_sdist_build.sh
@@ -24,8 +24,12 @@ source_dir=${1}/python
pushd "${source_dir}"
export SETUPTOOLS_SCM_PRETEND_VERSION=${PYARROW_VERSION:-}
# Meson dist must be run from a VCS, so initiate a dummy repo
-git init .
-git add --all .
-git commit -m "dummy commit for meson dist"
+#git init .
+#git config --global --add safe.directory "${source_dir}"
+#git config --global user.name "Your Name"
+#git config --global user.email [email protected]
+#git add --all .
+#git commit -m "dummy commit for meson dist"
+${PYTHON:-python} -m pip install build
${PYTHON:-python} -m build --sdist .
popdI am unsure whether creating the git repo is necessary (we are already cloning arrow) and build seems missing from this docker image.
The problem now is that it seems to want to build pyarrow (requires build-tools), even though building only the sdist shouldn't require building pyarrow as far as I understand:
+ /arrow-dev/bin/python -m build --sdist .
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
- cython >= 3.1
- meson-python
- numpy>=1.25
- setuptools_scm[toml]>=8
* Getting build dependencies for sdist...
* Installing packages in isolated environment:
- ninja >= 1.8.2
* Building sdist...
+ meson setup /arrow/python /arrow/python/.mesonpy-f5plfyxb -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/arrow/python/.mesonpy-f5plfyxb/meson-python-native-file.ini
The Meson build system
Version: 1.10.0
Source dir: /arrow/python
Build dir: /arrow/python/.mesonpy-f5plfyxb
Build type: native build
Project name: pyarrow
Project version: 23.0.0.dev362+gce85d064a.d20260114
../meson.build:18:0: ERROR: Unknown compiler(s): [['c++'], ['g++'], ['clang++'], ['nvc++'], ['pgc++'], ['icpc'], ['icpx']]
The following exception(s) were encountered:
Running `c++ --version` gave "[Errno 2] No such file or directory: 'c++'"
Running `g++ --version` gave "[Errno 2] No such file or directory: 'g++'"
Running `clang++ --version` gave "[Errno 2] No such file or directory: 'clang++'"
Running `nvc++ --version` gave "[Errno 2] No such file or directory: 'nvc++'"
Running `pgc++ --version` gave "[Errno 2] No such file or directory: 'pgc++'"
Running `icpc --version` gave "[Errno 2] No such file or directory: 'icpc'"
Running `icpx --version` gave "[Errno 2] No such file or directory: 'icpx'"
A full log can be found at /arrow/python/.mesonpy-f5plfyxb/meson-logs/meson-log.txt
ERROR Backend subprocess exited when trying to invoke build_sdist
If I install build-essential it fails locating Arrow CPP because it's basically trying to build pyarrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure whether creating the git repo is necessary
It is because building an sdist calls meson dist under the hood, which itself requires to be run from a VCS
The problem now is that it seems to want to build pyarrow
Try adding -Csetup-args="-Dsdist=true" to the invocation. This is something we encountered previously on this PR, so I added that option (noted in the top level python/meson.build file and upstream in mesonbuild/meson-python#647)
Rationale for this change
This helps simplify the steps to build pyarrow by leveraging Meson, a build system strongly inspired by Python's syntax. In it's current form, it requires Arrow to be installed on the host system, but in the future we may even be able to have PyArrow build Arrow as a subproject, as needed
What changes are included in this PR?
This PR adds Meson configuration files to the Python code base within Arrow.
Are these changes tested?
Yes
Are there any user-facing changes?
We may want to deprecate the traditional setup.py way of building PyArrow alongside this.