Enable address sanitizer for mpi4py runs #10

devreal · 2025-12-10T16:21:36Z

No description provided.

Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>

github-actions · 2025-12-10T16:58:41Z

Hello! The Git Commit Checker CI bot found a few problems with this PR:

b8cba3b: Install ASAN through apt

check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Signed-off-by: Nathan Bellalou <nbellalou@nvidia.com>

Create two bool variables, opal_single_threaded and opal_common_ucx_single_threaded, that mimic behavior of variables opal_uses_threads and opal_common_ucx_single_threaded, in order to propagate mpi thread level to opal while preserving abstraction. opal_single_threaded is true if and only if mpi thread level is MPI_THREAD_SINGLE Signed-off-by: Nathan Bellalou <nbellalou@nvidia.com>

Turns out the requests being returned to the UCX PML's persisten request list weren't being properly finalized. But it turns out mpi4py unit testing tests all kinds of edge cases, like getting the fortran handle for a persistent requests, and thus triggered a bug in the UCX PML when OMPI is configured with debug. Characteristic traceback at finalize prior to this patch is: python3: ../opal/mca/threads/pthreads/threads_pthreads_mutex.h:86: opal_thread_internal_mutex_lock: Assertion `0 == ret' failed. [er-head:1179128] *** Process received signal *** [er-head:1179128] Signal: Aborted (6) [er-head:1179128] Signal code: (-6) [er-head:1179128] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7ffff71edcf0] [er-head:1179128] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7ffff66daacf] [er-head:1179128] [ 2] /lib64/libc.so.6(abort+0x127)[0x7ffff66adea5] [er-head:1179128] [ 3] /lib64/libc.so.6(+0x21d79)[0x7ffff66add79] [er-head:1179128] [ 4] /lib64/libc.so.6(+0x47426)[0x7ffff66d3426] [er-head:1179128] [ 5] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x414a2)[0x7ffff1ccb4a2] [er-head:1179128] [ 6] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x4150d)[0x7ffff1ccb50d] [er-head:1179128] [ 7] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_pointer_array_set_item+0x7c)[0x7ffff1ccbd40] [er-head:1179128] [ 8] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x3a5adb)[0x7ffff21c1adb] [er-head:1179128] [ 9] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x3a7aa)[0x7ffff1cc47aa] [er-head:1179128] [10] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x3b34d)[0x7ffff1cc534d] [er-head:1179128] [11] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x39e934)[0x7ffff21ba934] [er-head:1179128] [12] /home/foobar/ompi/install_it/lib/libmpi.so.0(mca_pml_ucx_cleanup+0x314)[0x7ffff21bc96d] [er-head:1179128] [13] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x3a79ad)[0x7ffff21c39ad] [er-head:1179128] [14] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x39c57e)[0x7ffff21b857e] [er-head:1179128] [15] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_finalize_cleanup_domain+0x3e)[0x7ffff1cd32fa] [er-head:1179128] [16] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_finalize+0x56)[0x7ffff1cc1ca0] [er-head:1179128] [17] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_rte_finalize+0x312)[0x7ffff1edaad5] [er-head:1179128] [18] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0xc4dd8)[0x7ffff1ee0dd8] [er-head:1179128] [19] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_mpi_instance_finalize+0x13a)[0x7ffff1ee1064] [er-head:1179128] [20] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_mpi_finalize+0x5f3)[0x7ffff1ed4c44] [er-head:1179128] [21] /home/foobar/ompi/install_it/lib/libmpi.so.0(PMPI_Finalize+0x54)[0x7ffff1f29440] related to open-mpi#13623 Signed-off-by: Howard Pritchard <howardp@lanl.gov>

When we added the MCA_BASE_COMPONENT_INIT() macro to clean up LTO build issues, we accidently added a _component to the end of the component name, breaking the build for any platform that uses the bsdx_ipv4 component. Signed-off-by: Brian Barrett <bbarrett@amazon.com>

Don't search for a .git directory; it might not exist. Also, remove unnecessary Mercurial and Subversion support; we haven't used these for years. Signed-off-by: Jeff Squyres <jeff@squyres.com>

Signed-off-by: Jeff Squyres <jeff@squyres.com> Signed-off-by: Howard Pritchard <howardp@lanl.gov>

Use Intersphinx (https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) for making links out to PMIx and PRTE docs. If we simply always linked against the https/internet PMIx and PRTE docs, Intersphinx makes this very easy. But that's not the Open MPI way! Instead, we want to support linking against the internal (embedded) PMIx and PRTE docs when relevant and possible, mainly to support fully-offline HTML docs (e.g., for those who operated in not-connected-to-the-internet scenarios). As such, there's several cases that need to be handled properly: 1. When building the internal PMIx / PRTE, link to the local instances of those docs (vs. the https/internet instance). Ensure to use relative paths (vs. absolute paths) so that the pre-built HTML docs that we include in OMPI distribution tarballs work, regardless of the --prefix/etc. used at configure time. NOTE: When the Open MPI Sphinx docs are built, we have not yet installed the PMIx / PRTE docs. So create our own (fake) objects.inv inventory file for where the PMIx / PRTE docs *will* be installed so that Intersphinx can do its deep linking properly. At least for now, we only care about deep links for pmix_info(1) and prte_info(1), so we can just hard-code those into those inventory files and that's good enough. If the OMPI docs link more deeply into the PMIx / PRTE docs someday (i.e., link to a bunch more things than just pmix_info(1) / prte_info(1)), we might need to revisit this design decision. 2. When building against an external PMIx / PRTE, make a best guess as to where their local HTML doc instance may be (namely: $project_prefix/share/doc/PROJECT). Don't try to handle all the possibilities -- it just gets even more complicated than this already is. If we can't find it, just link out to the https/internet docs. Other miscellaneous small changes: * Added another Python module in docs/requirements.txt (for building the Sphinx inventory file). * Use slightly-more-pythonix dict.get() API calls in docs/conf.py for simplicity. * Updated OMPI PRTE submodule pointer to get a prte_info.1.rst label update that works for both upstream PRTE and the OMPI PRTE fork. Signed-off-by: Jeff Squyres <jeff@squyres.com>

Per the prior commit, update all OMPI docs RST to properly link to PMIx and PRTE documentation. Also added a few mpirun(1) links because they were in the vicinity of the pmix_info(1) and prte_info(1) that were being updates. Signed-off-by: Jeff Squyres <jeff@squyres.com>

if/bsdx_ipv4: Fix name in COMPONENT_INIT()

…eq_fix PML/UCX: properly handle persistent req free list items

The default algorithm selections were out of date and not performing well. After gathering data using the ompi-collectives-tuning package, new default algorithm decisions are selected for bcast. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>

Fixes the deadcode path issues from coverity in bcast and reduce. Signed-off-by: Nithya V S <Nithya.VS@amd.com>

Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>

…ixes coll/han: Fix null dereference in revoke_local

…odeFix opal/mca/common/ucx : assert fix - change thread mode sent to UCX api

docs: update TCP docs + support deep linking into PMIx and PRTE docs

…y_fix coll/acoll: Fixes for coverity deadcode issues

coll/tuned: Change the bcast default collective algorithm selection

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Use #ifdef with system headers

btl/ofi: fault tolerance

Run mpi4py with ASAN, with a separate step that aborts on errors. The existing steps should run to completion even if an error is detected. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

30 minutes are not enough to run two extra tests so just enable ASAN for the existing tests. Also test `ompi_info` and `mpicc`. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

This may reduce overhead, although according to https://github.com/google/sanitizers/wiki/addresssanitizerflags it should be disabled by default. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Matthew-Whitlock added 2 commits November 12, 2025 09:35

btl/ofi: fault tolerance

97ccdd6

Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>

btl/ofi check for valid pointer in error handler

6152e7e

Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>

devreal force-pushed the mpi4py-asan branch from afd07eb to cf48926 Compare December 10, 2025 16:33

devreal force-pushed the mpi4py-asan branch from b8cba3b to e32b5c3 Compare December 10, 2025 17:07

nbellalou and others added 15 commits December 23, 2025 13:52

opal/mca/common/ucx : assert fix - change thread mode sent to UCX api

5938c94

Signed-off-by: Nathan Bellalou <nbellalou@nvidia.com>

update-my-copyright.py: properly support git workspaces

5e859a9

Don't search for a .git directory; it might not exist. Also, remove unnecessary Mercurial and Subversion support; we haven't used these for years. Signed-off-by: Jeff Squyres <jeff@squyres.com>

docs: update the TCP tuning page

8c027cf

Signed-off-by: Jeff Squyres <jeff@squyres.com> Signed-off-by: Howard Pritchard <howardp@lanl.gov>

Merge pull request open-mpi#13633 from bwbarrett/bugfix/fix-bsd-compile

11ffe84

if/bsdx_ipv4: Fix name in COMPONENT_INIT()

Merge pull request open-mpi#13632 from hppritcha/ucx_pml_persistent_r…

57b8c9b

…eq_fix PML/UCX: properly handle persistent req free list items

coll/acoll: Fixes for coverity deadcode issues

fe641a7

Fixes the deadcode path issues from coverity in bcast and reduce. Signed-off-by: Nithya V S <Nithya.VS@amd.com>

revoke: Fix null dereference, improve debug prints, comment assumptions

59f8e2e

Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>

Merge pull request open-mpi#13641 from Matthew-Whitlock/coll_revoke_f…

cf73d24

…ixes coll/han: Fix null dereference in revoke_local

Merge pull request open-mpi#13590 from nbellalou/nbellalou/ucxThreadM…

3e00498

…odeFix opal/mca/common/ucx : assert fix - change thread mode sent to UCX api

devreal force-pushed the mpi4py-asan branch from e32b5c3 to 00f4cd2 Compare January 14, 2026 10:57

Merge pull request open-mpi#13599 from jsquyres/pr/tcp-docs-updates

4b7a78a

docs: update TCP docs + support deep linking into PMIx and PRTE docs

devreal force-pushed the mpi4py-asan branch 7 times, most recently from c2eeee3 to 49ecbba Compare January 14, 2026 13:48

Merge pull request open-mpi#13639 from amd-nithyavs/13Jan2026_coverit…

18b6e4a

…y_fix coll/acoll: Fixes for coverity deadcode issues

bosilca and others added 6 commits January 14, 2026 11:34

Merge pull request open-mpi#12278 from jiaxiyan/bcast

0516270

coll/tuned: Change the bcast default collective algorithm selection

Use #ifdef with HAVE_* defines.

1698b45

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Fix an #endif comment

13280e7

Signed-off-by: George Bosilca <gbosilca@nvidia.com>

Merge pull request open-mpi#13649 from bosilca/fix/13647

15999cc

Use #ifdef with system headers

Merge pull request open-mpi#13429 from Matthew-Whitlock/ofi_ft

a8d9b44

btl/ofi: fault tolerance

Enable ASAN for mpi4py in CI

c6fc05d

Run mpi4py with ASAN, with a separate step that aborts on errors. The existing steps should run to completion even if an error is detected. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal force-pushed the mpi4py-asan branch from 0abff70 to c6fc05d Compare January 14, 2026 21:29

Remove disable of ASLR and apt update

7f5eea7

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal force-pushed the mpi4py-asan branch from 64dfccf to 9b3d857 Compare January 14, 2026 22:05

Enable ASAN for all mpi4py tests and increase optimizations

1af95bb

30 minutes are not enough to run two extra tests so just enable ASAN for the existing tests. Also test `ompi_info` and `mpicc`. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

devreal force-pushed the mpi4py-asan branch from 9b3d857 to 1af95bb Compare January 14, 2026 22:06

ASAN: explicitly disable stack-use-after-return check

b8a4c15

This may reduce overhead, although according to https://github.com/google/sanitizers/wiki/addresssanitizerflags it should be disabled by default. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable address sanitizer for mpi4py runs #10

Enable address sanitizer for mpi4py runs #10

devreal commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Enable address sanitizer for mpi4py runs #10

Are you sure you want to change the base?

Enable address sanitizer for mpi4py runs #10

Conversation

devreal commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants