forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 4
Enable address sanitizer for mpi4py runs #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
devreal
wants to merge
28
commits into
main
Choose a base branch
from
mpi4py-asan
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+561
−243
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
|
Hello! The Git Commit Checker CI bot found a few problems with this PR: b8cba3b: Install ASAN through apt
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Signed-off-by: Nathan Bellalou <nbellalou@nvidia.com>
Create two bool variables, opal_single_threaded and opal_common_ucx_single_threaded, that mimic behavior of variables opal_uses_threads and opal_common_ucx_single_threaded, in order to propagate mpi thread level to opal while preserving abstraction. opal_single_threaded is true if and only if mpi thread level is MPI_THREAD_SINGLE Signed-off-by: Nathan Bellalou <nbellalou@nvidia.com>
Turns out the requests being returned to the UCX PML's persisten request list weren't being properly finalized. But it turns out mpi4py unit testing tests all kinds of edge cases, like getting the fortran handle for a persistent requests, and thus triggered a bug in the UCX PML when OMPI is configured with debug. Characteristic traceback at finalize prior to this patch is: python3: ../opal/mca/threads/pthreads/threads_pthreads_mutex.h:86: opal_thread_internal_mutex_lock: Assertion `0 == ret' failed. [er-head:1179128] *** Process received signal *** [er-head:1179128] Signal: Aborted (6) [er-head:1179128] Signal code: (-6) [er-head:1179128] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x7ffff71edcf0] [er-head:1179128] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7ffff66daacf] [er-head:1179128] [ 2] /lib64/libc.so.6(abort+0x127)[0x7ffff66adea5] [er-head:1179128] [ 3] /lib64/libc.so.6(+0x21d79)[0x7ffff66add79] [er-head:1179128] [ 4] /lib64/libc.so.6(+0x47426)[0x7ffff66d3426] [er-head:1179128] [ 5] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x414a2)[0x7ffff1ccb4a2] [er-head:1179128] [ 6] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x4150d)[0x7ffff1ccb50d] [er-head:1179128] [ 7] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_pointer_array_set_item+0x7c)[0x7ffff1ccbd40] [er-head:1179128] [ 8] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x3a5adb)[0x7ffff21c1adb] [er-head:1179128] [ 9] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x3a7aa)[0x7ffff1cc47aa] [er-head:1179128] [10] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(+0x3b34d)[0x7ffff1cc534d] [er-head:1179128] [11] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x39e934)[0x7ffff21ba934] [er-head:1179128] [12] /home/foobar/ompi/install_it/lib/libmpi.so.0(mca_pml_ucx_cleanup+0x314)[0x7ffff21bc96d] [er-head:1179128] [13] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x3a79ad)[0x7ffff21c39ad] [er-head:1179128] [14] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0x39c57e)[0x7ffff21b857e] [er-head:1179128] [15] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_finalize_cleanup_domain+0x3e)[0x7ffff1cd32fa] [er-head:1179128] [16] /home/foobar/ompi/install_it/lib/libopen-pal.so.0(opal_finalize+0x56)[0x7ffff1cc1ca0] [er-head:1179128] [17] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_rte_finalize+0x312)[0x7ffff1edaad5] [er-head:1179128] [18] /home/foobar/ompi/install_it/lib/libmpi.so.0(+0xc4dd8)[0x7ffff1ee0dd8] [er-head:1179128] [19] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_mpi_instance_finalize+0x13a)[0x7ffff1ee1064] [er-head:1179128] [20] /home/foobar/ompi/install_it/lib/libmpi.so.0(ompi_mpi_finalize+0x5f3)[0x7ffff1ed4c44] [er-head:1179128] [21] /home/foobar/ompi/install_it/lib/libmpi.so.0(PMPI_Finalize+0x54)[0x7ffff1f29440] related to open-mpi#13623 Signed-off-by: Howard Pritchard <howardp@lanl.gov>
When we added the MCA_BASE_COMPONENT_INIT() macro to clean up LTO build issues, we accidently added a _component to the end of the component name, breaking the build for any platform that uses the bsdx_ipv4 component. Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Don't search for a .git directory; it might not exist. Also, remove unnecessary Mercurial and Subversion support; we haven't used these for years. Signed-off-by: Jeff Squyres <jeff@squyres.com>
Signed-off-by: Jeff Squyres <jeff@squyres.com> Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Use Intersphinx (https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) for making links out to PMIx and PRTE docs. If we simply always linked against the https/internet PMIx and PRTE docs, Intersphinx makes this very easy. But that's not the Open MPI way! Instead, we want to support linking against the internal (embedded) PMIx and PRTE docs when relevant and possible, mainly to support fully-offline HTML docs (e.g., for those who operated in not-connected-to-the-internet scenarios). As such, there's several cases that need to be handled properly: 1. When building the internal PMIx / PRTE, link to the local instances of those docs (vs. the https/internet instance). Ensure to use relative paths (vs. absolute paths) so that the pre-built HTML docs that we include in OMPI distribution tarballs work, regardless of the --prefix/etc. used at configure time. NOTE: When the Open MPI Sphinx docs are built, we have not yet installed the PMIx / PRTE docs. So create our own (fake) objects.inv inventory file for where the PMIx / PRTE docs *will* be installed so that Intersphinx can do its deep linking properly. At least for now, we only care about deep links for pmix_info(1) and prte_info(1), so we can just hard-code those into those inventory files and that's good enough. If the OMPI docs link more deeply into the PMIx / PRTE docs someday (i.e., link to a bunch more things than just pmix_info(1) / prte_info(1)), we might need to revisit this design decision. 2. When building against an external PMIx / PRTE, make a best guess as to where their local HTML doc instance may be (namely: $project_prefix/share/doc/PROJECT). Don't try to handle all the possibilities -- it just gets even more complicated than this already is. If we can't find it, just link out to the https/internet docs. Other miscellaneous small changes: * Added another Python module in docs/requirements.txt (for building the Sphinx inventory file). * Use slightly-more-pythonix dict.get() API calls in docs/conf.py for simplicity. * Updated OMPI PRTE submodule pointer to get a prte_info.1.rst label update that works for both upstream PRTE and the OMPI PRTE fork. Signed-off-by: Jeff Squyres <jeff@squyres.com>
Per the prior commit, update all OMPI docs RST to properly link to PMIx and PRTE documentation. Also added a few mpirun(1) links because they were in the vicinity of the pmix_info(1) and prte_info(1) that were being updates. Signed-off-by: Jeff Squyres <jeff@squyres.com>
if/bsdx_ipv4: Fix name in COMPONENT_INIT()
…eq_fix PML/UCX: properly handle persistent req free list items
The default algorithm selections were out of date and not performing well. After gathering data using the ompi-collectives-tuning package, new default algorithm decisions are selected for bcast. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
Fixes the deadcode path issues from coverity in bcast and reduce. Signed-off-by: Nithya V S <Nithya.VS@amd.com>
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
…ixes coll/han: Fix null dereference in revoke_local
…odeFix opal/mca/common/ucx : assert fix - change thread mode sent to UCX api
docs: update TCP docs + support deep linking into PMIx and PRTE docs
c2eeee3 to
49ecbba
Compare
…y_fix coll/acoll: Fixes for coverity deadcode issues
coll/tuned: Change the bcast default collective algorithm selection
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Use #ifdef with system headers
btl/ofi: fault tolerance
Run mpi4py with ASAN, with a separate step that aborts on errors. The existing steps should run to completion even if an error is detected. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
30 minutes are not enough to run two extra tests so just enable ASAN for the existing tests. Also test `ompi_info` and `mpicc`. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
This may reduce overhead, although according to https://github.com/google/sanitizers/wiki/addresssanitizerflags it should be disabled by default. Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.