-
Notifications
You must be signed in to change notification settings - Fork 607
[Common] Enable determinism for cuDNN >= 9.18 on Blackwell #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR enables deterministic FusedAttention on Blackwell GPUs (sm_arch >= 100) for FP16/BF16 precisions with cuDNN >= 9.18.0. The key changes include:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant PyTorchJAX as PyTorch/JAX API
participant PythonUtils as Python Utils
participant CppExtensions as C++ Extensions
participant BackendSelection as Backend Selection
participant cuDNN
User->>PyTorchJAX: Set NVTE_ALLOW_NONDETERMINISTIC_ALGO=0
User->>PyTorchJAX: Call FusedAttention with training=True
PyTorchJAX->>PythonUtils: get_attention_backend with deterministic=True
PythonUtils->>CppExtensions: get_fused_attn_backend with deterministic param
CppExtensions->>BackendSelection: nvte_get_fused_attn_backend with deterministic param
alt Blackwell GPU sm_arch >= 100 and Training and Deterministic
BackendSelection->>BackendSelection: Check cuDNN version >= 9.18.0
BackendSelection->>BackendSelection: Check dropout == 0.0
BackendSelection->>BackendSelection: Check bias == NO_BIAS
alt All conditions satisfied
BackendSelection-->>CppExtensions: Return NVTE_F16_arbitrary_seqlen
else Conditions not satisfied
BackendSelection-->>CppExtensions: Return NVTE_F16_max512_seqlen fallback
end
else Other GPU or Non-deterministic mode
BackendSelection-->>CppExtensions: Return NVTE_F16_arbitrary_seqlen
end
CppExtensions-->>PythonUtils: Selected backend
PythonUtils-->>PyTorchJAX: Backend information
PyTorchJAX->>CppExtensions: Forward pass with deterministic=false
CppExtensions->>cuDNN: Execute forward pass always deterministic
cuDNN-->>CppExtensions: Output and auxiliary tensors
CppExtensions-->>PyTorchJAX: Forward results
PyTorchJAX->>CppExtensions: Backward pass with deterministic=true
CppExtensions->>cuDNN: Execute backward deterministic path
cuDNN-->>CppExtensions: Gradients
CppExtensions-->>PyTorchJAX: Backward results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Greptile OverviewGreptile SummaryOverviewThis PR enables determinism for FusedAttention on Blackwell GPUs (SM 100) with cuDNN version 9.18.0 or higher. The implementation moves determinism checking logic from Python to the C++ backend selection layer. Key Changes
ArchitectureThe change follows a layered approach:
The implementation correctly restricts deterministic FusedAttention to cases where cuDNN guarantees deterministic behavior, avoiding silent non-determinism. Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User as User/Test
participant PyAPI as Python API
participant Utils as utils.py
participant CppExt as C++ Extensions
participant Backend as Backend Selection
participant cuDNN as cuDNN Library
User->>PyAPI: Call attention with deterministic=True
PyAPI->>Utils: get_attention_backend(params)
Utils->>Utils: Extract deterministic from params
Utils->>CppExt: get_fused_attn_backend(..., deterministic)
CppExt->>Backend: nvte_get_fused_attn_backend(..., deterministic)
alt Blackwell (sm_arch >= 100) & Training & Deterministic
Backend->>Backend: Check cuDNN version >= 9.18.0
Backend->>Backend: Check bias_type == NO_BIAS
Backend->>Backend: Check dropout == 0.0
alt All checks pass
Backend-->>CppExt: F16_arbitrary_seqlen backend
else Any check fails
Backend-->>CppExt: No_Backend (disabled)
end
else Other architectures or inference
Backend->>Backend: Apply standard backend selection
Backend-->>CppExt: Selected backend
end
CppExt-->>Utils: Backend choice
Utils-->>PyAPI: Backend configuration
alt Forward Pass
PyAPI->>CppExt: nvte_fused_attn_fwd(..., deterministic=true)
Note over PyAPI,CppExt: Forward always uses deterministic=true
else Backward Pass
PyAPI->>CppExt: nvte_fused_attn_bwd(..., deterministic)
Note over PyAPI,CppExt: Backward respects user's deterministic flag
end
CppExt->>cuDNN: Execute attention operation
cuDNN-->>CppExt: Results
CppExt-->>PyAPI: Output tensors
PyAPI-->>User: Attention output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
make .xml file specific to deterministic tests in qa/ Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
fix typo Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
fix indentation Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
|
/te-ci L0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
/te-ci L0 |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci jax L0 |
Description
This PR enables determinism for
FusedAttentionon Blackwell for FP16/BF16 precisions and cuDNN >= 9.18.0.To run TE-PyTorch with determinism, please set this flag:
export NVTE_ALLOW_NONDETERMINISTIC_ALGO=0.Type of change
Changes
Please see Description.
Checklist: