Skip to content

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Jan 13, 2026

Overview

Adding additional tracing within NewServiceHandler

Add OpenTelemetry tracing for all RPC server methods:

- ExecutorService: InitChain, GetTxs, ExecuteTxs, SetFinal
- StoreService: GetBlock, GetState, GetMetadata, GetGenesisDaHeight, GetP2PStoreInfo
- P2PService: GetPeerInfo, GetNetInfo
- ConfigService: GetNamespace, GetSignerInfo

All implementations follow decorator pattern with comprehensive test coverage.
Tracer names: ev-node/{service}-service
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @chatton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of the system by integrating OpenTelemetry for distributed tracing. It introduces tracing capabilities for the Store, P2P, and Config RPC services, allowing for detailed insights into their execution flow and performance. The tracing can be toggled via configuration, providing flexibility for different environments.

Highlights

  • OpenTelemetry Integration: Introduced OpenTelemetry for distributed tracing across key RPC services.
  • Service Tracing: Added tracing to Store, P2P, and Config RPC service handlers.
  • Conditional Tracing: Tracing is conditionally enabled based on the config.Instrumentation.IsTracingEnabled() flag.
  • Detailed Span Attributes: Spans for RPC calls capture relevant request and response attributes (e.g., block height, hash, peer count, namespace) and record errors.
  • Dependency Updates: Updated go.mod and go.sum to include OpenTelemetry and testify dependencies.
  • Comprehensive Testing: New unit tests ensure the correct functionality and attribute capturing of the tracing wrappers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Jan 13, 2026

Claude finished @chatton's task in 2m 27s —— View job


Code Review Complete

I've performed a comprehensive code review of this PR, which adds OpenTelemetry tracing to the Store, P2P, and Config RPC services. Overall, this is a high-quality implementation with good test coverage and a clean decorator pattern. However, there are several issues that need attention.


Critical Issues

1. Missing Per-Store Attributes in GetP2PStoreInfo 🔴

Location: pkg/rpc/server/tracing.go:140-158

The GetP2PStoreInfo method only records the count of stores but doesn't capture detailed attributes about each store (height, label, etc.). The test at line 315 of tracing_test.go also doesn't verify these attributes, which would have caught this gap.

Impact: Reduced observability for P2P store debugging and monitoring.

Recommendation: Add per-store attributes like:

span.SetAttributes(
    attribute.Int("store_count", len(res.Msg.Stores)),
)
for i, store := range res.Msg.Stores {
    span.SetAttributes(
        attribute.String(fmt.Sprintf("store_%d_label", i), store.Label),
        attribute.Int64(fmt.Sprintf("store_%d_height", i), int64(store.Height)),
    )
}

Related Test Gap: Update TestTracedStoreService_GetP2PStoreInfo_Success to verify these attributes.

Fix this →


Moderate Issues

2. Code Duplication in Error Handling 🟡

Location: Throughout pkg/rpc/server/tracing.go (lines 50-53, 80-83, 108-111, etc.)

Every traced method repeats this error handling pattern:

res, err := t.inner.Method(ctx, req)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
    return nil, err
}

Impact: Maintenance burden - any change to error handling requires updates in ~10 places.

Recommendation: Extract into a generic helper function. Example:

func traceUnary[Req, Res any](
    tracer trace.Tracer,
    ctx context.Context,
    spanName string,
    req *connect.Request[Req],
    call func(context.Context, *connect.Request[Req]) (*connect.Response[Res], error),
    attrs ...attribute.KeyValue,
) (*connect.Response[Res], error) {
    ctx, span := tracer.Start(ctx, spanName, trace.WithAttributes(attrs...))
    defer span.End()
    
    res, err := call(ctx, req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }
    return res, nil
}

This would significantly reduce boilerplate across all traced methods.

Fix this →

3. Inconsistent Attribute Recording 🟡

Location: pkg/rpc/server/tracing.go:32-69 (GetBlock method)

The GetBlock method sets initial attributes differently than it sets response attributes. Initial attributes are set before span creation, while response attributes are set after the inner call. This inconsistency makes the code harder to follow.

Current approach:

  • Request attributes: Set before span creation
  • Response attributes: Set after inner call

Recommendation: Maintain consistency - either set all attributes after the call, or clearly separate request vs response attributes with prefixes like request.height and response.block_size_bytes.

4. Missing Nil Check Safety 🟡

Location: pkg/rpc/server/tracing.go:59-68

While there's a nil check for res.Msg.Block, there's no explicit check for res.Msg itself. Although this is likely guaranteed by the framework, defensive programming would add the check.

Recommendation:

if res.Msg != nil && res.Msg.Block != nil && res.Msg.Block.Data != nil {
    // ... existing logic
}

Minor Issues & Observations

5. Test Coverage Gaps ℹ️

Test Coverage: 82.69% (per Codecov report)

Missing coverage includes:

  • Error scenarios for some methods (only GetBlock, GetPeerInfo, GetNamespace, and GetSignerInfo have error tests)
  • Edge cases like empty stores, nil responses, etc.

Recommendation: Add error test cases for:

  • GetState
  • GetMetadata
  • GetGenesisDaHeight
  • GetP2PStoreInfo
  • GetNetInfo

6. Conditional Tracing Pattern ℹ️

Location: pkg/rpc/server/server.go:385-389

The conditional wrapping approach is clean:

if config.Instrumentation.IsTracingEnabled() {
    storeServer = WithTracingStoreServer(storeServer)
    // ...
}

Positive: This is excellent! It allows zero-overhead when tracing is disabled and maintains clean separation of concerns using the decorator pattern.

7. Hex Encoding Consistency ✅

Location: Throughout tracing.go

Good consistent use of hex.EncodeToString() for binary data (hashes, addresses) in trace attributes. This makes the data readable in tracing UIs.

8. Import Organization ✅

Location: pkg/rpc/server/tracing.go:3-16

Imports are well-organized with standard library, third-party, and internal packages properly separated.


Security & Performance

Security ✅

  • No sensitive data exposure: Reviewed all attribute recordings - no private keys, credentials, or sensitive user data logged
  • Hash encoding: Binary hashes properly hex-encoded, preventing binary injection in trace backends
  • No injection risks: All string interpolation uses proper formatting

Performance ✅

  • Zero overhead when disabled: Conditional wrapping means no performance impact when tracing is off
  • Efficient attribute recording: Attributes are recorded inline without unnecessary allocations
  • Deferred span end: Proper use of defer span.End() ensures spans are closed even on errors

Minor consideration: The block size calculation (lines 60-67) iterates over all transactions. For very large blocks, this could add latency. Consider adding an attribute like calculate_block_size_ms to track this overhead, or make it conditional based on tx count.


Architecture & Design

Decorator Pattern ✅

The implementation correctly uses the decorator pattern:

  • Each traced server wraps the inner implementation
  • Clean separation between tracing logic and business logic
  • Easy to compose with other decorators if needed

Tracer Initialization ✅

Good use of otel.Tracer() with descriptive names:

  • "ev-node/store-service"
  • "ev-node/p2p-service"
  • "ev-node/config-service"

This allows filtering and grouping in trace backends.

Testing Strategy ✅

Excellent testing approach:

  • Mock implementations with injectable functions
  • SpanRecorder for verifying trace output
  • Helper functions (setupStoreTrace, requireAttribute) reduce test boilerplate
  • Tests verify both successful and error cases

Documentation

Missing Documentation 📝

Location: pkg/rpc/server/tracing.go

The file lacks:

  1. Package-level documentation explaining the tracing approach
  2. Comments on the traced server structs explaining their purpose
  3. Function comments on public constructor functions

Recommendation: Add:

// Package server provides RPC server implementations with optional OpenTelemetry tracing.
//
// Tracing is implemented using the decorator pattern. Each service (Store, P2P, Config)
// has a traced variant that wraps the original implementation and emits OpenTelemetry spans.

// tracedStoreServer decorates a StoreServiceHandler with OpenTelemetry spans.
// Each RPC method creates a span with relevant request/response attributes.
type tracedStoreServer struct { ... }

// WithTracingStoreServer decorates the provided store service handler with tracing spans.
// This should be applied when config.Instrumentation.IsTracingEnabled() returns true.
func WithTracingStoreServer(inner v1connect.StoreServiceHandler) v1connect.StoreServiceHandler { ... }

Summary & Recommendations

✅ Strengths

  1. Clean decorator pattern implementation
  2. Good test coverage with well-structured tests
  3. Proper error recording in traces
  4. Zero overhead when disabled
  5. No security issues identified
  6. Consistent attribute naming

🔧 Must Fix

  1. Add per-store attributes in GetP2PStoreInfo (Critical for observability)
  2. Add corresponding test assertions (Would have caught ABCI interface #1)

🎯 Should Fix

  1. Extract error handling into generic helper (reduces duplication)
  2. Add error test cases for remaining methods
  3. Add package and function documentation

💡 Consider

  1. Make block size calculation conditional for large blocks
  2. Add attribute prefixes for request vs response data
  3. Add defensive nil checks

Test Execution Note

I don't have permissions to run the test suite directly. Please ensure:

make test           # Run unit tests
make lint           # Verify linting passes
make test-cover     # Check coverage remains above threshold

| Branch: tracing-part-6

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 14, 2026, 2:21 PM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry tracing for the Store, P2P, and Config services. The implementation uses a decorator pattern to wrap the existing service handlers, which is a good approach. The changes are well-tested, with a new test file covering the tracing logic. I've identified a few areas for improvement: a bug in attribute key generation, an opportunity to simplify the code in one of the traced methods, a suggestion to reduce code duplication across the tracing decorators, and a gap in the test coverage. Overall, this is a great addition for observability.

)
for i, store := range res.Msg.Stores {
span.SetAttributes(
attribute.Int64("store_"+string(rune(i))+"_height", int64(store.Height)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a bug in how the attribute key is constructed. string(rune(i)) will convert the integer index i into its corresponding Unicode character, not its string representation. For example, for i=0, it will produce a null character (\x00), which is likely not what you intend and may cause issues with tracing backends.

You should use strconv.Itoa(i) to convert the integer to a string. This will require adding an import for the strconv package.

			attribute.Int64("store_"+strconv.Itoa(i)+"_height", int64(store.Height)),

Comment on lines 36 to 54
var height uint64
switch identifier := req.Msg.Identifier.(type) {
case *pb.GetBlockRequest_Height:
height = identifier.Height
case *pb.GetBlockRequest_Hash:
// for hash-based queries, we'll add the hash as an attribute
}

ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock",
trace.WithAttributes(
attribute.Int64("height", int64(height)),
),
)
defer span.End()

// add hash attribute if hash-based query
if hashIdentifier, ok := req.Msg.Identifier.(*pb.GetBlockRequest_Hash); ok {
span.SetAttributes(attribute.String("hash", hex.EncodeToString(hashIdentifier.Hash)))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for setting span attributes based on the request identifier can be simplified. Currently, you're using a switch to get the height, and then a separate if to get the hash, which involves checking the type of req.Msg.Identifier twice. This can be done more cleanly in a single switch statement that builds the attributes before starting the span. This will make the code more readable and efficient.

	var attrs []attribute.KeyValue
	switch identifier := req.Msg.Identifier.(type) {
	case *pb.GetBlockRequest_Height:
		attrs = append(attrs, attribute.Int64("height", int64(identifier.Height)))
	case *pb.GetBlockRequest_Hash:
		if identifier.Hash != nil {
			attrs = append(attrs, attribute.String("hash", hex.EncodeToString(identifier.Hash)))
		}
	}

	ctx, span := t.tracer.Start(ctx, "StoreService.GetBlock", trace.WithAttributes(attrs...))
	defer span.End()

Comment on lines +86 to +91
res, err := t.inner.GetState(ctx, req)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for handling errors from the inner service call and updating the span is repeated in almost every traced method in this file. To improve maintainability and reduce boilerplate, consider extracting this logic into a generic helper function.

For example, you could create a function that takes the context, request, the inner function call, and returns the response and error, handling the span creation and error recording internally. Here's a conceptual example:

func traceUnary[Req, Res any](
    tracer trace.Tracer,
    ctx context.Context,
    spanName string,
    req *connect.Request[Req],
    call func(context.Context, *connect.Request[Req]) (*connect.Response[Res], error),
    // ... other params for attributes
) (*connect.Response[Res], error) {
    ctx, span := tracer.Start(ctx, spanName)
    defer span.End()

    res, err := call(ctx, req)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, err
    }

    // ... set success attributes

    return res, nil
}

Applying this pattern would make the tracing decorators much more concise.

Comment on lines +314 to +315
attrs := span.Attributes()
requireAttribute(t, attrs, "store_count", 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is missing assertions for the per-store attributes (store_0_height, store_1_height). The current test only checks for store_count. Adding assertions for individual store heights would have caught the attribute key generation bug in GetP2PStoreInfo and would make the test more robust.

	attrs := span.Attributes()
	requireAttribute(t, attrs, "store_count", 2)
	requireAttribute(t, attrs, "store_0_height", int64(100))
	requireAttribute(t, attrs, "store_1_height", int64(99))

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 82.23684% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.19%. Comparing base (8cd52bd) to head (d63f852).

Files with missing lines Patch % Lines
pkg/rpc/server/tracing.go 84.13% 18 Missing and 5 partials ⚠️
pkg/rpc/server/server.go 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2972      +/-   ##
==========================================
+ Coverage   57.82%   58.19%   +0.37%     
==========================================
  Files          98       99       +1     
  Lines        9394     9543     +149     
==========================================
+ Hits         5432     5554     +122     
- Misses       3358     3379      +21     
- Partials      604      610       +6     
Flag Coverage Δ
combined 58.19% <82.23%> (+0.37%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from cian/add-tracing-part-5 to main January 13, 2026 16:14
@chatton chatton marked this pull request as ready for review January 14, 2026 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants