Skip to content

Conversation

@tac0turtle
Copy link
Contributor

Overview

this pr add sync modes allowing us to move to subscriptions when at the head of the chain

@claude
Copy link
Contributor

claude bot commented Jan 8, 2026

Claude finished @tac0turtle's task in 3m 48s —— View job


Code Review: Sync Mode Implementation ✅

I've completed a comprehensive review of this PR implementing dual-mode synchronization (catchup/follow). Overall, the implementation introduces a valuable optimization for nodes at chain head, but there are several important issues to address.


Critical Issues

1. Race Condition in LocalDA RPC Implementation ⚠️

Location: tools/local-da/rpc.go:146-153

The LocalHead RPC method has a data race:

func (s *headerServer) LocalHead(_ context.Context) (*jsonrpc.Header, error) {
    s.da.mu.Lock()
    defer s.da.mu.Unlock()

    return &jsonrpc.Header{
        Height:    s.da.height,
        BlockTime: s.da.timestamps[s.da.height],  // ← potential nil map access
    }, nil
}

Issues:

  • If s.da.height exists but s.da.timestamps[s.da.height] doesn't, this returns a zero time.Time (silently wrong)
  • Should check map existence: ts, ok := s.da.timestamps[s.da.height]
  • Same pattern in GetByHeight handles this correctly (line 170-173)

Impact: Medium - Can cause incorrect timestamps in local DA testing, affecting based sequencer time calculations.

Fix this →


2. Closed Channel Handling in Subscribe ⚠️

Location: block/internal/syncing/syncer.go:505-520

The subscription response handlers don't distinguish between normal closure and error closure:

case resp, ok := <-headerCh:
    if !ok {
        return errors.New("header subscription closed")
    }

Issues:

  • All channel closures are treated as errors
  • Normal shutdown (context cancellation) will increment SubscribeErrors metric incorrectly
  • Should check ctx.Err() before treating closure as an error

Recommendation:

case resp, ok := <-headerCh:
    if !ok {
        if ctx.Err() != nil {
            return ctx.Err()  // Normal shutdown
        }
        return errors.New("header subscription closed unexpectedly")
    }

Fix this →


3. LocalDA Subscribe Returns Closed Channel ⚠️

Location: tools/local-da/rpc.go:132-137

func (s *blobServer) Subscribe(_ context.Context, _ libshare.Namespace) (<-chan *jsonrpc.SubscriptionResponse, error) {
    ch := make(chan *jsonrpc.SubscriptionResponse)
    close(ch)
    return ch, nil
}

Issues:

  • This immediately returns a closed channel, triggering subscription failures
  • Follow mode will never work with LocalDA
  • Should either return an error or implement a basic subscription mechanism

Impact: High - Makes follow mode untestable with LocalDA, forcing reliance on external DA for testing.

Recommendation: Return errors.New("LocalDA does not support subscriptions") instead, or implement a simple polling-based subscription.

Fix this →


High Priority Issues

4. Missing Context Propagation in ProcessBlobs

Location: block/internal/syncing/syncer.go:563

events := s.daRetriever.ProcessBlobs(s.ctx, blobs, resp.Height)

The context used here is the syncer's root context, not the subscription context. If the subscription is cancelled, blob processing continues with the wrong context.

Recommendation: Pass subCtx instead of s.ctx to respect subscription lifecycle.


5. Potential Deadlock in processSubscriptionResponse

Location: block/internal/syncing/syncer.go:566-579

for _, event := range events {
    select {
    case s.heightInCh <- event:
        // ...
    default:
        s.cache.SetPendingEvent(event.Header.Height(), &event)
    }
}

Issue: The loop variable event is captured by reference in Go 1.21 and earlier. In Go 1.22+, this is safe, but the comment from @gemini-code-assist was correct for older Go versions.

Status: Gemini correctly identified this is NOT a bug in Go 1.22+, which this project likely uses. However, for absolute safety and code clarity, consider adding event := event at loop start.


6. Watchdog Timeout Configuration

Location: block/internal/syncing/syncer.go:492-496

watchdogTimeout := s.config.DA.BlockTime.Duration * followWatchdogMultiplier
if watchdogTimeout <= 0 {
    watchdogTimeout = 30 * time.Second
}

Issues:

  • followWatchdogMultiplier is hardcoded to 3 (line 61)
  • Default 30s might be too aggressive for slower DA layers
  • No configuration option to tune this per deployment

Recommendation: Consider making this configurable or document the assumption that DA block time is properly configured.


7. Mode Determination Race Window

Location: block/internal/syncing/syncer.go:385-406

func (s *Syncer) determineSyncMode() SyncMode {
    // ...
    localHead, err := s.daClient.LocalHead(s.ctx)
    // ...
    currentDAHeight := s.daRetrieverHeight.Load()
    
    if currentDAHeight+catchupThreshold >= localHead {
        return SyncModeFollow
    }
    return SyncModeCatchup
}

Issue: There's a race between checking LocalHead and loading daRetrieverHeight. Between these two operations, the node could process several blocks, making the comparison stale.

Impact: Low - Results in slightly suboptimal mode selection, will self-correct on next iteration.


Medium Priority Issues

8. Insufficient Test Coverage

Coverage Report: 36.25% patch coverage (109 missing lines)

Missing test coverage for:

  • subscribeAndFollow() - the core follow mode logic (0% coverage)
  • processSubscriptionResponse() - subscription event handling (0% coverage)
  • LocalHead() in client.go (0% coverage)
  • Subscribe() in client.go (0% coverage)

Location:

  • block/internal/syncing/syncer.go: 74 missing lines
  • tools/local-da/rpc.go: 22 missing lines
  • block/internal/da/client.go: 12 missing lines

Recommendation: Add integration tests that:

  1. Verify subscription lifecycle (subscribe → receive events → unsubscribe)
  2. Test mode switching behavior (catchup → follow → catchup)
  3. Test subscription error handling and recovery
  4. Test watchdog timeout triggering

9. Error Handling in runFollowMode

Location: block/internal/syncing/syncer.go:436-447

func (s *Syncer) runFollowMode() {
    s.logger.Debug().Msg("running follow mode")
    s.metrics.SyncMode.Set(float64(SyncModeFollow))

    err := s.subscribeAndFollow()
    if err != nil && !errors.Is(err, context.Canceled) {
        s.metrics.SubscribeErrors.Add(1)
        s.logger.Warn().Err(err).Msg("subscribe failed, will retry via mode check")
        // No explicit catchup call needed...
    }
}

Issues:

  • No backoff on subscription failures - will hammer DA client if subscriptions fail repeatedly
  • Comment says "will retry via mode check" but there's no delay before the next mode check
  • Should add a small delay (e.g., s.sleepOrDone(time.Second)) before returning to prevent tight retry loops

10. Metrics Naming Consistency

Location: block/internal/common/metrics.go:74-76

SyncMode        metrics.Gauge   // Current sync mode: 0=catchup, 1=follow
SubscribeErrors metrics.Counter // Number of subscription failures
ModeSwitches    metrics.Counter // Number of catchup<->follow mode transitions

Observation: Metric names use different conventions:

  • SyncMode (Pascal case)
  • subscribe_errors_total (snake_case with _total suffix)

While functionally correct, consider documenting the naming convention. The Prometheus metrics themselves are properly named with the _total suffix for counters.


Design & Architecture

11. Interface Abstraction Violation ⚠️

Location: block/internal/syncing/syncer.go:563

events := s.daRetriever.ProcessBlobs(s.ctx, blobs, resp.Height)

Issue: The Syncer depends on DARetriever interface but then type-asserts to call ProcessBlobs. Gemini correctly identified this as breaking the interface abstraction.

Current State: ProcessBlobs is part of the DARetriever interface (defined in da_retriever.go:22-26), so this is actually fine. However, the method is called directly on the concrete type which could be clearer.

Recommendation: Keep as-is since ProcessBlobs is in the interface, but consider adding a comment explaining why this method is public (used by both polling and subscription modes).


12. Sync Mode State Machine Documentation

The sync mode switching logic is well-implemented but could benefit from additional documentation:

States: Catchup, Follow
Transitions:
  Catchup → Follow: when currentHeight + 2 >= localHead
  Follow → Catchup: when subscription fails OR watchdog timeout + sync check fails

Recommendation: Add state machine diagram or detailed comment at the top of daWorkerLoop explaining the state transitions and invariants.


13. Namespace Deduplication Logic

Location: block/internal/syncing/syncer.go:467-474

The subscription code has good logic to avoid subscribing to the same namespace twice:

var dataCh <-chan *blobrpc.SubscriptionResponse
if !bytes.Equal(headerNS, dataNS) {
    dataCh, err = s.daClient.Subscribe(subCtx, dataNS)
    // ...
}

This is well-designed and handles the common case where header and data share a namespace. ✅


Positive Aspects ✅

  1. Metrics Implementation: Comprehensive observability with SyncMode, SubscribeErrors, and ModeSwitches metrics
  2. Mode Switching Logic: Clean state machine with atomic mode tracking
  3. Watchdog Mechanism: Good safety net to detect subscription stalls
  4. Graceful Degradation: Properly falls back to catchup on subscription failures
  5. Context Handling: Proper use of context cancellation in subscription lifecycle
  6. Namespace Handling: Smart deduplication when header/data namespaces match
  7. Loop Variable Handling: Code is safe for Go 1.22+ (no loop variable capture issues)

Recommendations Summary

Must Fix Before Merge:

  1. Fix LocalDA LocalHead nil map access
  2. Fix LocalDA Subscribe returning closed channel
  3. Add error vs. normal closure distinction in subscription handlers

Should Fix:

  1. Add comprehensive tests for subscription code (boost coverage from 36% to >70%)
  2. Add backoff/delay in runFollowMode on subscription failures
  3. Pass subscription context (not root context) to ProcessBlobs

Nice to Have:

  1. Make watchdog timeout configurable
  2. Add state machine documentation
  3. Add configuration validation for DA block time

Security & Correctness

  • ✅ No unsafe concurrency patterns detected (proper use of atomics, mutexes)
  • ✅ No credential exposure risks
  • ✅ Context cancellation properly propagated
  • ✅ No SQL injection or XSS vectors (not applicable)
  • ⚠️ Race condition in LocalDA RPC (testing only, low severity)
  • ⚠️ Potential resource leak if subscriptions fail repeatedly without backoff

Performance Considerations

  • ✅ Subscription mode significantly reduces DA polling overhead when caught up
  • ✅ Proper use of buffered channels (heightInCh with capacity 100)
  • ✅ Mode switches are rare (only when sync status changes)
  • ⚠️ No backoff on subscription failures could cause excessive reconnection attempts
  • ✅ Watchdog prevents indefinite blocking on stalled subscriptions

Testing Gaps

Based on codecov report, the following scenarios need test coverage:

  1. Subscription lifecycle: Subscribe → receive multiple events → unsubscribe
  2. Mode switching: Catchup → Follow transition when caught up
  3. Subscription failures: Network errors, closed connections, malformed responses
  4. Watchdog trigger: No events for watchdogTimeout duration
  5. LocalHead errors: DA client unavailable during mode determination
  6. Namespace deduplication: Header and data in same namespace
  7. Forced inclusion subscription: When FI namespace is configured

Final Verdict

This is a solid implementation of an important optimization. The core logic is sound, metrics are comprehensive, and the state machine is well-designed. However, the low test coverage (36%) and LocalDA compatibility issues should be addressed before merge.

Recommended Actions:

  1. Approve with changes - Core logic is correct
  2. 📝 Request changes for:
    • LocalDA RPC fixes (critical for testing)
    • Subscription error handling improvements
    • Test coverage improvements (target >70%)

Estimated Effort: 2-4 hours to address all critical and high-priority issues.


@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 14, 2026, 4:35 PM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @tac0turtle, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the data availability (DA) worker's synchronization capabilities by introducing dynamic sync modes. It allows the system to efficiently transition between an aggressive polling strategy when catching up to the DA chain and a more resource-optimized, subscription-based approach once it reaches the chain head. This change aims to improve the overall performance and responsiveness of DA data processing.

Highlights

  • Dynamic Sync Modes: Introduced two dynamic synchronization modes for the DA worker: 'Catchup' for aggressive polling when behind the chain head, and 'Follow' for efficient, subscription-based real-time updates when caught up.
  • Subscription-based Updates: Implemented a new Subscribe method in the DA client, enabling the 'Follow' mode to receive real-time blob notifications, significantly reducing polling overhead.
  • Local Head Tracking: Added a LocalHead method to the DA client and a corresponding RPC endpoint, allowing the syncer to determine its current position relative to the DA chain head and facilitate mode switching.
  • New Metrics for Observability: Integrated new Prometheus metrics (SyncMode, SubscribeErrors, ModeSwitches) to provide visibility into the current sync mode, subscription failures, and transitions between modes.
  • Refactored DA Worker Loop: The daWorkerLoop has been refactored to intelligently determine and switch between 'Catchup' and 'Follow' modes based on the node's synchronization status, including a watchdog mechanism for 'Follow' mode.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dual-mode synchronization mechanism, allowing the node to switch between an aggressive polling 'catchup' mode and a more efficient subscription-based 'follow' mode. This is a significant enhancement for nodes that are at the head of the chain. The changes are well-structured, introducing new DA client methods, metrics, and the core state machine logic in the daWorkerLoop. My review identified two critical bugs related to incorrect loop variable capturing that could lead to data corruption, and a couple of medium-severity design and style issues. Once these points are addressed, the implementation will be much more robust.

@codecov
Copy link

codecov bot commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 36.25731% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.29%. Comparing base (aaae087) to head (ecfcf83).

Files with missing lines Patch % Lines
block/internal/syncing/syncer.go 44.44% 74 Missing and 1 partial ⚠️
tools/local-da/rpc.go 0.00% 22 Missing ⚠️
block/internal/da/client.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2961      +/-   ##
==========================================
- Coverage   57.88%   57.29%   -0.60%     
==========================================
  Files          97       97              
  Lines        9306     9457     +151     
==========================================
+ Hits         5387     5418      +31     
- Misses       3315     3435     +120     
  Partials      604      604              
Flag Coverage Δ
combined 57.29% <36.25%> (-0.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tac0turtle tac0turtle force-pushed the marko/sync_subscribe branch from 95aeea4 to ecfcf83 Compare January 12, 2026 08:22
@tac0turtle tac0turtle marked this pull request as ready for review January 12, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants