Skip to content

[Bug] Activity/workflow pollers drop to 0 ignoring minimum/maximum settings #1268

@sashaneb

Description

@sashaneb

What are you really trying to do?

Run Temporal workers that reliably poll for workflow and activity tasks. Workers should maintain active pollers as long as task slots are available.

Describe the bug

Worker pollers (both activity and workflow) randomly drop to 0 despite available task slots and explicit poller configuration. This causes workflow/activity task timeouts as no worker is polling for tasks.

Tested configurations - all fail:

  1. PollerBehaviorAutoscaling(minimum=5, maximum=20, initial=10) - pollers drop to 0 despite minimum=5
  2. Default behavior (no poller config) - pollers drop to 0
  3. PollerBehaviorSimpleMaximum(maximum=5) - pollers still drop to 0

Evidence from Prometheus metrics:

Worker with 14 available activity slots but 0 pollers:

temporal_num_pollers{poller_type="activity_task"} 0
temporal_num_pollers{poller_type="sticky_workflow_task"} 3
temporal_num_pollers{poller_type="workflow_task"} 1
temporal_worker_task_slots_available{worker_type="ActivityWorker"} 14
temporal_worker_task_slots_used{worker_type="ActivityWorker"} 1

The docs for PollerBehaviorSimpleMaximum state:

"A poller behavior that will attempt to poll as long as a slot is available, up to the provided maximum."

This is, however, not what we're observing. With 14 available slots, there should be active pollers.

Timeline:

  • Workers start healthy with expected poller counts
  • After 10-20 minutes, pollers drop to 0 on some/all workers
  • No errors in logs - worker appears healthy but doesn't poll
  • Happens randomly, affects different workers at different times

Impact:

  • Workflow Task Timeouts: New workflows can't start
  • Activity Task Timeouts: Activities can't execute
  • Silent Failures: No errors logged, worker appears healthy

Minimal Reproduction

import asyncio
from temporalio.client import Client
from temporalio.worker import Worker, PollerBehaviorSimpleMaximum
from temporalio import workflow, activity

@activity.defn
async def my_activity() -> str:
    return "done"

@workflow.defn
class MyWorkflow:
    @workflow.run
    async def run(self) -> str:
        return await workflow.execute_activity(
            my_activity,
            start_to_close_timeout=timedelta(seconds=60),
        )

async def main():
    client = await Client.connect("your-temporal-address")

    worker = Worker(
        client,
        task_queue="test-queue",
        workflows=[MyWorkflow],
        activities=[my_activity],
        max_concurrent_activities=15,
        max_concurrent_workflow_tasks=10,
        # Any of these configurations fail:
        workflow_task_poller_behavior=PollerBehaviorSimpleMaximum(maximum=5),
        activity_task_poller_behavior=PollerBehaviorSimpleMaximum(maximum=5),
    )

    # Run worker and monitor temporal_num_pollers metric
    # After 10-20 minutes, activity_task pollers drop to 0
    await worker.run()

if __name__ == "__main__":
    asyncio.run(main())

To observe:

  1. Start worker with Prometheus metrics enabled
  2. Run some workflows
  3. Monitor temporal_num_pollers metric over 15-20 minutes
  4. Observe poller_type="activity_task" drop to 0 despite available slots

Environment/Versions

  • OS and processor: Linux x86_64 (Kubernetes/EKS), also tested on M1 Mac
  • SDK version: temporalio==1.21.1 (latest), as well as previously used 1.13.0
  • Temporal: Temporal Cloud (us-east-1)
  • Python: 3.12
  • Using Kubernetes (EKS on AWS) with multiple worker pods autoscaled with KEDA

Additional context

Release notes reviewed:

Questions:

  1. What causes pollers to drop to 0 despite available slots?
  2. Why does minimum=5 in autoscaling not enforce the minimum?
  3. Is there a known issue with the Rust core's poller management?

Current workaround:

  • Use PollerBehaviorSimpleMaximum (slightly more stable than autoscaling)
  • Run multiple workers and hope not all drop to 0 simultaneously
  • Consider periodic worker restarts to reset pollers (not ideal)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions