Skip to content

Conversation

@double-fault
Copy link

Fixes #1813, #1896.

Lot of the code already existed but it was incomplete, like tracking of multiple leaders already existed but there was no extrapolation based off time running.

There were multiple groups in #1896 and the later groups always returned zero as the first group was pinned.

@google-cla
Copy link

google-cla bot commented Nov 28, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@double-fault
Copy link
Author

@dmah42 @LebedevRI (or whoever is the maintainer) ping for review. I think it's a pretty important bug, especially since there is no warning that there is no multiplexing, which leads to issues such as #1896 where I remember I had spent a lot of time on an unrelated program confused about why the counters were giving zero values.

If this patch is improper/there is no one to review it, it would be nice if a warning could be added somewhere regarding the same.

@dmah42
Copy link
Member

dmah42 commented Jan 10, 2026

I'm not an expert in the perf counter stuff but I'm going to take a look a bit later today

@dmah42
Copy link
Member

dmah42 commented Jan 11, 2026

i think this looks good. i've approved but maybe @LebedevRI wants to also weigh in.

@LebedevRI
Copy link
Collaborator

I'm sorry, i'm unfamiliar with this code. The change looks fairly simple.
Presumaby the author has tested this extensively and found no issues?

@double-fault
Copy link
Author

@LebedevRI Yes the changes are fairly simple and I have tested to some extent - I have run it on programs with a lot of counters and the values seemed to make sense (such as the program I had described in #1896 ), but I have not tested it to check the accuracy of the values reported in some objective way. I am not aware of any easy method to do this, if there is maybe you could let me know and I can do it.

From what I remember, none of the unit tests check for the accuracy of the actual values either (as I said, I'm not sure if there is a proper way to do this).

@double-fault
Copy link
Author

@LebedevRI ping

@LebedevRI
Copy link
Collaborator

@LebedevRI ping

I've already provided all the feedback that i have.

@double-fault
Copy link
Author

Alright then can this PR be merged, unless there's something else required? @dmah42

Comment on lines +31 to +47
// See man page of perf_event_open for the exact data format
// We are using PERF_FORMAT_GROUP |
// PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING
struct {
uint64_t nr_;
uint64_t time_enabled_;
uint64_t time_running_;
std::array<uint64_t, kMaxCounters> values_;
} buffer{};

// Any valid read will have nr_, time_enabled_, time_running_,
// and at least one value_
const size_t minimum_read_size = 4 * sizeof(uint64_t);

Value* current_counter = values_.data();
for (int lead : leaders) {
auto read_bytes = ::read(lead, ptr, size);
if (read_bytes >= ssize_t(sizeof(uint64_t))) {
// Actual data bytes are all bytes minus initial padding
std::size_t data_bytes =
static_cast<std::size_t>(read_bytes) - sizeof(uint64_t);
// This should be very cheap since it's in hot cache
std::memmove(ptr, ptr + sizeof(uint64_t), data_bytes);
// Increment our counters
ptr += data_bytes;
size -= data_bytes;
} else {
ssize_t read_bytes = ::read(lead, &buffer, sizeof(buffer));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct thing is nice, but i'm like 80% sure there is UB here.
The most safe way would be to read into array<uint64_t, 3+kMaxCounters>,
and then produce that buffer struct from it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can that be sunk into for (int lead : leaders) {?


double GetEstimate() const {
return static_cast<double>(value_ * time_enabled_) /
static_cast<double>(time_running_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks iffy. What are the typical perf counter value ranges?
And what about the time_enabled/time_running,
what units are they? Cycles?

We are either going to be losing precision here,
and/or having a multiplication overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Perf counters not multiplexed

3 participants