Skip to content

Using unlikely markers for PR386#387

Open
lemire wants to merge 4 commits into
mainfrom
pr386
Open

Using unlikely markers for PR386#387
lemire wants to merge 4 commits into
mainfrom
pr386

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Jun 6, 2026

PR #386 improves performance when parsing short strings.

However, it does so by marking functions as cold.

I think that a better option is to make branches as unlikely.

This is what this PR does.

fcostaoliveira and others added 3 commits June 3, 2026 09:30
parsed_number_string_t carries two span<UC const> members (integer, fraction)
that are only read on the rare slow paths (digit_comp, and the >19-significant-
digit truncation recompute). Materializing them on every parse forces the ~56/64-
byte struct to be written out and marshaled through the by-value return, which
shows up as backend/store pressure on the hot path.

This adds a runtime `store_spans` flag (default true, so all existing callers are
unchanged) to parse_number_string; from_chars_float_advanced parses with it false,
attempts the Clinger and Eisel-Lemire fast paths inline, and only re-parses with
spans on the two rare slow branches. The re-parse is pushed into a single
`fastfloat_noinline` (noinline+cold) helper so the force-inlined hot scanner is
emitted once rather than duplicated into the caller (without this the extra inline
copies regress some targets, e.g. ARM gcc, by bloating the hot frame and lengthening
the loop-carried dependency chain).

A runtime flag is used deliberately rather than a template parameter: a template
would create a second instantiation of the whole scanner whose icache cost wipes
out the gain.

Measured (per-parser microbench, median of 5, pinned core), fast_float from_chars
<double>/<float>, vs the current tip:
  - Intel Ice Lake (Xeon 8360Y): +17-19% (gcc), Intel TMA shows backend-bound
    26.0% -> 2.2% and retiring 60.3% -> 77.3% on short floats (the eliminated span
    spill), with -36% pipeline slots.
  - Intel Cascade Lake (Xeon 6248): +18-22% (gcc), +13-23% (clang).
  - ARM Neoverse-V2 (Graviton4): +73-196% (gcc), +8-11% (clang) -- the struct spill
    dominated the gcc hot loop there.
Correctness: the full float exhaustive suite (exhaustive32, exhaustive32_64,
exhaustive32_midpoint, random64) passes, and a 2^32 sweep is byte-identical to the
current tip. Public from_chars / from_chars_advanced / parsed_number_string_t are
unchanged.
Comment thread include/fast_float/float_common.h Outdated
// under -std=c++17, where using it would trip -Wc++20-extensions/-Werror.
#if (__cplusplus >= 202002L || \
(defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) && \
defined(__has_cpp_attribute) && __has_cpp_attribute(unlikely) >= 201803L
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If defined(__has_cpp_attribute) is false then __has_cpp_attribute(foo) is ill formed because the undefined macro is replaced with 0 and you get 0(foo) which is an error.

Copy link
Copy Markdown
Contributor

@jwakely jwakely Jun 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid that problem with:

#if (__cplusplus >= 202002L ||                                                 \
     (defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) &&                        \
    defined(__has_cpp_attribute)
#if __has_cpp_attribute(unlikely) >= 201803L
#define fastfloat_unlikely(x) (x) [[unlikely]]
#endif
#endif

#ifndef fastfloat_unlikely
#if defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively something like this keeps all the messy conditions in one place, separate from the main if-else logic:

#ifdef __has_cpp_attribute
#if __has_cpp_attribute(unlikely) /* && any extra conditions you want/need */
#define FASTFLOAT_USE_UNLIKELY_ATTR 1
#endif
#endif

#ifdef FASTFLOAT_USE_UNLIKELY_ATTR
#define fastfloat_unlikely(x) (x) [[unlikely]]
#elif defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered in my last commit

Comment thread include/fast_float/float_common.h Outdated
// attribute in C++20 or newer, otherwise to __builtin_expect on GCC/Clang, or
// to a no-op elsewhere (e.g. pre-C++20 MSVC, which has no equivalent hint).
// The [[unlikely]] branch is gated on the language version, not just on
// __has_cpp_attribute: GCC and Clang report the attribute as available even
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Clang gives that warning, but I don't think GCC does. It's annoying, because it defeats the entire purpose of standardizing things like __has_cpp_attribute and feature test macros, which was to get rid of the fragile preprocessor checks for specific versions of specific compilers 😭

You could put this around the whole of parse_number.h where the attribute is used:

#ifdef __clang__
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wc++20-extensions"
#endif
...
...
#ifdef __clang__
#pragma clang diagnostic pop
#endif

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allow the attribute to be used if __has_cpp_attribute(unlikely) is true, without the __cplusplus >= 202002L check alongside it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered in my last commit

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Jun 6, 2026

@jwakely Much appreciated. I update and ping you again.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Jun 6, 2026

@jwakely I guess that you have noticed that @fcostaoliveira provided a few significant optimizations. I think GCC should eventually update to benefit from this work.

@jwakely
Copy link
Copy Markdown
Contributor

jwakely commented Jun 6, 2026

@jwakely I guess that you have noticed that @fcostaoliveira provided a few significant optimizations. I think GCC should eventually update to benefit from this work.

Yes, I've already mentioned it to Patrick who did the work to include your code in GCC, thanks :)

@lemire lemire requested a review from jwakely June 6, 2026 17:14
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Jun 6, 2026

@jwakely I went with your proposals, please have a look.

It is somewhat depressing code, but...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants