Conversation
parsed_number_string_t carries two span<UC const> members (integer, fraction)
that are only read on the rare slow paths (digit_comp, and the >19-significant-
digit truncation recompute). Materializing them on every parse forces the ~56/64-
byte struct to be written out and marshaled through the by-value return, which
shows up as backend/store pressure on the hot path.
This adds a runtime `store_spans` flag (default true, so all existing callers are
unchanged) to parse_number_string; from_chars_float_advanced parses with it false,
attempts the Clinger and Eisel-Lemire fast paths inline, and only re-parses with
spans on the two rare slow branches. The re-parse is pushed into a single
`fastfloat_noinline` (noinline+cold) helper so the force-inlined hot scanner is
emitted once rather than duplicated into the caller (without this the extra inline
copies regress some targets, e.g. ARM gcc, by bloating the hot frame and lengthening
the loop-carried dependency chain).
A runtime flag is used deliberately rather than a template parameter: a template
would create a second instantiation of the whole scanner whose icache cost wipes
out the gain.
Measured (per-parser microbench, median of 5, pinned core), fast_float from_chars
<double>/<float>, vs the current tip:
- Intel Ice Lake (Xeon 8360Y): +17-19% (gcc), Intel TMA shows backend-bound
26.0% -> 2.2% and retiring 60.3% -> 77.3% on short floats (the eliminated span
spill), with -36% pipeline slots.
- Intel Cascade Lake (Xeon 6248): +18-22% (gcc), +13-23% (clang).
- ARM Neoverse-V2 (Graviton4): +73-196% (gcc), +8-11% (clang) -- the struct spill
dominated the gcc hot loop there.
Correctness: the full float exhaustive suite (exhaustive32, exhaustive32_64,
exhaustive32_midpoint, random64) passes, and a 2^32 sweep is byte-identical to the
current tip. Public from_chars / from_chars_advanced / parsed_number_string_t are
unchanged.
| // under -std=c++17, where using it would trip -Wc++20-extensions/-Werror. | ||
| #if (__cplusplus >= 202002L || \ | ||
| (defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) && \ | ||
| defined(__has_cpp_attribute) && __has_cpp_attribute(unlikely) >= 201803L |
There was a problem hiding this comment.
If defined(__has_cpp_attribute) is false then __has_cpp_attribute(foo) is ill formed because the undefined macro is replaced with 0 and you get 0(foo) which is an error.
There was a problem hiding this comment.
You can avoid that problem with:
#if (__cplusplus >= 202002L || \
(defined(_MSVC_LANG) && _MSVC_LANG >= 202002L)) && \
defined(__has_cpp_attribute)
#if __has_cpp_attribute(unlikely) >= 201803L
#define fastfloat_unlikely(x) (x) [[unlikely]]
#endif
#endif
#ifndef fastfloat_unlikely
#if defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif
#endif
There was a problem hiding this comment.
Alternatively something like this keeps all the messy conditions in one place, separate from the main if-else logic:
#ifdef __has_cpp_attribute
#if __has_cpp_attribute(unlikely) /* && any extra conditions you want/need */
#define FASTFLOAT_USE_UNLIKELY_ATTR 1
#endif
#endif
#ifdef FASTFLOAT_USE_UNLIKELY_ATTR
#define fastfloat_unlikely(x) (x) [[unlikely]]
#elif defined(__GNUC__) || defined(__clang__)
#define fastfloat_unlikely(x) (__builtin_expect(!!(x), 0))
#else
#define fastfloat_unlikely(x) (x)
#endif
| // attribute in C++20 or newer, otherwise to __builtin_expect on GCC/Clang, or | ||
| // to a no-op elsewhere (e.g. pre-C++20 MSVC, which has no equivalent hint). | ||
| // The [[unlikely]] branch is gated on the language version, not just on | ||
| // __has_cpp_attribute: GCC and Clang report the attribute as available even |
There was a problem hiding this comment.
I know Clang gives that warning, but I don't think GCC does. It's annoying, because it defeats the entire purpose of standardizing things like __has_cpp_attribute and feature test macros, which was to get rid of the fragile preprocessor checks for specific versions of specific compilers 😭
You could put this around the whole of parse_number.h where the attribute is used:
#ifdef __clang__
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wc++20-extensions"
#endif
...
...
#ifdef __clang__
#pragma clang diagnostic pop
#endif
There was a problem hiding this comment.
That would allow the attribute to be used if __has_cpp_attribute(unlikely) is true, without the __cplusplus >= 202002L check alongside it.
|
@jwakely Much appreciated. I update and ping you again. |
|
@jwakely I guess that you have noticed that @fcostaoliveira provided a few significant optimizations. I think GCC should eventually update to benefit from this work. |
Yes, I've already mentioned it to Patrick who did the work to include your code in GCC, thanks :) |
|
@jwakely I went with your proposals, please have a look. It is somewhat depressing code, but... |
PR #386 improves performance when parsing short strings.
However, it does so by marking functions as cold.
I think that a better option is to make branches as unlikely.
This is what this PR does.