Skip to content

Reduce TypeForm recognition slowdown: 2 identifier fast-reject filters#21596

Open
davidfstr wants to merge 5 commits into
python:masterfrom
davidfstr:f/typeform_complete--take3.1
Open

Reduce TypeForm recognition slowdown: 2 identifier fast-reject filters#21596
davidfstr wants to merge 5 commits into
python:masterfrom
davidfstr:f/typeform_complete--take3.1

Conversation

@davidfstr
Copy link
Copy Markdown
Contributor

References #21262.

Includes 2 filters from the original combined 7-filter PR:

Summary

Enabling TypeForm made SemanticAnalyzer.try_parse_as_type_expression run
eagerly on every expression in certain syntactic positions (~2.84M calls per
self-check). <5% of those reach an expensive full-parse block
(expr_to_analyzed_type + isolated_error_analysis), and ~91% of
those fail — pure wasted work.

This PR adds 2 cheap, early-reject filters for the StrExpr-identifier
case, plus a reordering of the existing early-reject checks by decreasing
rejection frequency. Together the two filters eliminate 8% of full
parses (2,548 → 2,343) per self-check.

2 infrastructure commits:

  • misc/perf_compare.py: Add options/behaviors
  • TypeForm: Add instrumentation of full parses

followed by 3 key commits, all operating within the str_value.isidentifier()
branch of try_parse_as_type_expression:

  1. Filter A: isinstance(node, Var) + more conditions
  2. Reorder the 3 mutually-exclusive isinstance checks by decreasing rejection frequency.
  3. Filter B: isinstance(node, (FuncDef, OverloadedFuncDef, MypyFile))

Looking at the str_value.isidentifier() branch alone,
84% (244 → 39) of failing full parses are eliminated per self-check.

Performance

misc/perf_compare.py, single worker, paired per-round deltas.

CPU time (canonical, lowest-variance) — master vs branch tip, n=100

python misc/perf_compare.py --warmup-runs 3 --num-runs 100 -j 3 \
    --metric cpu --workers1 <master> <tip>
  • n=100: −7.6 ms ±5.9 (−0.28%)

Significant (CI excludes 0). This is the net effect of the branch.

Wall-clock — master vs branch tip

python misc/perf_compare.py --warmup-runs 3 --num-runs 400 -j 3 <master> <tip>
  • n=400: −3.6 ms ±4.3 (−0.26%), CI [−7.9, +0.7].

Borderline significant: the CI just barely includes 0. Consistent with a real
per-call win partly masked by multi-worker wall-clock.

Correctness

All tests pass.

Note that the var_is_typing_special_form helper needed to be extended to
recognize stringified forms of typing.Self so that Filter A would not
incorrectly reject a stringified 'Self' annotation, regressing the
testSelfRecognizedInOtherSyntacticLocations test.

Open Questions

  • Should the 2 infrastructure commits be moved to a separate PR?

  • The full commit messages (after the subject line) on the 3 key commits
    are rather verbose. Let me know if you'd like me to trim them down,
    perhaps by removing everything after the subject.

davidfstr and others added 5 commits June 5, 2026 13:16
Specifically:
* Median is reported, in addition to the existing mean+stdev, which is
  significantly more resistant to skew by outliers.
* --metric {wall,cpu} (default wall): Enables profiling using CPU time
  rather than wall-clock time. CPU profiling has roughly half the coefficient
  of variation as wall-clock profiling equal run count.
* --workers1: Forces MYPY_NUM_WORKERS=1 (rather than the default 4) to
  cut CPU scheduling variance. Strongly recommended when using --metric cpu.
* --warmup-runs N (default 1): Configurable number of leading cold runs to discard.
  Previously was always 1. Higher run counts decrease outliers that skew
  the reported mean.
* A new "Paired deltas vs <first commit>" section is added to the report,
  showing per-round paired differencing against the first commit
  to cancel round-level common-mode noise, reducing variance.
  Reported as median +/-95% CI.

Also:
* --cache-binaries (default false): Caches each commit's compiled clone
  to avoid ~5min recompile whenever comparing the same commit multiple times.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_parse_as_type_expression()

Specifically:

- If you set MYPY_TYPEFORM_PROFILE_FULL_PARSE environment variable,
  mypy will output a .tsv to that filepath which characterizes the
  kinds of Expressions that try_parse_as_type_expression() in semanal.py
  was forced to do a full parse of, which was not rejected early.

- A misc/analyze_typeform_full_parse_profile.py script is added which
  takes those .tsvs and prints an expression-time summary (by total time)
  plus top-N descriptors per FAIL class.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s_type_expression()

Add a fast-rejection filter to SemanticAnalyzer.try_parse_as_type_expression():
a string literal that is an identifier naming a Var whose declared type is a
concrete Instance (and is not a typing special form) is a value -- a local,
parameter, or module-level constant -- never a type expression. Reject it
before the expensive full-parse block (expr_to_analyzed_type +
isolated_error_analysis).

On the mypy self-check this filter rejects 157 of the 381 identifier-string
literals that currently reach the full-parse block (e.g. "__doc__", "__name__",
enum/constant members like "ROUND_DOWN", "GEN_CREATED"), all of which were
failing full parses -- pure wasted work.

Insertion point chosen empirically. The filter is placed AFTER the existing
PlaceholderNode and unbound-tvar checks rather than before them. Its only
expensive conjunct (get_proper_type(node.type)) runs solely for Var nodes, and
all Var nodes already survive both earlier checks, so position cannot change how
often the expensive part runs -- only how often the cheap isinstance(node, Var)
conjunct is evaluated. Because 951 of the identifier-strings reaching this block
are unbound type variables, evaluating the filter before the tvar check would
force an extra isinstance onto ~951 nodes it can never catch. perf_compare.py
(--metric cpu --workers1 --num-runs 100) confirms: paired median vs baseline was
-15.6ms +/-4.6 here, vs -12.9ms (before placeholder) and -9.3ms (before tvar) --
matching the eval-count model's predicted ordering.

Also add typing.Self / typing_extensions.Self to var_is_typing_special_form().
Self is a _SpecialForm-typed Var, so without this guard the new filter would
wrongly reject a stringified "Self" type annotation (regressing
testSelfRecognizedInOtherSyntacticLocations). This guard is a correctness
prerequisite of the filter and is committed together with it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…frequency

Move the unbound-type-variable check ahead of the PlaceholderNode check (and
ahead of the Var-value filter added in the previous commit) in
SemanticAnalyzer.try_parse_as_type_expression(). Final order:

    unbound type variable  ->  value Var  ->  placeholder

These three checks are mutually exclusive -- a node is at most one of a
TypeVarExpr/ParamSpecExpr, a Var, or a PlaceholderNode -- so reordering cannot
change which expressions are rejected (verified: check-typeform and testsemanal
unchanged). Ordering them by descending rejection frequency, as measured on
mypy's self-check (unbound type vars ~951 >> value Vars ~157 > placeholders
~23), lets the commonest rejections exit first and minimizes total check
evaluations (~2700 -> ~1750 cheap isinstance calls over the self-check).

The win is below perf_compare's noise floor on its own (~10us), but the
reordering is free and behavior-preserving, and it makes the final ordering
self-documenting. A rationale comment is added at the head of the block.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ression()

Add a second fast-rejection filter: a string literal that is an identifier
naming a FuncDef, OverloadedFuncDef, or MypyFile is a function or module, never
a type expression. Reject it before the expensive full-parse block.

On mypy's self-check this rejects 48 of the identifier-string literals that
reach the full-parse block (e.g. builtin functions like "classmethod",
"staticmethod", "hash"; user functions; module names like "platform"), all of
which were failing full parses.

Unlike the Var-value filter, this check is a single isinstance with no expensive
follow-on work, and FuncDef/OverloadedFuncDef/MypyFile are mutually exclusive
with the other early-reject node kinds, so it is freely positionable and its
rejection count is order-independent. It is placed by descending rejection
frequency: after the Var-value filter (~157) and before the placeholder check
(~23), i.e. the final order is

    unbound type variable (~951) -> value Var (~157)
        -> function/module (~48) -> placeholder (~23)

No companion guard is needed (a function or module name is never a valid type,
so nothing valid is rejected; check-typeform and testsemanal unchanged).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant