Commit 7de451fc3a5 for php.net

commit 7de451fc3a5811cad4639c5eca36218e40832ce2
Author: Ilia Alshanetsky <ilia@ilia.ws>
Date:   Mon Jun 1 15:28:20 2026 -0400

    uri: Do not copy and normalize already-normalized URIs for uri_parser_rfc3986 (#21726)

    When Uri\Rfc3986\Uri::parse() produces a URI already in canonical form
    (the common case: http/https URLs with no uppercase host, no
    percent-encoding in unreserved ranges, no ".." path segments),
    get_normalized_uri() no longer deep-copies the parsed struct and runs
    a full normalization pass. It calls uriNormalizeSyntaxMaskRequiredExA
    once to compute the dirty mask; a zero mask means we alias the raw
    uri. The struct caches the dirty mask, so multiple non-raw reads on
    the same instance only run the scan once.

    Fallback: when the mask is nonzero, we copy and normalize as before,
    but only for the flagged components (uriNormalizeSyntaxExMmA(...,
    dirty_mask, ...) instead of (..., -1, ...)).

    Measurements on a 17-URL mix with a realistic parse-and-read workload
    (10 runs of 1.7M parses each, CPU pinned via taskset, same-session
    stash-pop A/B so both builds share machine state):

                            baseline mean    optimized mean    delta
        parse only         0.3992s (4.26M/s)  0.4083s (4.16M/s)  noise
        parse + 1 read     0.6687s (2.54M/s)  0.5464s (3.11M/s)  -18.3%
        parse + 7 reads    0.8510s (2.00M/s)  0.7305s (2.33M/s)  -14.2%

    The "parse + 1 read" row isolates the first-read cost where this
    change lands. The "parse + 7 reads" row shows the amortized effect
    under a realistic user pattern: the first getter pays the reduced
    normalization cost, and the remaining six getters hit the cached
    normalized uri and cost the same as before.

    hyperfine cross-check on the whole benchmark script, 15 runs each:

        baseline   20.471 s +/- 1.052 s  [19.535 .. 22.985]
        optimized  17.240 s +/- 0.540 s  [16.556 .. 18.190]
        optimized runs 1.19 +/- 0.07 times faster.

    All 309 tests in ext/uri/tests pass. I checked that URIs needing
    normalization (http://EXAMPLE.com/A/%2e%2e/c resolving to /c) still
    hit the full normalize path through the nonzero dirty mask.

    Co-authored-by: Tim Düsterhus <tim@bastelstu.be>

diff --git a/UPGRADING b/UPGRADING
index 2a32c2a5a36..95299dd5117 100644
--- a/UPGRADING
+++ b/UPGRADING
@@ -448,7 +448,10 @@ PHP 8.6 UPGRADE NOTES
   . Improved performance of str_split().

 - URI:
-  . Reduced allocations when reading RFC3986 IPv6/IPFuture hosts and paths.
+  . Reduced allocations when reading IPv6/IPFuture hosts and paths with
+    Uri\Rfc3986\Uri.
+  . Improved performance and memory consumption when using normalizing
+    (non-raw) getters on already-normalized URIs with Uri\Rfc3986\Uri.

 - Zip:
   . Avoid string copies in ZipArchive::addFromString().
diff --git a/ext/uri/uri_parser_rfc3986.c b/ext/uri/uri_parser_rfc3986.c
index ad47aa1946c..4e2c5656aa7 100644
--- a/ext/uri/uri_parser_rfc3986.c
+++ b/ext/uri/uri_parser_rfc3986.c
@@ -24,6 +24,7 @@
 struct php_uri_parser_rfc3986_uris {
 	UriUriA uri;
 	UriUriA normalized_uri;
+	unsigned int normalization_mask;
 	bool normalized_uri_initialized;
 };

@@ -84,12 +85,21 @@ ZEND_ATTRIBUTE_NONNULL static void copy_uri(UriUriA *new_uriparser_uri, const Ur

 ZEND_ATTRIBUTE_NONNULL static UriUriA *get_normalized_uri(php_uri_parser_rfc3986_uris *uriparser_uris) {
 	if (!uriparser_uris->normalized_uri_initialized) {
-		copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri);
-		int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, (unsigned int)-1, mm);
-		ZEND_ASSERT(result == URI_SUCCESS);
+		int mask_result = uriNormalizeSyntaxMaskRequiredExA(&uriparser_uris->uri, &uriparser_uris->normalization_mask);
+		ZEND_ASSERT(mask_result == URI_SUCCESS);
+
+		if (uriparser_uris->normalization_mask != URI_NORMALIZED) {
+			copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri);
+			int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, uriparser_uris->normalization_mask, mm);
+			ZEND_ASSERT(result == URI_SUCCESS);
+		}
 		uriparser_uris->normalized_uri_initialized = true;
 	}

+	if (uriparser_uris->normalization_mask == URI_NORMALIZED) {
+		return &uriparser_uris->uri;
+	}
+
 	return &uriparser_uris->normalized_uri;
 }