Commit 1f3fe93eff6 for php.net

commit 1f3fe93eff69f994be72ba47e143a0f6d01279e4
Author: Heran Yang <heran55@126.com>
Date:   Fri Dec 12 10:58:37 2025 +0800

    Add GB18030-2022 to default encoding list for zh-CN (#20604)



    GB18030-2022 is the current official standard, superseding the previous 2005 and 2000 versions. It is essential for modern Chinese text processing for the following reasons:

        1. Superset Relationship: GB18030 is a strict superset of CP936 (GBK) and EUC-CN (GB2312). Using GB18030 as the detection target covers all characters in these older encodings while enabling support for a much wider range of characters.
        2. Extended Character Coverage: The 2022 standard includes significant updates, covering over 87,000 characters. It adds support for CJK Extensions (C, D, E, F, G) and updates mappings for rare characters that were previously mapped to the Private Use Area (PUA) in the 2005 version. This is critical for correctly handling names containing rare characters (e.g., in banking or government data).
        3. Backward Compatibility: It is safe to promote GB18030-2022 as the preferred encoding. Files encoded in EUC-CN or CP936 are valid GB18030 streams.

    This PR adds GB18030-2022 to the default encoding list for CN.

diff --git a/NEWS b/NEWS
index af1bb92003d..f2e81284bdf 100644
--- a/NEWS
+++ b/NEWS
@@ -25,6 +25,7 @@ PHP                                                                        NEWS
 - Mbstring:
   . ini_set() with mbstring.detect_order changes the order of mb_detect_order
     as intended, since mbstring.detect_order is an INI_ALL setting. (tobee94)
+  . Added GB18030-2022 to default encoding list for zh-CN. (HeRaNO)

 - Opcache:
   . Fixed bug GH-20051 (apache2 shutdowns when restart is requested during
diff --git a/UPGRADING.INTERNALS b/UPGRADING.INTERNALS
index e35557e71f4..71b6e1fd01e 100644
--- a/UPGRADING.INTERNALS
+++ b/UPGRADING.INTERNALS
@@ -64,6 +64,9 @@ PHP 8.6 INTERNALS UPGRADE NOTES
   . Removed the XML_GetCurrentByteCount() libxml compatibility wrapper,
     as it was unused and could return the wrong result.

+- ext/mbstring:
+  . Added GB18030-2022 to default encoding list for zh-CN.
+
 ========================
 4. OpCode changes
 ========================
diff --git a/ext/mbstring/mbstring.c b/ext/mbstring/mbstring.c
index 8f94e50ddc8..7422c600284 100644
--- a/ext/mbstring/mbstring.c
+++ b/ext/mbstring/mbstring.c
@@ -116,7 +116,8 @@ static const enum mbfl_no_encoding php_mb_default_identify_list_cn[] = {
 	mbfl_no_encoding_ascii,
 	mbfl_no_encoding_utf8,
 	mbfl_no_encoding_euc_cn,
-	mbfl_no_encoding_cp936
+	mbfl_no_encoding_cp936,
+	mbfl_no_encoding_gb18030_2022
 };

 static const enum mbfl_no_encoding php_mb_default_identify_list_tw_hk[] = {
diff --git a/ext/mbstring/tests/zh_CN_default_encodings.phpt b/ext/mbstring/tests/zh_CN_default_encodings.phpt
new file mode 100644
index 00000000000..213c304b52c
--- /dev/null
+++ b/ext/mbstring/tests/zh_CN_default_encodings.phpt
@@ -0,0 +1,24 @@
+--TEST--
+Default encodings in Simplified Chinese
+--EXTENSIONS--
+mbstring
+--INI--
+mbstring.language=Simplified Chinese
+--FILE--
+<?php
+var_dump(mb_detect_order());
+
+?>
+--EXPECT--
+array(5) {
+  [0]=>
+  string(5) "ASCII"
+  [1]=>
+  string(5) "UTF-8"
+  [2]=>
+  string(6) "EUC-CN"
+  [3]=>
+  string(5) "CP936"
+  [4]=>
+  string(12) "GB18030-2022"
+}