Skip to content

Reimplement String.prototype.search, replace, replaceAll, split #1870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 10, 2025

Conversation

andreabergia
Copy link
Contributor

@andreabergia andreabergia commented Apr 3, 2025

This PR adds implementation for:

  • RegExp.prototype[Symbol.search]
  • RegExp.prototype[Symbol.replace]
  • RegExp.prototype[Symbol.split]

and rewrites the implementation of:

  • String.prototype.search
  • String.prototype.replace
  • String.prototype.replaceAll
  • String.prototype.split

following the spec, in order to rely on the RegExp implementation when the spec says so.

A little more than a hundred test262 cases now pass. The few apparent regressions are related to things that are known to be broken and were already failing for other tests, such as rhino doing too much auto-boxing or the RegExp.prototype.flags implementation not being correct according to the spec. I am happy to discuss any such case in more details.

I have a couple of style decisions that I would like some feedback on:

  • There is a generic algorithm in the spec named GetSubstitution that is used by both String and RegExp. I have placed it in a new class named AbstractEcmaStringOperations but maybe it would be better in the existing AbstractEcmaObjectOperations.java?
  • I have removed some dead code from RegExpImpl, though I cannot remove it all or some old test cases in MozillaSuiteTest would break. However, this was part of the public API of rhino, so I am uncertain if removing it is the best choice. I don't really imagine anyone would be using this... but who knows. Any opinion?

I've made an effort to keep the git history very clean, so it's ok to not squash this.

@rbri
Copy link
Collaborator

rbri commented Apr 3, 2025

A little more than a hundred test262 cases now pass

❤️

I have removed some dead code from RegExpImpl, though I cannot remove it all or some old test cases in MozillaSuiteTest would break. However, this was part of the public API of rhino, so I am uncertain if removing it is the best choice. I don't really imagine anyone would be using this... but who knows. Any opinion?

Not sure but if you like you can have a look at a889fa44451ee6a0610820983cc72b53ac8bc42d. Maybe this is (was) one of the real usages of RegExpImpl. For HtmlUnit we tried to map the js RegExp stuff to real Java Regular Expressions (if possible). Therefore we did a lot of customization. I guess if we do not use the stuff you removed then there is a good chance that you can safely remove it.
And since the last release i have completely removed that stuff and we are back to use the Rhino RegExp because of the progress made in this area. Many thanks & kudos to @balajirrao and @andreabergia.

@andreabergia
Copy link
Contributor Author

I have removed some dead code from RegExpImpl, though I cannot remove it all or some old test cases in MozillaSuiteTest would break. However, this was part of the public API of rhino, so I am uncertain if removing it is the best choice. I don't really imagine anyone would be using this... but who knows. Any opinion?

Not sure but if you like you can have a look at a889fa44451ee6a0610820983cc72b53ac8bc42d. Maybe this is (was) one of the real usages of RegExpImpl. For HtmlUnit we tried to map the js RegExp stuff to real Java Regular Expressions (if possible). Therefore we did a lot of customization. I guess if we do not use the stuff you removed then there is a good chance that you can safely remove it.

Hm that makes me think we should not remove that code, though. It's just a few lines, won't really hurt to keep them in.

@rbri
Copy link
Collaborator

rbri commented Apr 3, 2025

It's just a few lines, won't really hurt to keep them in.

Maybe hurts the inlining....

@rbri
Copy link
Collaborator

rbri commented Apr 4, 2025

For me the cleanup is fine, extending RegExpImpl is a nightmare. I do not believe there are soo many projects out there doing this. And if then they might be able to do it because they have some good understanding what has to be done. Usually simpler code helps everybody.

@gbrail
Copy link
Collaborator

gbrail commented Apr 5, 2025

Side comment -- is "RegExpImpl" getting useful enough that it should be moved to a separate module that could theoretically be used by other projects? I used to always assume that we'd be better off with Java's native regex engine if we could figure it out and you all are showing me that the opposite may be true.

@gbrail
Copy link
Collaborator

gbrail commented Apr 5, 2025

Also, if someone is looking for work, put "convert RegExpImpl to lambdas" as a possible project!

@gbrail
Copy link
Collaborator

gbrail commented Apr 5, 2025

Finally, I will not lose too much sleep if we decide to change the public interface of RegExpImpl given, as @rbri says, it's super hard to extend and I doubt that anyone did. I'll let you guys decide and then take a more careful look at this before merging it. Happy to see it's happening!

@rbri
Copy link
Collaborator

rbri commented Apr 5, 2025

Side comment -- is "RegExpImpl" getting useful enough that it should be moved to a separate module that could theoretically be used by other projects? I used to always assume that we'd be better off with Java's native regex engine if we could figure it out and you all are showing me that the opposite may be true.

Have had something similar in mind and already discussed with @balajirrao in a different PR. Will open an issue for that and write down my point of view.

Issue: #1875

It is now implemented as `match` and `matchAll`, which means a few more
test262 cases now pass
Now follows correctly the spec, and therefore we have some more test262
cases passing.
Also implemented abstract operation `GetSubstitution`.
More test262 cases passing!
diff --git c/rhino/src/main/java/org/mozilla/javascript/AbstractEcmaStringOperations.java i/rhino/src/main/java/org/mozilla/javascript/AbstractEcmaStringOperations.java
new file mode 100644
index 000000000..5405ecf
--- /dev/null
+++ i/rhino/src/main/java/org/mozilla/javascript/AbstractEcmaStringOperations.java
@@ -0,0 +1,133 @@
+package org.mozilla.javascript;
+
+/** Abstract operations for string manipulation as defined by EcmaScript */
+public class AbstractEcmaStringOperations {
+    /**
+     * GetSubstitution(matched, str, position, captures, namedCaptures, replacementTemplate)
+     *
+     * <p>https://tc39.es/ecma262/multipage/text-processing.html#sec-getsubstitution
+     */
+    public static String getSubstitution(
+            Context cx,
+            Scriptable scope,
+            String matched,
+            String str,
+            int position,
+            NativeArray capturesArray,
+            Object namedCaptures,
+            String replacementTemplate) {
+        // See ECMAScript spec 22.1.3.19.1
+        int stringLength = str.length();
+        if (position > stringLength) Kit.codeBug();
+        StringBuilder result = new StringBuilder();
+        String templateRemainder = replacementTemplate;
+        while (!templateRemainder.isEmpty()) {
+            String ref = templateRemainder.substring(0, 1);
+            String refReplacement = ref;
+
+            if (templateRemainder.charAt(0) == '$') {
+                if (templateRemainder.length() > 1) {
+                    char c = templateRemainder.charAt(1);
+                    switch (c) {
+                        case '$':
+                            ref = "$$";
+                            refReplacement = "$";
+                            break;
+
+                        case '`':
+                            ref = "$`";
+                            refReplacement = str.substring(0, position);
+                            break;
+
+                        case '&':
+                            ref = "$&";
+                            refReplacement = matched;
+                            break;
+
+                        case '\'':
+                            {
+                                ref = "$'";
+                                int matchLength = matched.length();
+                                int tailPos = position + matchLength;
+                                refReplacement = str.substring(Math.min(tailPos, stringLength));
+                                break;
+                            }
+
+                        case '0':
+                        case '1':
+                        case '2':
+                        case '3':
+                        case '4':
+                        case '5':
+                        case '6':
+                        case '7':
+                        case '8':
+                        case '9':
+                            {
+                                int digitCount = 1;
+                                if (templateRemainder.length() > 2) {
+                                    char c2 = templateRemainder.charAt(2);
+                                    if (c2 == '0' || c2 == '1' || c2 == '2' || c2 == '3'
+                                            || c2 == '4' || c2 == '5' || c2 == '6' || c2 == '7'
+                                            || c2 == '8' || c2 == '9') {
+                                        digitCount = 2;
+                                    }
+                                }
+                                String digits = templateRemainder.substring(1, 1 + digitCount);
+
+                                // No need for ScriptRuntime version; we know the string is one or
+                                // two characters and
+                                // contains only [0-9]
+                                int index = Integer.parseInt(digits);
+                                long captureLen = capturesArray.getLength();
+                                if (index > captureLen && digitCount == 2) {
+                                    digitCount = 1;
+                                    digits = digits.substring(0, 1);
+                                    index = Integer.parseInt(digits);
+                                }
+                                ref = templateRemainder.substring(0, 1 + digitCount);
+                                if (1 <= index && index <= captureLen) {
+                                    Object capture = capturesArray.get(index - 1);
+                                    if (capture
+                                            == null) { // Undefined or missing are returned as null
+                                        refReplacement = "";
+                                    } else {
+                                        refReplacement = ScriptRuntime.toString(capture);
+                                    }
+                                } else {
+                                    refReplacement = ref;
+                                }
+                                break;
+                            }
+
+                        case '<':
+                            {
+                                int gtPos = templateRemainder.indexOf('>');
+                                if (gtPos == -1 || Undefined.isUndefined(namedCaptures)) {
+                                    ref = "$<";
+                                    refReplacement = ref;
+                                } else {
+                                    ref = templateRemainder.substring(0, gtPos + 1);
+                                    String groupName = templateRemainder.substring(2, gtPos);
+                                    Object capture =
+                                            ScriptRuntime.getObjectProp(
+                                                    namedCaptures, groupName, cx, scope);
+                                    if (Undefined.isUndefined(capture)) {
+                                        refReplacement = "";
+                                    } else {
+                                        refReplacement = ScriptRuntime.toString(capture);
+                                    }
+                                }
+                            }
+                            break;
+                    }
+                }
+            }
+
+            int refLength = ref.length();
+            templateRemainder = templateRemainder.substring(refLength);
+            result.append(refReplacement);
+        }
+        return result.toString();
+    }
+}
diff --git c/rhino/src/main/java/org/mozilla/javascript/Context.java i/rhino/src/main/java/org/mozilla/javascript/Context.java
index fd8ef6ebb..ca5f72d32 100644
--- c/rhino/src/main/java/org/mozilla/javascript/Context.java
+++ i/rhino/src/main/java/org/mozilla/javascript/Context.java
@@ -12,21 +12,20 @@ import java.beans.PropertyChangeEvent;
 import java.beans.PropertyChangeListener;
 import java.io.Closeable;
 import java.io.IOException;
 import java.io.PrintWriter;
 import java.io.Reader;
 import java.io.StringWriter;
 import java.io.Writer;
 import java.lang.reflect.InvocationTargetException;
 import java.lang.reflect.Method;
 import java.util.ArrayDeque;
-import java.util.Collection;
 import java.util.Deque;
 import java.util.EnumSet;
 import java.util.HashMap;
 import java.util.HashSet;
 import java.util.Locale;
 import java.util.Map;
 import java.util.Set;
 import java.util.TimeZone;
 import java.util.function.Consumer;
 import java.util.function.UnaryOperator;
diff --git c/rhino/src/main/java/org/mozilla/javascript/NativeString.java i/rhino/src/main/java/org/mozilla/javascript/NativeString.java
index 8aa19e9..0d2ff57 100644
--- c/rhino/src/main/java/org/mozilla/javascript/NativeString.java
+++ i/rhino/src/main/java/org/mozilla/javascript/NativeString.java
@@ -803,23 +803,98 @@ final class NativeString extends ScriptableObject {

         Object method = ScriptRuntime.getObjectElem(rx, SymbolKey.SEARCH, cx, scope);
         if (!(method instanceof Callable)) {
             throw ScriptRuntime.notFunctionError(rx, method, SymbolKey.SEARCH.getName());
         }
         return ((Callable) method).call(cx, scope, rx, new Object[] {s});
     }

     private static Object js_replace(
             Context cx, Scriptable scope, Scriptable thisObj, Object[] args) {
-        requireObjectCoercible(cx, thisObj, CLASS_NAME, "replace");
-        return ScriptRuntime.checkRegExpProxy(cx)
-                .action(cx, scope, thisObj, args, RegExpProxy.RA_REPLACE);
+        // See ECMAScript spec 22.1.3.19
+        Object o = requireObjectCoercible(cx, thisObj, CLASS_NAME, "replace");
+
+        if (cx.getLanguageVersion() <= Context.VERSION_1_8) {
+            // Use old algorithm for backward compatibility
+            return ScriptRuntime.checkRegExpProxy(cx)
+                    .action(cx, scope, thisObj, args, RegExpProxy.RA_REPLACE);
+        }
+
+        // Spec-compliant algorithm
+
+        Object searchValue = args.length > 0 ? args[0] : Undefined.instance;
+        Object replaceValue = args.length > 1 ? args[1] : Undefined.instance;
+
+        if (!Undefined.isUndefined(searchValue) && searchValue != null) {
+            Object replacer =
+                    ScriptRuntime.getObjectElem(searchValue, SymbolKey.REPLACE, cx, scope);
+            // If method is not undefined, it should be a Callable
+            if (replacer != null && !Undefined.isUndefined(replacer)) {
+                if (!(replacer instanceof Callable)) {
+                    throw ScriptRuntime.notFunctionError(
+                            searchValue, replacer, SymbolKey.SEARCH.getName());
+                }
+                return ((Callable) replacer)
+                        .call(
+                                cx,
+                                scope,
+                                ScriptRuntime.toObject(scope, searchValue),
+                                new Object[] {
+                                    o instanceof NativeString ? ((NativeString) o).string : o,
+                                    replaceValue
+                                });
+            }
+        }
+
+        String string = ScriptRuntime.toString(o);
+        String searchString = ScriptRuntime.toString(searchValue);
+        boolean functionalReplace = replaceValue instanceof Callable;
+        if (!functionalReplace) {
+            replaceValue = ScriptRuntime.toString(replaceValue);
+        }
+        int searchLength = searchString.length();
+        int position = string.indexOf(searchString);
+        if (position == -1) {
+            return string;
+        }
+        String preceding = string.substring(0, position);
+        String following = string.substring(position + searchLength);
+
+        String replacement;
+        if (functionalReplace) {
+            Scriptable callThis =
+                    ScriptRuntime.getApplyOrCallThis(cx, scope, null, 0, (Callable) replaceValue);
+
+            Object replacementObj =
+                    ((Callable) replaceValue)
+                            .call(
+                                    cx,
+                                    scope,
+                                    callThis,
+                                    new Object[] {
+                                        searchString, position, string,
+                                    });
+            replacement = ScriptRuntime.toString(replacementObj);
+        } else {
+            NativeArray captures = (NativeArray) cx.newArray(scope, 0);
+            replacement =
+                    AbstractEcmaStringOperations.getSubstitution(
+                            cx,
+                            scope,
+                            searchString,
+                            string,
+                            position,
+                            captures,
+                            Undefined.SCRIPTABLE_UNDEFINED,
+                            (String) replaceValue);
+        }
+        return preceding + replacement + following;
     }

     private static Object js_replaceAll(
             Context cx, Scriptable scope, Scriptable thisObj, Object[] args) {
         requireObjectCoercible(cx, thisObj, CLASS_NAME, "replaceAll");
         return ScriptRuntime.checkRegExpProxy(cx)
                 .action(cx, scope, thisObj, args, RegExpProxy.RA_REPLACE_ALL);
     }

     private static Object js_matchAll(
diff --git c/rhino/src/main/java/org/mozilla/javascript/regexp/NativeRegExp.java i/rhino/src/main/java/org/mozilla/javascript/regexp/NativeRegExp.java
index 5ce6393e6..6263c3e6c 100644
--- c/rhino/src/main/java/org/mozilla/javascript/regexp/NativeRegExp.java
+++ i/rhino/src/main/java/org/mozilla/javascript/regexp/NativeRegExp.java
@@ -5,20 +5,21 @@
  * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

 package org.mozilla.javascript.regexp;

 import java.io.Serializable;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import org.mozilla.javascript.AbstractEcmaObjectOperations;
+import org.mozilla.javascript.AbstractEcmaStringOperations;
 import org.mozilla.javascript.Callable;
 import org.mozilla.javascript.Constructable;
 import org.mozilla.javascript.Context;
 import org.mozilla.javascript.Function;
 import org.mozilla.javascript.IdFunctionObject;
 import org.mozilla.javascript.IdScriptableObject;
 import org.mozilla.javascript.Kit;
 import org.mozilla.javascript.NativeArray;
 import org.mozilla.javascript.NativeObject;
 import org.mozilla.javascript.ScriptRuntime;
@@ -3619,40 +3620,41 @@ public class NativeRegExp extends IdScriptableObject {
                 ++n;
             }
             NativeArray capturesArray = (NativeArray) cx.newArray(scope, captures.toArray());

             Object namedCaptures = ScriptRuntime.getObjectProp(result, "groups", cx, scope);
             String replacementString;

             if (functionalReplace) {
                 List<Object> replacerArgs = new ArrayList<>();
                 replacerArgs.add(matched);
-                replacerArgs.addAll(capturesArray);
+                replacerArgs.addAll(captures);
                 replacerArgs.add(position);
                 replacerArgs.add(s);
                 if (!Undefined.isUndefined(namedCaptures)) {
                     replacerArgs.add(namedCaptures);
                 }

                 Scriptable callThis =
                         ScriptRuntime.getApplyOrCallThis(
                                 cx, scope, null, 0, (Callable) replaceValue);
                 Object replacementValue =
                         ((Callable) replaceValue).call(cx, scope, callThis, replacerArgs.toArray());
                 replacementString = ScriptRuntime.toString(replacementValue);
             } else {
                 if (!Undefined.isUndefined(namedCaptures)) {
                     namedCaptures = ScriptRuntime.toObject(scope, namedCaptures);
                 }

+                NativeArray capturesArray = (NativeArray) cx.newArray(scope, captures.toArray());
                 replacementString =
-                        getSubstitution(
+                        AbstractEcmaStringOperations.getSubstitution(
                                 cx,
                                 scope,
                                 matched,
                                 s,
                                 position,
                                 capturesArray,
                                 namedCaptures,
                                 (String) replaceValue);
             }

@@ -3664,144 +3666,20 @@ public class NativeRegExp extends IdScriptableObject {
         }

         if (nextSourcePosition >= lengthS) {
             return accumulatedResult.toString();
         } else {
             accumulatedResult.append(s.substring(nextSourcePosition));
             return accumulatedResult.toString();
         }
     }

-    private static String getSubstitution(
-            Context cx,
-            Scriptable scope,
-            String matched,
-            String str,
-            int position,
-            NativeArray capturesArray,
-            Object namedCaptures,
-            String replacementTemplate) {
-        // See ECMAScript spec 22.1.3.19.1
-        int stringLength = str.length();
-        // TODO: assert(position <= stringLength)
-        StringBuilder result = new StringBuilder();
-        String templateRemainder = replacementTemplate;
-        while (!templateRemainder.isEmpty()) {
-            String ref = templateRemainder.substring(0, 1);
-            String refReplacement = ref;
-
-            if (templateRemainder.charAt(0) == '$') {
-                if (templateRemainder.length() > 1) {
-                    char c = templateRemainder.charAt(1);
-                    switch (c) {
-                        case '$':
-                            ref = "$$";
-                            refReplacement = "$";
-                            break;
-
-                        case '`':
-                            ref = "$`";
-                            refReplacement = str.substring(0, position);
-                            break;
-
-                        case '&':
-                            ref = "$&";
-                            refReplacement = matched;
-                            break;
-
-                        case '\'':
-                            {
-                                ref = "$'";
-                                int matchLength = matched.length();
-                                int tailPos = position + matchLength;
-                                refReplacement = str.substring(Math.min(tailPos, stringLength));
-                                break;
-                            }
-
-                        case '0':
-                        case '1':
-                        case '2':
-                        case '3':
-                        case '4':
-                        case '5':
-                        case '6':
-                        case '7':
-                        case '8':
-                        case '9':
-                            {
-                                int digitCount = 1;
-                                if (templateRemainder.length() > 2) {
-                                    char c2 = templateRemainder.charAt(2);
-                                    if (c2 == '0' || c2 == '1' || c2 == '2' || c2 == '3'
-                                            || c2 == '4' || c2 == '5' || c2 == '6' || c2 == '7'
-                                            || c2 == '8' || c2 == '9') {
-                                        digitCount = 2;
-                                    }
-                                }
-                                String digits = templateRemainder.substring(1, 1 + digitCount);
-
-                                // No need for ScriptRuntime version; we know the string is one or
-                                // two characters and
-                                // contains only [0-9]
-                                int index = Integer.parseInt(digits);
-                                long captureLen = capturesArray.getLength();
-                                if (index > captureLen && digitCount == 2) {
-                                    digitCount = 1;
-                                    digits = digits.substring(0, 1);
-                                    index = Integer.parseInt(digits);
-                                }
-                                ref = templateRemainder.substring(0, 1 + digitCount);
-                                if (1 <= index && index <= captureLen) {
-                                    Object capture = capturesArray.get(index - 1);
-                                    if (capture
-                                            == null) { // Undefined or missing are returned as null
-                                        refReplacement = "";
-                                    } else {
-                                        refReplacement = ScriptRuntime.toString(capture);
-                                    }
-                                } else {
-                                    refReplacement = ref;
-                                }
-                                break;
-                            }
-
-                        case '<':
-                            {
-                                int gtPos = templateRemainder.indexOf('>');
-                                if (gtPos == -1 || Undefined.isUndefined(namedCaptures)) {
-                                    ref = "$<";
-                                    refReplacement = ref;
-                                } else {
-                                    ref = templateRemainder.substring(0, gtPos + 1);
-                                    String groupName = templateRemainder.substring(2, gtPos);
-                                    Object capture =
-                                            ScriptRuntime.getObjectProp(
-                                                    namedCaptures, groupName, cx, scope);
-                                    if (Undefined.isUndefined(capture)) {
-                                        refReplacement = "";
-                                    } else {
-                                        refReplacement = ScriptRuntime.toString(capture);
-                                    }
-                                }
-                            }
-                            break;
-                    }
-                }
-            }
-
-            int refLength = ref.length();
-            templateRemainder = templateRemainder.substring(refLength);
-            result.append(refReplacement);
-        }
-        return result.toString();
-    }
-
     private static long getLastIndex(Context cx, Scriptable thisObj) {
         return ScriptRuntime.toLength(ScriptRuntime.getObjectProp(thisObj, "lastIndex", cx));
     }

     private static NativeRegExp realThis(Scriptable thisObj, IdFunctionObject f) {
         return realThis(thisObj, f.getFunctionName());
     }

     private static NativeRegExp realThis(Scriptable thisObj, String functionName) {
         return ensureType(thisObj, NativeRegExp.class, functionName);
diff --git c/tests/testsrc/test262.properties i/tests/testsrc/test262.properties
index c681d67..684bd3e 100644
--- c/tests/testsrc/test262.properties
+++ i/tests/testsrc/test262.properties
@@ -1401,47 +1401,45 @@ built-ins/Reflect 12/153 (7.84%)
     deleteProperty/return-boolean.js strict
     get/return-value-from-receiver.js
     ownKeys/return-on-corresponding-order-large-index.js
     set/call-prototype-property-set.js
     set/different-property-descriptors.js
     set/receiver-is-not-object.js
     set/return-abrupt-from-result.js
     set/return-false-if-receiver-is-not-writable.js
     set/return-false-if-target-is-not-writable.js

-built-ins/RegExp 1038/1854 (55.99%)
+built-ins/RegExp 1036/1854 (55.88%)
     CharacterClassEscapes 24/24 (100.0%)
     dotall 4/4 (100.0%)
     escape 20/20 (100.0%)
     match-indices/indices-array.js
     match-indices/indices-array-element.js
     match-indices/indices-array-matched.js
     match-indices/indices-array-non-unicode-match.js
     match-indices/indices-array-properties.js
     match-indices/indices-array-unicode-match.js
     match-indices/indices-array-unicode-property-names.js
     match-indices/indices-array-unmatched.js
     match-indices/indices-groups-object.js
     match-indices/indices-groups-object-undefined.js
     match-indices/indices-groups-object-unmatched.js
     match-indices/indices-groups-properties.js
     match-indices/indices-property.js
     named-groups/duplicate-names-match-indices.js
-    named-groups/duplicate-names-replace.js
     named-groups/duplicate-names-replaceall.js
     named-groups/duplicate-names-split.js
     named-groups/functional-replace-global.js
     named-groups/functional-replace-non-global.js
     named-groups/groups-object.js
     named-groups/groups-object-subclass.js
     named-groups/groups-object-subclass-sans.js
-    named-groups/groups-object-unmatched.js
     named-groups/groups-properties.js
     named-groups/lookbehind.js
     named-groups/non-unicode-property-names.js
     named-groups/non-unicode-property-names-valid.js
     named-groups/string-replace-escaped.js
     named-groups/string-replace-get.js
     named-groups/string-replace-missing.js
     named-groups/string-replace-nocaptures.js
     named-groups/string-replace-numbered.js
     named-groups/string-replace-unclosed.js
@@ -1786,21 +1784,21 @@ built-ins/Set 158/381 (41.47%)
     prototype/union/subclass-receiver-methods.js
     prototype/union/subclass-symbol-species.js
     prototype/union/union.js
     proto-from-ctor-realm.js
     valid-values.js

 built-ins/SetIteratorPrototype 0/11 (0.0%)

 ~built-ins/SharedArrayBuffer

-built-ins/String 49/1182 (4.15%)
+built-ins/String 46/1182 (3.89%)
     prototype/endsWith/return-abrupt-from-searchstring-regexp-test.js
     prototype/includes/return-abrupt-from-searchstring-regexp-test.js
     prototype/indexOf/position-tointeger-errors.js
     prototype/indexOf/position-tointeger-wrapped-values.js
     prototype/indexOf/searchstring-tostring-wrapped-values.js
     prototype/isWellFormed/to-string-primitive.js
     prototype/matchAll/flags-nonglobal-throws.js
     prototype/matchAll/flags-undefined-throws.js
     prototype/matchAll/regexp-matchAll-invocation.js
     prototype/matchAll/regexp-prototype-matchAll-invocation.js
@@ -1816,23 +1814,20 @@ built-ins/String 49/1182 (4.15%)
     prototype/replaceAll/searchValue-flags-toString-abrupt.js
     prototype/replaceAll/searchValue-get-flags-abrupt.js
     prototype/replaceAll/searchValue-isRegExp-abrupt.js
     prototype/replaceAll/searchValue-replacer-before-tostring.js
     prototype/replaceAll/searchValue-replacer-call.js
     prototype/replaceAll/searchValue-replacer-call-abrupt.js
     prototype/replaceAll/searchValue-replacer-method-abrupt.js
     prototype/replaceAll/searchValue-replacer-RegExp-call.js {unsupported: [class]}
     prototype/replaceAll/searchValue-replacer-RegExp-call-fn.js {unsupported: [class]}
     prototype/replaceAll/searchValue-tostring-regexp.js
-    prototype/replace/cstm-replace-get-err.js
-    prototype/replace/cstm-replace-invocation.js
-    prototype/replace/S15.5.4.11_A12.js non-strict
     prototype/search/cstm-search-invocation.js
     prototype/split/cstm-split-get-err.js
     prototype/split/cstm-split-invocation.js
     prototype/split/separator-regexp.js
     prototype/split/separator-tostring-error.js
     prototype/split/this-value-tostring-error.js
     prototype/startsWith/return-abrupt-from-searchstring-regexp-test.js
     prototype/substring/S15.5.4.15_A1_T5.js
     prototype/toLocaleLowerCase/Final_Sigma_U180E.js {unsupported: [u180e]}
     prototype/toLocaleLowerCase/special_casing_conditional.js
@@ -5292,27 +5287,25 @@ language/expressions/unsigned-right-shift 2/45 (4.44%)
     order-of-evaluation.js

 language/expressions/void 0/9 (0.0%)

 language/expressions/yield 4/63 (6.35%)
     rhs-omitted.js
     rhs-primitive.js
     star-return-is-null.js
     star-rhs-iter-nrml-next-invoke.js

-language/function-code 100/217 (46.08%)
+language/function-code 98/217 (45.16%)
     10.4.3-1-1-s.js non-strict
     10.4.3-1-10-s.js non-strict
-    10.4.3-1-100-s.js
-    10.4.3-1-100gs.js
-    10.4.3-1-102-s.js
-    10.4.3-1-102gs.js
+    10.4.3-1-102-s.js compiled
+    10.4.3-1-102gs.js compiled
     10.4.3-1-104.js strict
     10.4.3-1-106.js strict
     10.4.3-1-10gs.js non-strict
     10.4.3-1-11-s.js strict
     10.4.3-1-11gs.js strict
     10.4.3-1-12-s.js non-strict
     10.4.3-1-12gs.js non-strict
     10.4.3-1-14-s.js non-strict
     10.4.3-1-14gs.js non-strict
     10.4.3-1-16-s.js non-strict
Now follows correctly the spec, and therefore we have some more test262
cases passing.
@andreabergia andreabergia force-pushed the regexp-symbol-and-string-methods branch from 8ed166b to 3e71209 Compare April 8, 2025 06:48
@andreabergia
Copy link
Contributor Author

I've decided to drop the "dead code removal" from this PR; we can always discuss and do it in another. It was only a few lines anyway. I've also rebased it on top of master.
For me, this is ready to be merged. Can be squashed, or not, as you prefer.

Copy link
Collaborator

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- this is indeed looking very nice and I'd like to try and be very careful about such a big change. I can see a place where we could improve code coverage -- could you please take a look?

ScriptRuntime.toInteger(
ScriptRuntime.getObjectProp(result, "index", cx, scope));
int position;
if (positionDbl < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern (clamping the position to the length in some way) appears more than once in this code, and looking at the code coverage, we only test one of the three possible paths.

if (separatorLength == 0) {
int strLen = s.length();
long outLen;
if (lim < 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second place where we have this kind of clamping code and we only test one of the three code paths. Would you consider either making a new function in ScriptRuntime for this generic behavior, or adding some artificial test cases so that we execute all three code paths? It's this kind of tricky stuff that, when not tested, leads to problems, because otherwise the code coverage is darn near 100% for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extracted a function ScriptRuntime.clamp and added some unit tests for it.
The coverage really comes from test262, luckily. 🙂

char c2 = templateRemainder.charAt(2);
if (c2 == '0' || c2 == '1' || c2 == '2' || c2 == '3'
|| c2 == '4' || c2 == '5' || c2 == '6' || c2 == '7'
|| c2 == '8' || c2 == '9') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't think I "feel strongly" but it does feel wrong to me, because we're introducing five comparison instructions in the average case to what I imagine is a performance-sensitive bit of code. I would prefer a function that is implemented as a switch statement.

Copy link
Contributor Author

@andreabergia andreabergia Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think this is particularly performance sensitive - it happens only when calling String.prototype.replace or replaceAll when the replacement is something like $23 (two digits). But I've done the change anyway. 🙂

@andreabergia andreabergia force-pushed the regexp-symbol-and-string-methods branch from ca67468 to 9439176 Compare April 9, 2025 07:21
@gbrail
Copy link
Collaborator

gbrail commented Apr 10, 2025

Thank you! I appreciate those small things.

@gbrail gbrail merged commit f6e5590 into mozilla:master Apr 10, 2025
3 checks passed
@andreabergia andreabergia deleted the regexp-symbol-and-string-methods branch April 10, 2025 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants