Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Feature request: utility function to check if a string contains only alphanumeric characters #962

Closed
PaulRBerg opened this issue Jun 17, 2024 · 5 comments

Comments

@PaulRBerg
Copy link

PaulRBerg commented Jun 17, 2024

Rationale

Onchain generation of NFT SVGs is on the rise. Many SVGs rely on third-party string data, e.g. ERC-20 symbols.

To sanitize strings and prevent XSS attacks, developers should only allow alphanumeric strings in the token symbol1. This should be enough, since the vast majority of tokens don't contain any special symbols.

It would thus be helpful to have a utility function in Solady for checking whether a string contains only alphanumeric characters.

Example Implementation

/// @notice Checks whether the provided string contains only alphanumeric characters and spaces.
/// @dev Note that this returns true for empty strings, but it is not a security concern.
function isAlphanumeric(string memory str) internal pure returns (bool) {
    // Convert the string to bytes to iterate over its characters.
    bytes memory b = bytes(str);

    uint256 length = b.length;
    for (uint256 i = 0; i < length; ++i) {
        bytes1 char = b[i];

        // Check if it's a space or an alphanumeric character.
        bool isSpace = char == 0x20; // space
        bool isDigit = char >= 0x30 && char <= 0x39; // 0-9
        bool isUppercase = char >= 0x41 && char <= 0x5A; // A-Z
        bool isLowercase = char >= 0x61 && char <= 0x7A; // a-z
        if (!(isSpace || isDigit || isUppercase || isLowercase)) {
            return false;
        }
    }
    return true;
}

Footnotes

  1. See, for example, finding M-01 in Sablier's recent audit contest on CodeHawks.

@PaulRBerg
Copy link
Author

Alternatively, a utility function to check if a single character is alphanumeric would also be helpful:

function isAlphanumericChar(bytes1 char) internal pure returns (bool) {
    bool isSpace = char == SPACE;
    bool isDigit = char >= ZERO && char <= NINE;
    bool isUppercaseLetter = char >= A && char <= Z;
    bool isLowercaseLetter = char >= a && char <= z;
    return isSpace || isDigit || isUppercaseLetter || isLowercaseLetter;
}

@Vectorized
Copy link
Owner

Good feature request. Will add, thanks.

@atarpara
Copy link
Collaborator

atarpara commented Jun 18, 2024

@PaulRBerg your request is fulfilled check it out.

@0xCLARITY
Copy link

Personally, I think it's quite surprising that isAlphanumeric would return "true" for spaces - that seems quite counterintuitive.

Other examples I can find of is_alphanumeric don't do that:

Maybe it would be better to have 2 functions? isAlphanumeric() and isAlphanumericWithSpaces()?

Other Characters

If we are going to special case "spaces", I can also think of other characters that might deserve the same treatment.

Right now, I'm writing a contract that requires strings be alphanumeric, or _ underscores, or - hyphens, since those are commonly considered valid characters for URL slugs. Right now, I have something like this:

    function _validateUrlSafe(string calldata urlSlug) internal pure {
        // Check that the slug is no more than 16 bytes (which will be 16 characters assuming ASCII).
        uint256 length = bytes(urlSlug).length;
        if (length > 16) revert TooLong();

        // Check all characters are alphanumeric or hyphen/underscore.
        for (uint256 i = 0; i < length; i++) {
            bytes1 charCode = bytes(urlSlug)[i];

            // a-z, A-Z, hyphen, underscore
            if (
                (charCode > 0x60 && charCode < 0x7B) // a-z
                    || (charCode > 0x40 && charCode < 0x5B) // A-Z
                    || (charCode == 0x2D) // hyphen (-)
                    || (charCode == 0x5F) // underscore (_)
            ) continue;

            // numbers (0-9)
            if (charCode > 0x2F && charCode < 0x3A) continue;

            revert InvalidChar();
        }
    }

@Vectorized
Copy link
Owner

Btw, there is also a LibString.escapeHTML(s).

@Vectorized Vectorized changed the title Feature request: utility function to check if a string contains only alphanumeric characters ✨ Feature request: utility function to check if a string contains only alphanumeric characters Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants