Skip to content

html2text does not work with uppercase html tags #23

Open
@tobiase

Description

@tobiase
print (new Html2Text('<P>Test string</P>'))->getText();

prints nothing while

print (new Html2Text('<p>Test string</p>'))->getText()

prints Test string as expected.

The reason for that is in \voku\Html2Text\Html2Text::pregCallback.
$matches['element'] is initially converted to lowercase with

$element = \strtolower($matches['element']);

but the lowercase version is not used in the switch statement or to match headings:

protected function pregCallback(array $matches): string
    {
        // init
        $element = \strtolower($matches['element']);

        switch ($matches['element']) { // Case sensitive
            case 'p':
                // Replace newlines with spaces.
                $para = \str_replace("\n", ' ', $matches['value']);

                // Add trailing newlines for this paragraph.
                return "\n\n" . $para . "\n\n";
                ...
                ...
        }

        // h1 - h6
        if (\preg_match('/h[123456]/', $matches['element'])) {  // Case sensitive
            return $this->convertElement($matches['value'], $matches['element']);
        }

I don't understand why pregCallback returns an empty string as last line.
Shouldn't the code always know how to handle the $matches['element'] that ends up in pregCallback?
Wouldn't it be better to throw an exception in case there is no handling available for a match $matches['element']
or return $matches['value'] ?? ''?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions