Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readability error in single feed causes the entire update job to fail #3102

Open
3 tasks done
Ellpeck opened this issue Mar 3, 2025 · 0 comments
Open
3 tasks done
Labels

Comments

@Ellpeck
Copy link

Ellpeck commented Mar 3, 2025

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

  • I have read the CONTRIBUTING.md and followed the provided tips
  • I accept that the issue will be closed without comment if I do not check here
  • I accept that the issue will be closed without comment if I do not fill out all items in the issue template.

Explain the Problem

I have one specific feed I'm subscribed to (the url is https://taz.de/Politik/!p4615;rss/) which appears to have a weirdly formatted item in it as of late. This results in the Readability library failing to parse the feed in some way, with the thrown exception causing the entire feed update job to fail to update any of the other feeds as well.

I use other taz.de feeds that still work fine, so it appears that this is an issue exclusive to whatever item in this feed's current content is causing the issue.

Trying to update just the erroring feed using the update-feed subcommand causes the following exception to be thrown, which also appears in the log every hour (which is the interval I have set for feed updates through the background job system):

An unhandled exception has been thrown:
TypeError: fivefilters\Readability\Nodes\DOM\DOMNodeList::add(): Argument #1 ($node) must be of type fivefilters\Readability\Nodes\DOM\DOMNode|fivefilters\Readability\Nodes\DOM\DOMElement|fivefilters\Readability\Nodes\DOM\DOMText|fivefilters\Readability\Nodes\DOM\DOMComment|fivefilters\Readability\Nodes\DOM\DOMProcessingInstruction, fivefilters\Readability\Nodes\DOM\DOMCdataSection given, called in /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Nodes/NodeUtility.php on line 162 and defined in /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Nodes/DOM/DOMNodeList.php:45
Stack trace:
#0 /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Nodes/NodeUtility.php(162): fivefilters\Readability\Nodes\DOM\DOMNodeList->add(Object(fivefilters\Readability\Nodes\DOM\DOMCdataSection))
#1 /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Nodes/NodeTrait.php(353): fivefilters\Readability\Nodes\NodeUtility::filterTextNodes(Object(DOMNodeList))
#2 /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Readability.php(917): fivefilters\Readability\Nodes\DOM\DOMElement->hasSingleTagInsideElement('p')
#3 /var/www/html/apps/news/vendor/fivefilters/readability.php/src/Readability.php(178): fivefilters\Readability\Readability->getNodes(Object(fivefilters\Readability\Nodes\DOM\DOMElement))
#4 /var/www/html/apps/news/lib/Scraper/Scraper.php(78): fivefilters\Readability\Readability->parse('<!DOCTYPE html>...')
#5 /var/www/html/apps/news/lib/Fetcher/FeedFetcher.php(191): OCA\News\Scraper\Scraper->scrape('https://taz.de/...')
#6 /var/www/html/apps/news/lib/Service/FeedServiceV2.php(320): OCA\News\Fetcher\FeedFetcher->fetch('https://taz.de/...', true, NULL, NULL, 'Sat, 01 Mar 202...')
#7 /var/www/html/apps/news/lib/Command/Updater/UpdateFeed.php(58): OCA\News\Service\FeedServiceV2->fetch(Object(OCA\News\Db\Feed))
#8 /var/www/html/3rdparty/symfony/console/Command/Command.php(326): OCA\News\Command\Updater\UpdateFeed->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 /var/www/html/3rdparty/symfony/console/Application.php(1078): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 /var/www/html/3rdparty/symfony/console/Application.php(324): Symfony\Component\Console\Application->doRunCommand(Object(OCA\News\Command\Updater\UpdateFeed), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#11 /var/www/html/3rdparty/symfony/console/Application.php(175): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#12 /var/www/html/lib/private/Console/Application.php(187): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#13 /var/www/html/console.php(87): OC\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput))
#14 /var/www/html/occ(11): require_once('/var/www/html/c...')

Steps to Reproduce

Explain what you did to encounter the issue

  1. Add the erroring feed (link above)
  2. Try to wait for the update job to run - no feeds are updated at all
  3. Check the logs for the exception
  4. :( sad times

System Information

  • News app version: 25.2.1
  • Nextcloud version: 31.0.0
  • Cron type: system cron
  • PHP version: 8.3.17
  • Database and version: PostgreSQL 15.12 (Ubuntu 15.12-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
  • Browser and version: n/a
  • OS and version: Linux 6.8.0-54-generic x86_64
Contents of nextcloud/data/nextcloud.log

One of the error messages as described above is already extremely long in their raw format, so here is a single one of those: https://gist.github.com/Ellpeck/9134bde37bfb4c6748de2e29f3bdded2

Contents of Browser Error Console Read http://ggnome.com/wiki/Using_The_Browser_Error_Console if you are unsure what to put here
n/a
@Ellpeck Ellpeck added the bug label Mar 3, 2025
@Grotax Grotax added help wanted starter issue API Impact API/Backend code labels Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants