Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate article across feeds #3081

Open
3 tasks done
SethFalco opened this issue Feb 5, 2025 · 0 comments
Open
3 tasks done

Duplicate article across feeds #3081

SethFalco opened this issue Feb 5, 2025 · 0 comments
Labels

Comments

@SethFalco
Copy link

SethFalco commented Feb 5, 2025

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

  • I have read the CONTRIBUTING.md and followed the provided tips
  • I accept that the issue will be closed without comment if I do not check here
  • I accept that the issue will be closed without comment if I do not fill out all items in the issue template.

Explain the Problem

I'm subscribed to multiple news feeds from the BBC:

However, frequently the same article appears in more than one of these feeds.

For example, here is an identical record that appears in both technology and business right now (every field is identical, including the URL):

<item>
<title>Google lifts ban on using AI for weapons</title>
<description>
The tech giant has updated the principles governing its development of artificial intelligence.
</description>
<link>https://www.bbc.com/news/articles/cy081nqx2zjo</link>
<guid isPermaLink="false">https://www.bbc.com/news/articles/cy081nqx2zjo#0</guid>
<pubDate>Wed, 05 Feb 2025 00:43:40 GMT</pubDate>
<media:thumbnail width="240" height="135" url="https://ichef.bbci.co.uk/ace/standard/240/cpsprodpb/9489/live/6de4ebf0-e351-11ef-a08f-756c6bc158bd.jpg"/>
</item>

My understanding is that Nextcloud News generates a GUID/hash for each article to avoid presenting duplicates to the user. This logic doesn't appear to apply across multiple feeds, however.

Steps to Reproduce

Explain what you did to encounter the issue

  1. Subscribe to http://feeds.bbci.co.uk/news/business/rss.xml
  2. Subscribe to http://feeds.bbci.co.uk/news/technology/rss.xml
  3. Sync feeds
  4. Ensure you're selecting on All articles
  5. Scroll to see duplicates (if you don't, just add other BBC feeds, a duplicate is bound to show up)

System Information

  • News app version: 25.2.0
  • Nextcloud version: Nextcloud Hub 8 (29.0.0)
  • Cron type: system cron
  • PHP version: 8.2.18
  • Database and version: mysql 10.6.17
  • Browser and version: Mozilla Firefox Flatpak 134.0.2 (64-bit)
  • OS and version: Debian GNU/Linux 12 (bookworm) x86_64
Contents of nextcloud/data/nextcloud.log

N/A: This is a logical/UX issue afaik.

Contents of Browser Error Console Read http://ggnome.com/wiki/Using_The_Browser_Error_Console if you are unsure what to put here

N/A: No logs in console!

Proposal

Would it be feasible to check this hash globally, and see if that hash has already been processed before.

The article should still appear in each individual feed when they're accessed. These kinds of duplicates would only be hidden when selected on a category/tag featuring multiple feeds, or when selected on All articles.

Considerations

At the top of the article, it states from {Source} (from BBC Technology or from BBC Business). If there are duplicate articles, it may be worth thinking how to approach this UI. Can it just use whichever it found first? Should it list all feeds the article appeared in (from BBC Technology and BBC Business)?

@SethFalco SethFalco added the bug label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant