Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridge request for National Geographic #1029

Closed
5 of 10 tasks
tomislav opened this issue Feb 5, 2019 · 18 comments
Closed
5 of 10 tasks

Bridge request for National Geographic #1029

tomislav opened this issue Feb 5, 2019 · 18 comments
Labels
Bridge-Request Request for a new bridge

Comments

@tomislav
Copy link

tomislav commented Feb 5, 2019

Bridge request

Sadly, National Geographic doesn't have an RSS feed anymore. The bridge should get the most recent articles published on National Geographic.

Also, would be nice to specify a category you're interested. ie. "magazine" only.

General information

Get a feed of the most recent articles published on National Geographic.

  • How should the information be displayed/formatted?

Title
Lead image
Description

  • Which of the following parameters do you expect?

    • Title
    • URI (link to the original article)
    • Author
    • Timestamp
    • Content (the content of the article)
    • Enclosures (pictures, videos, etc...)
    • Categories (categories, tags, etc...)

Options

  • Limit number of returned items
    • Default limit: 5
  • Load full articles
    • Cache articles (articles are stored in a local cache on first request): yes
    • Cache timeout (max = 24 hours): 24 hours
  • Balance requests (RSS-Bridge uses cached versions to reduce bandwith usage)
    • Timeout (default = 5 minutes, max = 24 hours): 5 minutes
@logmanoriginal logmanoriginal added the Bridge-Request Request for a new bridge label Mar 2, 2019
@logmanoriginal
Copy link
Contributor

I quickly checked the site which unfortunately returns no contents if javascript is disabled. That makes it unusable for RSS-Bridge. However, they do provide an API, which can be used by individuals and open source projects: https://newsapi.org/s/national-geographic-api

They require an attribution link for their contents, which is reasonable and actually a desired outcome for RSS-Bridge as well. Generally, their terms sound reasonable to me. I don't have time to go further into it, but it sure looks like a feasible task to make a Bridge using their API (which is not limited to National Geographic it seems).

https://newsapi.org/pricing

image

I'm actually impressed 😮

@tomislav
Copy link
Author

tomislav commented Mar 2, 2019

All the content that is displayed on their frontpage is embedded in the HTML as a JavaScript array/dictionary. Maybe it could be scrapped with a regex?

@logmanoriginal
Copy link
Contributor

You are right, it does contain the JSON data. Not sure how I missed that before. I went ahead and made a small bridge from the contents I could find, see #1065. Let me know if this is what you wanted. There are other endpoints from which contents can possibly be extracted (like the one I linked in the PR).

@tomislav
Copy link
Author

Thanks! Looks good to me.

About other endpoints, I think people would most be interested in getting a feed off articles published in the magazine. https://www.nationalgeographic.com/magazine/

@logmanoriginal
Copy link
Contributor

I changed the bridge to build a feed off articles in the magazine. Please take a look.
How about including full articles? Currently the items in the feed have no contents, because there is no content on the original page. Technically it's possible to collect each article, but that take extra time on each request. Let me know what you think about that.

@tomislav
Copy link
Author

IMHO, there should still be a "latest stories" bridge. That's where most of the articles and daily news are posted. Built off https://www.nationalgeographic.com/latest-stories/

But it would be nice to have an additional "magazine only" bridge, for people who are interested only in the big stories.

I don't know if this requires two separate bridges?

About the the full articles, I poked around with the web inspector and it seems doable, only the images would have to extracted from the tags and rewritten as so they work in RSS readers. Not sure how much of a hassle that is, but this is great as is.

@logmanoriginal
Copy link
Contributor

Thanks for the feedback. I'll see if I can find some time this week to get it done.

I don't know if this requires two separate bridges?

It's doable in a single bridge, using contexts:
https://github.com/RSS-Bridge/rss-bridge/wiki/const-PARAMETERS#level-1---context

About the the full articles, I poked around with the web inspector and it seems doable, only the images would have to extracted from the tags and rewritten as so they work in RSS readers. Not sure how much of a hassle that is, but this is great as is.

I suppose you mean images have relative links, right? (haven't checked yet)
This is easily solvable, using defaultLinkTo.

@logmanoriginal
Copy link
Contributor

I've added most features. You can now select the topic from a drop-down list and choose to include the full article as well (which can take a while and may not work if the timeout is set too low on your server). Images in the article are not included, however.

Also, there is no time stamp included in the raw data, so feeds will have to rely on titles.

Let me know if this now works for you.

@tomislav
Copy link
Author

Thanks. I just tried it out and works perfect. I'll let you know in a few days if there were any issues.

I presume they load images with javascript? Bummer.

@logmanoriginal
Copy link
Contributor

I presume they load images with javascript?

To be honest I haven't checked yet. Lead images are simply provided in the JSON data. For full articles the current filter only covers text. I'll take another look, maybe images can be extracted the same way.

@logmanoriginal
Copy link
Contributor

That was easier than I thought. Try the latest version, it includes images for full articles.

@tomislav
Copy link
Author

Thanks, I'll try it.

One thing that I noticed is that I'm getting duplicated articles in my RSS reader (Feedbin). Are the uid's on the articles changing when they update the page? I've tried commenting out the uid assignment line so it relies on uri's to see if it makes difference.

@tomislav
Copy link
Author

I can confirm I'm no longer getting duplicates after I removed the following line:

$item['uid'] = $story['id'];

Otherwise, it's working great. I appreciate it a lot.

Some "Maybe/Someday" things that I wanted to write down for reference:

  • Include image captions below images
  • Include "hero" images and carousels (at the top of the page, above the article title)
  • Include carousels inside article text

@logmanoriginal
Copy link
Contributor

Great, I'm glad this is working for you!

I removed the uid and included image captions.
What do you mean with "hero" images and carousels?

There are carousels mentioned in the JSON data, but from what I can tell they are placed below the contents and not above - maybe I'm looking at the wrong contents. It would be great if you could share a screen shot to illustrate what you mean.

@logmanoriginal
Copy link
Contributor

Thanks for the screenshots. Hero images were already included as enclosures. I just added support for hero carousels at top (added to enclosures) and in the article (added to contents).

Find the latest version at #1065

Does that work for you?

@tomislav
Copy link
Author

Thank you. I don’t think any RSS reader displays enclosures, so hero images and carousels should probably go directly into the content (top) with their corresponding captions.

@logmanoriginal
Copy link
Contributor

This was added, so I'm going to merge this now. Please open a new issue if further changes are necessary.

infominer33 pushed a commit to web-work-tools/rss-bridge that referenced this issue Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bridge-Request Request for a new bridge
Projects
None yet
Development

No branches or pull requests

2 participants