« Ruby Gem of the Week Series

The Future of Online News - The Future of Facebook & Co - Web Feeds, Web Feeds, Web Feeds

feedparser library - Read Web Feeds in XML, HTML, JSON, TXT and More; Build Your Own News Reader from Scratch in Twenty Lines

github: feedparser/feedparser, rubygems: feedparser, rdoc: feedparser

What’s a web feed?

A web feed (or news feed) is a (simple) document/text format that:

(1) lets you publish a list of:

and that

(2) lets others subscribe to your updates.

Example:

{
    "version": "https://jsonfeed.org/version/1",
    "title": "Jason Fried's Microblog",
    "home_page_url": "https://micro.blog/jasonfried/",
    "feed_url": "https://micro.blog/jasonfried/feed.json",
    "author": {
        "name": "Jason Fried",
        "url": "https://micro.blog/jasonfried/",
        "avatar": "https://micro.blog/jasonfried/avatar.png"
    },
    "items": [
        {
            "id": "865767227416612864",
            "url": "https://micro.blog/jasonfried/status/865767227416612864",
            "content_text": "JSON Feed? I know that guy.",
            "date_published": "2017-05-19T20:12:00-00:00"
        }
    ]
}

Aside: The Wonders of RSS - What’s RSS!?

Triva Quiz - Q: What’s RSS?

RDF = Resource Description Framework

Trivia Quiz - Find the Content - Q: What’s your favorite way to add content in hypertext to RSS 2.0?

Bonus: Is your content in plain text, in html, in xhtml, in html escaped? Is your content a summary? or full text?

The State of Web Feed Formats in 2017 - XML, JSON, YAML, HTML, TXT

Let’s celebrate diversity! Live and let live! Web feed formats today in 2017 include:

And some more.

What’s the feedparser library?

One library to rule them all! All your base are blong to feedparser.

In the end all formats are just 0 and 1s or:

=> Let your computer handle the reading of web feeds ;-).

Read Feed Example

require 'open-uri'
require 'feedparser'

txt = open( 'http://openfootball.github.io/feed.xml' ).read

feed = FeedParser::Parser.parse( txt )

puts feed.title
# => "football.db - Open Football Data"

puts feed.url
# => "http://openfootball.github.io/"

puts feed.items[0].title
# => "football.db - League Quick Starter Sample - Mauritius Premier League - Create Your Own Repo/League(s) from Scratch"

puts feed.items[0].url
# => "http://openfootball.github.io/2015/08/30/league-quick-starter.html"

puts feed.items[0].updated
# => Sun, 30 Aug 2015 00:00:00 +0000

puts feed.items[0].content
# => "Added a new quick starter sample using the Mauritius Premier League to get you started..."

...

or reading a feed in the new JSON Feed format in - surprise, surprise - JSON; note: nothing changes :-)

txt = open( 'http://openfootball.github.io/feed.json' ).read

feed = FeedParser::Parser.parse( txt )

puts feed.title
# => "football.db - Open Football Data"

puts feed.url
# => "http://openfootball.github.io/"

puts feed.items[0].title
# => "football.db - League Quick Starter Sample - Mauritius Premier League - Create Your Own Repo/League(s) from Scratch"

puts feed.items[0].url
# => "http://openfootball.github.io/2015/08/30/league-quick-starter.html"

puts feed.items[0].updated
# => Sun, 30 Aug 2015 00:00:00 +0000

puts feed.items[0].content_text
# => "Added a new quick starter sample using the Mauritius Premier League to get you started..."

...

Microformats

Microformats let you mark up feeds and posts in HTML with h-entry, h-feed, and friends.

Note: Microformats support in feedparser is optional. Install and require the the microformats library to read feeds in HTML with Microformats.


require 'microformats'

text =<<HTML
<article class="h-entry">
  <h1 class="p-name">Microformats are amazing</h1>
  <p>Published by
    <a class="p-author h-card" href="http://example.com">W. Developer</a>
     on <time class="dt-published" datetime="2013-06-13 12:00:00">13<sup>th</sup> June 2013</time>

  <p class="p-summary">In which I extoll the virtues of using microformats.</p>

  <div class="e-content">
    <p>Blah blah blah</p>
  </div>
</article>
HTML

feed = FeedParser::Parser.parse( text )

puts feed.format
# => "html"
puts feed.items.size
# =>  1
puts feed.items[0].authors.size
# => 1
puts feed.items[0].content_html  
# => "<p>Blah blah blah</p>"
puts feed.items[0].content_text  
# => "Blah blah blah"
puts feed.items[0].title
# => "Microformats are amazing"
puts feed.items[0].summary
# => "In which I extoll the virtues of using microformats."
puts feed.items[0].published
# => 2013-06-13 12:00:00
puts feed.items[0].authors[0].name
# => "W. Developer"
...

Samples

Feed Reader

Planet Feed Reader in 20 Lines of Ruby

planet.rb:

require 'open-uri'
require 'feedparser'
require 'erb'

# step 1) read a list of web feeds

FEED_URLS = [
  'http://vienna-rb.at/atom.xml',
  'http://weblog.rubyonrails.org/feed/atom.xml',
  'http://www.ruby-lang.org/en/feeds/news.rss',
  'http://openfootball.github.io/feed.json',
]

items = []

FEED_URLS.each do |url|
  feed = FeedParser::Parser.parse( open( url ).read )
  items += feed.items
end

# step 2) mix up all postings in a new page

FEED_ITEM_TEMPLATE = <<EOS
<% items.each do |item| %>
  <div class="item">
    <h2><a href="<%= item.url %>"><%= item.title %></a></h2>
    <div><%= item.content %></div>
  </div>
<% end %>
EOS

puts ERB.new( FEED_ITEM_TEMPLATE ).result

Run the script:

$ ruby ./planet.rb      

Prints:

<div class="item">
  <h2><a href="http://vienna-rb.at/blog/2017/11/06/picks/">Picks / what the vienna.rb team thinks is worth sharing this week</a></h2>
  <div>
   <h3>6/11 Picks!!</h3>
   <p>In a series on this website we'll entertain YOU with our picks...
 ...

Real World Usage

See the Planet Pluto feed reader family:

Built with Ruby (running Jekyll) on 2021-05-13 17:08:11 +0000 in 0.371 seconds.
Hosted on GitHub Pages. </> Source on GitHub. (0) Dedicated to the public domain.