«
dnews
Howto MagpieRSS: Implementing Web Developer News
Intro
By popular demand, I have authored this howto and tutorial to help loadaverageZero visitors understand how the dnews application was built using the MagpieRSS feed parsing library, PHP and MySQL. The instructions for installing MagpieRSS are clear and simple, so I will not reiterate them here.
Note: There is a lot of code and data to look at here, and I’m too busy to write an article that includes every gory detail (such as installing the Magpie library). I write pretty clean code, with adequate commenting. So if you are comfortable with PHP and MySQL, then you should have no problem grasping the logic from reading the source code. If not, there are a ton of resources available using the drx application to help get you started.
Data
First, the data. Luckily, I have already built another application for browsing both the schema (table structure) and contents (data) of my databases. And it is called dbrowse. What a shock!
There are only two simple MySQL tables that make up the backend of the dnews application: news, from which the ”channel selector” menu is built, and feed, which holds some details about each RSS channel. The two are bound together by the foreign key nid.
Note: You may get side tracked in dbrowse. If you need help understanding how that application works, I recommend you rewind all the way back to the beginning and start from there.
Code
All of my applications are derived from a base PHP class called application. Another shocker! In order to understand what is going on long before this kicks in, I recommend reading as much of the PHP Labs series as you can stomach. This builds everything that surrounds the content area which is the container that this document is sitting in. After that, anything goes.
So, to build dnews, I first create a node, or instance of the application class.
Example initialization code:
Node
The feed() function is only called after the news channel menu is rendered and a user
selects an entry. At which point the the script calls itself with a PATH_INFO argument
indicating which feed was selected. The feed() function returns an object which contains
the site label, the image basename, and the feed’s title and URI.
A #news fragment identifier is also used in the URI to move the focus of
the page down past the introduction, so the user can begin scanning headlines, or select
another channel from the menu.
Armed with this information, dnews can now proceed to render the feed’s banner, link and headlines. To understand how all of this is assembled, take a moment to read the dnews source code. For the insanely curious, you can also view the PHP source code to this document.
Below is the source code to the API functions that are used by the application.
To render the channel selector menu, dnews calls the feeds() function, which returns the
menu pre-built as an XHTML ordered list. All presentation details are are
separated from the markup by employing several CSS stylesheets. In addition
to the base classes in root.css, dnews also uses classes shared by drx and several
other applications and documents on loadaverageZero. And they include resource.css, hreview.css
and lists.css. Further information on the CSS stylesheets used on this site can be found
at the Sitemap. Good luck with all of that.
Fetch
Okay, we’re on the home stretch now. All that needs to be done is to fetch the feed, and render each item’s headline, link, description (aka blurb or teaser) and URI.
To do this, dnews calls the fetch() wrapper function with the feed object. fetch() returns
the banner image as a link to the source site, and an array of <item> objects
which hold the details of each headline as described above. These are by default in reverse
chronological order, in other words newest first, as is the convention with RSS news
feeds. This is where Magpie really kicks in, since the items and the $rss->items returned
by the fetch_rss() function are one and the same. With some slight modifications.
Cache
If you don’t appreciate the importance of caching the requests made to the services provided by your news sources, then you probably don’t produce your own feeds—or worse, don’t pay attention to your log files. Even after many hours of hard work on my part, and extended contacts with the aggregator folks, the requests made to this host by said providers outmatch any other category of requests, including search engine indexing bots like those from Google and Yahoo. Certain places will even ban you, such as Slashdot, if you keep hammering them with requests. Not only does this keep you in good graces with your sources, it also dramatically improves the response time for your users by storing local (serialized) copies of the feed data until they go stale. Hell, the bandwidth savings alone is worth the effort.
Luckily, MagpieRSS makes this very simple. All that is necessary to trigger the caching
is to define a constant MAGPIE_CACHE_DIR, which is just that, a path to your cache directory.
See the code in the first snip box above for an example. Okay, if you’re thinking
this isn’t strictly necessary, you’re right. Magpie will default
to using the working directory to store cache files if you don’t specify one in
your program. Trust me, you do not want to do this.
Items
Almost done, I promise. Now that dnews has an array of <item> objects, all
it needs to do is loop through this list and display each one. To do this, dnews calls the item()
function, which just returns the data formatted into a XHTML ordered list element. Since each
feed source is a little different, this function tries its best to present a
unified format for dates, and it makes a weak effort at cleaning up markup and
character entities that may be present in the description. Occasionally, some feeds
(such as Technorati) will not function correctly do to invalid (not well-formed) markup.
Not that this is necessarily their fault, their sources are countless blogs and you have
no control over the quality of these sources. So, if you are using XHTML for markup
like I am, it is very important to pick your feeds carefully, and test them
thoroughly.
If there is an error, dnews catches it, does its best to display what went wrong, and even provides a link to an RSS Validator so you can determine why it failed. Okay, you got me, so I can determine what went wrong. You probably could care less and will move on to another feed or another site.
That’s it! Sorry, I do not have a commenting system yet. if you would like to provide feedback, you can send me an email via the Contact page and I will be happy to respond, and perhaps even include comments here, or at least add clarifications and of course fix any errors or omissions.
Notes
Fetching secure feeds.
If you want to fetch feeds from SSL (secure) servers, use the https protocol (aka
scheme) and make sure you have cURL installed. I found that in order to fetch some feeds, from
mozilla.org for instance, I needed to compile and install cURL, and edit the Snoopy package
used by MagpieRSS. This was necessary because the default location for cURL is /usr/bin/curl,
and I put mine in /usr/local/bin. See the Snoopy.class.inc file in the MagpieRSS extlib
install directory for details.
Open source at work.
I personally like to see who is accessing my resources. Since MagpieRSS is a popular feed parser
and I produce several feeds myself, I see a lot of requests for mine using this software. I felt
it was important to identify myself when doing the same for other feeds. Kellan provides a hook
to set the user-agent string in the rss_fetch.inc file by defining the constant
MAGPIE_USER_AGENT before you include the file. Because I only wanted to modify the
URL portion of the agent string (much like you will see when friendly robots are indexing
your site), and leave the rest alone (the agent identifying string and version number), I created
a new constant MAGPIE_AGENT_URL and modified rss_fetch.inc accordingly.
Note that you can still override the entire user-agent string by defining the constant as described
above before including the library. Below is a context diff you can apply to the rss_fetch.inc
using the patch program if you want this same functionality. Or you can download the file as rss_fetch.diff.
To apply the patch, simply upload it into the same directory as the original and issue a:
After making these modifications, my new user-agent string when requesting feeds via dnews and MagpieRSS is:
Alternative approach.
Using MagpieRSS in this manner is PHP version agnostic (although I recommend using at least
4.3.x). If you’re interested in another approach that leverages PHP5, its built-in
SimpleXML parsing library, the DOM XML extension and the APC from PECL, then have a look at
Rasmus’ simple_rss.php. You’ll find more great code like this at the Yahoo! Developer Network.
Good luck with your own RSS News feed pages using MagpieRSS, PHP and MySQL (should you choose to implement something similar to dnews).
—Douglas Clifton
atom cache magpie mysql parsing php rss xml
















































































