I decided to play with Yahoo Pipes to see if it could ease my problems with online news aggravation aggregation.
I've been playing with RSS for the News Machine project for a while now - so I had a few feeds from FeedBurner that i've been using.
To recap for anyone who hasn't read the "News Machine" site - the basic idea is to use Google News to filter the news sources on a set of keywords.
Each set of keywords is thematically related - for example the "Current Obsessions" feed solely filters on the following keywords: cleanfeed, hackers, privacy internet, virus, censorship internet, microsoft security, copyright, drm, riaa, mpaa, piracy internet, piracy sea, malware, blacklists, smartfilter, dmca, spyware, exploit, security internet, and censorware.
Each of these feeds gets a name - e.g. "Current Obsessions" - and for shorthand I've dubbed them "Keyword Cluster Units" (KCU).
News Machine currently uses 10 KCUs and this works very well.
If I drop the feed into e.g. RSS Owl I get a nice listing for each RSS feed - each RSS feed has the "KCU" name as the name of the feed, making it easy to figure out where things have come from - and underneath the "Category" column the keyword shows up as a category.
it makes "the discovery of the new" very easy, my only real problem is that Google News requires an account per every 20 keywords and maintenance is hell if I want to add, delete or (*shudder*) re-cluster any of the keywords.
I've also had problems with online aggravators aggregators, such as Rojo which have a tendency to strip out all the nice keyword information which appears as a "category" ...
The basic RSS dished up from FeedBurner has the following structure for an <item> goes like this:
<item>
<title>
<link>
<guid isPermaLink="false">
<category>
<pubDate>
<description>
.. and of course the text within the <category> tag <\category> is the keyword being searched for in the Google News feed ...
With this in mind I started off making what could possibly be the simplest of Yahoo Pipes feeds - connecting the
the feed for Current Obsessions Keyword Cluster Unit directly into the "pipe output" - like this:
It is probably the simplest pipe there is going - but it is not a pipe at all - once again the all important <category> information field is stripped out - so when I feed the RSS into RSS Owl - I can no longer tell which keyword triggered which story.
The RSS given out by Yahoo pipes filter for an <item> looks like this:
<item>
<title>
<link>
<description>
<pubDate>
Now I might be just being a little picky here - but from my UNIX programming days I remember a "pipe" was just that - it passed everything through the pipe from input to output without changing anything.
What we have here is a "filter" - a pipe that passes some things through unchanged but changes and/or removes others - such as the <category> information ...
I might have missed something - I wondered if the "pipe output" module could be changed to allow stuff through - but it appears that this module is not configurable.
Anyway my conclusion from all this was that "Yahoo Pipes" should be renamed "Yahoo RSS Filters" - if anyone out there has an idea how to get round this limitation - I'd love to here from them.
On the upside, Yahoo Pipes is a nifty network toy for playing around with RSS feeds and aggregating them - even if it doesn't do what I want at the moment - and sometime real soon now I'll be coming back to this topic as I construct a whole KCU inside Yahoo Pipes.
Tags: rss, yahoo pipes, feedburner, rss owl, rojo, news machine, google news