To answer the question of whether a legitimate news or website aggregator can be successful on today’s Internet, you really have to consider a few other questions:
- Can you find a blogging platform that provides you with the aggregation features you need (or design your own services from scratch)?
- Can you get access to content that you are legitimately allowed to post and is worth posting (i.e., you have written permission)?
- What will Google think?
This article addresses these questions within the context of my own experiments with converting Texxors to a news aggregation site.
Why set up a news or website aggregator?
On one hand the goal of switching to a news aggregator was to improve the services Texxors offers. But a secondary goal was to determine whether a legitimate website aggregator can be successful on today’s Internet.
At a high level, Texxors has always been a technology news aggregator. For most of it’s history that aggregation has been done via original content. However, given the success of TechCrunch, I though it might be time to make a switch to a more passive format.
Setting up the aggregator
The first step was getting the functionality to parse, aggregate and post content implemented. I wanted to know exactly what format I would be using before I began contacting collaborators. I also wanted to be able to point them to the site so they could see how their content would be displayed. Initially, I had big plans for sorting content based on popularity, freshness, and ratings. Ultimately, these features were just too difficult to implement simultaneously within the WordPress structure. As a result, I focused primarily on getting the basic functions right – parsing and presenting the content.
The primary plugin I used for this was FeedWordPress. FeedWordPress is a very comprehensive program that was relatively easy to set up and provides a lot of options for organizing feeds. I also wanted a visually dynamic and appealing front page layout so I installed a plugin to pull images from the articles that were parsed. Other than that, no other plugins were installed to meet needs specific to the aggregation task.
Getting access to content
I wanted to obtain content legitimately, by which I mean with permission from the original source. From my perspective, legitimate meant I needed to find contributors and get their written approval to repost portions of their content. This would cover me legally, should a DMCA complaint ever be submitted and it would also allow me to sleep at night. This was easier than I expected; about 50% of those I contacted were willing to share content for a backlink and attribution. I was expecting closer to 10%.
What did Google think?
With the functionality implemented and content ready to parse, I was ready to go, and now it was time for probably the most important question – How was Google going to respond to these changes? I tested two different formats for presenting the articles over the couple of months of this experiment. In the first, the front page of the site listed a brief excerpt from the aggregated content with the title linking back to the full article on the original source. This worked well enough, but caused problems with the sitemap, because the permalinks pointed to an different site. This was the ideal scenario in terms of the user experience and source attribution, but not for making Google happy. In the image below the blue vertical line indicates the transition to the aggregator layout. As you can see the site traffic, which is almost completely via search, began slowly dropping. After a few days I surmised that the sitemap errors were decreasing Google’s opinion of the site too much to make that approach workable. Although it was feasible to remove the aggregated content from the sitemap, a situation that didn’t keep Google informed of the new content didn’t seem workable either.
With the second layout, the front page listed excerpts that linked to the original full article on Texxors, and above each article there was a box providing links and information for the original source. Given how much Google dislikes duplicate content, I was pretty sure this would result in decrease in a traffic, however, I was surprised how quick and intense this decrease was. Additionally, I was optimistic that Google might be smart enough to see the value, from a users perspective, in a site that housed a lot of relevant traffic even if it was duplicated elsewhere. The counter argument, of course, it that such content is usually associated with at least unethical and probably illegal behavior, and Google had no way of knowing I had obtained the content legally. The red line indicates when the title links were pointed to the Texxors content and the sitemaps were refreshed. After a two day boost, Google quickly identified the duplication and traffic decreased precipitously. Two days ago, ALL aggregated content was removed, returning the site to its original content format. It will be interesting to see how the search traffic responds over the next few days and weeks.
Did it work?
My answer to the question “Can a legitimate website aggregator be successful?” is a resounding “no”. Formats built around user communities, such as Reddit, are obviously successful, but passive approaches like the ones tried above, are too dependent on search results. Since most search engines hate duplicate content, it’s almost impossible to build enough readership to justify the effort.
Despite the decrease in traffic, the administration requirements of the site increased. In the two short months this experiment ran, I received two requests to remove published content. One of these requests was from the legitimate owner of work that had been sourced via a contributing site. It is unclear whether the contributing site had permission to “re-license” the work, but it’s easy to see the issues that come up with this format. In short, the results seem to be pretty conclusive that a website aggregator approach in the post-panda and penguin web is likely to be more trouble than it is worth.