Duplicate Content: The Impact of Canonical URLs

Being a web developer I am trying to become savvier when it comes to factoring additional SEO practices, which is generally considered (in my view) compulsory. 

Ever since Google updated its Search Console (formally known as Webmaster Tools), it has opened my eyes to how my site is performing in greater detail, especially the pages Google deems as links not worthy for indexing. I started becoming more aware of this last August, when I wrote a post about attempting to reduce the number "Crawled - Currently not indexed" pages of my site. Through trial and error managed to find a way to reduce the excluded number of page links.

The area I have now become fixated on is the sheer number of pages being classed as "Duplicate without user-selected canonical". Google describes these pages as:

This page has duplicates, none of which is marked canonical. We think this page is not the canonical one. You should explicitly mark the canonical for this page. Inspecting this URL should show the Google-selected canonical URL.

In simplistic terms, Google has detected there are pages that can be accessed by different URL's with either same or similar content. In my case, this is the result of many years of unintentional neglect whilst migrating my site through different platforms and URL structures during the infancy of my online presence. 

Google Search Console has marked around 240 links as duplicates due to the following two reasons:

  1. Pages can be accessed with or without a ".aspx" extension.
  2. Paginated content.

I was surprised to see paginated content was classed as duplicate content, as I was always under the impression that this would never be the case. After all, the listed content is different and I have ensured that the page titles are different for when content is filtered by either category or tag. However, if a site consists of duplicate or similar content, it is considered a negative in the eyes of a search engine. 

Two weeks ago I added canonical tagging across my site, as I was intrigued to see if there would be any considerable change towards how Google crawls my site. Would it make my site easier to crawl and aid Google in understanding the page structure?

Surprising Outcome

I think I was quite naive about how my Search Console Coverage statistics would shift post cononicalisation. I was just expecting the number of pages classed as "Duplicate without user-selected canonical" to decrease, which was the case. I wasn't expecting anything more. On further investigation, it was interesting to see an overall positive change across all other coverage areas.

Here's the full breakdown:

  • Duplicate without user-selected canonical: Reduced by 10 pages
  • Crawled - Currently not indexed: Reduced by 65 pages
  • Crawl anomaly: Reduced by 20 pages
  • Valid : Increased by 60 pages

The change in figures may not look that impressive, but we have to remember this report is based only on two weeks after implementing canonical tags. All positives so far and I'm expecting to see further improvements over the coming weeks. 

Conclusion

Canonical markup can often be overlooked, both in its implementation and importance when it comes to SEO. After all, I still see sites that don't use them as the emphasis is placed on other areas that require more effort to ensure it meets Google's search criteria, such as building for mobile, structured data and performance. So it's understandable why canonical tags could be missed.

If you are in a similar position to me, where you are adding canonical markup to an existing site, it's really important to spend the time to set the original source page URL correctly the first time as the incorrect implementation can lead to issues.

Even though my Search Console stats have improved, the jury's still out to whether this translates to better site visibility across search engines. But anything that helps search engines and visitors understand your content source can only be beneficial.

Reducing The Number of 'Crawled - Currently not indexed' Pages

Every few weeks, I check over the health of my site through Google Search Console (aka Webmaster Tools) and Analytics to see how Google is indexing my site and look into potential issues that could affect the click-through rate.

Over the years the content of my site has grown steadily and as it stands it consists of 250 published blog posts. When you take into consideration other potential pages Google indexes - consisting of filter URL's based on grouping posts by tag or category, the number of links that my site consists is increased considerably. It's to the discretion of Google's search algorithm to whether it includes these links for indexing.

Last month, I decided to scrutinise the Search Console Index Coverage report in great detail just to see if there are any improvements I can make to alleviate some minor issues. What I wasn't expecting to see is the large volume of links marked as "Crawled - Currently not indexed".

Crawled Currently Not Indexed - 225 Pages

Wow! 225 affected pages! What does "Crawled - Currently not indexed" mean? According to Google:

The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

Pretty self-explanatory but not much guidance on the process on how to lessen the number of links that aren't indexed. From my experience, the best place to start is to look at the list of links that are being excluded and to form a judgement based on the page content of these links. Unfortunately, there isn't an exact science. It's a process of trial and error.

Let's take a look at the links from my own 225 excluded pages:

Crawled Currently Not Indexed - Non Indexed Links

On initial look, I could see that the majority of the URL's consisted of links where users can filter posts by either category or tag. I could see nothing content-wise when inspecting these pages for a conclusive reason for index exclusion. However, what I did notice is that these links were automatically found by Google when the site gets spidered. The sitemap I submitted in the Search Console only list out blog posts and content pages.

This led me to believe a possible solution would be to create a separate sitemap that consisted purely of links for these categories and tags. I called it metasitemap.xml. Whenever I added a post, the sitemap's "lastmod" date would get updated, just like the pages listed in the default sitemap.

I created and submitted this new sitemap around mid-July and it wasn't until four days ago the improvement was reported from within the Search Console. The number of non-indexed pages was reduced to 58. That's a 74% reduction!

Crawled Currently Not Indexed - 58 Pages

Conclusion

As I stated above, there isn't an exact science for reducing the number of non-indexed pages as every site is different. Supplementing my site with an additional sitemap just happened to alleviate my issue. But that is not to say copying this approach won't help you. Just ensure you look into the list of excluded links for any patterns.

I still have some work to do and the next thing on my list is to implement canonical tags in all my pages since I have become aware I have duplicate content on different URL's - remnants to when I moved blogging platform.

If anyone has any other suggestions or solutions that worked for them, please leave a comment.

Google Seems To Have An Issue With My Server Response Time...

...and I think I know why...

Out of all the issues Google PageSpeed Insights seems to have when analysing my site, there are two specific things crop up that annoy me:

  1. ​Reduce server response time
  2. ​Leverage browser caching (due to Google Analytics JavaScript file)

The Google Analytics issue is something I will have to live with since (as far as I'm aware) there's nothing I can do. It would be nice if Google wouldn't penalise you for using a product they have developed. However, the "Reduce server response time" was something that perplexed me. My site is relatively simple and not doing anything over-the-top.

Due to the nature of my hosting setup (shared), I didn't have all the capabilities to make my website respond any better. The only way I could think of improving server response time was to move my hosting to another region and purchasing a VPS to get more control.

Now, I think I have resolved the server response time issue...It has something to do with a Web Statistics service called AWStats that was enabled by default as an "addon" service on my hosting. Once disabled through my Plesk Management Portal, Google PageSpeed didn't seem to have any issue with my server response.

I cannot 100% confirm if by disabling the Web Statistics service is a permanent solution and will work for everyone else. But there might be some truth behind this. Web Statistic services like AWStats store all analytical data in log files directly on the server, so this must have some affect on the time a request is made. I could be talking complete nonesense.

If you have experienced the same problem as me, check your own hosting setup and it's "addon" services. You never know, it may give you that extra Google PageSpeed point. :-)

Official Google Nexus 7 (2013) Case Review

Nexus 7 Case Google LabelEver since I purchased my Nexus 7 last year. I've been trying to find a nice case for it. Failing, I settled for a cheap and cheerful folio case from eBay, which (still to this day) has served me well. But I was dying to have a case that looked different and oozed some unique design elements.

When I noticed Google selling their own collection of Nexus 5 and 7 cases, I purchased one straight away. The Grey/Blue colour scheme caught my eye. It seemed that Google's offering ticked all the boxes. What could they possibly do wrong? It's an official product designed and manufactured by the very people who made the Nexus 7. If anyone could make a case without fault, it would have to be Google....right?

Sadly no.

For starters, the case lacks magnet technology allowing automatic power on/off feature when opening the case. Secondly, there was no type of latch that would keep the case closed and found that the case opened whilst it was moving around in my backpack. Maybe I just had high expectations since my current offering already had these features.

Yes. These might be small things. But I found myself getting increasingly agitated (maybe an overstatement!) whilst using my Nexus 7, especially for a case that cost four times the price of the case I previously used.

Nexus 7 Case - Outside

Nexus 7 Case - Inside

It wasn't all doom and gloom. There were things I did like about Google's case offering. I loved that the case looked and felt very different to what is available on the market currently. Outside was covered with hard wearing fabric with an inner lining of suede. Definitely high quality stuff! 

Unfortunately, Google just seemed to miss the mark by not including a few key features, mainly being the magnetic sensor.

Back to Google it goes.

Update - 11/02/2014

I was expecting to pay for all postage costs to return the case. But after contacting Google Support regarding the reasons to why I wanted to return the item, they sent me a prepaid shipping package and were very helpful throughout the return process. Quick and easy!

Integrating Into Google Plus - Is it worth it?

Google PlusWhen I first heard Google were introducing their own social-networking platform, I was intrigued to say the least on what they could offer compared to the other social sites I use: Facebook and Twitter.

As I stated in one of my earlier posts, I am more of a tweeter since I can share my blog posts easily along with my random ramblings. I think Facebook will have a problem competing alongside Twitter or Google+. Facebook is seen to be more of a personal social network rather than a open professional network and that’s its biggest downfall. It’s quite difficult to cross the boundaries between posting professional/business content alongside personal posts. Thankfully, this is something Google Plus does quite well through its new “circle’s” feature allowing complete control on who see’s what.

I jumped at the chance of using Google Plus when I was offered an invite during the initial release. I was very impressed. Simple and straight-forward. My posts looked really beautiful within its minimalist user interface. Well what else would you expect from Google? Don’t get me started on the eye-sore that is Facebook’s new interface – I’ll leave that for another blog post.

For me, Google Plus is like an extension of Twitter with some added benefits such as:

  • Ability to make posts private/public.
  • Follow people by adding them to a circle.
  • No character limit on the length of posts.
  • Nice interoperability with the search-daddy that is Google.

For a new social networking site, I get a higher click-through-rate to my blog than I ever got compared to tweeting on Twitter. In the process, I managed to get more people adding me to their circle. So take any remarks regarding the inactivity of Google+ with a pinch of salt. I don’t buy it. Google encompasses a big community that you feel part of.

I briefly touched upon the interoperability factor with Google search. People underestimate the power of having the backing of Google search. For example, what if you wrote an article and linked it to your Google+ profile? This information will be displayed as author information within search results to help users discover great content and learn more about the person who wrote the article.

One thing that did surprise me is the fact that at this point in time there’s no advertisement. Unlike its predecessors (yes I that’s how confident I am in Google Plus), you always manage to find advertisement in some form or another. I can view my profile page without constantly having an advert rubbing my single relationship status to my face – something Facebook does far too often.

I trust Google more with my data over Facebook any day. I know Google can’t exactly be trusted either but unlike Facebook they’re not always in the the news on a monthly basis regarding some type of data scandal. At time of writing, it is being reported Facebook is now facing a privacy suit over internet tracking.

In conclusion, integrating ones self into Google Plus is definitely worth it. I only recently started to make more of an effort on Google+ and I find myself posting my content here over other social-networking sites. The key to making a good start is to make some of your posts public to show others your interests and even connect to these type of people either by adding them to a circle or joining a hangout.

On a final note, if you have a Google Plus account and like what I post then why not circle me. :-)

Backing up Google Account Data

In light of what has happened recently with some 150,000 Google Account holders loosing their information due to a mishap at Google HQ over the weekend really reinforces the fact that our data is not safe…even in the “cloud”.

At the end of the day our information is stored on hardware that can fail. I think that this whole “cloud computing” malarkey has got all lured into a false sense of security where we think we don’t need to take measures to ensure our data backed up on a regular basis. I have to admit, I too have become a bit tardy when it comes to backing up my online data. If a large company like Google can get it wrong, what hope is there for other companies offering the same thing?

I practically live on the “cloud” in terms of what Google has to offer. I use their email, calendar, document and notebook applications. Even their mobile phone OS: Android! Luckily, there are steps we can take to ensure our data is backed up on your own terms:

Google CalendarGoogle Calendar

Google Calendar is the one application I use the most. If I lost all my data, I would quite annoyed to say the least (and be very disorganised).

You can backup all your calendar entries by opening your calendar settings, click on Calendars and select “Export Calendars”. A zip file will be created containing your calendars in a .ical format.
 
GmailLogoGmail

This a simple one. Use an desktop email client such as Thunderbird (or any other client you prefer) to download all your emails directly to you computer through POP access.
 
GoogleDocsLogoDocs

If you only store a handful of documents in your Google Account, you could just download them one-by-one. Understandably, if you have a long list of documents a more automated approach is required.

Lifehacker.com shows a really great script you can use to that allows you to download documents in whatever format you require. Take a look here.
 

Hooray! Our data is saved!

The Google Chrome Comic Strip

I have to say that I am quite impressed with the way Google markets its own applications and services. Who would ever had thought of using a comic string to introduce the key workings of a specific application? Its a lengthy comic to say the least, consisting of 38 “fun filled” pages, which actually makes learning about the Chrome browser an interesting read.

Google Chrome Comic 1Google Chrome Comic 2

But this does ask the question on why Google is releasing their own browser? I thought they had extended their search deal with Mozilla Firefox in return for setting Google as the default search engine. I guess this may cause an awkward relationship between the two in the future. But I suppose any attack against the dreaded Microsoft Internet Explorer browser can only be a benefit!

I have to say that the guy wearing the glasses on the left bares a striking (less cool) resemblance to me. :-)

You can view the full comic strip here.