Reducing The Number of 'Crawled - Currently not indexed' Pages

Every few weeks, I check over the health of my site through Google Search Console (aka Webmaster Tools) and Analytics to see how Google is indexing my site and look into potential issues that could affect the click-through rate.

Over the years the content of my site has grown steadily and as it stands it consists of 250 published blog posts. When you take into consideration other potential pages Google indexes - consisting of filter URL's based on grouping posts by tag or category, the number of links that my site consists is increased considerably. It's to the discretion of Google's search algorithm to whether it includes these links for indexing.

Last month, I decided to scrutinise the Search Console Index Coverage report in great detail just to see if there are any improvements I can make to alleviate some minor issues. What I wasn't expecting to see is the large volume of links marked as "Crawled - Currently not indexed".

Crawled Currently Not Indexed - 225 Pages

Wow! 225 affected pages! What does "Crawled - Currently not indexed" mean? According to Google:

The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

Pretty self-explanatory but not much guidance on the process on how to lessen the number of links that aren't indexed. From my experience, the best place to start is to look at the list of links that are being excluded and to form a judgement based on the page content of these links. Unfortunately, there isn't an exact science. It's a process of trial and error.

Let's take a look at the links from my own 225 excluded pages:

Crawled Currently Not Indexed - Non Indexed Links

On initial look, I could see that the majority of the URL's consisted of links where users can filter posts by either category or tag. I could see nothing content-wise when inspecting these pages for a conclusive reason for index exclusion. However, what I did notice is that these links were automatically found by Google when the site gets spidered. The sitemap I submitted in the Search Console only list out blog posts and content pages.

This led me to believe a possible solution would be to create a separate sitemap that consisted purely of links for these categories and tags. I called it metasitemap.xml. Whenever I added a post, the sitemap's "lastmod" date would get updated, just like the pages listed in the default sitemap.

I created and submitted this new sitemap around mid-July and it wasn't until four days ago the improvement was reported from within the Search Console. The number of non-indexed pages was reduced to 58. That's a 74% reduction!

Crawled Currently Not Indexed - 58 Pages

Conclusion

As I stated above, there isn't an exact science for reducing the number of non-indexed pages as every site is different. Supplementing my site with an additional sitemap just happened to alleviate my issue. But that is not to say copying this approach won't help you. Just ensure you look into the list of excluded links for any patterns.

I still have some work to do and the next thing on my list is to implement canonical tags in all my pages since I have become aware I have duplicate content on different URL's - remnants to when I moved blogging platform.

If anyone has any other suggestions or solutions that worked for them, please leave a comment.

My First Butter CMS JavaScript Implementation

For one of my side projects, I was asked to use Butter CMS to allow for basic blog integration using JavaScript. I have never heard or used Butter CMS before and was intrigued to know more about the platform.

Butter CMS is another headless CMS variant that allows a developer to utilise API endpoints to push content to an application via an arrange of approaches. So nothing new here. Just like any headless CMS, the proof is in the pudding when it comes to the following factors:

  • Quality of features
  • Ease of integration
  • Price points
  • Quality of documentation

I haven't had a chance to properly look into what Butter CMS fully has to offer, but from what I have seen from working on the requirements for this side project I was pleasently surprised. Found it really easy to get setup with minimal amount of fuss! For this project I used Butter CMS's Blog Engine package, which does exactly what it says on the tin. All the fields you need for writing blog posts are already provided.

JavaScript Code

My JavaScipt implementation is pretty basic and provides the following functionality:

  • Outputs a list of posts consisting of title, date and summary text
  • Pagination
  • Output a single blog post

All key functionality is derived from the "ButterCMS" JavaScript file:

/*****************************************************/
/*                    Butter CMS                                 */
/*****************************************************/
var ButterCMS =
{
    ButterCmsObj: null,

    "Init": function () {
        // Initiate Butter CMS.
        this.ButterCmsObj = new ButterCmsBlogData();
        this.ButterCmsObj.Init();
    },
    "GetBlogPosts": function () {
        BEButterCMS.ButterCmsObj.GetBlogPosts(1);
    },
    "GetSinglePost": function (slug) {
        BEButterCMS.ButterCmsObj.GetSinglePost(slug);
    }
};

/*****************************************************/
/*                Butter CMS Data                         */
/*****************************************************/
function ButterCmsBlogData() {
    var apiKey = "<Enter API Key>",
        baseUrl = "/",
        butterInstance = null,
        $blogListingContainer = $("#posts"),
        $blogPostContainer = $("#post-individual"),
        pageSize = 10;

    // Initialise of the ButterCMSData object get the data.
    this.Init = function () {
        getCMSInstance();
    };

    // Returns a list of blog posts.
    this.GetBlogPosts = function (pageNo) {
        // The blog listing container needs to be cleared before any new markup is pushed.
        // For example when the next page of data is requested.
        $blogListingContainer.empty();

        // Request blog posts.
        butterInstance.post.list({ page: pageNo, page_size: pageSize }).then(function (resp) {
            var body = resp.data,
                blogPostData = {
                    posts: body.data,
                    next_page: body.meta.next_page,
                    previous_page: body.meta.previous_page
                };

            for (var i = 0; i < blogPostData.posts.length; i++) {
                $blogListingContainer.append(blogPostListItem(blogPostData.posts[i]));
            }

            //----------BEGIN: Pagination--------------//

            $blogListingContainer.append("<div>");

            if (blogPostData.previous_page) {
                $blogListingContainer.append("<a class=\"page-nav\" href=\"#\" data-pageno=" + blogPostData.previous_page + " href=\"\">Previous Page</a>");
            }

            if (blogPostData.next_page) {
                $blogListingContainer.append("<a class=\"page-nav\" href=\"#\" data-pageno=" + blogPostData.next_page + " href=\"\">Next Page</a>");
            }

            $blogListingContainer.append("</div>");

            paginationOnClick();

            //----------END: Pagination--------------//
        });
    };

    // Retrieves a single blog post based on the current URL of the page if a slug has not been provided.
    this.GetSinglePost = function (slug) {
        var currentPath = location.pathname,
            blogSlug = slug === null ? currentPath.match(/([^\/]*)\/*$/)[1] : slug;

        butterInstance.post.retrieve(blogSlug).then(function (resp) {
            var post = resp.data.data;

            $blogPostContainer.append(blogPost(post));
        });
    };

    // Renders the HTML markup and fields for a single post.
    function blogPost(post) {
        var html = "";

        html = "<article>";

        html += "<h1>" + post.title + "</h1>";
        html += "<div>" + blogPostDateFormat(post.created) + "</div>";
        html += "<div>" + post.body + "</div>";
        
        html += "</article>";

        return html;
    }

    // Renders the HTML markup and fields when listing out blog posts.
    function blogPostListItem(post) {
        var html = "";

        html = "<h2><a href=" + baseUrl + post.url + ">" + post.title + "</a></h2>";
        html += "<div>" + blogPostDateFormat(post.created) + "</div>";
        html += "<p>" + post.summary + "</p>";

        if (post.featured_image) {
            html += "<img src=" + post.featured_image + " />";
        }

        return html;
    }

    // Set click event for previous/next pagination buttons and reload the current data.
    function paginationOnClick() {
        $(".page-nav").on("click", function (e) {
            e.preventDefault();
            var pageNo = $(this).data("pageno"),
                butterCmsObj = new ButterCmsBlogData();

            butterCmsObj.Init();
            butterCmsObj.GetBlogPosts(pageNo);
        });
    }

    // Format the blog post date to dd/MM/yyyy HH:mm
    function blogPostDateFormat(date) {
        var dateObj = new Date(date);

        return [dateObj.getDate().padLeft(), (dateObj.getMonth() + 1).padLeft(), dateObj.getFullYear()].join('/') + ' ' + [dateObj.getHours().padLeft(), dateObj.getMinutes().padLeft()].join(':');
    }

    // Get instance of Butter CMS on initialise to make one call.
    function getCMSInstance() {
        butterInstance = new Butter(apiKey);
    }
}

// Set a prototype for padding numerical values.
Number.prototype.padLeft = function (base, chr) {
    var len = (String(base || 10).length - String(this).length) + 1;

    return len > 0 ? new Array(len).join(chr || '0') + this : this;
};

To get a list of blog posts:

// Initiate Butter CMS.
BEButterCMS.Init();

// Get all blog posts.
BEButterCMS.GetBlogPosts();

To get a single blog post, you will need to pass in the slug of the blog post via your own approach:

// Initiate Butter CMS.
BEButterCMS.Init();

// Get single blog post.
BEButterCMS.GetSinglePost(postSlug);

Check Your Website Headers People!

A website can tell the public a lot about you, from the things you want people to see and other things you probably would not. HTTP Headers can divulge things about your website that you wouldn't necessarily want to make public and its up to the individual to make a decision on what headers they're willing to expose. But what I would recommend is to at least analyse any site prior to moving to a production environment.

Why all of a sudden am I talking about questioning your website HTTP Headers?

It was only by chance when perusing StackOverflow I came across a question about securing HTTP headers, I was directed to a site called securityheaders.io. I immediately entered this very site for scanning, thinking it would fare quite well. But boy oh boy was I wrong!:

Security Headers (Before)

Based on this result, does this make my website vulnerable? To a certain extent yes. By default you're exposing some key information to potential hackers about how your website is built. For example, here is a simple list of HTTP Headers that could be returned from the server:

  • Web server
  • Framework version
  • Cache handling
  • Cross-site scripting access
  • Referrer policies

Now based on that list alone, what HTTP headers would you hide? From having my eyes opened by the report generated by securityheaders.io, as a minimum I would hide anything that shows what technology, framework and server platform I am using. If there happens to be an exploit on the very server or technology you are using, we don't want the whole world to know that especially if you happen to be hosting a high traffic website.

I decided to correct all the issues highlighted by securityheaders.io and spent additional time obfuscating some additional headers. Now I can proudly say I've passed. There is just one blemish against the report to do with the "Content-Security-Policy" header, which defines approved sources of content that the browser may load.

Security Headers (After)

I been tweaking around with the rules for this header and I'll be honest when I say it shafted the administration dashboard of my the content management system I use for my site - Kentico CMS. So before I reinstate the header, I need a little more time tweaking.

Another great site to use to analyse the security of your site (.NET sites only) is ASafaWeb, which scans for common configuration vulnerabilities.

Recommended Links

Google Seems To Have An Issue With My Server Response Time...

...and I think I know why...

Out of all the issues Google PageSpeed Insights seems to have when analysing my site, there are two specific things crop up that annoy me:

  1. ​Reduce server response time
  2. ​Leverage browser caching (due to Google Analytics JavaScript file)

The Google Analytics issue is something I will have to live with since (as far as I'm aware) there's nothing I can do. It would be nice if Google wouldn't penalise you for using a product they have developed. However, the "Reduce server response time" was something that perplexed me. My site is relatively simple and not doing anything over-the-top.

Due to the nature of my hosting setup (shared), I didn't have all the capabilities to make my website respond any better. The only way I could think of improving server response time was to move my hosting to another region and purchasing a VPS to get more control.

Now, I think I have resolved the server response time issue...It has something to do with a Web Statistics service called AWStats that was enabled by default as an "addon" service on my hosting. Once disabled through my Plesk Management Portal, Google PageSpeed didn't seem to have any issue with my server response.

I cannot 100% confirm if by disabling the Web Statistics service is a permanent solution and will work for everyone else. But there might be some truth behind this. Web Statistic services like AWStats store all analytical data in log files directly on the server, so this must have some affect on the time a request is made. I could be talking complete nonesense.

If you have experienced the same problem as me, check your own hosting setup and it's "addon" services. You never know, it may give you that extra Google PageSpeed point. :-)

Microsoft Virtual Academy...Something every Microsoft Developer Should Take A Look At!

There are many roads and avenues a tech-head can take to either get a grasp on new technology or prepare for certification. Unfortunately, some methods to get the knowledge on a subject can come at a great cost...especially when it comes to anything Microsoft.

Generally, Microsoft has always had some great forum and blogging communities to enable developers to get the expertise they require. I've always found them to be somewhat divided and looked rough around the edges. Now Microsoft has reworked its community and provided learners with a wide variety of courses freely available to anyone!

While MVA courses are not specifically meant to focus on exam preparation. They should be used as an addition to paid courses, books and online test exams to prepare for a certification. But it definitely helps. It takes more than just learning theory to pass an exam.

So if you require some extra exam training or just want to brush up your skills, give a few topics a go. I myself decided to test my skills by starting right from the beginning and covering courses that relate to my industry. In this case, to name a few:

  • Database Fundamentals
  • Building Web Apps with ASP.NET Jump Start
  • Developing ASP.NET MVC 4 Web Applications Jump Start
  • Programming In C# Jump Start
  • Twenty C# Questions Explained

I can guarantee you'll be stumped by some of the exam questions after covering each topic. Some questions can be quite challenging!

I've been a .NET developer for around 7 years and even I had to go through the learning content more than once. Just because you've been in the technical industry for a lengthy period of time, we are all susceptible to forget things or may not be aware of different coding techniques.

One of the great motivations of using MVA is the ranking system that places you against a leaderboard of other avid learners and seeing yourself progress as you complete each exam. All I can advise is that don't let the ranking system be your sole motivation to just "show-off" your knowledge. The important part is learning. What's the point in making a random attempt to answer each exam without a deep understanding on why you got the answer correct or incorrect.

You can see how far I have progressed by viewing my MVA profile here: http://www.microsoftvirtualacademy.com/Profile.aspx?alias=2181504

All in all: Fantastic resource and fair play to Microsoft for offering some free training!

C# In Depth by Jon Skeet Review

C# In Depth Third EditionWhen working as a programmer, it's really easy to continue coding in the same manner you have done since you picked up a language and made your first program.

The saying: "Why fix it if it ain't broken?" comes to mind...

I for one sometimes fail to move with the times (unknowingly to me) and find new and better ways of coding. It's only on the off chance I get introduced to different approaches through my work colleague or whilst Googling for an answer to one of my coding queries.

After reading some rave reviews on C# In Depth, written by the one and only Stackoverflow god: Jon Skeet. I decided to part with my hard earned money and make a purchase.

C# In Depth is different from other programming books I've read on C#. In fact it's really good and don't let the title of the book deter you. The contents is ideal for novice and semi-experienced programmers.

Firstly, you start off by being shown code samples on how C# has evolved through its iterations (v1 - v4). In most cases I gave myself a gratifying pat on the back when I noticed the approaches I've taken in my own projects utilised practises and features of the current language. ;-)

Secondly, unlike some programming books I've read in the past, it's not intimidating to read at all. Jon Skeet really has a great way to talk about some concepts I find difficult to comprehend in a clear a meaningful way, so I could utilise these concepts within my current applications.

The only minor niggle I have is that there were a few places where I would have liked specific chapters to go into more detail. On the other hand, it gave me the opportunity to research the nitty-gritty details for myself.

Since I purchased this book, I found myself referencing it many times and appreciating what C# has to offer along with it misconstrued and underused features.

All in all, the author truly has a gift in clearly demonstrating his understanding on the subject with finesse and if I am able to comprehend even one-tenth of his knowledge, I will be a happy man.

The Easy Way To Run a PHP Site In A Windows Environment

Even though my programming weapon of choice is .NET C#, there are times (unfortunate times!) where I need to dabble in a bit if PHP. Being a .NET developer means I do not have the setup to run PHP based sites such as Apache and MySQL.

In the past I have tried to create an Apache configured server but I could never get it running 100% - possibly because I didn't have the patience or could justify the additional time required for setup when I could be working on a PHP site once in a blue moon...

Last year, I came across a program called EasyPHP that allowed me to install a local instance of Apache and MySQL altogether in just one installation. It made it really to get up and running without all the setup and configuration hassle.

Once installed you can create numerous websites and MySQL instances in a version of your own choosing. Wicked! I never been so excited about PHP in my life!

I have only scratched the surface on the features EasyPHP provides and whenever I did need to use it, there has always been great improvements. Take a look at their site for more information: www.easyphp.org.

So if you're a Windows man who needs to carry out PHP odds and ends, can't recommend EasyPHP enough.

It’s all about Website Hotkeys!

During the latter-end of 2010, Twitter overhauled their somewhat simplistic website to compete with client-side offerings (e.g. TweetDeck, Seesmic). What I found really impressive was a hidden bit of functionality that allowed the user to navigate around the site using keyboard shortcuts (or hot keys). If you haven't tried it, take a look at the list of shortcuts below and try them out.

Twitter Keyboard Shortcuts

Some people I know in the industry think it's a pointless feature. But I believe something so simple automatically enhances the users experience when accessing a site. In fact, you could think of hotkeys as an additional web accessibility requirement for those who don’t have a mouse or just prefer the more direct approach in navigating through a site. Many sites have been utilising hotkeys to get their sites to act like locally installed software programmes, for example Google Docs.

I was very keen on replicating hotkey functionality on my next project. Not surprising, there are a lot of custom jQuery plugins that allowed you to implement some basic keyboard shortcut functionality. The best one I found through trial and error is Mousetrap. I found Mousetrap to be the most flexible plugin to fire your own custom JavaScript events by binding a single, sequence or combination key press.

Using Mousetrap, I could replicate a simple Twitter-style shortcut to take a user back to the homepage by pressing the following keys in sequence: “G H”:

Mousetrap.bind("g h",
    function () { 
        window.location = "/Home.aspx"; 
    }
);

iOS Safari Browser Has A Massive Caching Issue!

Safari iOS6It wasn’t until today I found that the Safari browser used on iPad and iPhone caches page functionality to such an extent that it stops the intended functionality. So much so, it affects the user experience. I think Apple has gone a step too far in making their browser uber efficient to minimise page loading times.

We can accept browsers will cache style-sheets and client side scripts. But I never expected Safari to go as far as caching responses from web services. This is a big issue. So something as simple as the following will have issues in Safari:

// JavaScript function calling web service
function GetCustomerName(id)
{
    var name = "";

    $.ajax({
        type: "POST",
        url: "/Internal/ShopService.asmx/GetCustomerName",
        data: "{ 'id' : '" + id + "' }",
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        cache: false,
        success: function (result) {
            var data = result.d;
            name = data;
        },
        error: function () {
        },
        complete: function () {
        }
    });
    
    return name;
}
//ASP.NET Web Service method
[WebMethod]
public string GetCustomerName(int id)
{
   return CustomerHelper.GetFullName(id);
}

In the past to ensure my jQuery AJAX requests were not cached, the “cache: false” option within the AJAX call normally sufficed. Not if you’re making POST web service requests. It’s only until recently I found using “cache:false” option will not have an affect on POST requests, as stated on jQuery API:

Pages fetched with POST are never cached, so the cache and ifModified options in jQuery.ajaxSetup() have no effect on these requests.

In addition to trying to fix the problem by using the jQuery AJAX cache option, I implemented practical techniques covered by the tutorial: How to stop caching with jQuery and JavaScript.

Luckily, I found an informative StackOverflow post by someone who experienced the exact same issue a few days ago. It looks like the exact same caching bug is still prevalent in Apple’s newest operating system, iOS6*. Well you didn’t expect Apple to fix important problems like these now would you (referring to Map’s fiasco!). The StackOverflow poster found a suitable workaround by passing a timestamp to the web service method being called, as so (modifying code above):

// JavaScript function calling web service with time stamp addition
function GetCustomerName(id)
{
    var timestamp = new Date();

    var name = "";

    $.ajax({
        type: "POST",
        url: "/Internal/ShopService.asmx/GetCustomerName",
        data: "{ 'id' : '" + id + "', 'timestamp' : '" + timestamp.getTime() + "' }", //Timestamp parameter added.
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        cache: false,
        success: function (result) {
            var data = result.d;
            name = data;
        },
        error: function () {
        },
        complete: function () {
        }
    });
    
    return name;
}
//ASP.NET Web Service method with time stamp parameter
[WebMethod]
public string GetCustomerName(int id, string timestamp)
{
    string iOSTime = timestamp;
    return CustomerHelper.GetFullName(id);
}

The timestamp parameter doesn’t need to do anything once passed to web service. This will ensure every call to the web service will never be cached.

*UPDATE: After further testing it looks like only iOS6 contains the AJAX caching bug.

HTTP Request Script

In one of my website builds, I needed to output around a couple thousand records from a database permanently into the .NET cache. Even though I set the cache to never expire, it will get cleared whenever the application pool recycles (currently set to every 24 hours). As you can expect, if a user happens to visit the site soon after the cache is cleared, excess page loading times will be experienced.

The only way I could avoid this from happening is by setting up a Scheduled Task that would run a script that would carry out a web request straight after the application pool was set to recycle.

Luckily, I managed to find a PowerShell script on StackOverflow that will do exactly that:

$request = [System.Net.WebRequest]::Create("")
$response = $request.GetResponse()
$response.Close()