Generate Google Sitemap From A List of Url’s In A Text File

I had around 2000 webpage URL’s listed in a text file that needed to be generated into a simple Google sitemap.

I decided to create a quick Google Sitemap generator console application fit for purpose. The program iterates through each line of a text file and parses it to a XmlTextWriter to create the required XML format.

Feel free to copy and make modifications to the code below.

Code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;

namespace GoogleSitemapGenerator
{
    class Program
    {
        static void Main(string[] args)
        {
            string textFileLocation = String.Empty;

            if (args != null && args.Length > 0)
            {
                textFileLocation = args[0];
            }

            if (!String.IsNullOrEmpty(textFileLocation))
            {
                string fullSitemapPath = String.Format("{0}sitemap.xml", GetCurrentFileDirectory(textFileLocation));

                //Read text file
                StreamReader sr = File.OpenText(textFileLocation);

                using (XmlTextWriter xmlWriter = new XmlTextWriter(fullSitemapPath, Encoding.UTF8))
                {
                    xmlWriter.WriteStartDocument();
                    xmlWriter.WriteStartElement("urlset");
                    xmlWriter.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9");

                    while (!sr.EndOfStream)
                    {
                        string currentLine = sr.ReadLine();

                        if (!String.IsNullOrEmpty(currentLine))
                        {
                            xmlWriter.WriteStartElement("url");
                            xmlWriter.WriteElementString("loc", currentLine);
                            xmlWriter.WriteElementString("lastmod", DateTime.Now.ToString("yyyy-MM-dd"));
                            //xmlWriter.WriteElementString("changefreq", "weekly");
                            //xmlWriter.WriteElementString("priority", "1.0");

                            xmlWriter.WriteEndElement();
                        }
                    }

                    xmlWriter.WriteEndElement();
                    xmlWriter.WriteEndDocument();
                    xmlWriter.Flush();

                    if (File.Exists(fullSitemapPath))
                        Console.Write("Sitemap successfully created at: {0}", fullSitemapPath);
                    else
                        Console.Write("Sitemap has not been generated. Please check your text file for any problems.");

                }
            }
            else
            {
                Console.Write("Please enter the full path to where the text file is situated.");
            }
        }

        static string GetCurrentFileDirectory(string path)
        {
            string[] pathArr = path.Split('\\');

            string newPath = String.Empty;

            for (int i = 0; i < pathArr.Length - 1; i++)
            {
                newPath += pathArr[i] + "\\";
            }

            return newPath;
        }
    }
}

I will be uploading a the console application project including the executable shortly.

Adding An XML Sitemap to a Website

In 2005, the search engine Google launched the Sitemap 0.84 Protocol, which would be using the XML format. A sitemap is a way of organizing a website, identifying the URLs and the data under each section. Previously, the sitemaps were primarily geared for the users of the website. However, Google's XML format was designed for the search engines, allowing them to find the data faster and more efficiently.

Even the most simple sitemap to a website is quite important in order to allow search engines such as Google and Microsoft Live Search to crawl your website for any changes. The following example shows what a basic XML sitemap contains:

<?xml version="1.0" encoding="UTF-8" ?> 
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>/blog/</loc> 
<priority>0.5</priority> 
<changefreq>weekly</changefreq> 
</url>
</urlset>

As you can see the sitemap contain the following:
  • <loc> = Location of the page
  • <priority> = The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site.
    The default priority of a page is 0.5.
  • <changefreq> = This value indicates how frequently the content at a particular URL is likely to change.

Thankfully, there is a site that will automatically generate an XML sitemap for you: http://www.sitemapspal.com/

I have written a blog post a little while back on how to manually submit your sitemap to search engines which proves to be quite useful if you find that your site has not been crawled for a long time. You can find that blog post here.

Adding Google Sitemap To Blogger Account

After my last blog posting (Google Is Better Than I Thought!!!), one of my mates said that he was unable integrate the Google Sitemap facility to his Blogger.com blog since there we could not find a Sitemap file. You can create a Google Sitemap for your Blogger.com site by carrying out the following:

1) Login to your Google Sitemap account.

2) Soon as you login you will be able to add your blogger site. In my example, I am creating a fictitious blog called "someblog.blogsite.com".


3) Once you have added your new site you will get a message saying that your new site was added successfully.

4) Now you will need to verify that you are the owner of the blog you have submitted (you wouldn't want anybody to track your site would you? :-P). You have two options here. You can either use a HTML file (which cannot be used on Blogger blogs) or META tag. Select META tag. 

5) A unique META tag will be generated for you:

6) Login to your Blogger.com account and edit your Blog Template. Insert your generated META tag straight after the <head> tag and Save your template.

7) In the Google Sitemap account click on the "Verify" button and if everything goes to plan you will be able to create a Sitemap file.

8) Click on the "Sitemap" link from the left navigation and then click "Add Sitemap".

9) From the Drop Down List select "Add General Web Sitemap".

 
10) The Blogger RSS feed will be used as your Sitemap. So enter the full web address of your RSS feed. For example: http://someblog.blogsite.com/feeds/posts/ and press "Add General Web Sitemap" button.

 

That should be it. It may take up to 24 hours for Google to crawel through your blog depending on the amount of content on your site.

UPDATE:

It looks like  there are 4 possibilities for referencing your Blogger Sitemap:

> someblog.blogsite.com/rss.xml
> someblog.blogsite.com/feeds/posts/full
> someblog.blogsite.com/feeds/posts/default?alt=rss
> someblog.blogsite.com/atom.xml