GatsbyJS Markdown Plugin: Automatically Open External Links In A New Tab

I’ve been doing a little research into how I can make posts written in markdown more suited for my needs and decided to use this opportunity to develop my own Gatsby Markdown plugin. Ever since I moved to Gatsby, making my own Markdown plugin has been on my todo list as I wanted to see how I could render slightly different HTML markup based on the requirements of my blog post content.

As this is my first markdown plugin, I thought it best to start small and tackle bug-bear of mine - making external links automatically open in a new window. From what I have looked online, some have suggested to just add an HTML anchor tag to the markdown as you will then have the ability to apply all attributes you’d want - including target. I’ll quote my previous post about aligning images in markdown and why I’m not keen on mixing HTML with markdown:

HTML can be mingled alongside the markdown syntax... I wouldn't recommend this from a maintainability perspective. Markdown is platform-agnostic so your content is not tied to a specific platform. By adding HTML to markdown, you're instantly sacrificing the portability of your content.

Setup

We will need to create a local Gatsby plugin, which I’ve named gatsby-remark-auto-link-new-window. Ugly name... maybe you can come up with something more imaginative. :-)

To register this to your Gatsby project, you will need start of with the following:

  • Creating a plugin folder at the root of your project (if one hasn’t been created already).
  • Add a new folder based on the name of our plugin, in this case - /plugins/gatsby-remark-auto-link-new-window.
  • Every plugin consists of two files:

    • index.js - where the plugin code to carry out our functionality will reside.
    • package.json - contains the details of our plugin, such as name, description, dependencies etc. For the moment this can just contain an empty JSON object {}. If we were to publish our plugin, this will need to be completed in its entirety.

Now that we have our bare-bones structure, we need to register our local plugin by adding a reference to the gatsby-config.js file. Since this is a plugin to do with transforming markdown, the reference will be added inside the 'gatsby-transformer-remark options:

{
  // ...
  resolve: 'gatsby-transformer-remark',
    options: {
      plugins: [        
        {
          resolve: 'gatsby-remark-embed-gist',
          options: {
            username: 'SurinderBhomra',
          },
        },
        {
          resolve: 'gatsby-remark-auto-link-new-window',
          options: {},
        },
        'gatsby-remark-prismjs',
      ],
    },
  // ...
}

For the moment, I’ve left the options field empty as we currently have no settings to pass to our plugin. This is something I will show in another blog post.

To make sure we have registered our plugin with no errors, we need run our build using the gatsby clean && gatsby develop command. This command will always need to be run after every change made to the plugin. By adding gatsby clean, we ensure the build clears out all the previously built files prior to the next build process.

Rewriting Links In Markdown

As the plugin is relatively straight-forward, let’s go straight into the code that will be added to our index.js file.

const visit = require("unist-util-visit")

module.exports = ({ markdownAST }, pluginOptions) => {
  visit(markdownAST, "link", node => {
    // Check if link is external by checking if the "url" attribute starts with http.
    if (node.url.startsWith("http")) {
      if (!node.data) {
        // hProperties refers to the HTML attributes of the node in question.
        // Ensure this object is added to the node.
        node.data = { hProperties: {} };
      }
      
      // Assign the 'target' attribute.
      node.data.hProperties = Object.assign({}, node.data.hProperties, {
        target: "_blank",
      });
    }
  })

  return markdownAST
}

As you can see, I want to target all markdown link nodes and depending on the contents of the url property we will perform a custom transformation. If the url property starts with an "http" we will then add a new attribute, "target" using hProperties. hProperties refers to the HTML attributes of the targeted node.

To see the changes take effect, we will need to re-run gatsby clean && gatsby develop.

Now that we have understood the basics, we can beef up our plugin by adding some more functionality, such as plugin options. But that's for another post.

Aligning Images In Markdown

Every post on this site is written in markdown since successfully moving over to GatsbyJS. Overall, the transition has been painless and found that writing blog posts using the markdown syntax is a lot more efficient than using a conventional WYSIWYG editor. I never noticed until making the move to markdown how fiddly those editors were as you sometimes needed to clean the generated markup at HTML level.

Of course, all the efficiency of markdown does come at a minor cost in terms of flexibility. Out of the minor limitations, there was one I couldn't let pass. I needed to find a way to position images left, right and centre as majority of my previous posts have been formatted in this way. When going through the conversion process from HTML to markdown, all my posts were somewhat messed up and images were rendered 100% width.

HTML can be mingled alongside the markdown syntax, so I do have an option to use the image tag and append styling. I wouldn't recommend this from a maintainability perspective. Markdown is platform-agnostic so your content is not tied to a specific platform. By adding HTML to markdown, you're instantly sacrificing the portability of your content.

I found a more suitable approach would be to handle the image positioning by appending a hashed value to the end of the image URL. For example, #left, #right, or #centre. We can at CSS level target the src attribute of the image and position the image along with any additional styling based on the hashed value. Very neat!

img[src*='#left'] {
float: left;
margin: 10px 10px 10px 0;
}

img[src*='#center'] {
display: block;
margin: 0 auto;
}

img[src*='#right'] {
float: right;
margin: 10px 0 10px 10px;
}

Being someone who doesn’t delve into front-end coding techniques as much as I used to, I am amazed at the type of things you can do within CSS. I’ve obviously come late to the more advanced CSS selectors party.

Journey To GatsbyJS: Exporting Kentico Blog Posts To Markdown Files

The first thing that came into my head when testing the waters to start the process of moving over to Gatsby was my blog post content. If I could get my content in a form a Gatsby site accepts then that's half the battle won right there, the theory being it will simplify the build process.

I opted to go down the local storage route where Gatsby would serve markdown files for my blog post content. Everything else such as the homepage, archive, about and contact pages can be static. I am hoping this isn’t something I will live to regret but I like the idea my content being nicely preserved in source control where I have full ownership without relying on a third-party platform.

My site is currently built on the .NET framework using Kentico CMS. Exporting data is relatively straight-forward, but as I transition to a somewhat content-less managed approach, I need to ensure all fields used within my blog posts are transformed appropriately into the core building blocks of my markdown files.

A markdown file can carry additional field information about my post that can be declared at the start of the file, wrapped by triple dashes at the start and end of the block. This is called frontmatter.

Here is a snippet of one of my blog posts exported to a markdown file:

---
title: "Maldives and Vilamendhoo Island Resort"
summary: "At Vilamendhoo Island Resort you are surrounded by serene beauty wherever you look. Judging by the serendipitous chain of events where the stars aligned, going to the Maldives has been a long time in the coming - I just didn’t know it."
date: "2019-09-21T14:51:37Z"
draft: false
slug: "/Maldives-and-Vilamendhoo-Island-Resort"
disqusId: "b08afeae-a825-446f-b448-8a9cae16f37a"
teaserImage: "/media/Blog/Travel/VilamendhooSunset.jpg"
socialImage: "/media/Blog/Travel/VilamendhooShoreline.jpg"
categories: ["Surinder's Log"]
tags: ["holiday", "maldives"]
---

Writing about my holiday has started to become a bit of a tradition (for those that are worthy of such time and effort!) which seem to start when I went to [Bali last year](/Blog/2018/07/06/My-Time-At-Melia-Bali-Hotel). 
I find it's a way to pass the time in airports and flights when making the return journey home. So here's another one...

Everything looks well structured and from the way I have formatted the date, category and tags fields, it will lend itself to be quite accommodating for the needs of future posts. I made the decision to keep the slug value void of any directory structure to give me the flexibility on dynamically creating a URL structure.

Kentico Blog Posts to Markdown Exporter

The quickest way to get the content out was to create a console app to carry out the following:

  1. Loop through all blog posts in post date descending.
  2. Update all images paths used as a teaser and within the content.
  3. Convert rich text into markdown.
  4. Construct frontmatter key-value fields.
  5. Output to a text file in the following naming convention: “yyyy-MM-dd---Post-Title.md”.

Tasks 2 and 3 will require the most effort…

When I first started using Kentico, all references to images were made directly via the file path and as I got more familiar with Kentico, this was changed to use permanent URLs. Using permanent URL’s caused the link to an image to change from "/Surinder/media/Surinder/myimage.jpg", to “/getmedia/27b68146-9f25-49c4-aced-ba378f33b4df /myimage.jpg?width=500”. I need to create additional checks to find these URL’s and transform into a new path.

Finding a good .NET markdown converter is imperative. Without this, there is a high chance the rich text content would not be translated to a satisfactorily standard, resulting in some form of manual intervention to carry out corrections. Combing through 250 posts manually isn’t my idea of fun! :-)

I found the ReverseMarkdown .NET library allowed for enough options to deal with Rich Text to Markdown conversion. I could set in the conversion process to ignore HTML that couldn’t be transformed thus preserving content.

Code

using CMS.DataEngine;
using CMS.DocumentEngine;
using CMS.Helpers;
using CMS.MediaLibrary;
using Export.BlogPosts.Models;
using ReverseMarkdown;
using System;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace Export.BlogPosts
{
class Program
{
public const string SiteName = "SurinderBhomra";
public const string MarkdownFilesOutputPath = @"C:\Temp\BlogPosts\";
public const string NewMediaBaseFolder = "/media";
public const string CloudImageServiceUrl = "https://xxxx.cloudimg.io";
static void Main(string[] args)
{
CMSApplication.Init();
List<BlogPost> blogPosts = GetBlogPosts();
if (blogPosts.Any())
{
foreach (BlogPost bp in blogPosts)
{
bool isMDFileGenerated = CreateMDFile(bp);
Console.WriteLine($"{bp.PostDate:yyyy-MM-dd} - {bp.Title} - {(isMDFileGenerated ? "EXPORTED" : "FAILED")}");
}
Console.ReadLine();
}
}
/// <summary>
/// Retrieve all blog posts from Kentico.
/// </summary>
/// <returns></returns>
private static List<BlogPost> GetBlogPosts()
{
List<BlogPost> posts = new List<BlogPost>();
InfoDataSet<TreeNode> query = DocumentHelper.GetDocuments()
.OnSite(SiteName)
.Types("SurinderBhomra.BlogPost")
.Path("/Blog", PathTypeEnum.Children)
.Culture("en-GB")
.CombineWithDefaultCulture()
.NestingLevel(-1)
.Published()
.OrderBy("BlogPostDate DESC")
.TypedResult;
if (!DataHelper.DataSourceIsEmpty(query))
{
foreach (TreeNode blogPost in query)
{
posts.Add(new BlogPost
{
Guid = blogPost.NodeGUID.ToString(),
Title = blogPost.GetStringValue("BlogPostTitle", string.Empty),
Summary = blogPost.GetStringValue("BlogPostSummary", string.Empty),
Body = RichTextToMarkdown(blogPost.GetStringValue("BlogPostBody", string.Empty)),
PostDate = blogPost.GetDateTimeValue("BlogPostDate", DateTime.MinValue),
Slug = blogPost.NodeAlias,
DisqusId = blogPost.NodeGUID.ToString(),
Categories = blogPost.Categories.DisplayNames.Select(c => c.Value.ToString()).ToList(),
Tags = blogPost.DocumentTags.Replace("\"", string.Empty).Split(',').Select(t => t.Trim(' ')).Where(t => !string.IsNullOrEmpty(t)).ToList(),
SocialImage = GetMediaFilePath(blogPost.GetStringValue("ShareImageUrl", string.Empty)),
TeaserImage = GetMediaFilePath(blogPost.GetStringValue("BlogPostTeaser", string.Empty))
});
}
}
return posts;
}
/// <summary>
/// Creates the markdown content based on Blog Post data.
/// </summary>
/// <param name="bp"></param>
/// <returns></returns>
private static string GenerateMDContent(BlogPost bp)
{
StringBuilder mdBuilder = new StringBuilder();
#region Post Attributes
mdBuilder.Append($"---{Environment.NewLine}");
mdBuilder.Append($"title: \"{bp.Title.Replace("\"", "\\\"")}\"{Environment.NewLine}");
mdBuilder.Append($"summary: \"{HTMLHelper.HTMLDecode(bp.Summary).Replace("\"", "\\\"")}\"{Environment.NewLine}");
mdBuilder.Append($"date: \"{bp.PostDate.ToString("yyyy-MM-ddTHH:mm:ssZ")}\"{Environment.NewLine}");
mdBuilder.Append($"draft: {bp.IsDraft.ToString().ToLower()}{Environment.NewLine}");
mdBuilder.Append($"slug: \"/{bp.Slug}\"{Environment.NewLine}");
mdBuilder.Append($"disqusId: \"{bp.DisqusId}\"{Environment.NewLine}");
mdBuilder.Append($"teaserImage: \"{bp.TeaserImage}\"{Environment.NewLine}");
mdBuilder.Append($"socialImage: \"{bp.SocialImage}\"{Environment.NewLine}");
#region Categories
if (bp.Categories?.Count > 0)
{
CommaDelimitedStringCollection categoriesCommaDelimited = new CommaDelimitedStringCollection();
foreach (string categoryName in bp.Categories)
categoriesCommaDelimited.Add($"\"{categoryName}\"");
mdBuilder.Append($"categories: [{categoriesCommaDelimited.ToString()}]{Environment.NewLine}");
}
#endregion
#region Tags
if (bp.Tags?.Count > 0)
{
CommaDelimitedStringCollection tagsCommaDelimited = new CommaDelimitedStringCollection();
foreach (string tagName in bp.Tags)
tagsCommaDelimited.Add($"\"{tagName}\"");
mdBuilder.Append($"tags: [{tagsCommaDelimited.ToString()}]{Environment.NewLine}");
}
#endregion
mdBuilder.Append($"---{Environment.NewLine}{Environment.NewLine}");
#endregion
// Add blog post body content.
mdBuilder.Append(bp.Body);
return mdBuilder.ToString();
}
/// <summary>
/// Creates files with a .md extension.
/// </summary>
/// <param name="bp"></param>
/// <returns></returns>
private static bool CreateMDFile(BlogPost bp)
{
string markdownContents = GenerateMDContent(bp);
if (string.IsNullOrEmpty(markdownContents))
return false;
string fileName = $"{bp.PostDate:yyyy-MM-dd}---{bp.Slug}.md";
File.WriteAllText($@"{MarkdownFilesOutputPath}{fileName}", markdownContents);
if (File.Exists($@"{MarkdownFilesOutputPath}{fileName}"))
return true;
return false;
}
/// <summary>
/// Gets the full relative path of an file based on its Permanent URL ID.
/// </summary>
/// <param name="filePath"></param>
/// <returns></returns>
private static string GetMediaFilePath(string filePath)
{
if (filePath.Contains("getmedia"))
{
// Get GUID from file path.
Match regexFileMatch = Regex.Match(filePath, @"(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}");
if (regexFileMatch.Success)
{
MediaFileInfo mediaFile = MediaFileInfoProvider.GetMediaFileInfo(Guid.Parse(regexFileMatch.Value), SiteName);
if (mediaFile != null)
return $"{NewMediaBaseFolder}/{mediaFile.FilePath}";
}
}
// Return the file path and remove the base file path.
return filePath.Replace("/SurinderBhomra/media/Surinder", NewMediaBaseFolder);
}
/// <summary>
/// Convert parsed rich text value to markdown.
/// </summary>
/// <param name="richText"></param>
/// <returns></returns>
public static string RichTextToMarkdown(string richText)
{
if (!string.IsNullOrEmpty(richText))
{
#region Loop through all images and correct the path
// Clean up tilda's.
richText = richText.Replace("~/", "/");
#region Transform Image Url's Using Width Parameter
Regex regexFileUrlWidth = new Regex(@"\/getmedia\/(\{{0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\}{0,1})\/([\w,\s-]+\.[A-Za-z]{3})(\?width=([0-9]*))", RegexOptions.Multiline | RegexOptions.IgnoreCase);
foreach (Match fileUrl in regexFileUrlWidth.Matches(richText))
{
string width = fileUrl.Groups[4] != null ? fileUrl.Groups[4].Value : string.Empty;
string newMediaUrl = $"{CloudImageServiceUrl}/width/{width}/n/https://www.surinderbhomra.com{GetMediaFilePath(ClearQueryStrings(fileUrl.Value))}";
if (newMediaUrl != string.Empty)
richText = richText.Replace(fileUrl.Value, newMediaUrl);
}
#endregion
#region Transform Generic File Url's
Regex regexGenericFileUrl = new Regex(@"\/getmedia\/(\{{0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\}{0,1})\/([\w,\s-]+\.[A-Za-z]{3})", RegexOptions.Multiline | RegexOptions.IgnoreCase);
foreach (Match fileUrl in regexGenericFileUrl.Matches(richText))
{
// Construct media URL required by image hosting company - CloudImage.
string newMediaUrl = $"{CloudImageServiceUrl}/cdno/n/n/https://www.surinderbhomra.com{GetMediaFilePath(ClearQueryStrings(fileUrl.Value))}";
if (newMediaUrl != string.Empty)
richText = richText.Replace(fileUrl.Value, newMediaUrl);
}
#endregion
#endregion
Config config = new Config
{
UnknownTags = Config.UnknownTagsOption.PassThrough, // Include the unknown tag completely in the result (default as well)
GithubFlavored = true, // generate GitHub flavoured markdown, supported for BR, PRE and table tags
RemoveComments = true, // will ignore all comments
SmartHrefHandling = true // remove markdown output for links where appropriate
};
Converter markdownConverter = new Converter(config);
return markdownConverter.Convert(richText).Replace(@"[!\", @"[!").Replace(@"\]", @"]");
}
return string.Empty;
}
/// <summary>
/// Returns media url without query string values.
/// </summary>
/// <param name="mediaUrl"></param>
/// <returns></returns>
private static string ClearQueryStrings(string mediaUrl)
{
if (mediaUrl == null)
return string.Empty;
if (mediaUrl.Contains("?"))
mediaUrl = mediaUrl.Split('?').ToList()[0];
return mediaUrl.Replace("~", string.Empty);
}
}
}
view raw Program.cs hosted with ❤ by GitHub
There is a lot going on here, so let's do a quick breakdown:

  1. GetBlogPosts(): Get all blog posts from Kentico and parse them to a “BlogPost” class object containing all the fields we want to export.
  2. GetMediaFilePath(): Take the image path and carry out all the transformation required to change to a new file path. This method is used in GetBlogPosts() and RichTextToMarkdown() methods.
  3. RichTextToMarkdown(): Takes rich text and goes through a transformation process to relink images in a format that will be accepted by my image hosting provider - Cloud Image. In addition, this is where ReverseMarkdown is used to finally convert to markdown.
  4. CreateMDFile(): Creates the .md file based on the blog posts found in Kentico.