The first thing that came into my head when testing the waters to start the process of moving over to Gatsby was my blog post content. If I could get my content in a form a Gatsby site accepts then that's half the battle won right there, the theory being it will simplify the build process.
I opted to go down the local storage route where Gatsby would serve markdown files for my blog post content. Everything else such as the homepage, archive, about and contact pages can be static. I am hoping this isn’t something I will live to regret but I like the idea my content being nicely preserved in source control where I have full ownership without relying on a third-party platform.
My site is currently built on the .NET framework using Kentico CMS. Exporting data is relatively straight-forward, but as I transition to a somewhat content-less managed approach, I need to ensure all fields used within my blog posts are transformed appropriately into the core building blocks of my markdown files.
A markdown file can carry additional field information about my post that can be declared at the start of the file, wrapped by triple dashes at the start and end of the block. This is called frontmatter.
Here is a snippet of one of my blog posts exported to a markdown file:
--- title: "Maldives and Vilamendhoo Island Resort" summary: "At Vilamendhoo Island Resort you are surrounded by serene beauty wherever you look. Judging by the serendipitous chain of events where the stars aligned, going to the Maldives has been a long time in the coming - I just didn’t know it." date: "2019-09-21T14:51:37Z" draft: false slug: "/Maldives-and-Vilamendhoo-Island-Resort" disqusId: "b08afeae-a825-446f-b448-8a9cae16f37a" teaserImage: "/media/Blog/Travel/VilamendhooSunset.jpg" socialImage: "/media/Blog/Travel/VilamendhooShoreline.jpg" categories: ["Surinder's Log"] tags: ["holiday", "maldives"] --- Writing about my holiday has started to become a bit of a tradition (for those that are worthy of such time and effort!) which seem to start when I went to [Bali last year](/Blog/2018/07/06/My-Time-At-Melia-Bali-Hotel). I find it's a way to pass the time in airports and flights when making the return journey home. So here's another one...
Everything looks well structured and from the way I have formatted the date, category and tags fields, it will lend itself to be quite accommodating for the needs of future posts. I made the decision to keep the slug value void of any directory structure to give me the flexibility on dynamically creating a URL structure.
Kentico Blog Posts to Markdown Exporter
The quickest way to get the content out was to create a console app to carry out the following:
- Loop through all blog posts in post date descending.
- Update all images paths used as a teaser and within the content.
- Convert rich text into markdown.
- Construct frontmatter key-value fields.
- Output to a text file in the following naming convention: “yyyy-MM-dd---Post-Title.md”.
Tasks 2 and 3 will require the most effort…
When I first started using Kentico, all references to images were made directly via the file path and as I got more familiar with Kentico, this was changed to use permanent URLs. Using permanent URL’s caused the link to an image to change from "/Surinder/media/Surinder/myimage.jpg", to “/getmedia/27b68146-9f25-49c4-aced-ba378f33b4df /myimage.jpg?width=500”. I need to create additional checks to find these URL’s and transform into a new path.
Finding a good .NET markdown converter is imperative. Without this, there is a high chance the rich text content would not be translated to a satisfactorily standard, resulting in some form of manual intervention to carry out corrections. Combing through 250 posts manually isn’t my idea of fun! :-)
I found the ReverseMarkdown .NET library allowed for enough options to deal with Rich Text to Markdown conversion. I could set in the conversion process to ignore HTML that couldn’t be transformed thus preserving content.
There is a lot going on here, so let's do a quick breakdown:
GetBlogPosts(): Get all blog posts from Kentico and parse them to a “BlogPost” class object containing all the fields we want to export.
GetMediaFilePath(): Take the image path and carry out all the transformation required to change to a new file path. This method is used in GetBlogPosts() and RichTextToMarkdown() methods.
RichTextToMarkdown(): Takes rich text and goes through a transformation process to relink images in a format that will be accepted by my image hosting provider - Cloud Image. In addition, this is where ReverseMarkdown is used to finally convert to markdown.
CreateMDFile(): Creates the .md file based on the blog posts found in Kentico.