Blog

Posts written in April 2017

Azure WebJob To Delete Old Files from A Blob Container

Posted in: Azure

I like to keep my blob containers quite tidy and delete any files that would unnecessarily increase its size. For a project I was working on, I had a blob that was being used to temporarily store images a user uploaded for manipulation at a later time. I saw no reason to keep these files for no longer than 24 hours. An Azure WebJob seemed an ideal solution to do this.

I could've left the blob container to stagnate and fester over time and the reasoning behind creating a cleanup task wasn't from a cost point of view. A blob container is very reasonably priced for the amount of storage and requests I would be making. I was more concerned about performance for times where I would be trawling through many thousands of files to get back the image a user had uploaded for temporary use by my web application.

Creating an Azure WebJob is very easy and versatile. You have the flexibility to develop a WebJob by creating the following scripts or programs:

  • .cmd, .bat, .exe (using windows cmd)
  • .ps1 (using powershell)
  • .sh (using bash)
  • .php (using php)
  • .py (using python)
  • .js (using node)
  • .jar (using java)

In this post, I will be developing my WebJob using a Console Application that will generate an executable. In Visual Studio 2017, there are two ways you can go about creating a project for your WebJob:

  1. Console Application project
  2. Selecting Azure WebJob project - which you will find under the "Cloud" category.

If you create your WebJob using a Console Application, you will still have the option later on to "Publish as an Azure WebJob..." when right-clicking on the project. In the code below I happened to be using a Console Application only because I didn't even know a Azure WebJob project existed until after I completed development on my project. Doh!

Program.cs

I have created a new project called "Site.AzueWebJob.Cleanup". The project uses the following two Azure nuget packages:


namespace Site.AzureWebJob.Cleanup
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                CloudStorageAccount storageAccount = CloudStorageAccount.Parse("<Insert Storage Connection String Here>");

                CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
                CloudBlobContainer dataContainer = blobClient.GetContainerReference("<Blob container name>");

                Console.WriteLine("Hourly threshold to remove records: {0}", ConfigurationManager.AppSettings["Azure.CleanupHours"]);

                #region Retrieve all data items greater than 24 hours and delete them

                Console.WriteLine("Retrieving old data files...");

                // Get files where the "Last Modified Date" is olders than 24 hours.
                IEnumerable<CloudBlob> oldData = dataContainer.ListBlobs()
                                .OfType<CloudBlob>()
                                .Where(b => b.Properties.LastModified.Value.Date < DateTime.Now.AddHours(int.Parse(ConfigurationManager.AppSettings["Azure.CleanupHours"].ToString()) * -1));

                IList<CloudBlob> dataBlobs = oldData as IList<CloudBlob> ?? oldData.ToList();

                Console.WriteLine("Data records retrieved: {0}.", dataBlobs.Count);
                Console.WriteLine("Removing old data files...");

                // Loop through the files and delete if they exist.
                foreach (CloudBlob dataBlob in dataBlobs)
                {
                    bool isDeleted = dataBlob.DeleteIfExists();

                    if (isDeleted)
                        Console.WriteLine("Deleted: {0}.", dataBlob.Name);
                }

                #endregion

                Console.WriteLine("Removing old data complete.");
            }
            catch (Exception ex)
            {
                Console.WriteLine("Error cleaning container files: {0}", ex.Message);
            }

            Console.WriteLine("Clean Containers WebJob complete.");
        }
    }
}

There isn't really much to it. All I am doing is retrieving all files that are older than 24 hours (value set within App.config app setting called: "Azure.CleanupHours") and then carrying out the delete process by looping through any records returned.

The most safest way to delete a file is to use the CloudBlob.DeleteIfExists() call. As the method name suggests, it will only delete a file if it exists. Using the CloudBlob.Delete() will cause an exception if for some reason the file isn't there and will require additional error handling.

Final Steps

Now that we have our Azure WebJob ready to go, the only thing left is to publish to your Azure Web App by simply right-clicking on your project and selecting: "Publish as an Azure WebJob...". Here you will connect to your Azure instance and have the options to choose how your WebJob should run:

  • Continuously
  • On Demand
  • On Schedule
;