Backing up Site Content from an Azure WebJob

Being from an Ops background, I am generally predisposed to worry about keeping regular backups of stuff. So, before I even started writing this blog, I wanted to make sure I wouldn't lose any of the blog posts through a failure of the site - or more likely an error on my part.

The site itself, I wasn't too bothered about; I have it in my Git repo - I can just re-deploy it. However, the SQLite database that sits behind the site is explicitly excluded from source control so that it doesn't get overwritten with every deploy. I wanted some way of taking a regular backup of the ghost.db file and putting it somewhere away from the site itself. And since I'm using Azure, an Azure storage blob seemed to be a logical choice for that.

Copying stuff from a Windows Azure Web Site to an Azure storage blob is easy enough to do from the outside - you can FTP directly to your website, and PowerShell has some cmdlets that you can use to copy files into storage blobs. That would work, sure, but where do I run it? What if the machine is turned off? I'd also be paying extra for unnecessary data transfer - pulling files out of Azure only to upload them again.

The answer to this comes in the form of Azure WebJobs. WebJobs run in the same process space as your Azure web site. They have access to the filesystem and you can submit jobs as CMD scripts, PowerShell, .NET assemblies and a bunch of others. The standard output stream (ECHO, Write-Host, Console.WriteLine etc) is recorded as a log and can be accessed via the Kudu interface of your Web Site.

Great, I thought - I'll just upload a basic PowerShell script to copy the Ghost.db file to an Azure storage blob, same as I would do from a remote machine. Or so I thought. There are a few snags when trying to do it this way:

Ghost needs to be shut down while the backup is running. You can copy the file while it's running but you run the risk of the backup file being in an inconsistent state if SQLite has the file open when your backup runs.
In order to achieve the above, you need to kill the node.exe process, but not the underlying w3wp.exe - your WebJobs run in the context of the IIS App Pool and they simply won't run if the worker process has been killed. However, even if you kill node.exe it simply gets re-spawned when your site receives another connection, so you need to make sure that this doesn't happen.
The Windows Azure PowerShell cmdlets don't appear to be present on an Azure Web Site instance, and it doesn't seem to be possible to load them. This means that writing to Azure storage will be more tricky.

PowerShell didn't seem to be the way to go in this instance. Instead, I created a WebJob in C#. Now, there's a cool SDK for WebJobs in .NET which allows you to do neat stuff like bind methods to Azure storage queues and trigger when a message is posted and other stuff like that. But in this case, I just wanted a basic job that copies a file from my site to an Azure Storage Blob, and keeps node.exe from running while it's at it.

In the end I settled on the following method that, while perhaps not as elegant as I would like, does the job effectively.

First, a method to kill the node.exe process:

private static void killNode ()
{
    foreach (var proc in Process.GetProcessesByName("node"))
    {
        Console.WriteLine("Killing PID: " + proc.Id);
        proc.Kill();
        proc.WaitForExit();
    }
}

Next, a class with a method to copy my Ghost.db file to an Azure storage blob. The URI and credentials for the Storage Account are held in the web.config of the Azure Web Site and can be edited from the management portal.

public class BackupGhostDB
 {
    public void backupGhostDB()
    {
        var storageAccountName  = ConfigurationManager.ConnectionStrings["BackupAccount"].ConnectionString;
        var containerName       = ConfigurationManager.AppSettings["BackupContainer"];
        var csa                 = CloudStorageAccount.Parse(storageAccountName);
        var client              = csa.CreateCloudBlobClient();
        var container           = client.GetContainerReference(containerName);
        var blobName            = DateTime.UtcNow.ToString("yyyyMMddHHmmss") + "_ghost.db";
        container.CreateIfNotExist();

        Console.WriteLine("Backing up database to " + container.Uri + "/" + blobName);
        var blob = container.GetBlockBlobReference(blobName.ToString());

        Console.WriteLine("Copy started.");
        blob.UploadFile(@"D:\home\site\wwwroot\content\data\ghost.db");
        Console.WriteLine("Copy completed.");
    }
}

Now, to hook it all together. The key is that the node.exe process must be prevented from starting while the backupGhostDB() method runs. The solution I chose for that was to start the backup process asynchronously and call KillNode() in a loop until the backup has completed.

In the BackupGhostDB class, a delegate for the backupGhostDB() method:

public delegate void backupCaller();

and finally, the main() method:

static void Main(string[] args)
{
    var ghostBackup = new BackupGhostDB();
    var caller      = new backupCaller(ghostBackup.backupGhostDB);

    killNode();

    Console.WriteLine("About to invoke backup delegate");
    IAsyncResult result = caller.BeginInvoke(null, null);

    while (!result.IsCompleted)
    {
        killNode();
    }

    caller.EndInvoke(result);
    Console.WriteLine("Backup completed");
}

Simple as that! Once compiled, the executable and its dependencies can be zipped and uploaded to the WebJobs section of the site in the Azure management console, and added to a schedule to run it once per night or how often you wish.

This method of checking for and killing node.exe in a loop is probably not going to be a good solution for sites that get lots of traffic, but for your average blog like this, it should be just fine.

Cheers!