Here at FrogSlayer, we’ve started the process of moving our QA and Demo environments from local hosting (on a single 4 year old dust covered VM server in a closet, just waiting for the right day to crash) to cloud based hosting (where a hopefully dust free data center server of unknown age, loaded to capacity, is waiting to crash in a hopefully more fault tolerant way). Part of this process is providing our thoughts in a bit of a retrospective on the various cloud providers.
Our use case is a bit different from what marketing advertises. We aren’t yet looking to host our 1M visitor per day sites over; rather we’re looking to move our relatively low traffic, semi-public facing infrastructure into a more fault tolerant environment. We’re also looking to move some of our existing (low traffic) hosted sites with minimal effort.
This is part of an unplanned series, starting with our thoughts on Azure. This series will also hit AWS and possibly a local deployment of OpenStack.
We start this series with Azure as I have the most experience with it and the most bile for it. As an organization, we started with Azure because historically we’ve been a Microsoft shop, and we had some experience with it from the free credits offered by MSDN.
See Entire Cloud at Once
I didn’t realize this was a pro until working with AWS, but you can see your entire infrastructure at a glance. No more asking yourself, “Did I put that server in Sydney or Singapore?” It is all listed in one place.
Multi-Account Support Deeply Integrated
With Azure it’s very easy to create new subscriptions that bill to different accounts, and otherwise clearly differentiate and filter what infrastructure belongs to one account vs another (in the old portal at least). The differentiation is still present in the new one, but not the filtering.
Azure Websites could have been Good
Azure Websites is, in theory, a set of sites placed on top of a scalable infrastructure. You can put 30 sites on a single server instance, with each site seperated by whatever dark magic Microsoft has weaved server side such that each site thinks it’s in D:\home\site\wwwroot\. If you have 30 sites that both need to be up and are barely ever hit, that’s a nice feature from a cost savings point of view. It’s hampered by the fact that local storage isn’t persisted, so almost any existing site that allows file upload needs to be modified. It’s alright for new stuff, but isn’t a friction free alternative for migration.
I’m curious how this works in FTP deployment environments though. With FTP, you’re (presumably) connecting to one of the running instances or to some weird management server that happens to have aggregations of all server logs. If connecting to a running instance, that should imply that file uploads would be persisted as there’s not really any difference between a random FTP upload and a file created by an IIS process (barring weird things like custom FTP server, hooks, etc). However, due to a con listed somewhere below, this has to remain idle speculation.
Azure SQL is Nice
Azure SQL is a high availability mostly MS SQL compatible database as a service offering. If I’m reading the marketing right, under the right circumstances it supports recovering to 5 seconds ago. By default it spreads/replicates data over multiple servers, and supports geo-replication. Starting at about $5/mo (no licensing worries).
SLA for VMs
… in a correctly configured availability set. This is better then the AWS offering of “we super promise that our services will be up 99.99% of the time in at least one of the AZ’s”. The entire concept of the SLA is a bit of false comfort, of course. It isn’t a guarantee and only provides a path for limited compensation in the event of violation. It provides incentive, but ultimately it’s still on you to architect for failure.
Easy Port Redirection
This falls into security through obscurity, but it’s nice to be able to very easily say, “Okay, redirect port 38948 to the RDP port.” There are also some methods for limiting IP ranges, but those don’t fall under the moniker “easy”.
Dallas Data Center
Did you know that Texas is the best state? And yet AWS doesn’t have a data center here. 🙁 What are we supposed to do once we finally get around to seceding? Have our servers hosted internationally? Not only that, but on AWS our options are limited to either hurricane alley or earthquake boulevard. Or Oregon. Not exactly spoiled for choice.
Bloody Slow UI
Have you ever used a multibillion dollar technology corporation’s website and noticed every action has a 2 to 4 second load time? Have you ever wondered how the engineers weren’t taken out for trampling under the company brontosaurus? This is Microsoft Azure. The servers themselves (once they’ve had time to setup/cache) aren’t that bad, but anything on the management UI. Anything that calls for not-quite-static content. It will drive you to frustration. Frustration and bourbon.
My best story actually comes from this. We had a support issue (wanted to diagnose why a VM would periodically be listed as running but unresponsive), but at the time didn’t have a support contract. In attempting to file a ticket, their server side check for ‘does this account have a support contract’ timed out after 30 seconds. And because it timed out, I assume out of fear of the wrath of people that actually do pay 10k/month for support, their support software allowed us to file a ticket under a gratis 1k/month plan. Their site is so slow we got free support.
Unrelated: we actually got a response on that ticket that a piece of networking hardware had failed on the VM host. Okay, that happens. It’s not statistically comforting that this happened to one of our first VM’s, but it happens. Surely they’ve either fixed the issue or, say, migrated the VMs on the faulty hardware to somewhere else, right?… nope. Continued having issues until we went through the magical process of ‘change the VM size to force a reprovision on different (and hopefully better) hardware’. That does do well for confidence building.
Poor Management UI
At time of writing, there are roughly four ways of interacting with Azure Management:
- https://manage.windowsazure.com/, the classic UI.
- https://portal.azure.com/, the “preview” portal.
- PowerShell commandlets
- A REST API
The PowerShell commandlets and REST API are close enough to one and the same for the purposes of ranting. These are the only ways to do some tasks and can generally do anything either portal can do. From here on, we ignore them because I don’t actually spend enough time on Azure to justify learning the esoteric syntax that the Powershell cmdlets call for, or take the time to write my own “better” interface.
The Preview Portal is a thing out of my Windows 8 nightmares. Ignoring the eye gouging design and continued insistence of Microsoft that “No really, tiles are the future!”, it’s painfully buggy. A fair bit of it is just small things: making a choice leads to a faulty ‘if you close this your edits will be lost’ dialog; tiles on the main page stick around after you delete the thing they point to (despite the UI auto creating them); a request times out and suddenly the entire UI is unresponsive. Standard buggy preview software things. But at the same time, it’s also the only way to do certain operations. Recreating a VM that had a static public IP can ONLY be done in preview. Putting a VM on a virtual network can only be done in the preview portal. Support tickets can only be filed in the new portal. They’ve rushed out a beta into production.
The older portal is/(was) at least close to usable. It’s a bit old fashioned site design-wise, but I’m fine with that. By virtue of being older, it doesn’t have support for all the newer features that you probably want. But at the same time, it’s the only way to do certain other tasks (Active Directory management is the officially listed item that’s only in the old portal, but I recall hitting other things that were only possible in the old portal).
Part of the drive for creating the preview portal is that, supposedly, the old portal has a lot of (UI and code) inconsistency. It’s not unreasonable, as it was originally created back when Azure was a grand total of three service offerings by a guy driving ’round the country-side in a van. It was then extended by various individual groups. It wasn’t built for the intertwining of services that are now offered. It’s just a shame that the replacement is so awful.
My favorite comment from trawling the internet: “Yet another task that can only be completed by deleting and recreating the VM.” Once most resources are created, very few things about them can be changed. Want to move a VM to a new/different virtual network? Delete the VM and re-create it by recycling the disks. Want to rename anything? Delete it and recreate it. Want to add an ip preserved through deleting and recreating VMs? Delete and recreate. Want to move VM’s from one subscription to another? Good luck with that because you need to move the disks to a storage account on that subscription, and you can’t do that in either of the web UIs. The preview portal added the concept of Resource groups. Want to change what resource group a resource is in? Delete and recreate. Want to change the name of a Resource Group? You can’t delete it until you delete everything under it.
Deleting and recreating isn’t facilitated in anyway in the UI. You’re given a delete button and a create button with the expectation that you can fill out the multi-page form without mistakes this time. This is a good part of why I suspect “build your own interface” is the intended UI. And that Microsoft hates the user.
To be fair, AWS is only slightly better. Moving things between VPC’s is basically the same delete/recreate process. But they at least had the foresight to allow the user to rename things. They realized that IT folk like to keep things organized so their OCD doesn’t flare up.
The only Azure documentation I’ve been able to reliably find is the marketing blurb and the ‘Baby’s First Deployment’ guides. I wanted to know more about how Azure Websites handled migration from one server to another during failover. After an hour of searching I found a forum post that had a link to a 3 year old shaky cam recording of a talk given by a Microsoft Engineer that worked on that part of the system. And that was the only “documentation” I could find.
One problem we had was that a Python hosted site would become unresponsive whenever a single user attempted a large file upload. Doesn’t happen on our local (nix) server. Best guess for cause is that IIS is setup to interact with WSGI in a single process / single threaded manner. Okay… how do we find the Web.config that actually defines how IIS talks to python? Go to etc.scm.azurewebsites.net of course! Somewhere in there you’ll find the IIS applicationhost.config file. You can’t change it of course. In order to do that, you need to play around with Kudu Site Extensions to be able to apply an XML transform on it. How did I find any of that? Sure as hell wasn’t the Microsoft docs.
Fellow travelers may find these of use: Azure Website Cheat Sheet, and Website Tools You Should Know About. I’ve given up on Azure Websites for anything other then the most basic of (.NET) sites, but someone else may find them useful.
The other problem is the documentation gets out of date quickly. Azure Web Sites are now called Azure Web Apps under the App Service moniker. As far as I can tell this is just marketing. They used to be called Azure Web Roles, and behaved very differently. The three are just compatible enough that a good half of the (user generated) knowledge is still applicable though. But good luck finding it.
Is Azure worth it?
I’m not sure if you can tell, but I’m a bit of a curmudgeon. I’m also non-plussed by Azure. It’s definitely usable, and we do have some infrastructure hosted on it. Even a few production sites. Once we got past the teething pains, it’s been reasonably acceptable. The ideas are solid and I’m willing to say the people responsible for the backend know what they’re doing. It’s just that interacting with it past the most basic use cases is frustrating.
BlogSlayer is the official blog of FrogSlayer, a custom software product development shop in Bryan/College Station, Texas. Our specialty is getting your product to market in 90 days or less. If you would like a free consultation for your project or big idea, email us at email@example.com. You can also connect with us on Twitter, Facebook, or LinkedIn.