Cisco 9800 WLC in Azure deployment Issues

Azure and Cisco 9800 WLC images with test of Its kind of broke over top

Table of Contents

Intro

Thought this was a quick fun one as my first practice blog. One of my customers is looking to deploy a 9800 WLC into Azure for there wireless deployment, easy enough right, Cisco have had this out for a while in the Azure marketplace via VM or an application. Hmmmmmm……

What is the issue?

When trying to deploy the VM version of the WLC from the Azure marketplace they found that the VM doesn’t really deploy successfully. So I thought I would try to replicate what they were finding, so I followed the guide from Cisco – Cisco Catalyst 9800 Wireless Controller for Cloud on Microsoft Azure Deployment Guide – Cisco. Following the guide a I constructed a WLC VM using 17.12.3 version, one key point is that I selected SSH public key authentication for SSH access which is what my customer selected, remember this for later. So I hit deploy, and watched all the resources deploy successfully, until the VM itself.

The VM is created, and it boots, and I can access the GUI. So what is the complaint, well in Azure it just gets stuck in the deploy state until you get a deployment failed, but why the VM is powered on and working.

Looking closer, at the raw error message and researching online it seems a time out is occurring. So the VM is not deploying in the allotted time, even though its on and powered. Odd.

{
    "status": "Failed",
    "error": {
        "code": "OSProvisioningTimedOut",
        "message": "OS Provisioning for VM '9800wlc2' did not finish in the allotted time. The VM may still finish provisioning successfully. Please check provisioning state later. For details on how to check current provisioning state of Windows VMs, refer to https://aka.ms/WindowsVMLifecycle and Linux VMs, refer to https://aka.ms/LinuxVMLifecycle."
    }
}

Well I thought initially it was Azure being Azure sometimes, and carried on. I thought I would SSH using my key and start some basic configuration just to check there was nothing else going on. Issue was I couldn’t SSH on to the box, no matter what I tried PuTTY and OpenSSH with various different arguments to force authentication types. I first thought perhaps because the WLC by default creates a 2048 RSA key and the Azure generated one defaults to 3072, so I uploaded my own 2048 public key and redeployed the VM, to then get that same issue, provisioning timed out and I can’t SSH to the box.

I did some further testing, I deployed a WLC just using username/password SSH authentication and it deploys successfully furthermore, I can login to it fine. The issue seems to be related to when the authentication is set to Public Key.

Spending some time looking at the serial console output, I noticed you can see in the logs it finding the public key and setting the VM to use Key based authentication.

binos[5673]: 2025/08/26 21:22:30 Not able to retrieve password. This should be Key based Authentication deployment.
binos[5673]: 2025/08/26 21:22:30 Wrote /var/lib/waagent/CustomData
2025/08/26 21:22:30 Disabled SSH password-based authentication methods. 
2025/08/26 21:22:30 entry: in CreateAccount 
binos[5673]: 2025/08/26 21:22:30 ovf env parse: public key found
2025/08/26 21:22:30 Created user account: azureuser 
binos[5673]: 2025/08/26 21:22:30 ovf env parse: path: /home/azureuser/.ssh/authorized_keys
binos[5673]: 2025/08/26 21:22:30 entry: Ovf env process method

Then what is happening next seems to be the root of why I am unable to use the private key to access the VM over SSH.

2025/08/26 21:22:30 Deploy public key:None 
2025/08/26 21:22:30 ERROR:Traceback (most recent call last): 
binos[5673]: 2025/08/26 21:22:30 Setting the host name to 9800wlc2
2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 2447, in main 
binos[5673]: 2025/08/26 21:22:30 Disabled SSH password-based authentication methods.
binos[5673]: 2025/08/26 21:22:30 entry: in CreateAccount
2025/08/26 21:22:30 ERROR:    rc = WaAgent.Run1() 
2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 1718, in Run1 
2025/08/26 21:22:30 ERROR:    provisionError = self.Provision() 
binos[5673]: 2025/08/26 21:22:30 Created user account: azureuser
binos[5673]: 2025/08/26 21:22:30 Deploy public key:None
binos[5673]: 2025/08/26 21:22:30 ERROR:Traceback (most recent call last):
2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 1090, in Provision 
binos[5673]: 2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 2447, in main
2025/08/26 21:22:30 ERROR:    error = ovfobj.Process() 
binos[5673]: 2025/08/26 21:22:30 ERROR:    rc = WaAgent.Run1()
2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 588, in Process 
binos[5673]: 2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 1718, in Run1
2025/08/26 21:22:30 ERROR:    if not os.path.isfile(TmpAzure + pkey[0] + ".crt"): 
binos[5673]: 2025/08/26 21:22:30 ERROR:    provisionError = self.Provision()
binos[5673]: 2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 1090, in Provision
2025/08/26 21:22:30 ERROR:TypeError: can only concatenate str (not "NoneType") to str 
2025/08/26 21:22:30 ERROR: 
2025/08/26 21:22:30 ERROR:Exception: can only concatenate str (not "NoneType") to str 
binos[5673]: 2025/08/26 21:22:30 ERROR:    error = ovfobj.Process()
binos[5673]: 2025/08/26 21:22:30 ERROR:  File "/usr/bin/csr_azure_init_agent", line 588, in Process
2025/08/26 21:22:30 Restart agent in 15 seconds 
binos[5673]: 2025/08/26 21:22:30 ERROR:    if not os.path.isfile(TmpAzure + pkey[0] + ".crt"):
binos[5673]: 2025/08/26 21:22:30 ERROR:TypeError: can only concatenate str (not "NoneType") to str
binos[5673]: 2025/08/26 21:22:30 ERROR:
binos[5673]: 2025/08/26 21:22:30 ERROR:Exception: can only concatenate str (not "NoneType") to str

It seems even though it does find a public key in the environment variables, when being deployed it is being set to None, which as you can see the deployment program does not expect or handle well. As it correctly identifies you can’t concatenate a None Type to a string which it is trying to do (takes me back to learning python at school 🙂 ). This is also perhaps why the VM deployment errors.

What next?

While Cisco looks into this, I will aim to update this post if it gets fixed or if there is a workaround given. For now I would deploy the VM with username/password authentication for SSH, and if you want to use certificates for SSH configure them once the VM is deployed and you can access it.