1345 words
7 minutes
Convert public AKS cluster to private cluster

For security reasons, the AKS cluster in your organization may need to be converted to private AKS cluster, to ensure that your AKS is not exposed to the public Internet.
This article helps you convert AKS to private cluster using the feature API Server VNet Integration.

Precautions#

  • This process will result in node rotation processes, which can cause downtime. Treat this process as AKS upgradation process, and refer to pre-upgrade considerations for precautions. There is NO ROLLBACK.
  • After converting it into private cluster, ONLY user-assigned managed identity is allowed to be used for AKS. Note the current role assignments for the AKS security principal, if any, will need to be re-granted.
  • If AKS cluster was using a service principal as a security principal, you should not terminate the conversion process midway, otherwise it may lead to serious consequences due to permissions confusion.
  • You can no longer use the feature Service Tags for API Server authorized IP ranges.

Preliminary inspection: check current AKS security principal type#

To use the feature API Server VNet Integration for converting AKS cluster into private cluster, the AKS cluster must use user-assigned managed identity. Therefore, before converting, we need to check which type of security principal AKS is using.

az aks show -n ${aks} -g ${rG} --query identity.type -o tsv

By executing the above command, you will get the result among three results:

  • SystemAssigned -> system-assigned managed identity
  • UserAssigned -> user-assigned managed identity
  • (null) -> service principal

If the result is “UserAssigned”, you can skip converting the AKS security principal to a user-assigned managed identity. Otherwise, a user-assigned managed identity needs to be created for AKS to use.

Migrate AKS security principal to user-assigned managed identity (Conditional skip)#

NOTE

If the AKS is currently using user-assigned managed identity, skip the entire (this) section “Migrate AKS security principal to user-assigned managed identity (Conditional skip)”, otherwise continue.

Check what external resources is used by current AKS security principal#

In some scenarios, you may want to use external resources, such as: public IP resources, storage accounts outside of the Node Resource Group (NRG), or custom VNet for AKS. Assigning roles to the new managed identity with the same permissions on these resources is required to ensure that AKS can continue to use them after changing the security principal.

  1. Get current AKS security principal object ID
aksIdentityType=$(az aks show -n ${aks} -g ${rG} \
--query identity.type -o tsv)

if [[ "$aksIdentityType" == "SystemAssigned" ]]
then
aksIdentityId=$(az aks show -n ${aks} -g ${rG} \
--query identity.principalId -o tsv)
fi
if [[ "$aksIdentityType" == "UserAssigned" ]]
then
aksIdentityId=$(az aks show -n ${aks} -g ${rG} \
--query identity.userAssignedIdentities.*.principalId -o tsv)
fi
if [[ "$aksIdentityType" == "" ]]
then
aksSPclientId=$(az aks show -n ${aks} -g ${rG} \
--query servicePrincipalProfile.clientId -o tsv)
aksIdentityId=$(az ad sp show \
--id ${aksSPclientId} --query id -o tsv)
fi

oldaksIdentityId=${aksIdentityId}
  1. Check role assignments for current AKS security principal, excluding the scope of the AKS Node Resource Group (NRG)
aksNrgId=$(az group show --query id -o tsv \
-n $(az aks show -n ${aks} -g ${rG} \
--query nodeResourceGroup -o tsv))

assignmentList=$(az role assignment list --all --query \
"[?principalId=='${aksIdentityId}'&&"'!'"starts_with(scope,'${aksNrgId}')]. \
{roleId:roleDefinitionId,name:roleDefinitionName,scope:scope}")

echo ${assignmentList}

If there is no output, we can assume that AKS currently has no control on resources outside of the Node Resource Group (NRG).

NOTE

When changing the security identity used by AKS, it automatically grants permissions to the new managed identity on Node Resource Group (NRG). So you don’t need to consider permissions on resources in the Node Resource Group (NRG).

Preparing new user-assigned managed identity#

As mentioned above, to convert AKS cluster into private cluster, it must use user-assigned managed identity. Therefore, we will create a new user-assigned managed identity here and ensure that it inherits permissions from the current AKS security principal, excluding the scope of the Node Resource Group (NRG).

  1. Create user-assigned managed identity
uami=${aks}-newIdentity
az identity create -n ${uami} -g ${rG} -o none
uamiId=$(az resource list -n ${uami} -g ${rG} \
--resource-type Microsoft.ManagedIdentity/userAssignedIdentities \
--query [0].id -o tsv)
uamiIdentityId=$(az identity show --ids $uamiId \
--query principalId -o tsv)
  1. Have the new managed identity inherit permissions from current AKS security principal (Conditional skip)
NOTE

If the current AKS security principal has no control over resources outside of the Node Resource Group (NRG), you can skip this section.

assignmentNum=$(echo ${assignmentList} | jq length)
for ((i=0; i<${assignmentNum}; i++)); do 
  roleId=$(echo "${assignmentList}" | jq -r '.['$i'] | .roleId')
  scope=$(echo "${assignmentList}" | jq -r '.['$i'] | .scope')
  
  az role assignment create --role ${roleId} \
  --assignee-object-id ${uamiIdentityId} -o none\
  --scope ${scope} --assignee-principal-type ServicePrincipal
done

Update current AKS cluster to use new user-assigned managed identity#

After all the preparation, we can migrate the current AKS security principal to the new user-assigned managed identity.

az aks update -n ${aks} -g ${rG} -o none \
--enable-managed-identity --assign-identity ${uamiId}
NOTE

This process DOES NOT automatically rotate the nodes.
If you were using system-assigned managed identity, the old system-assigned identity will be deleted.

CAUTION

If the current type of AKS security principal is service princial, there might be no going back. Check the note in Use a managed identity in Azure Kubernetes Service (AKS).
The process of converting your current AKS cluster to a private cluster will naturally resolve the old and new security principal conflicts, as node rotation occurs during the process. However, if you use AKS to pull images from ACR, you will need to re-grant permissions.

Re-grant permission to access ACR (Conditional skip)#

NOTE

If the AKS is using a system-assigned managed identity, skip this section.

This procedure is required for AKS which was using service principal, or after converting an AKS cluster to a private cluster, your workloads will fail. If you do not have any Azure Container Registry (ACR) resources, you can also skip this step.

Since we are now using managed identities, AKS now relies on the kubelet managed identity to access the Azure Container Registry (ACR). This managed identity is different from the managed identity we just created. These two identities can be the same, but we won’t discuss that here.

  1. Check current ACR resources in your subscription
az resource list --resource-type Microsoft.ContainerRegistry/registries \
--query "[].{name:name,id:id}" -o yaml
  1. Re-attach ACR back to AKS
az aks update -n ${aks} -g ${rG} --attach-acr {ACR_name/ID}
NOTE

This also applies to scenarios where you use managed identities to access storage accounts without access keys, or likewise. You will need to re-configure them to ensure they have access after a node rotation.

Convert current AKS cluster to private cluster#

From now on, we will convert AKS cluster to private cluster.

  1. Create a subnet in AKS VNet for AKS api server
az network vnet subnet create -g {resource_group} --vnet-name {vnet_name} \
-n {apiserver_subnet_name} --delegations Microsoft.ContainerService/managedClusters \
--address-prefixes {your_desired_cidr}
NOTE

The minimum supported CIDR for api server subnet is a /28.

  1. Grant permission to the new subnet
az role assignment create --role "Network Contributor" \
--assignee-object-id ${uamiIdentityId} -o none \
--scope ${vnetId}/subnets/apisubnet --assignee-principal-type ServicePrincipal
  1. Speed ​​up the node rotation process by configuring max surge (Optional)
az aks nodepool update -n {nodepool_name} -o none \
--cluster-name ${aks} -g ${rG} --max-surge 2
  1. Enable API Server VNet Integration
az aks update -n ${aks} -g ${rG} --enable-apiserver-vnet-integration \
--apiserver-subnet-id ${vnetId}/subnets/apisubnet -o none
IMPORTANT

Since the API Server IP address will be changed during the process, you will encounter the downtime as all nodes will be rotated. Consider using the strategies when upgrading AKS in general scenario, like Max Surge, to speed up the process.

  1. Enable private cluster mode
az aks update -n ${aks} -g ${rG} --enable-private-cluster -o none
CAUTION

You cannot combine steps 4 and 5. The metrics-server will fail due to readiness probe failures caused by failure to refresh the API server address, and the Pod will not start automatically. This further triggers PDB failure.

Therefore, the following commands are prohibited and AKS will fail when executing it:

az aks update -n ${aks} -g ${rG} --enable-apiserver-vnet-integration -o none \
--apiserver-subnet-id ${vnetId}/subnets/apisubnet --enable-private-cluster

I also wrote a write-up for this error, see also How Pod Disruption Budget (PDB) error mislead the investigation.

Clean old role assignments#

After successfully converting current AKS cluster to arivate cluster, we need to clean up the old role assignments for old security principal, as they are not automatically being cleaned up.

oldassignmentItems=$(az role assignment list --all \
--query "[?principalId=='${oldaksIdentityId}'].id" -o tsv)

az role assignment delete --ids ${oldassignmentItems}

Hands-on lab#

To have better understanding on operation details, I wrote two demonstration scripts for practice:

  • Converting public AKS with system-assigned managed identity to private cluster
  • Converting public AKS with service principal to private cluster

The scripts can be accessed here: https://github.com/JoeyC-Dev/aks-example/tree/main/aks-convert-public-to-private

Convert public AKS cluster to private cluster
https://blog.joeyc.dev/posts/aks-convert-public-to-private/
Author
Joey Chen
Published at
2024-11-10