For security reasons, the AKS cluster in your organization may need to be converted to private AKS cluster, to ensure that your AKS is not exposed to the public Internet.
This article helps you convert AKS to private cluster using the feature API Server VNet Integration.
This process will result in node rotation processes, which can cause downtime. Treat this process as AKS upgradation process, and refer to pre-upgrade considerations for precautions. There is NO ROLLBACK.
After converting it into private cluster, ONLY user-assigned managed identity is allowed to be used for AKS. Note the current role assignments for the AKS security principal, if any, will need to be re-granted.
If AKS cluster was using a service principal as a security principal, you should not terminate the conversion process midway, otherwise it may lead to serious consequences due to permissions confusion.
Preliminary inspection: check current AKS security principal type#
To use the feature API Server VNet Integration for converting AKS cluster into private cluster, the AKS cluster must use user-assigned managed identity. Therefore, before converting, we need to check which type of security principal AKS is using.
If the result is “UserAssigned”, you can skip converting the AKS security principal to a user-assigned managed identity. Otherwise, a user-assigned managed identity needs to be created for AKS to use.
Migrate AKS security principal to user-assigned managed identity (Conditional skip)#
NOTE
If the AKS is currently using user-assigned managed identity, skip the entire (this) section “Migrate AKS security principal to user-assigned managed identity (Conditional skip)”, otherwise continue.
Check what external resources is used by current AKS security principal#
In some scenarios, you may want to use external resources, such as: public IP resources, storage accounts outside of the Node Resource Group (NRG), or custom VNet for AKS. Assigning roles to the new managed identity with the same permissions on these resources is required to ensure that AKS can continue to use them after changing the security principal.
If there is no output, we can assume that AKS currently has no control on resources outside of the Node Resource Group (NRG).
NOTE
When changing the security identity used by AKS, it automatically grants permissions to the new managed identity on Node Resource Group (NRG). So you don’t need to consider permissions on resources in the Node Resource Group (NRG).
As mentioned above, to convert AKS cluster into private cluster, it must use user-assigned managed identity. Therefore, we will create a new user-assigned managed identity here and ensure that it inherits permissions from the current AKS security principal, excluding the scope of the Node Resource Group (NRG).
This process DOES NOT automatically rotate the nodes.
If you were using system-assigned managed identity, the old system-assigned identity will be deleted.
CAUTION
If the current type of AKS security principal is service princial, there might be no going back. Check the note in Use a managed identity in Azure Kubernetes Service (AKS).
The process of converting your current AKS cluster to a private cluster will naturally resolve the old and new security principal conflicts, as node rotation occurs during the process. However, if you use AKS to pull images from ACR, you will need to re-grant permissions.
Re-grant permission to access ACR (Conditional skip)#
NOTE
If the AKS is using a system-assigned managed identity, skip this section.
This procedure is required for AKS which was using service principal, or after converting an AKS cluster to a private cluster, your workloads will fail. If you do not have any Azure Container Registry (ACR) resources, you can also skip this step.
Since we are now using managed identities, AKS now relies on the kubelet managed identity to access the Azure Container Registry (ACR). This managed identity is different from the managed identity we just created. These two identities can be the same, but we won’t discuss that here.
This also applies to scenarios where you use managed identities to access storage accounts without access keys, or likewise. You will need to re-configure them to ensure they have access after a node rotation.
Since the API Server IP address will be changed during the process, you will encounter the downtime as all nodes will be rotated. Consider using the strategies when upgrading AKS in general scenario, like Max Surge, to speed up the process.
You cannot combine steps 4 and 5. The metrics-server will fail due to readiness probe failures caused by failure to refresh the API server address, and the Pod will not start automatically. This further triggers PDB failure.
Therefore, the following commands are prohibited and AKS will fail when executing it:
After successfully converting current AKS cluster to private cluster, we need to clean up the old role assignments for old security principal, as they are not automatically being cleaned up.