Throttling in Azure copy, restore, and expiry jobs
Azure has throttling limits on the number of copy operations for snapshots and page blobs.
This can cause throttling errors which leads to retries and in turn affects the performance of copy and restore jobs. If many copy, restore, or expiry jobs get triggered simultaneously, the jobs take a longer time to complete and the possibility of job failure due to throttling issues is higher.
To avoid these issues, Cloud Snapshot Manager limits the number of DDVE copy, restore, and expiry jobs run within the Azure region.
Cloud Snapshot Manager allows up to 15 disks to be copied or restored at a time within a region. Each VM might have a different number of disks, so the number of copy or restore jobs at a time varies. After one set of copy, restore, or expiry jobs gets completed, the next set of jobs gets queued to be run. So you might notice the following:
- Even though 15 copy jobs have to be triggered for a protection plan in a particular region, only 5 jobs are triggered. As these 5 jobs near completion, new subsequent copy jobs start triggering.
- After the DDVE restore jobs are triggered, you might see an error message that not enough sessions are available and to try after sometime. This issue is due to throttling by Cloud Snapshot Manager. After any of the restore jobs is completed, you can try to run the next restore job.
- Even for expiry jobs, you might see the same error message, as the expiry jobs are handled by Cloud Snapshot Manager based on the number of disk copy operations available for the cloud account and region.
Cloud Snapshot Manager tries to reuse the Azure Container Instances across the copy, restore, and expiry jobs. Each container is used to copy 3 disks at a time. These 3 disks can be of the same or a different VM. After the disk copy or restore operation is completed for any of the disks within the container, Cloud Snapshot Manager assigns another disk for copy or restore within that container. In this manner, Cloud Snapshot Manager reuses the existing container instead of spawning another container for the jobs.