Migrating Data within Azure
At times you are faced with requirements to move data within Azure from one storage account to another for multiple reasons say Reorganization, billing constraints, Data Usability, Data Governance etc.
To move all the containers within your storage account along with all the blobs present inside each of the container presents a challenging situation. Well "azcopy" comes to the rescue. It enables you to either selectively copy only certain blobs inside your containers or copy specific containers from one storage account to another. Besides it can also enable you to copy your entire storage account contents along with all the containers and blobs present in it from one storage account to another.
Below is an example highlighting the syntax to use to copy all the contents of one storage account including containers and blobs to another storage account. The storage accounts can be in the same region or different.
azcopy copy 'https://<source-storage-account-name>.blob.core.windows.net/<SAS-token>' 'https://<destination-storage-account-name>.blob.core.windows.net/' --recursive
Now how do we ensure data consistency with the results of azcopy. Meaning how do we ensure that data has copied without any issues without opening any file as there might be thousands of files.
Azcopy will copy all the data (upload and download). when you start the process. There won’t any issue with the consistency!. AzCopy creates log and plan files for every job. You can use the logs to investigate and troubleshoot any potential problems. The logs will contain the status of failure (UPLOADFAILED, COPYFAILED, and DOWNLOADFAILED), the full path, and the reason of the failure. By default, the log and plan files are located in the %USERPROFILE.azcopy directory on Windows or $HOME.azcopy directory on Mac and Linux. Note: When you resume a job, AzCopy looks at the job plan file. The plan file lists all the files that were identified for processing when the job was first created. When you resume a job, AzCopy will attempt to transfer all of the files that are listed in the plan file which weren't already transferred. Data transfers are done with spare bandwidth and that there is no SLA as to whether it'll be fast or slow. Will 'sync' delete files in the destination if they no longer exist in the source location? By default, the 'sync' command doesn't delete files in the destination unless you use an optional flag with the command. To learn more, see Synchronize files. AzCopy is designed to delete partial data after failures IF it has deletion rights and IF the failure is a controlled one (i.e. something where AzCopy can detect it and perform cleanup in response). Sudden termination of the AzCopy process itself would not be a controlled failure, and so it would not be able to cleanup partially-completed destination files in that case. E.g. if you kill the process from Task Manager. (BTW, block blobs are a special case, with stricter behavior around partial data, because PutBlock and PutBlockList are atomic. Therefore on block blobs you won't see incomplete data even after an uncontrolled failure. Whereas on page blobs and Azure Files you can see incomplete data after an uncontrolled failure because they are not atomically updated). Additional information :To validate the data integrity, you have to download the file using AzCopy with /CheckMD5 option, and then compare the downloaded file with your local original file. However, given AzCopy has made its best effort to protect data integrity during transferring, the validation step above is probably redundant and not recommended unless data integrity is much more important than performance.
Comments
Post a Comment