Azure Data analytics: Part2: Benfits of using azure data lake storage Gen2

Azure has bee providing great storage capabilities using Azure storage accounts and Azure Blob storage. Then why go in for Azure Data Lake storage?
Below are the key benefits of using Azure Data Lake storage Gen2 over a normal storage for your analytics needs:-

1. Has tiering and data life cycle management capabilities.
2. Provides High availability, security and durability.
3. Designed for handling exabyte scale of data providing a throughput of hundreds of Gigabytes hence can be used for both real-time and batch solutions
4. Hadoop Compatible Access: You can treat the data as if it is stored in a Hadoop Distributed File system hence can be used directly with Azure DataBricks, HDInsights, Azure Synapse Analytics without moving data between environments
5. Security: Supports Access Control Lists (ACL) and Portable Operating System Interface (POSIX) permissions. You can set permissions at Directory level and file level. Data stored at rest is encrpted using Microsoft or customer managed keys.
6 Performance: Stores data in hierarchy of Directories and sub-directories and files much like a file system as a result data processing requires less computational resources saving time and cost.
7. Data Redundancy: Provides options such as Locally Redundant Storage and Geo Redundant Storage (GRS)

One of the USP of the Azure Data lake storage is its ability to support hierarchical namespaces. With hierarchical namespace you can organize your data into Directories with metadata about each file and directory within it. This allows operations like directory renames and deletes in a single atomic operation. Simple azure storage accounts only support flat namespaces and require operations proportional to the number of objects in the structure. A hierarchical namespace keeps the data better organized and yield better storage and retrieval performance for analytical use case which lowers the cost of analysis.

To see the difference between the flat namespace and the hierarchical namespace on the azure portal:_

Create a normal storage account with hierarchical namespace disabled. Go under containers. Create a new container. Go under the new container created. You would not find any option to create directories:-



Whereas if you create a storage account with hierarchical namesapce (Azure data lake storage Gen2) enabled. Then go under containers. Create a new container and go under this container you would see the option to create directories:-













Comments

Popular posts from this blog

python3: unpickling error

Azure Data Analytics: Part1: Hosting Data Lake storage: Gen1 and Gen2