Maloy Manna

Data, Tech, Cloud Security & Agile Project Management

Delta Lake and the Lakehouse Architecture

The technology world often sees upheavals when disparate concepts are put together to achieve different objectives, creating something which is much more than the sum of its parts. Delta Lake is one such concept, which has melded log and ACID, bringing transaction and atomicity concepts into the ETL-analytics-big.data field, creating a revolution of sorts.

The problem(s):
Since traditional data warehousing, the design and modeling of analytics systems relied on denormalized tables, as analytics systems were considered separate from transactional systems. This started to change with the move to the cloud and availability of more real-time data. With the advent of big data technology like HDFS/Hadoop, additional constraints on updates and storage of relational datasets were added due to performance costs. The difficulty was particularly acute for cloud customers who faced additional latency compared to on-premises HDFS/Hadoop users. GDPR compliance meant deleting or correcting customer data required massive table-wide updates for a few records, with increased probability of data corruption and consistency issues in case of crashed updates.

Identity Basics 2 - Permissions, Scopes and Consent

In my previous post, we saw how app registrations add identity configurations for applications on Azure AD. Just like a user, an application would also require access to resources like Microsoft Graph, which need authorization. The resource owner can grant(consent) or deny this authorization to the application. There are mainly 2 access scenarios:

  1. Delegated access - access on behalf of a signed-in user. User is signed-into a client application, which access the resource on behalf of the user. This requires delegated permissions (also referred to as scopes). All scenarios involving user actions should use delegated access. Also, scopes should be limited using the principle of least-privilege.
    See also: the full list of Microsoft Graph permissions
  2. App-only access - access without a user, as the application’s own identity. This scenario is when the application runs as a background service or daemon used for automation or backup, or the data can’t be scoped to a single user. The client app needs to be granted appropriate application roles of the resource app it’s calling to access the requested data. Application roles granted through consent are called application permissions.

Access Scenarios

Identity Basics 1 - Application Registrations

For some time now, I’ve been working on security risk assessments of web applications. Modern identity management can be complex and often requires diving deep into the authentication flow and registration process to understand risk blocks in order to design appropriate controls and counter-measures. I hope to write a short series of posts to document the components and flows of this process, so that it can be my handy reference.

Modern authentication fundamentals

In my post on Identity and Access Management (IAM), I provided a very high-level view of how modern authentication works on the basis of a centralized Identity provider, like Azure Active Directory.

In this post, let’s look at a Microsoft Azure video, where Azure AD Program manager Stuart Kwan presents the basics of modern claims-based authentication in a lucid and eloquent way. Clearly if a picture is worth a thousand words, a video is probably worth a million!

Azure Security - Service endpoint vs Private endpoint

Endpoints are a critical aspect of securing your resources in the cloud. When using Azure PaaS services, it is important to understand the differences between two types of endpoint available in Azure: service endpoint and private endpoint.

Service endpoint:

A service endpoint is a way of extending your virtual network’s private address space to Azure services over the Azure backbone network. When a service endpoint is enabled, traffic between your virtual network and the Azure service of your choice stays on the Azure backbone network, rather than going over the public internet. This provides better security and performance for your resources.