How to Connect an Azure Synapse Analytics Workspace / Azure Data Factory to an Azure Data Lake Storage Gen 2 (Cross-Tenant and with Activated Firewall)

Nolock
10 min readAug 16, 2021

--

Lucky you if you work for a company which has just one tenant. Then, if you have a storage account with an activated firewall, you can connect to it from all ADFs, Synapse Analytics Workspaces, and so on across subscriptions with ease. Everything works together like a charm. But it becomes a challenge if you have more than one tenant. On the following lines I will describe how I have connected an Azure Synapse Analytics Workspace in one tenant to a storage account in another tenant. The same also works with Azure Data Factory and a storage account.

Let’s visualize what we want to configure.

We have two tenants: black and blue. There are also two resource groups rg-tenant-black and rg-tenant-blue. The goal is to connect an Azure Synapse Analytics Workspace in the blue tenant to a container in a storage account in the black tenant. We will start with a shared key for authentication and authorization, then we replace the shared key with a Service Principal. In the end of the article, we will activate a firewall on the storage account and configure a managed private endpoint on the storage account.

What we have

Storage account

I have provisioned a storage account with a hierarchical namespace, the so called Azure Data Lake Storage Gen 2, in the black tenant. (All screenshots have either black or blue header to distinguish in which tenant we are.)

Azure Synapse Analytics

In the blue tenant, there is a Synapse Analytics Workspace called syn-tenant-blue. On the screenshot, you see the blue header -> we are in the blue tenant.

Shared key, no firewall

Let’s start with the simplest case. We want to provide an access to a storage account from an Azure Synapse Analytics Workspace.

We click on the button Data, then on the tab Linked, and last on the plus sign. The following menu opens.

We select Connect to external data and choose Azure Data Lake Storage Gen 2.

A new mask opens on the right-hand side. Here we can configure a new linked service.

Let’s start with filling out the form. We need a name for the new linked service. If you like, you can also add a description.

As next, we will use initially the authentication method Account key. It is the access key you know from the storage account settings.

Furthermore, we should choose our storage account. The form allows only selecting a storage account from the same tenant. But we are now in the blue tenant and our storage account was provisioned in the black one. We need to find another way. We click on Enter manually in the menu Account selection method where we can insert the URL of our ADLS Gen 2.

The URL can be found in the left pane in the storage account under Endpoints.

We will also need an access key to authenticate on our storage account. These keys can be found in the left pane under Access keys.

Let’s click on Show keys to unhide them.

Now we can copy the key to clipboard and use it in the mask in the blue tenant.

Then click on Test connection in the right lower corner to test the connection to ADLS Gen 2. It was successful.

But the big disadvantage of shared access keys is that they ignore RBAC and ACLs. You get the full access to all containers and their data. It is something what you do not want to have in the most scenarios. Even Microsoft itself does not recommend using the access keys (Manage account access keys — Azure Storage | Microsoft Docs). Let’s find another way, a better one.

Service principal, no firewall

Let’s switch to the Service Principal authentication method. We will be able to limit the access to any level thanks to RBAC and ACL, and there is no key to share/maintain/store.

Earlier in this article, when we used the Account key authentication method, we needed an access key. When we want to authenticate via a service principal, we need much more. First, we must create an app registration in our black Azure Active Directory (AAD) — that is the tenant with the storage account. Let’s go to the black tenant’s AAD and navigate to the menu App registrations in the left pane. As next, we click on the New registration button.

The following mask opens. We choose the second option for supported account types in order to access the app from any AAD.

After some more clicks, which are simple and not important for this scenario, a new app registration is created.

We copy the Application (client) ID and Directory (tenant) ID and paste it in the mask of the new linked service. (We copy it from the black tenant and paste it into the blue tenant.)

Next, you must decide if you use a client secret or a certificate as the service principal credential type. Just for ease of use in this demonstration we use a client secret. It can be generated in the Certificates & secrets menu on the left-hand side.

After creating a New client secret, we copy the value.

… and paste it into the mask in the blue tenant.

There is still one thing we must configure in the app registration. The app must own a permission to communicate with a data lake service. Let’s go to the API permissions menu and click on Add a permission. We scroll down and select Azure Data Lake. Then we choose Delegated permissions and select user_impersonation among permissions in the middle of the window.

Our app has now a full access to the Azure Data Lake service. It does not mean that the app can access any data in a storage account, just that it can communicate with the Azure Data Lake service.

For a better demonstration of the capabilities of this service principal approach, we create two containers in our storage account sttenantblack. There are called company-internal and nolock-private.

Our Azure Synapse Analytics Workspace instance will have access only to the company-internal container. Let’s navigate to the container company-internal and assign the role Storage Blob Data Contributor to the app registration appreg-syn-tenant-blue.

The result looks like this:

Now we can start with testing. First, we try to connect to the whole storage account.

It does not work. That is correct because our service principal was not granted access to the whole storage account but to a container only. Therefore, we click on Test connection / To file path and write down the name of our shared container company-internal. Voila, it works.

A short intermezzo

At this point, if you want, you can also disable the access to a storage account via Access keys. If it is disabled, you can use neither Access key nor Shared access signatures (SAS) anymore. You can activate it again any time if needed.

Service principal, activated firewall

The last part is the most challenging. We want to activate the firewall of a storage account and still be able to communicate with the storage account from the Azure Synapse Analytics Workspace. How to do that? If both resources, the storage account and the Azure Synapse Analytics Workspace, were in the same tenant, there would not be any challenge. We just use a system-assigned managed identity for the authentication, and it works automatically.

However, as far as I know, we cannot use the system-assigned managed identity if the communication is cross tenant. We must authenticate with a service principal and find out a way how to create an exception in the firewall rules. Well, we never know from what IP ranges the request will come because the list of Azure Integration Runtime IP addresses changes every week — for more information read here. (And it also contains IPv6 addresses which cannot be used as exceptions in the storage account firewall.) Moreover, it can be an internal request which does not have a public IP. Therefore, we need another solution.

Let’s activate the firewall on the storage account.

Then, if we try to connect again to the storage account, the read operation fails. We get the error AuthentizationFailure.

The solution is the so called Private endpoint connections. We can create it directly in the mask of a new linked service with just few clicks. We need a name of the managed private endpoint and the Resource ID of the storage account in the black tenant.

After applying we see that the new resource is provisioning.

After a moment the status changes to Pending.

What does it mean? The other side, the storage account in the black tenant, must approve that we can access the storage account this way.

Let’s go back to the black tenant, navigate to Networking and in the Private endpoints connections tab there is a pending request.

We select the pending request and click on the Approve button.

The status changes to Approved.

From now on the new linked service can connect again to the storage account. (Do not ask me why but the status in the new linked service mask stays Pending, however it is already approved. The refresh does not help.)

Azure Data Factory

If we want to configure a new linked service in Azure Data Factory, we do not automatically have the chance to use a managed private endpoint. To achieve that, we must provision a new Azure Integration Runtime with the virtual network support. If we want to test the connection, we must enable the interactive authoring capability as we see on the following screenshot. When we switch to the new integration runtime in the New linked service form, the option to use a managed private endpoint shows up.
That was the only difference to the setup used in Azure Synapse Analytics Workspace.

Final notes

We have done a lot of work today. We have a storage account with activated firewall in one tenant and we have an Azure Synapse Analytics Workspace in another. They communicate with each other thanks to a private endpoint connection. The requests are authenticated via a service principal and have granted permissions to only one container in the storage account.

--

--