Processing video is that kind of scenario that fits perfectly for cloud computing, right? I mean, it usually requires a certain power of processing (video is usually a heavy kind of workload) available, it does require a considerable amount of storage, it usually needs to be high-scalable (depending on the volume of videos being processed), so on so forth.

In education (a industry which I have been working very closely for a long time), especially in Higher-Ed, processing video is a pretty common operation due to the fact they usually do deliver tons of content in video format throughout their internal LMS (Learning Management System). So, on today’s post I’m going to guide you through the process of creating an automated pipeline for video processing which relies on different services in Azure:

  • Azure Media Services (AMS): is an extensible cloud-based platform that enables developers to build scalable media management and delivery applications. Media Services is based on REST APIs that enable you to securely upload, store, encode, and package video or audio content for both on-demand and live streaming delivery to various clients (for example, TV, PC, and mobile devices).
  • Video Indexer (VI): Azure service that consolidates various audio and video artificial intelligence (AI) technologies offered by Microsoft in one integrated service, making development simpler.
  • Durable Functions (ADF): is an extension of Azure Functions that lets you write stateful functions in a serverless compute environment. The extension lets you define stateful workflows by writing orchestrator functions and stateful entities by writing entity functions using the Azure Functions programming model.
  • Logic Apps: Azure Logic Apps is a cloud service that helps you schedule, automate, and orchestrate tasks, business processes, and workflows when you need to integrate apps, data, systems, and services across enterprises or organizations.

Architectural view

You know that, in this cloud’s World everything starts by designing an architecture for the solution you’re targeting to build. We’re going to do the same here. The Figure 1 shows up the general view for the architecture we’re proposing.

Figure 1. Solution’s proposed architecture

A brief explanation about the general flow presented early on. In summary, this is what happens when a new video arrives into a specific container (here suggested as “incoming-videos”) within a given Azure Storage Account:

  1. A new event is triggered by the storage account and is captured by a Logic App.
  2. The Logic App then gather the information previously sent and calls out a new function whom validates the information received and then, starts a new stateful video processing flow by calling an orchestration function under ADF.
  3. Under-the-hood ADF does call activity functions respecting a pattern known as “Chaining calls” (please, take a look at Figure 2) whereby both the ingestion, encoding, publishing and insights extraction routines are performed. All these action functions actually do is to call specific routines (actually performed by AMS/VI) and wait for its responses.
Figure 2. The chaining pattern

Receiving the incoming video and starting the encoding process

Throughout this article, I’m going to cover the entire process related to processing the video in an automated fashion so let’s start by understanding it from the beginning.

The encoding process both starts and finishes relying on Logic Apps to get it done. In the middle, to overcome the Azure Media Services side, we are using Azure Durable Functions as primary automation tool as we don’t have at this point connectors and tasks for this purpose within Logic Apps. So, the very first thing to do in here is to answer the following question: “what does happens when a new video lands into Azure Storage Blob?”

The straightforward answer is: Storage does generate a new “Blob added/updated” event and then, because I have pre-configured an instance of Logic App to “hear” and react (calling the Starter function) to that kind of event, the processing flow get’s started. The Figure 3 presents a general view about the flow itself.

Figure 3. Logic App which reacts to a new video file into blob storage

In simple words:

  • The first step indicates that the Logic App will be “hearing” for that particular kind of event in a specific container (incoming-videos) within the blob storage. It will verify the occurrence of that kind of event every minute and will return the information about one single blob.
  • Once that event is fired up, the step 1 will call out an Azure Function along with the payload described in the body. This starter function just called will be in charge of begining the ADF processing flow.

A deeper look into the orchestration code

Video encoding is a statefull process by design. This is because, somehow, you have to hold the results from all of those different steps (encoding, insights, streaming endpoint, and such) comprising the entire process for, at the end, provide an unique return to, either another system or a final user, a consolidated view about everything there.

Because ADE brings a built-in way to create stateful (relying on Storage Queues and Tables) and programmatic workflows, we picked it up as primarily solution to address our needs. Important to mention at this point that this is only one way to get it done. You could solve the “statefull” part of the problem in so many different ways.

The starter function

Everything starts with a regular function call (in my case, called “Starter”). Who calls this function? The Logic App which was “hearing” the blob creation/update event mentioned early on. The code below shows up what it actually does.

Aspects to be considered in here:

  • Line 2: We are instantiating a new member type “[OrchestrationClient] DurableOrchestrationClient starter“. This object has a special power to initiate a new orchestration flow so, that’s why it is sitting in here.
    Lines 9 – 13: I’m binding up the dynamic object with the information that just arrived from the initial request.
    Line 24: I’m starting the orchestration flow itself. Please note that, this call may reference both the orchestration function (in this case "O_Orchestrator") and the initial set of information (here referenced as "videoModel") whereby the first activity function will be relying on to get its operation done.
    Line 27: Does verify the current status of the orchestration process.

The orchestration function

If you’re not familiar with the concepts tied to ADF, I strongly recommend you to go over the official documentation to get started, but everything starts with the orchestration function receiving an initial call and, from that moment on, calling and orchestrating sub-functions (well-known as “Activity Functions”) which actually perform individual tasks underneath the flow.

Below you can see the orchestration function adhered for this project.

Important aspects:

  • Line 2: The orchestration function declares a [OrchestrationTrigger] DurableOrchestrationContext context, whereby information about the flow itself will be accessed from. If you recall, we’ve injected a DTO object called videoModel into the orchestration context so, from now on, we’re able to access that information from within the flow.
  • Line 6: I’m retrieving context’s info.
  • Line 17: I do call the first activity function, which does the initial set up on my Azure’s environment and then, when it is finished, the result is stored in resultInitialSetup object.
  • Lines 23, 29 and 35: I do the same call for each activity function respectively, injecting the proper context’s information.

Every step accomplished within the flow is automatically tracked by the Durable Function’s engine. If, for some reason, something fails within the flow, the failure itself will be logged internally. You can easily track everything happening inside the flow by calling the endpoints below (examples) with its respective flow’s ID.

{
    "id": "abc123",
    "purgeHistoryDeleteUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/abc123?code=XXX",
    "sendEventPostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/abc123/raiseEvent/{eventName}?code=XXX",
    "statusQueryGetUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/abc123?code=XXX",
    "terminatePostUri": "http://localhost:7071/runtime/webhooks/durabletask/instances/abc123/terminate?reason={text}&code=XXX"
}

For more information on how to track internal flow’s progress, please, give a click on this link.

Activity Function 1: Initial Set up

When it comes to the individual operations under the video processing flow, everything starts by setting up the environment for this chain of activities. That’s why the first activity function here goes after it, as you can see below.

Quick description about what’s happening:

  • Lines 4 – 8: It reads environment variables related to the pre-existing Azure services: AAD, Azure Media Services API, Storage Account, so on so forth.
  • Line 11: [ActivityTrigger] VideoAMS videoDto declares the object which receives the context information from the orchestrator’s call.
  • Line 14: Creates a new AMS’ Asset object.
  • Line 15: Creates a new AMS’ Locator object to be tied to the Asset later on.
  • Line 19: Does access the AMS instance.
  • Line 22: Request access token to Azure AD’s service principal previously configured by the time AMS was created and configured.
  • Line 34: Creates a required Policy Id for the Asset within the AMS instance.
  • Line 35: Effectively creates the new Asset within the referred AMS.
  • Line 39: Does create the Locator within the AMS and ties it to the referred Asset so the videos can be accessed externally.
  • Lines 54 – 57: Moves the video which originally arrived into a regular blob storage into the Asset’s blob.

Activity Function 2: Encoding’s Job

Once the initial set up completes successfully, we can start the video’s encoding job within AMS. That’s exactly what the orchestration function calls afterwards. The code below shows up how to queue up a new encoding job into AMS.

Again, some quick explanation about what’s happening:

  • Lines 9 – 11: It does ask a new token to Azure AD to be used by the service principal towards to manipulate things (in this case, create a new processing Job) within AMS.
  • Line 14: Does create a new queue into Azure Queue to receive notifications from the encoding process.
  • Line 17: Creates a notification point (into Azure Tables) and maps it out to the just created queue.
  • Lines 23 – 24: Creates the encoding job for the given video within the Asset.
  • Line 28: Because we need to have the encoding process completed before to move to the next stage, we have to synchronously wait for its completion. This is what we’re doing here.
  • Line 38 – 39: Cleans up the temporary resources (queue and notification point respectively).

Activity Function 3: Streaming out the video

Ok, now that we have the video finally encoded, we are ready to finally stream it out. The activity function here described shows how I did get there.

Quick description:

  • Line 14: After getting the token issued by Azure AD to my service principal, I’m calling a method which does publish (streams) the video out over Azure CDN.

Activity Function 4: Extracting insights

The insights extraction in this solution is outsourced to a built-in connector for Video Indexer within Logic App service. We could directly call VI’s APIs, however, towards to maximize productivity, we’re approaching a serverless mechanism to get the job done. Because of this, the activity function here presented only does a simple HTTP call to the Logic App endpoint that holds the connectors to VI service.

As you can see by analyzing the call var response = client.PostAsync(_logicappuri, content);, in here I’m just pointing the call to the Logic App endpoint and, at the same time, adding the context object as an additional parameter so I can retrieve some useful information generated internally to the ADF’s flow.

Done, from that moment on, the flow will be outsourced to a Logic App instance, and we’re going to discuss what’s happening down there in details.

Support methods

You may have noticed that most of the job related to performing the encoding process over the four activity functions I have described are relying on some methods living in a different part of the application, right?

If you want to go even deeper than we actually did by the time being, and take a look on the operations being executed under-the-hood, please, be my guest and give a click in this link. It is going to direct you to a file within the solution’s Github repository where each of these methods have been sitting into.

Managing the extraction of insights

As stated early on, the process of extraction of the insights from the video being processed in here is outsourced to a new instance of Logic App. I have chosen this approach for one single reason: it is going to save us a considerable amount of time and won’t compromise both performance and associated cost. So, why not? Figure 4 presents the general view for this last mile.

Figure 4. Video’s insights extraction with Logic App

As you can see, our video’s insights extraction is comprised of seven different steps, throughout which I’m going to guide you over from now on.

Step 1: When a HTTP request is received

I’m pretty sure you recall when I mentioned that the activity function dedicated to extract insights actually just call the Logic App endpoint and sends the context information into it, don’t you? So, that call arrives in here, and once it happens, this entire flow gets started and the first step passes the context information received (and respecting the pre-determined schema) over to the upcoming steps. The Figure 5 maximizes the view about the step 1.

Figure 5. Maximizing the view about the first step

Step 2: Get Access Token

To be able to make any operation within the Video Indexer service, Logic App needs to rely on a service principal. This step requests an access token to Azure AD tied to that service and then uses it to perform the insights extraction process by attaching the token to that service principal. The Figure 6 amplifies the view around this step.

Figure 6. Acquiring a new access token set up

Step 3: Upload video and index

Provided with the access token (note on the Figure 7 I’m attaching it into the task), we can finally start the process of extracting the insights. There are two different ways throughout which you can approach the task to select the video for processing: 1) You could directly provide a hot URL from where the video would be sitting onto; 2) You pass an assetID considering you outsourced the encoding process to Azure Media Services. I’ll pick the second option as I overcame the encoding process taking advantage of AMS’ tools.

Also, as you can see over the Figure 7, I have selected the option “NoStream” for the parameter “Streaming Preset”, indicating that the video won’t be streamed by Video Indexer service. It will only extract the insights and that’s it.

I have also picked “Portuguese” as idiom for that extraction as the videos I’m delivering through this solution are all pt-BR-based.

Figure 7. Set up for the extraction process

Step 4: Until

Until is a special kind of task delivered by Logic Apps service which allows you to repeat certain operation (or group of operations) happening inside it while some criteria hasn’t been satisfied. In our case, because I depend on the result of the extraction to move forward, I necessarily have to monitor the progress of the work and repeat the monitor operation followed by a delay (in my case 3 minutes) while it doesn’t gets finished. When the job gets done (I mean, processing state turns to “Processed”), we can then move to the next step.

The Figure 8 shows up the configuration in place to make it happen.

Figure 8. Monitoring the status of the processing

Step 5: Get Access Token

Because the access token has an expiration period of time settled, and also keeping in consideration that the extraction process can take longer than that amount of time(depending on the size of the video), we ask Azure AD another access token before to proceed to the next step.

Step 6: Get Video Caption

Now that we have the processing completed and the insights extracted, because a requirement for this solution is to get the the subtitle automatically generated being returned, I have a step up to retrieve it from the asset just generated. The task will return the subtitle in “vtt” format and customer would take advantage of it by attaching it to the actual player that is going to reproduce the video at the end. Figure 9 presents the configuration for this task.

Figure 9. Retrieving the caption for the video just processed

Step 7: Send email

Finally, towards to notify the final user about the success of the processing of encoding, streaming and extraction of insights, I’m dropping them off a message which contains both Streaming URL and the caption generated by the previous step. I’m using a specialized task built-in in Logic Apps for SendGrid to get it done. Please, see the Figure 10.

Figure 10. Sending an email with the video information attached

Additional considerations

As you might be noticed, this post does take advantage of a bunch of pre-existing Azure services. If you would like to know how to get those services up and running and how to build this solution from ground up, you can refer to the documentation available on Github for that. The repository is available in here.

At the same repository, you will find the complete code solution. Feel free to use in your products by yourself.

If you would find some issue or a opportunity to collaborate with something, I would be happy to receive, analyze and approve your pull request so, feel free to do it.

Also, for the solution here proposed work properly, you are going to need to manually create a link between your Azure Media Service instance and the Video Indexer instance. You can do that by creating a new account within the Video Indexer web portal and then, from there, providing both your Azure AD tenant’s and AMS’s information to properly connect those two. Please, refer to this link to guide you through this process.

Another important point: to configure certain Video Indexer tasks within the Logic App insights extraction flow, you’re going to have to provide an “Account ID”. You can grab that information from Video Indexer API Portal by getting yourself logged into, navigating to “Products”, then “Authorization”, adding a new account and then, finally, generating a new key for the account.

That’s it. Hopefully this is going to help you out with this process.


2 Comments

Martin Kearn · September 20, 2020 at 12:37 pm

Hi, thank for this great insight. I’m just wondering why there is a logic app in the picture at all? The Durable Function could be triggered by the arrival of the video in storage and thus removing the need for the logic app and simplifying the architecture and deployment process. This would be through a blob storage trigger binding. You can see the supported Durable Function bindings here: https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings#supported-bindings

    Fabricio Sanchez · October 2, 2020 at 8:29 am

    Thanks for the comments, Martin.
    You’re right. We brought Logic Apps into the scene for demoing how different pieces in Azure could come together.
    But, architecture-wise in a production scenario, we wouldn’t need logic apps the this work done.
    Thanks for the call out.

Leave a Reply

Your email address will not be published. Required fields are marked *