This is the first article of a series of others whereby I intend to present out and discuss, in a high-level, how educational institutes have been implementing projects on Big Computing, AI and, Data among so many others on top of Azure services. It has to be great, and I’m excited to use this channel to share that with you.

Since I started my professional career back in early 2000, I have always been passionate about two tremendous and transformational areas: technology and education and of course, at the point in time I got my Computer Science’s degree I had only one certainty: somehow I would like to work with technology applying it to education. Why? Because I firmly believed (actually, I still do) that these two things combined could generate powerful tools, capable of changing peoples lives.

By the time being, I have been trying to accomplish this. I got a master degree researching Bioengineering over 2-years long. Also, I have acted as a titular professor of IT undergraduate courses. When I had the opportunity to create my own company, of course, I didn’t lose that chance and developed a solution from scratch whereby student’s learning gaps were automatically identified and treated correctly in an automated way through personalized recovery activities and such. Once in Microsoft, I started as Technical Evangelist who, in a certain way, is all about education (of the market) as well.

Turns out that a few months ago I had the opportunity to become a Microsoft Cloud Solution Architect (CSA) for education industry supporting the United States, Canada, Brazil and some English-speaking-countries in Latin America. Once again, I had the opportunity to get some fun over my day-by-day working with both education and technology at the same time. So you know what? What such a fantastic experience it has been so far! There are so many disruptive and transformational projects (some of them I’m having the pleasure to be directly involved with) going on in educational institutes around the Americas’ region that I decided compile out the main aspects related to it on this post. The cool thing about it? Somehow, most of those projects are currently using Azure services as a transformation agent behind the scenes.

Because these projects and new possibilities I’ve been seen and working with are so exciting, I decided to come up here and through this post invite you to come along and see how Microsoft’s cloud and its related services are genuinely transforming K12, Higher-Ed, Libraries, and Museums.

Research

As mentioned earlier, back in early 2006 I had the chance to get my Master Degree in Bioengineering at the University of Sao Paulo, Brazil. Long story short, I used to be part of a team which the primary duty was the development of a new computational approach to automatically identify speech’s related (and possible throat as well) illness. I used to be the “code” guy; once there was a bunch of bioengineers (creating scenarios to be analyzed), mathematicians (building mathematical models based on those current scenarios) and speech therapists (the ones in charge to verify out the accuracy of the generated results). I used to use those numerical models recently created and transform them in C++ code. Also, on top of that data, apply transformational algorithms like Fast Fourier Transform (FFT), Wavelets and such. 

I still remember – depending on the amount of data (audio files) being processed, we used to spend up 2-weeks in a row processing parallel algorithms on top of those files using conventional servers towards to extract out something meaningful from it. Furthermore, we needed to request (and naturally wait for the next spot available) to get access to the university’s data center to use those machines with GPU unities available. It was tough!

Thanks to the public clouds this is no longer (at least it doesn’t suppose to be) a reality. Modern cloud computing platforms like Microsoft Azure, for example, does offer lots of different “Big Computing” or High Performance Computing (HPC) options for researchers to accomplish the laborious task of processing complex and intensive algorithms.

Big Computing

One of these options currently available on Azure (I would love to get that in 2006) is a service called “Batch” or “Azure Batch.” This service allows us to quickly create and configure a large and complex either Linux or Windows-based environment capable of hosting one or various machine-pools where each pool could have dozens or hundreds of nodes, and each node could be a composition of regular processing cores and GPUs unities, as you can see through the image below.

Azure Batch is ideal for scenarios of “embarrassingly parallel” computing, where the processes are automatically distributed over the nodes’ cores, and there are no communication or dependency between them. This model is also known as “intrinsically parallel” computing. Basically, by receiving a given “job” from the client, Batch service will break up that job in several tasks and then will assign nodes in an indicated pool to asynchronously execute it. Both application, income files and output files must seat on the same storage accounts configured in the occasion in which Batch services was created and set. If the algorithm uses MPI framework to distribute the process along the cluster, Azure Batch would a nice option too, once the service natively supports that.

There are several cases where Azure Batch has been the natural choice of Research’s Departments inside Universities. Some of the projects I’m involved in nowadays are using Azure Batch for the following:

  1. Data-intensive applications: Customer needs to process a significant amount of data (nearly to 40 TB) in a weekly basis and they would like to decrease the response time of this process and, at the same time, reduce investments on physical devices. To accomplish that, they’re starting to use Azure Batch with some dozens of Linux (CentOS) and GPU-based VMs to maximize up its processing capacity. Because of this, they should be able to get answers fastest answers and, by creating and deleting cluster as needed, they will be able to reduce costs. This is all about astronomic and geospatial data processing.
  2. Genomics: Azure does have a specialized PaaS service to genomics processing (you can see more information about it here). However, the way this customer does that processing requires some additional configuration on the pool’s nodes. This is why we opted to use the Batch service. By using startup tasks, we can configure some aspects like, Python version and so on.

There are another options for “big computing” or HPC on Azure, though. The following list present some of them:

  • Azure Genomics Service: a fully-managed and specialized cloud service for genomics processing. 
  • HDInsights: managed service designed to manipulate through massive amounts of data using the Hadoop technology stack. This service is ideal for Big Data scenarios.
  • Azure Batch AI: Azure Batch AI helps you experiment with your AI models using any framework and then train them at scale across GPU and CPU clusters.
  • Deep Learning VM: The Deep Learning Virtual Machine (DLVM) is a specially configured variant of the Data Science Virtual Machine (DSVM) to make it easier to use GPU-based VM instances for training deep learning models.
  • Azure VMs for HPC: A broad number of VMs specifically designed to support HPC workloads. You can always raise up your clusters using extremely robust VMs (GPU included) configurations by yourself.

Artificial Intelligence

Azure AI services are another ones which present a perfect fit for research scenarios. Often I have been seen research departments combining the results of its vast amount of processed data with machine learning algorithms to classify things or even to predict other ones. For these scenarios, the approaches I have been recommending and helping customers moving fast, are:

  1. Azure Machine Learning Studio: A SaaS solution whereby users can create from scratch both simple and complex ML models only dragging-and-dropping visual blocks which represent actions to be executed by the tool. There is also a wide range of ML algorithms, and additional resources are available. Furthermore, Azure ML does support both R and Python scripts to implement customizations on top of the data model being used. Once getting the experiment done, you can publish that experiment as a RESTFul service for making it available to external applications. The picture below shows how an Azure ML looks like.

To give you a sense in what is possible with Azure ML, a research department in the US is currently using Azure to automatically classify out as “normal” or “non-normal” the brain signals received from an external application. In Brazil, a K12 institute is actively using Azure ML to create experiments to accurately predict students evasion and based on that, to develop actions to avoid it in advance.

However, it goes beyond. There are other approaches to work with AI on Azure. You can figure out more information about each one of this solutions following the links below.

  • Cognitive Services: A full-managed and high-scalable collection of specialized services for a wide variety of scenarios, like face recognition, sentiment analysis, video indexing, search recommendations, Bot conversations, natural language understanding, and so forth.
  • Deep Learning VM: The Deep Learning Virtual Machine (DLVM) is a specially configured variant of the Data Science Virtual Machine (DSVM) to make it easier to use GPU-based VM instances for training deep learning models.
  • Azure CNTK: A free, easy-to-use, open-source, and commercial-grade toolkit that trains deep learning algorithms to learn like the human brain.

Storage and archiving

Research departments usually produce a significant amount of data. Is not uncommon to see the research department’s storage systems extrapolating petabytes of data over time. Another specificity regarding its data is that usually the majority of this massive amount of information is not frequently accessed by people or systems so, to be effective in costs terms, a robust solution for archiving is valuable as well. This way, the “infinity” storage capacity and features provided by the public cloud providers (like Microsoft through Azure) is vital for research departments.

The good news is: Azure can help on that too. Azure Storage accounts were designed to provide “infinite” storage capacity with high levels of throughput for I/O operations, security, geo-distribution and high-availability; both for hot (files that need to be ready both to be read and written) and cold (when the data are not frequently accessed) access and of course, it fits perfectly to researchers needs.

There are also good options offered by Microsoft Azure for those interested to move large amounts of data from on-premises to Azure with no internet dependency. There are two different kinds of services in this regard: Azure Import and Azure Data Box. While you send your information using your data disks to Microsoft Azure data center by selecting Azure Import, by opting per Azure Data Box you’re are going to use Microsoft data disks to ship off your information to the DC picked.

Important reminder: Storage Accounts are the foundation of every service on Azure. I’m featuring it in the research section; however, it will be valuable in whatever another aspect inside educational workloads.

I will stop here for now, and this is because Azure can add value in so many ways to educational institutes and I don’t want to stress all those aspects in a text so long. See you soon, then!


0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *