What is Neural Architecture Search?
Building a solid neural architecture is quite time-consuming as it is an iterative process. Designing an architecture that matches your precise requirements is even tougher, as crafting such networks warrants a lot of trial and error and experimentation, leading to too many failures.
Optimization and efficiency is the key to ensuring that the neural model is running to its maximum potential. To achieve maximum potential and overcome the challenges in building an efficient neural model, you need to embrace Neural Architecture Search (NAS). In this post, we will take a look at this popular concept that is shaping the world of neural models.
Table of Contents
- Understanding the complexity of NAS
- Major components of NAS
- Conclusion
Understanding the complexity of NAS
Neural Architecture Search works by combing through multiple architectures to find the perfect one for an existing neural network. This is a relatively new area, and a lot of research is going on, particularly around various methods and approaches to accomplish the given task.
NAS comes under AutoML and has features of engineering, hyperparameter optimization, and transfer learning. It's the most complicated Machine learning problem currently in active research.
NAS replaces the human effort of manually finding a superior neural network for a specific dataset through trial and error. It does this by tweaking the neural network and learning the working of the system and the results. It also automates the task of finding the most complex architecture to run a neural network with high efficiency. NAS utilizes a set of tools and methods to test and evaluate a huge set of architectures across space using a search strategy. The end goal is to choose the one model that meets the problem’s objective and maximizes the fitness function.
Major components of NAS
Neural Architecture Search can be broken down into three major components -
Search Space
This is nothing but the search space of a target neural network. The neural layer is the most common component of a deep learning model. You have to determine the type and number of layers that need to be explored. They can be a set of operations like pooling, fully connected, and convolution layers.
Next, you would need to configure these layers. It could be the number of cases or units, the number of filters, stride, and Kernel size. Other elements like operations (say dropout layers or pooling layers) and activation functions can also be included in the search space.
You could add more elements to the search space and increase its versatility, but there is a catch. The more elements you add, the more degrees of freedom will be added, which, in turn, increases the cost of finding the optimal architecture. Usually, advanced architectures tend to have several branches of layers and a plethora of elements. It’s not easy to explore such architectures using NAS.
To reduce the complexity of search space without compromising on the complexity of the neural network, you can use the “cells” technique. In that, the NAS algorithm optimizes small blocks separately and later uses those small blocks in combination.
Search Strategy
Generally, even a simple search space would require plenty of trial and error to find the optimal architecture. So, a neural architecture search space would ideally need a search strategy. This is to determine the way the algorithm can experiment with various neural networks to get the desired results. Starting with the most basic strategy, which is the “random search,” the algorithm randomly picks a neural network from the search space. This neural network is then trained, validated, and results are registered. Once that is done, it will go on to pick the next neural network from the search space.
This random search method is extremely expensive as it uses a brute-forcing technique throughout the search space; this ends up wasting expensive resources on testing solutions that can be easily eliminated using an easier method. Based on the neural network complexity, this operation can take days or weeks of GPU time to verify every possible neural network architecture.
Bayesian optimization is another technique that can be used to boost the process. It starts with a random search and fine-tunes the next direction based on the results obtained and information gathered.
The next strategy is to frame NAS as a reinforcement learning problem. In that, the reinforcement learning environment is the search space, the different configurations of the neural network are the actions, and the performance of the network is taken as the reward. It starts with random modifications, but over time it adapts to pick the configuration that gives the best results.
There are various other search strategies like Monte Carlo tree search and evolutionary algorithms. Each of these strategies has its own strength and weakness. Through exploration and experimentation of these strategies, neural engineers find the optimal architecture for a given dataset.
Evaluation/performance estimation strategy
In the process of choosing an optimal neural network, the NAS algorithm goes through the search space and trains and validates various deep learning models. It finally compares their performances. This typically takes a long time. Now, performing the full training for each of these neural networks is not only time-consuming but also requires a large number of computational resources. To reduce the costs incurred in training these deep learning models, engineers use an algorithm called proxy metrics. It ensures that the model doesn’t have to undergo the full training of the neural network.
Using proxy metrics, one can train models for a few epochs, that too on lower-resolution data or by using a smaller dataset. The resulting deep learning models might not have full potential, but they will create a baseline to compare various models at a minimal cost. Once the architecture has been rounded off to a few neural networks, then the NAS algorithm can build a solid foundation for training and testing the models.
An experienced engineer might reduce the cost of evaluation even further by initializing the new models and comparing them with the data of previous training models. This is popularly known as transfer learning. This strategy has been observed to achieve convergence at a much quicker pace. In other words, the deep learning model will end up requiring fewer training epochs. However, this model is compatible only if the source and destination models have compatible architecture.
Conclusion
As you would have seen in this post, NAS provides an automated algorithm to achieve better optimization and performance while building an efficient neural model. Research has intensified even further in this space as NAS has been showing consistent results. The day is not very far when NAS will reach every industry, bringing better efficiency to companies.