Deep learning requires a lot of calculations. It usually contains a neural network with many nodes, and each node has many connections that must be constantly updated during the learning process. In other words, every layer of a neural network has hundreds or thousands of identical artificial neurons performing the same calculations. Therefore, the structure of the neural network is suitable for the types of calculations that the GPU (Graphics Processing Unit) can efficiently perform (the GPU is designed specifically for calculating the same instructions in parallel).
With the rapid development of deep learning and artificial intelligence in the past few years, we have also seen the introduction of many deep learning frameworks. The goal of the deep learning framework is to efficiently run deep learning systems on the GPU. These deep learning frameworks rely on the concept of computational graphs, which define the order of calculations that need to be performed. In these frameworks you are using a language that can build graphs, and the language's execution mechanism is different from that of the host language itself. The calculation graph can then be optimized and run in parallel on the target GPU.
In this article, I would like to introduce to you the five main frameworks for promoting the development of deep learning. These frameworks make it easier for data scientists and engineers to build deep learning solutions for complex problems and perform more complex tasks. This is only a small part of many open source frameworks, supported by different tech giants, and promoting mutual innovation.
1. TensorFlow (Google)
TensorFlow was originally developed by Google Brain Team researchers and engineers. Its purpose is to study deep neural networks and machine intelligence. Since the end of 2015, TensorFlow's library has been officially open sourced on GitHub. TensorFlow is useful for quickly performing graph-based calculations. The flexible TensorFlow API can deploy models between multiple devices through its GPU-supported architecture.
In short, the TensorFlow ecosystem has three main components:
The TensorFlow API written in C++ contains APIs for defining models and training models using data. It also has a user-friendly Python interface.
TensorBoard is a visual toolkit that helps analyze, visualize, and debug TensorFlow calculation charts.
TensorFlow Serving is a flexible, high-performance service system for deploying pre-trained machine learning models in a production environment. Serving is also written in C++ and can be accessed via the Python interface and can be instantly switched from the old mode to the new mode.
TensorFlow has been widely used in academic research and industrial applications. Some notable current uses include Deep Speech, RankBrain, SmartReply and On-Device Computer Vision. You can check out some of the best official uses in TensorFlow's GitHub project to study models, examples, and tutorials.
Let's take a look at a running example. Here, I trained a 2-tier ReLU network based on L2 loss with random data on TensorFlow.
This code has two main components: define the calculation graph and run this graph multiple times. In defining the calculation graph, I create placeholders for the input x, the weights w1 and w2, and the goal y to place a placeholder. Then in forward propagation, I calculate the prediction of the target y and the loss value (the loss value is the L2 distance between the true and predicted values ​​of y). Finally, I let Tensorflow calculate the gradient loss for w1 and w2.
After completing the calculation graph build, I create a dialog box to run the calculation graph. Here I have created a numpy array that will fill placeholders (placeholders) created during the construction and provide them with values ​​for x, y, w1, w2. To train the network, I run the calculation graph repeatedly, using gradients to update the weights and then get the numpy arrays for loss, grad_w1, and grad_w2.
Keras: Advanced Packaging
The deep learning framework operates at two levels of abstraction: low-level mathematics and neural network basic entity implementations (TensorFlow, Theano, PyTorch etc.) and high-level use of low-level basic entities to implement neural network abstractions, such as models And layers (Keras).
Keras is the wrapper for its backend library, which can be either TensorFlow or Theano - which means that if you are using Keras with TensorFlow as the backend repository, you are actually running TensorFlow code. Keras has considered many basic details for you because it is aimed at users of neural network technology and it is very suitable for those who practice data science. It supports simple and rapid prototyping, supports multiple neural network architectures, and can run seamlessly on CPU/GPU.
In this example, training a neural network similar to that in the previous example, I first define the model object as a series of layers and then define the optimizer object. Next, I build the model, specify the loss function, and train the model with a single "fit" curve.
2. Theano (University of Montreal)
Theano is another Python library for fast numerical calculations that can run on a CPU or GPU. It is an open source project developed by the Montréal Learning Algorithms Group at the University of Montreal. Some of its most prominent features include the transparent use of GPUs, tight integration with NumPy, efficient symbol differentiation, speed/stability optimization, and a large number of unit tests.
Unfortunately, Youshua Bengio (head of MILA Labs) announced in November 2017 that they will no longer actively maintain or develop Theano. The reason is that most of the innovations that Theano has introduced over the years have now been adopted and refined by other frameworks. If you are interested, you can still contribute to its open source library.
Theano is similar to TensorFlow in many ways. So let's take a look at another code example that uses the same batch and input/output dimensions to train the neural network:
I first defined the Theano symbol variable (similar to the TensorFlow placeholder). For forward propagation I calculate the prediction and loss; for back propagation, I calculate the gradient. Then I compile a function that calculates losses, scores, and gradients based on data and weights. Finally, I run this function several times to train the network.
3. PyTorch (Facebook)
Pytorch is very popular among academic researchers and a relatively new deep learning framework. The Facebook artificial intelligence research group developed pyTorch to address some of the problems encountered in its use of the former database Torch. Due to the low popularity of the programming language Lua, Torch never experienced the rapid development of Google TensorFlow. Therefore, PyTorch employs the original Python imperative programming style already familiar to many researchers, developers, and data scientists. At the same time, it also supports dynamic calculation charts. This feature makes it attractive for researchers and engineers who work on time series and natural language processing data.
Uber has used PyTorch best so far, and it has built Pyro, a general-purpose probabilistic programming language that uses PyTorch as its back end. PyTorch's dynamically differentiated execution capabilities and ability to build gradients are very valuable for random operations in a probabilistic model.
PyTorch has 3 levels of abstraction:
Tensor: imperative ndarray but running on GPU
Variables: Calculate the nodes in the graph; store data and gradients
Module: neural network layer; can store state or learnable weights
Here I will focus on the tensor abstraction level. PyTorch tensors are like numpy arrays, but they can run on the GPU. There is no built-in calculation graph or gradient or deep learning concept. Here, we use PyTorch Tensors (tensor) to fit a 2-tier network:
As you can see, I first created a random tensor for data and weights. Then I calculate the predictions and losses in the forward propagation process and manually calculate the gradients in the back propagation process. I also set the gradient descent step size for each weight. Finally, I trained the network by running this function multiple times.
4. Torch (NYU / Facebook)
Next, let's talk about Torch. It is Facebook's open source machine learning library, scientific computing framework, and scripting language based on the Lua programming language. It provides a wide range of deep learning algorithms and has been used by Facebook, IBM, Yandex, and other companies to solve the data flow hardware issues.
As a direct ancestor of PyTorch, Torch and PyTorchg share a lot of C backends. Unlike PyTorch, which has 3 levels of abstraction, Torch has only 2 tensors and modules. Let's try a code tutorial that uses Torch tensors to train two layers of neural networks:
Initially, I built a multilayer neural network model and a loss function. Next, I define a backtracking function, input the weights and generate a loss/gradient on the weights. Inside the function, I calculate the prediction and loss in the forward propagation, and the gradient in the backward propagation. Finally, I repeatedly passed the traceback function to the optimizer for optimization.
5. Caffe (UC Berkeley)
Caffe is a deep learning framework that combines the expressiveness, speed and thinking of modularity. Developed by the Berkeley Artificial Intelligence Research Group and the Berkeley Vision and Learning Center. Although its kernel is written in C++, Caffe has Python and Matlab related interfaces. This is useful for training or fine-tuning the feed forward classification model. Although it is not used much in research, it is still very popular with deployment models, as community contributors have demonstrated.
In order to use Caffe to train and fine-tune neural networks, you need to go through 4 steps:
Convert data: We read data files, then wash them and store them in a format that Caffe can use. We will write a Python script for data preprocessing and storage.
Define the model: The model defines the structure of the neural network. We chose the CNN architecture and defined its parameters in the configuration file with the extension .prototxt.
Defining a solver: The solver is responsible for model optimization and defines all the information about how to make gradient descent. We define solver parameters in the configuration file with the extension .prototxt.
Training model: Once we have prepared the model and solver, we train the model by calling caffe binary from the terminal. After training the model, we will get a trained model in a file with the extension .caffemodel.
I will not do a code demonstration for Caffe, but you can see a tutorial on Caffe's home page. In general, Caffe is very useful for feedforward networks and fine-tuning existing networks. You can easily train the model without writing any code. Its Python interface is very useful because you can deploy the model without using Python code. The downside is that you need to write C++ kernel code for each new GPU layer (under Caffe). Therefore, the construction of large networks (AlexNet, VGG, GoogLeNet, ResNet, etc.) will be very troublesome.
Which deep learning framework should you use?
Since Theano is no longer being developed, Torch is written in a Lua language that is not familiar to many people. Caffe is still in its precocious stage. TensorFlow and PyTorch have become the preferred framework for most deep learning practitioners. Although both frameworks use Python, there are some differences between them:
PyTorch has a more clean and refreshing interface that is easier to use and especially suitable for beginners. Most of the code is written more intuitively, rather than fighting the library. In contrast, TensorFlow has a more complex, small, ambiguous library.
However, TensorFlow has more support and a very large, dynamic and helpful community. This means TensorFlow's online courses, code tutorials, documentation and blog posts are more than PyTorch.
In other words, as a new platform, PyTorch has many interesting features that have not yet been perfected. But what's amazing is that PyTorch has made tremendous achievements in just over a year.
TensorFlow is more extensible and very compatible with distributed execution. It supports all systems from GPU-only to large systems that involve heavy distributed reinforcement learning based on real-time experiments and errors.
Most importantly, TensorFlow is "define-run", defining conditions and iterations in a graph structure, and then running it. PyTorch, on the other hand, is "defined on the fly," where the graph structure is defined in real time during the forward calculation. In other words, TensorFlow uses static calculation graphs while PyTorch uses dynamic calculation graphs. The dynamic graph-based approach provides more operational debugging capabilities and greater processing power for complex architectures such as dynamic neural networks. Static graph-based methods can be more easily deployed to mobile devices, easier to deploy to more diverse architectures, and have the ability to compile ahead of time.
Therefore, PyTorch is more suitable for the rapid prototyping of enthusiasts and small projects, and TensorFlow is more suitable for large-scale deployment, especially when considering cross-platform and embedded deployment. TensorFlow has stood the test of time and is still widely used. It has more features and better scalability for large projects. PyTorch is easier to learn, but it doesn't have the same integrated integration capabilities as TensorFlow. This is useful for small projects that need to be completed quickly, but is not the best choice for product deployment.
Write last
The above list is just the most prominent framework among many frameworks, and most of them support Python language. Several new deep learning frameworks have been released over the past few years, such as DeepLearning4j (Java), Apache's MXNet (R, Python, Julia), Microsoft CNTK (C++, Python), and Intel's Neon (Python). Each framework is different because they are developed by different people for different purposes. Having a general overview will help you solve your next deep learning challenge. When choosing the best fit for you, ease of use (in terms of architecture and processing speed), GPU support, difficulty in getting tutorials and training materials, neural network modeling capabilities, and supported languages ​​are all important considerations.
SMD: It is an abbreviation of Surface Mounted Devices. It is one of Surface Mount Technology (SMT) components. In the initial stage of circuit board production, through - hole assembly is completely manual. When the first automated machines were introduced, they could place some simple pin elements, but complex elements still needed to be placed manually for reflow soldering. Surface Mounted components mainly include rectangular chip components, cylindrical chip components, composite chip components and special-shaped chip components
Surface Mounted Devices
Changzhou Changyuan Electronic Co., Ltd. , https://www.cydiode.com