The Tencent Cloud FPGA joint team consisting of Tencent Cloud Basic Product Center and Tencent Architecture Platform Department introduces the deep implementation learning algorithm (AlexNet) of the first FPGA cloud server in China, and discusses the architecture of the deep learning algorithm FPGA hardware acceleration platform.
On January 20th, Tencent Cloud launched the first high-performance heterogeneous computing infrastructure in China, the FPGA cloud server, which will spread the FPGAs that large companies can use for long-term payment to more enterprises in the form of cloud services. At approximately 40% of the cost of a general-purpose CPU, performance can be increased to more than 30 times that of a general-purpose CPU server. The specific sharing content is as follows:
In March 2016, AI Go program AlphaGo defeated human chess player Li Shishi, igniting the industry's enthusiasm for the development of artificial intelligence, and artificial intelligence became the trend of the future.
Artificial intelligence consists of three elements: algorithms, calculations, and data. The most mainstream of artificial intelligence algorithms is deep learning. The corresponding hardware platforms for calculation are: CPU, GPU, FPGA, ASIC. Due to the arrival of the mobile Internet, users generate a large amount of data every day collected by the portal application: search, communication. In our QQ and WeChat business, the number of pictures generated by users every day is hundreds of millions. If we regard the data generated by these users as mineral deposits, the hardware platform corresponding to the calculation is regarded as an excavator. Calculate the standard for hardware platform comparison.
The main computing platform for the initial deep learning algorithm was the CPU. Because of the versatility of the CPU, the hardware framework was mature and very friendly to the programmer. However, when deep learning algorithms are increasingly demanding computing power, it is found that CPUs are not efficient at performing deep learning. In order to meet the versatility of the CPU, a large part of the chip area is used for the complicated control flow and the Cache buffer, leaving a small area for the arithmetic unit. At this time, the GPU has entered the field of deep learning researchers. The original purpose of the GPU is image rendering. Because the image rendering algorithm is relatively independent between pixels and pixels, the GPU provides a large number of parallel computing units, which can simultaneously process many pixels in parallel, and this architecture can be used in deep learning algorithms.
The GPU runs deep learning algorithms much faster than the CPU, but due to the high price and large power consumption, it brings many problems to its large-scale deployment in IDC. Someone will ask, if you make a dedicated chip (ASIC) designed for deep learning, will it be more efficient than the GPU? In fact, to really do a deep learning dedicated chip is facing great uncertainty, first of all to use the best semiconductor manufacturing process for performance, and now the latest process to manufacture chips can cost millions of dollars. To eliminate the problem of funds, organize the R&D team to design from the beginning, the complete design cycle time is often more than one year, but the current deep learning algorithm is constantly updated, and the design of the dedicated chip architecture is suitable for the latest deep learning algorithm, which is very risky. . Some people may ask that Google is not a dedicated chip TPU for deep learning design? From the current performance-to-power ratio increase (more than ten times improvement) announced by Google, it is far from the upper limit of the dedicated processor, so it is likely to adopt a GPU-like architecture with a lower data bit width. May still have strong versatility. In recent years, FPGAs have attracted everyone's attention. Internet companies such as Amazon and Facebook have deployed FPGAs in batches in the data center to provide hardware platforms for their deep learning.
FPGA is called "Field Programmable Gate Array". The basic principle is to integrate a large number of digital circuit basic gates and memories in the FPGA chip. Users can define these gates by programming FPGA configuration files and The connection between the memories. This kind of burning is not one-off, that is, users can configure the FPGA as an image codec today, and can edit the configuration file to configure the same FPGA as an audio codec tomorrow. This feature can greatly improve the data center flexibility service. ability. Therefore, the FPGA can quickly realize the chip architecture developed for deep learning algorithms, and the cost is cheaper than the designed dedicated chip (ASIC). Of course, the performance is not as strong as the dedicated chip (ASIC). ASIC is a hammer sale, designed to find out where the basic is not a chance to change, but FPGA can be re-configured to continue to try to know the best solution, so the risk of using FPGA development is much smaller than ASIC.
2. Alexnet algorithm analysis 2.1 Alexnet model structureThe structure of the Alexnet model is shown in Figure 2.1 below.
Figure 2.1 Alexnet model
The input of the model is a 3x224x224 size picture, using a 5 (convolution layer) + 3 (full connection layer) layer model structure, part of the layer convolution and adding Relu, Pooling and NormalizaTIon layers, the last layer of the fully connected layer is output 1000 classification Softmax layer. As shown in Table 1, all 8 layers need to be subjected to 1.45 GFLOP multiply-accumulate calculation. The calculation method is as follows.
Number of layers
Number of kernel
Convolution times per kernel
One convolution operation per kernel
Floating point multiply
Tier 1
96
3025
(1x363)x(363x1)
96x3025x363=105M=210MFLOP
Level 2
256
729
(1x1200)x (1200x1)
256x729x1200=224M=448MFLOP
Level 3
384
169
(1x2304)x(2304x1)
384x169x2304=150M=300MFLOP
4th floor
384
169
(1x1728)x(1728x1)
384x169x1728=112M=224MFLOP
5th floor
256
169
(1x1728)x(1728x1)
256x169x1728=75M=150MFLOP
Level 6
1
4096
(1x9216)x(9216x1)
4096x9216=38M=76MFLOP
7th floor
1
4096
(1x4096)x(4096x1)
4096x4096=17M=34MFLOP
8th floor
1
1000
(1x4096)x(4096x1)
1000x4096=4M=8MFLOP
sum
1.45GFLOP
Table 2.1 Alexnet floating point calculation
2.2 Alexnet convolution operation features
Using USB drives already saves paper, but when you choose a custom wood USB key, you can help offset your impact on the environment even further. Organic materials are back in demand. You can now reap the benefits in your promotional USB project by including a pure & natural image to your brand.
This stylish wooden custom flash drive has a magnet on the inside of the cap to keep it firmly in place while not in use. With the use of trendy craftsmanship, this wooden usb key add natural richness to your branding. This wooden USB stick works great for trade industries, health spas, photographers and other hospitality businesses. Our Konchang wooden USB drives are popular with photographers & allow you to showcase your promotional materials, photos, or videos.
Specifications
These wood USB keys offer all of the benefits of a traditional USB Flash Drive, but with more flair and practical applications. This rectangular wooden USB stick is created from a real piece of wood. It can easily be used to store your promotional marketing material, photos, videos or documents. The wooden USB key can be customized with your logo laser engraved or imprinted on 1 or 2 sides.
Our wood usb drive can also be imprinted, engraved or customized with a special [burn" process for a truly natural look. You may personalize the exterior finishing of the Natural wood tones in either Walnut, Mahogany, Bamboo, or Maple.
Konchang offers up to 600MB of FREE data loading so you can avoid the inconvenience and save time of doing it yourself. As with any USB Drive, we also offer a complimentary virtual proof to make your purchase decision easier.
Wooden Card Pen Drive,Wooden Usb Flash Pen Drive,Wood Usb Flash Drive Bulk,Wood Usb Flash Drive With Box,usb flash drive for photography
Shenzhen Konchang Electronic Technology Co.,Ltd , https://www.konchangs.com