Problems with IO mode, data storage and deep learning

Artificial intelligence, especially deep learning, is a computing technology that is changing many aspects of people's lives. The deep learning algorithm requires a lot of data. The number does depend on the goals of the algorithm and the generated network model, but for some complex patterns, it may run hundreds of millions of input collections.

Artificial intelligence is one of the hot topics in computing, and it has good reason. New technologies in deep learning (DL) have the ability to create neural networks that are more accurate than humans on certain issues. Image recognition is an example of how a deep learning (DL) model can achieve better accuracy than humans in identifying objects in an image (object detection and classification).

The ImageNet competition is an example. Since 2010, the ImageNet Large Scale Visual Identity Challenge (ILSVRC) has been used as a measure of improved image recognition. In 2011, the error rate was about 25% (the tool correctly identified 75% of the images outside the training data set). In 2012, the error rate of deep neural network (DNN) was reduced to 16%. In the next few years, the error rate will be reduced to single digits. In 2017, 29 out of 36 teams had a turnover rate of less than 5%, which is usually better than human identification.

Deep learning uses various types of neural networks and can be applied to various problems. There are usually two main steps in creating a deep learning model: The first step is so-called training. This is the process by which the model repeatedly reads the input data set and adjusts the model parameters to minimize errors (the difference between the correct output and the calculated output). This step requires a lot of input and requires an extremely large amount of calculations.

The second step occurs after the model is trained and is called reasoning. This is the deployment of the training model in production. Production means that the model can be used to read data that is not used for training. It produces output for a task rather than training a neural network. This step also has a calculation component. It does not require a lot of calculations, it needs to achieve such goals as minimizing delay, best possible accuracy, maximizing throughput, and maximizing energy efficiency.

Software for performing the calculation of these two steps is done by the framework. These software tools and databases can read scripts that are usually written in Python, telling the framework what operations are needed and what a neural network looks like. The code is then read by the framework and executed. Examples of frameworks are Tensorflow, Caffe, or PyTorch.

Questions about IO mode

By studying the function of the deep learning (DL) framework, you can understand the IO mode. People do not need to know the details of the specific framework, nor do they need to understand the mathematical knowledge behind neural networks.

The basic flow of training steps in the deep learning (DL) framework is very simple. Neural networks require quite a bit of input data to properly train the network to perform tasks. It can be an image, video, volume, number or a combination of almost any data.

People need a lot of data. In addition, their data must be very diverse and provide extensive information for each input. For example, simple face recognition to determine if a person is male or female requires more than 100 million images.

The input data can be stored in a variety of ways, obtaining a small amount of input data from a simple csv file to understand the deep neural network (DNN) database, as well as a database containing images. As long as the deep neural network (DNN) can access the data and understand the input format, the data can also be distributed in different formats and tools. It can also be a combination of structured and unstructured data, as long as users know the data and format, and can express these data and formats in the model.

The size of the data on the storage media may vary. In the extreme case, the simple image from the MNIST data set is a 28 x 28 grayscale image (values ​​from 0 to 255). There are a total of 784 pixels. This format is very small. Today people have 4K resolution televisions and cameras. This will be 4,096 x 4,096 pixels, for a total of 16,777,216 pixels.

The 4K color representation usually begins with 8 bits (256 choices) or it can reach 16 bits of information. This can lead to very large images. If you make a 4K image as a single uncompressed tiff file with a resolution of 4520 x 2540 and 8 bits, its size is 45.9 MB. For a 16-bit color image, the size is 91.8 MB.

If the organization has 100 million pictures, it is reasonable for some facial recognition algorithms, and the organization has so many files. This is not too bad for today's file system. The total space used in the case of 8-bit images is 4.59 PB. This is a considerable space for a single neural network (NN) that uses large, high-resolution images.

In general, neural networks have two phases when training a network. The first phase is called feed forward. It accepts input and processes it through the network. The output is compared with the correct output to generate an error. Then spread this error (back-propagation) through the network to adjust the parameters of the network in order to reduce the network-generated errors.

This process continues so that all images are processed through the network. This is called an epoch (the number of iterations, 1 epoch is equal to one training using all the samples in the training set). Training a network to achieve the required level of performance may require hundreds, thousands, or tens of thousands of epochs. Deep learning frameworks (eg Tensorflow or Caffe or PyTorch) are responsible for the entire process of user-created network models.

Overall IO process

A brief overview of deep learning IO mode is that data is read again and again. Deep learning is often repeated reading (rereading). Please note that reading some text, but compared to reading, its workload is very small, because it is mainly checked during neural network training. However, in order to improve neural network training, some options that affect the IO mode can be used.

As an example of the amount of data read or written, it is assumed here that the network requires 100 million images, each of which is 45.9 MB. In addition, it is assumed that the network model needs about 40 MB to save, and once for every 100 epochs, and 5000 epochs are needed to train the model.

As mentioned earlier, an epoch needs to read 4.59 PB of data. This needs to be repeated 5000 times. This requires a total of 22.95 EB of data to be read. If each image is a single file, it also needs to read 50 billion files.

For writing IO, the model needs to be written 50 times. This is a total of 2 GB and 50 writes. Compared with reading, its workload is very small.

For this example, a total of 459 PBs performed 10 billion read IOs. This is followed by 40MB of write IO. The entire IO mode is repeated 50 times in total.

This is the basic IO mode for deep neural networks (DNNs) that identify applications. In order to reduce training time, several techniques can be used. The following topics are a quick overview of these technologies from an IO perspective.

Training skills

The first technique used in neural network (NN) training is random shuffling of input data. It is used almost all the time to reduce the required epochs (points of reference) and to prevent overfitting (optimizing the model to the dataset, but the model does not perform well on real world data).

Before the new epoch begins, the order of data reading is random. This means that the read IO mode is based on a random number for each image. It is continuous when reading personal images, but it is random between images. Therefore, due to randomness, it is difficult to characterize patterns as "reading" rather than "reading".

There is also a framework that can read data from a database. The reading of the IO mode is still very onerous, and the data may be shuffled randomly. This may complicate the details of the IO mode because the database is located between memory and the framework.

Sometimes the framework also uses IO's mmap() function. This is a system call that maps a file or device to memory. When the virtual memory area is mapped to a file, it is called "file-based mapping." Reading some memory areas will read the file. This is the default behavior.

Regardless of whether mmap() is used or not, the IO mode is still re-read, following the patterns discussed above. However, using mmap() complicates the analysis because IO goes directly from file to memory.

Another common technique for improving training performance is called batch processing. Instead of updating the network after entering a "batch" image, the network is updated after each input image (including forward and backward propagation). The reverse propagation part of the network operates on errors, for example averaging them, to update network parameters. This does not usually change the IO mode because the image still needs to be read, but it may affect the convergence speed. In general, it can slow down the convergence rate, but less back propagation occurs, which increases the speed of calculation.

When using GPUs (Graphics Processing Units) for training, using batch processing also helps improve performance. Instead of moving files from the CPU to the GPU, batch processing allows the user to copy multiple files to the GPU. This can increase the throughput from the CPU to the GPU and reduce the data transfer time. Taking this example as an example, a batch size of 32 will reduce the number of data transfers to 3125,000 transfers.

Batch processing does contribute to convergence, but it does not really affect the IO mode. This mode is still random reads, writes rarely. But it can change the output created by the framework.

Data storage and deep learning

Artificial intelligence, especially deep learning, is a computing technology that is changing many aspects of people's lives. The deep learning algorithm requires a lot of data. The number does depend on the goals of the algorithm and the generated network model, but for some complex patterns, it may run hundreds of millions of input collections. In general, the more data used to train the model and the more diverse the data, the better the model that will eventually be trained. This points to a very large data set.

In the past, it was discussed that the data will become more and more cold. This means that after creating data, it is rarely used again. And people passed inspection data, including engineering and corporate data, and discovered some very interesting trends:

• Both workloads are more write-oriented. Read and write byte ratios have dropped significantly (from 4:1 to 2:1)

• The read-write access mode has increased 30 times compared to read-only and write-only access modes.

• Files rarely open again. More than 66% were reopened only once, 95% less than five times.

• Files rarely open again.

• More than 90% of the active storage space was unused during the study.

• A small percentage of customers account for a significant portion of document activity. Less than 1% of customers account for 50% of file requests.

The overall use of summary data is very easy.

• IO mode places a lot of importance on writing.

• Data is rarely reused, but it still exists.

A more in-depth study of the algorithm's IO mode reveals that it is almost the opposite of traditional engineers, HPC, and enterprise applications. Deep learning is very stressful on IO guidance. Data is reused when designing and training the model. Even after the model is trained, it is still necessary to use the new data to increase the existing training data set, especially the errors in the model output. This is to improve the model over time.

Canaan AvalonMiner

Canaan is a leading provider of supercomputing solutions, distinguished for superior cost-efficiencies and performance. In addressing the limitations of today`s computing hardware, Canaan strives to advance the world we live in by powering transformative technologies.


Canaan is renowned for having invented the world`s first ASIC-powered Bitcoin Mining Machine in 2013, radically catalysing the growth of a computationally-advanced bitcoin mining sector.

Developed through academic research, rigorous expertise in semiconductor design, and backed by a robust network of strategic partners, Canaan continues to expand its suite of advanced hardware offerings, exploring opportunities across some of the world`s most exciting emerging technologies.


Canaan AvalonMiner:Canaan AvalonMiner 821,Canaan AvalonMiner 841,Canaan AvalonMiner 741,Canaan AvalonMiner 1246,Canaan AvalonMiner 921,Canaan AvalonMiner 1166 Pro

Canaan Avalonminer,1166 Pro 66T,Avalon 821,Avalon 1246,avalon miner

Shenzhen YLHM Technology Co., Ltd. , https://www.hkcryptominer.com

Posted on