CN111126600A

CN111126600A - Training method of neural network model, data processing method and related product

Info

Publication number: CN111126600A
Application number: CN201911324388.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-08

Abstract

The application relates to a training method of a neural network model, a data processing method and a related product. The method comprises the following steps: obtaining an initial network model; wherein the initial network model is a neural network model; inputting training data into the initial network model for training to obtain a target network model; the training data includes at least one of voice data, text data, and image data. By adopting the method, the processing efficiency of the voice data, the text data and the image data can be improved, and the power consumption of the processor can be reduced.

Description

Training method of neural network model, data processing method and related product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a training method for a neural network model, a data processing method, and a related product.

Background

With the development of neural network technology, the deep learning framework (Caffe) has been widely applied.

The Caffe-based neural network model can be trained to process data such as images, speech, and text, and obtain a desired recognition result, for example, recognizing images to obtain image features, recognizing speech to obtain control commands, and the like. In a traditional neural network model, the amount of data to be processed is larger and larger, so that the energy efficiency overhead of a processor in the data processing and computing process is large.

Disclosure of Invention

Therefore, in order to solve the above technical problems, it is necessary to provide a training method of a neural network model, a data processing method, an apparatus, a processor, a chip, a board, and an electronic device, which can reduce the processor energy consumption overhead.

In a first aspect, an embodiment of the present application provides a training method for a neural network model, which is applied in a computing platform including a processor, and the method includes:

obtaining an initial network model; wherein the initial network model is a neural network model;

inputting training data into the initial network model for training by adopting a data sparsization method to obtain a target network model;

the data thinning is to set part of data in the operation data to be zero to obtain thinned new operation data, and to perform operation by adopting the new operation data; the training data includes at least one of voice data, text data, and image data.

In one embodiment, the data thinning comprises weight data thinning and/or neuron data thinning.

In one embodiment, the weight data thinning includes weight data static thinning, and the obtaining of the target network model by inputting training data into the initial network model for training by using a data thinning method includes:

inputting the training data into the initial network model for training by adopting a weight data static sparsification method to obtain a target network model;

wherein, the static sparsity of the weight data is the operation of updating part of the weight data in the neural network model to zero and operating as new weight data according to the preset sparse condition when each iterative operation in the training process; the sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate.

In one embodiment, the weight data thinning includes dynamic weight data thinning, and the obtaining of the target network model by inputting training data into the initial network model for training by using a data thinning method includes:

inputting the training data into the initial network model for training by adopting a method of dynamic sparsification of weight data to obtain a target network model;

the weight data dynamic sparsification is an operation of taking the result of multiplication of the obtained sparsifying matrix and the corresponding weight data as the weight data of the current operation for operation when each iteration operation in the training process is performed; the sparse matrix obtaining mode comprises the following steps: traversing the weight data according to a preset sparse condition, updating data which meet the sparse condition in the weight data to be 0, and updating data which do not meet the sparse condition in the weight data to be 1 to obtain the sparse matrix; the sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate.

In one embodiment, the method further comprises:

and adjusting the sparse condition according to the output result corresponding to the training data.

In one embodiment, the adjusting the sparse condition according to the output result corresponding to the training data includes:

if the difference degree between the output result and the reference result is greater than or equal to a preset difference degree threshold value, adjusting the sparse condition to the direction of reducing the sparse degree; and/or

And if the difference degree of the output result and the reference result is smaller than the difference degree threshold value, adjusting the sparse condition to the direction of improving the sparse degree.

In one embodiment, the inputting training data into the initial network model for training by using a data sparsification method to obtain a target network model includes:

inputting the training data into the initial network model for training by adopting a neuron data sparsification method to obtain a target network model;

and during each iterative operation in the training process, the neuron data sparseness is to set part of data in the neuron data to be zero according to the corresponding hyper-parameter of each network layer to obtain new neuron data, and to perform operation according to the new neuron data.

In one embodiment, the method further comprises:

and adjusting the hyper-parameters according to the output result corresponding to the training data.

In one embodiment, the adjusting the hyper-parameter according to the output result corresponding to the training data includes:

if the difference degree between the output result and the reference result is greater than or equal to a preset difference degree threshold value, reducing the hyper-parameter; and/or

And if the difference degree of the output result and the reference result is smaller than the difference degree threshold value, increasing the hyper-parameter.

In one embodiment, the method for performing neuron data sparsification by inputting the training data into the initial network model for training to obtain the target network model includes:

and initializing the hyper-parameters corresponding to each network layer.

In a second aspect, an embodiment of the present application provides a data processing method, which is applied in a computing platform including a processor, and the method includes:

acquiring data to be processed; the data to be processed comprises at least one of voice data, text data and image data;

and identifying the data to be processed by adopting the target network model in any embodiment to obtain an identification result.

In a third aspect, an embodiment of the present application provides an apparatus for training a neural network model, which is applied in a computing platform including a processor, and the apparatus includes:

the first acquisition module is used for acquiring an initial network model; wherein the initial network model is a neural network model;

the training module is used for inputting training data into the initial network model for training by adopting a data sparsization method to obtain a target network model;

In a fourth aspect, an embodiment of the present application provides a data processing apparatus, which is applied in a computing platform including a processor, and the apparatus includes:

the second acquisition module is used for acquiring data to be processed; the data to be processed comprises at least one of voice data, text data and image data;

and the processing module is used for identifying the data to be processed by adopting the target network model in any embodiment to obtain an identification result.

In a fifth aspect, the present application provides a processor, where the processor is configured to implement the method in any one of the foregoing embodiments.

In a sixth aspect, an embodiment of the present application provides a neural network chip, where the neural network chip includes a training device of a neural network model as described in the foregoing embodiment or a data processing device as described in the foregoing embodiment.

In a seventh aspect, an embodiment of the present application provides a board card, where the board card includes: a memory device, a receiving device and a control device and a neural network chip as described in the above embodiments;

wherein the neural network chip is respectively connected with the storage device, the control device and the receiving device;

the storage device is used for storing data;

the receiving device is used for realizing data transmission between the chip and external equipment;

and the control device is used for monitoring the state of the chip.

In one embodiment, the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;

the receiving device is as follows: a standard PCIE interface.

In an eighth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a chip as described in the foregoing embodiments.

According to the training method, the data processing method and device for the neural network model, the processor, the chip, the board card and the electronic equipment, the initial network model is obtained through the processor, training data are input into the initial network model for training by adopting a data sparsization method, and a target network model is obtained. Because the initial network model is a neural network model and the data sparseness is to set part of data in the operation data to be zero, obtain the new operation data after the sparseness and adopt the new operation data to perform operation, the initial network model is trained by adopting the data sparseness method, so that the data volume and the operation amount of the operation data can be greatly reduced; meanwhile, due to the reduction of the data volume, the carrying volume of the data can be greatly reduced, and the storage space of the data can be reduced, so that the efficiency of training data processing of the neural network model is greatly improved, and the power consumption of the processor is reduced. Because the training data comprises at least one of voice data, text data and image data, the processor processes the voice data, the text data and the image data by adopting the target network model, the data volume and the operation volume of the operation data can be reduced, the storage space of the data is further saved, the processing efficiency of the voice data, the text data and the image data is greatly improved, and the power consumption of the processor is reduced.

Drawings

FIG. 1 is a diagram illustrating an internal structure of a computer device according to an embodiment;

FIG. 2 is a schematic flow chart of a method for training a neural network model according to an embodiment;

FIG. 3 is a flow diagram illustrating a data processing method, according to an embodiment;

FIG. 4 is a schematic structural diagram of a training apparatus for a neural network model according to yet another embodiment;

FIG. 5 is a schematic diagram of a data processing apparatus according to another embodiment;

fig. 6 is a schematic structural diagram of a board card according to an embodiment.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the description and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

The term "if" may be interpreted as "when.. or" upon "or" in response to a determination "or" in response to a detection, "depending on the context. Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The training method and the data processing method of the neural network model provided by the embodiment of the application can be applied to a computer device shown in fig. 1, and the computer device can comprise a processor. Optionally, the processor may be an artificial intelligence processor or a deep learning processor, and the type of the processor is not limited in the embodiments of the present application. It should be noted that, in the method for acquiring a network model provided in the embodiment of the present application, an execution main body of the method may include a motherboard of a processor, or may be an electronic device including the motherboard. In the following method embodiments, the execution subject is a processor for example.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart illustrating a training method of a neural network model according to an embodiment, which may be applied to a computing platform including a processor. The method comprises the following steps:

s10, obtaining an initial network model; wherein the initial network model is a neural network model.

Specifically, the processor obtains the initial network model, optionally, the processor downloads a pre-established initial neural network model from a database, or receives an initial network model sent by other devices, or initializes or trains the downloaded neural network model in other manners to obtain the initial network model. The initial network model may include any combination of a plurality of network layers, such as a convolutional layer, a normalization layer, a pooling layer, and a full connection layer.

S20, inputting training data into the initial network model for training by adopting a data sparsization method to obtain a target network model; the data thinning is to set part of data in the operation data to be zero to obtain thinned new operation data, and to perform operation by adopting the new operation data; the training data includes at least one of voice data, text data, and image data.

It should be noted that the training data may include one or a combination of voice data, text data and image data, such as a photograph, a piece of voice, a piece of video or text. Generally, as data of a neural network grows, the data redundancy of the network is higher, and in redundant operation data, part of the data has smaller influence on an operation result. The data thinning is to set part of data in the operation data to zero, for example, to zero data with little influence on the operation result, to obtain thinned new operation data, and then to perform operation by using the thinned new data, thereby greatly reducing the data amount and the calculation amount of the operation.

Specifically, the processor inputs training data into an initial network model, trains the initial network model by a data thinning method, and in the operation process of each network layer, the processor may perform thinning processing on all or part of operation data to be operated, such as weight data and/or neuron data, to obtain thinned operation data, perform operation of the network layer based on the thinned operation data, and perform iterative computation for multiple network layers and multiple times, thereby obtaining a trained target network model.

In this embodiment, the processor obtains the initial network model, and inputs training data into the initial network model for training by using a data sparsization method to obtain a target network model, thereby implementing fine tuning retraining of the network model. Because the initial network model is a neural network model and the data sparseness is to set part of data in the operation data to be zero, obtain the new operation data after the sparseness and adopt the new operation data to perform operation, the initial network model is trained by adopting the data sparseness method, so that the data volume and the operation amount of the operation data can be greatly reduced; meanwhile, due to the reduction of the data volume, the carrying volume of the data can be greatly reduced, and the storage space of the data can be reduced, so that the efficiency of training data processing of the neural network model is greatly improved, and the power consumption of the processor is reduced. Because the training data comprises at least one of voice data, text data and image data, the processor processes the voice data, the text data and the image data by adopting the target network model, the data volume and the operation volume of the operation data can be reduced, the storage space of the data is further saved, the processing efficiency of the voice data, the text data and the image data is greatly improved, and the power consumption of the processor is reduced.

Optionally, on the basis of the foregoing embodiment, the data thinning includes weight data thinning and/or neuron data thinning. Specifically, the processor can train the initial network model by adopting a weight data sparsification method to obtain a target network model, and in the process, the weight close to 0 can be set to be 0, so that the weight data is greatly sparse, the data carrying amount, the data amount and the operation amount are greatly reduced, the efficiency is improved, and the power consumption is reduced; the processor can also train the initial network model by adopting a neuron data sparsification method to obtain a target network model, and in the process, the neuron data close to 0 can be set to be 0, so that the neuron data is greatly sparse, the data carrying capacity, the data quantity and the operation quantity are greatly reduced, the efficiency is improved, and the power consumption is reduced; the processor can also be combined with a method of weight data sparsification and neuron data sparsification to train the initial network model to obtain a target network model, in the process, operation data are sparsified from two dimensions of sparse neuron data and sparse weight data to obtain sparse operation data, the operation is carried out by the sparse neuron data and the sparse weight data to obtain an operation result, and the operation data can be sufficiently sparsified by the method, so that the rationality and the accuracy of data processing are ensured.

Optionally, on the basis of the foregoing embodiment, the weight data thinning may include static thinning of the weight data, and one possible implementation manner of step S20 includes: inputting the training data into the initial network model for training by adopting a weight data static sparsification method to obtain a target network model; wherein, the static sparsity of the weight data is the operation of updating part of the weight data in the neural network model to zero and operating as new weight data according to the preset sparse condition when each iterative operation in the training process; the sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate.

It should be noted that the static sparsity of the weight data is an operation of updating part of the weight data in the neural network model to zero and operating the updated weight data as new weight data according to a preset sparse condition when performing iterative operation each time in the training process. In the process, the weight value which is updated to zero is always kept to be zero, and new weight value data is used as weight value data in subsequent operation and iteration processes to participate in operation. Specifically, the processor adopts a method of static sparsification of weight data, and inputs training data into the initial network model for training to obtain a target network model, wherein the sparse condition may include a condition set according to a weight threshold, for example, a weight with an absolute value greater than or equal to the weight threshold is kept unchanged, and a weight with an absolute value less than the weight threshold is considered to have a smaller influence on the result, so that the result is directly updated to zero, optionally, the weight threshold may be a number close to 0, for example, 0.001, and may of course be other values; the sparse condition may further include a condition set according to a preset sparse rate, for example, when the sparse rate is twenty percent, the processor may sort a group of weight data including one hundred weight data from large to small according to absolute values, and update weights corresponding to twenty last-ranked data to zero, thereby obtaining new weight data based on the sparse rate as the sparse condition; optionally, the processor may also update part of the weight data in the neural network model to zero and operate as new weight data by using a combination of the weight threshold and the sparsity ratio, for example, the sparse rate obtained by the method of sparse by using the weight threshold is calculated and compared with the preset sparse rate, if the sparse rate obtained by the method of sparse by adopting the weight threshold is greater than the preset sparse rate, according to the weight threshold as the sparse condition, if the sparse rate obtained by the method of sparse by adopting the weight threshold is less than the preset sparse rate, according to the weight threshold as the sparse condition, and a sparse condition with a high sparse degree is selected from the weight threshold and the preset sparse rate, so that the data volume and the operation amount are further reduced, the data processing efficiency is further improved, and the power consumption of the processor is further reduced. Optionally, the sparsity and the weight threshold may be set as needed, if the precision of the output result is needed to be high, the sparsity may be reduced or the weight threshold may be reduced, if the computational efficiency is needed to be high and the power consumption of the processor is smaller, the sparsity may be increased or the weight threshold may be increased.

In this embodiment, the processor inputs training data into the initial network model for training by using a method of static sparsification of weight data, so as to obtain a target network model. The static sparsity of the weight data is an operation of updating part of the weight data in the neural network model to zero and using the updated weight data as new weight data to operate according to a preset sparse condition when each iterative operation is performed in the training process, and the sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate, so that the weight which is updated to zero is always kept to be zero, the new weight data is used as the weight data in the subsequent operation and the iterative process to participate in the operation, and the weight sparse degree is continuously improved along with the performance of the training iteration, so that the data volume, the operation volume and the data handling volume are continuously reduced, the data processing efficiency is further improved, and the power consumption of a processor is reduced.

The experimental result shows that the nonzero weight data of the AlexNet network can be compressed by 11% after the weight data are statically thinned, specifically, the sparsity of each layer of the AlexNet network in the table 1.1 can be referred to, the nonzero weight data of the VGG network can be compressed by 7.5%, and specifically, the sparsity of each layer of the VGG network in the table 1.2 can be referred to.

TABLE 5.1 sparsity of layers of AlexNet network

TABLE 5.2 sparsity of layers of VGG networks

Optionally, on the basis of the foregoing embodiment, the weight data thinning includes dynamic thinning of the weight data, and a possible implementation manner of step S20 may further include: inputting the training data into the initial network model for training by adopting a method of dynamic sparsification of weight data to obtain a target network model; the weight data dynamic sparsification is an operation of taking the result of multiplication of the obtained sparsifying matrix and the corresponding weight data as the weight data of the current operation for operation when each iteration operation in the training process is performed; the sparse matrix obtaining mode comprises the following steps: traversing the weight data according to a preset sparse condition, updating data which meet the sparse condition in the weight data to be 0, and updating data which do not meet the sparse condition in the weight data to be 1 to obtain the sparse matrix; the sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate.

It should be noted that, in the training process, when the processor performs operation on each network layer, the sparse matrix corresponding to the network layer needs to be acquired, and the acquiring process of the sparse matrix may specifically include: traversing the weight data of the network layer according to a preset sparse condition, updating the data meeting the sparse condition in the weight data to 0, and updating the data not meeting the sparse condition in the weight data to 1 to obtain a sparse matrix consisting of 0 and 1. The sparse condition is a condition set according to a preset weight threshold and/or a preset weight sparse rate. Optionally, the processor may update the weight data whose absolute value is greater than or equal to the weight threshold to 1, and update the weight data whose absolute value is less than the weight threshold to zero; optionally, the processor may also set a condition according to a preset sparsity rate, for example, when the sparsity rate is twenty percent, the processor may sort a group of weight data including one hundred, from large to small, update twenty weight values arranged at the rearmost to 0, and update the other eighty weight values to 1, thereby obtaining a sparsified matrix; the processor can also adopt a mode of combining a weight threshold and a sparsity rate, for example, a sparsity ratio obtained by adopting a method of carrying out sparsity by the weight threshold is calculated and compared with a preset sparsity rate, if the sparsity ratio corresponding to the method of carrying out sparsity by the weight threshold is larger than the preset sparsity rate, the weight threshold is taken as a sparsity condition, if the sparsity ratio obtained by adopting the method of carrying out sparsity by the weight threshold is smaller than the preset sparsity rate, the weight threshold is taken as a sparsity condition, and the sparse condition with higher sparsity degree is selected from the weight threshold and the preset sparsity rate by adopting the mode, so that the data volume and the operand are further reduced, the efficiency of data processing is further improved, and the power consumption of the processor is further reduced.

In this embodiment, the dynamic sparsification of the weight data is performed in such a way that, during each iterative operation in the training process, part of the weight data is not directly updated to zero, but a sparsified matrix is obtained, where the sparsified matrix is a matrix composed of 0 and 1, and the sparsified matrix is obtained again in each training process, and is not fixed, so that the processor performs operation using the result of multiplying the sparsified matrix by the corresponding weight data as the weight data of the current operation, so that the operation result of the weight multiplied by 0 in the sparsified matrix is 0, and the operation data is sparsified to a certain extent. In the method, a processor adopts a method of dynamic sparsification of weight data, training data are input into the initial network model for training, in the process of obtaining a target network model, since the weight data is not directly updated to zero, but by acquiring a different sparse matrix each time, while the weight data are thinned, the weight data which are thinned out (as the weight data of zero participation operation) can return to the state which is not thinned out (restore the non-zero value of the weight data), therefore, in the training process, the state of the weight data participating in the operation is a dynamically changing process, the training data input method may be zero or not, so that the stability of a dynamic training process of inputting the training data into the initial network model for training is greatly improved by adopting a weight data dynamic sparsification method, the flexibility is higher, and the application scenes are richer.

The experimental result shows that the nonzero weight data of the AlexNet network can be compressed by one time after the weight data are dynamically thinned, and the sparsity of each layer of the AlexNet network can be specifically shown in table 1.3.

TABLE 1.3 sparsity of layers of AlexNet network

Layer(s)	Number of weight	Weight non-sparseness in static sparsity	Weight non-sparseness in dynamic sparseness
				convl	35K	84％	53.80％
conv2	307K	38％	40.60％
				conv3	885K	35％	29.00％
conv4	664K	37％	32.30％
				conv5	443K	37％	32.50％
fcl	38M	9％	3.70％
				fc2	17M	9％	6.60％
fc3	4M	25％	4.60％
				Total of	61M	11％	5.70％

Optionally, in the model training process described in each of the above embodiments, a process of adjusting each parameter according to an output result of the model corresponding to each training data may be further included, and optionally, the process may include: and adjusting the sparse condition according to the output result corresponding to the training data. Specifically, the processor adjusts the sparse condition according to the corresponding output result of each iterative computation of the training data, so that the output result meets the training target, the data sparse degree can be changed by adjusting the sparse condition, and the data volume, the operation amount, the data transportation amount and the storage space are reduced while the accuracy of the output result is ensured. Optionally, one possible implementation manner of the method may include: if the difference between the output result and the reference result is greater than or equal to a preset difference threshold, optionally, the difference between the output result and the reference result is greater than or equal to the preset difference threshold, for example, the difference between the output result and the reference result is greater than or equal to the preset difference threshold, or the value of the convergence function between the output result and the reference result does not satisfy the convergence range, the processor determines that the difference between the output result and the reference result is too large due to a high data sparsification degree, and therefore the processor adjusts the sparse condition towards the direction of reducing the sparsity, so that the sparsity of the data is reduced to ensure the accuracy of the output result; optionally, the difference between the output result and the reference result may be smaller than a preset difference threshold, for example, the difference between the output result and the reference result is smaller than the preset difference threshold, or the value of the convergence function of the output result and the reference result meets the convergence range, it is determined that the data sparsity degree can meet the precision or accuracy requirement of the output result, but the sparsity degree may also have an optimized space, so that the processor may adjust the sparsity condition toward the direction of increasing the sparsity, so that the data sparsity degree is increased to further reduce the data volume, the computation volume, the data handling volume, and the storage space, thereby increasing the efficiency and reducing the power consumption. Alternatively, the sparsity ratio may be characterized by a ratio of the number of valid data (i.e., non-zero data) after data sparseness to the number of data before data sparseness.

Optionally, the manner of adjusting the sparse condition of the weight data includes adjusting towards a direction of increasing sparsity, which may be increasing sparsity and/or increasing a weight threshold, so that more weights are updated to 0, thereby increasing sparsity; the method for adjusting the sparse condition of the weight data may further include adjusting towards a direction of reducing the sparsity, which may be reducing the sparsity and/or reducing the weight threshold, so that fewer weights are updated to 0, thereby reducing the sparsity, and thus realizing adjustment of the data sparsity degree by adjusting the sparsity or the weight threshold, so that the method is more flexible to apply.

The above embodiment describes in detail a specific implementation manner of weight data sparsification in the training process, and the following describes in detail training by using a neuron data sparsification method.

In an embodiment, another possible implementation manner of the step S20 may include: inputting the training data into the initial network model for training by adopting a neuron data sparsification method to obtain a target network model; and during each iterative operation in the training process, the neuron data sparseness is to set part of data in the neuron data to be zero according to the corresponding hyper-parameter of each network layer to obtain new neuron data, and to perform operation according to the new neuron data. It should be noted that the neuron data thinning is an operation of updating part of neuron data in the neural network model to zero and operating as new neuron data according to a preset thinning condition every time iterative operation in the training process. Optionally, the sparse condition may be a sparse rate, or may be a sparse condition set according to a preset hyper-parameter t, and the method for sparse neuron data by using a sparse rate may refer to the method for sparse weight data by using a sparse rate, and of course, values of the sparse rate may be the same or different; the neural data sparseness by using the hyper-parameter may be realized by keeping the neural data having an absolute value greater than or equal to the hyper-parameter unchanged, and updating the neural data having an absolute value less than the hyper-parameter to 0. It should be noted that each network layer may correspond to such a hyper-parameter, and the hyper-parameter of each network layer may be the same or different. The processor sparsely arranges the neuron data of each network layer according to the corresponding super-parameter of each network layer, so that the sparse degree of the neuron data can be adjusted based on the size of the super-parameter, and further, the balance between the accuracy of an output result, the operation amount and the like can be conveniently adjusted. According to experimental results, the sparsity of input data brought by RelU (activation function) of different layers of the deep learning algorithm is between 30% and 70%. The neural data may be sparse in each network layer, or may be sparse in part of the network layers, for example, each RelU layer is implemented by adding a new hyper-parameter t, which is not limited in this embodiment.

In this embodiment, the processor inputs training data into the initial network model for training by using a neuron data sparsification method, so as to obtain a target network model. The neuron data sparseness is that when iterative operation is performed each time in the training process, part of data in the neuron data is set to be zero according to the corresponding hyper-parameter of each network layer to obtain new neuron data, and operation is performed according to the new neuron data, so that along with the progress of training iteration, the neuron sparseness degree is continuously improved, the data volume, the operation volume and the data handling volume are continuously reduced, the data processing efficiency is further improved, the storage space is reduced, and the power consumption of a processor is reduced.

Optionally, the value of the hyper-parameter t is adjusted at each retraining iteration, and the sparsity of the input neuron data of the convolutional layer or the fully-connected layer behind the RelU layer can be controlled by the value of the hyper-parameter t.

Optionally, on the basis of the above embodiment of training the network model by using the neuron data sparsification method, the method may further include adjusting the hyper-parameter according to an output result of the model corresponding to each training data. Specifically, the processor adjusts the hyper-parameters corresponding to each network layer according to the output result corresponding to each iterative computation of the training data, so that the output result meets the training target, the degree of sparsity of the neuron data is changed by adjusting the hyper-parameters, and the data volume, the computation volume, the data transportation volume and the storage space are reduced while the accuracy of the output result is ensured. Optionally, one possible implementation manner of the method may include: if the difference between the output result and the reference result is greater than or equal to a preset difference threshold, optionally, the difference between the output result and the reference result is greater than or equal to the difference threshold, or the convergence function of the output result and the reference result does not meet the convergence range, it is considered that the difference between the output result and the reference result is too large due to a high data sparseness degree, so that the processor reduces the hyperparameter, and the data sparseness degree is reduced to ensure the accuracy of the output result; if the difference between the output result and the reference result is smaller than the difference threshold, optionally, the difference between the output result and the reference result is smaller than a preset difference threshold, or the convergence function of the output result and the reference result meets the convergence range, the data sparsity degree is considered to meet the precision or accuracy requirement of the output result, but the sparsity degree may also have an optimized space, so that the processor may increase the hyper-parameter, thereby improving the data sparsity degree to further reduce the data amount, the operation amount, the data handling amount and the storage space, further improving the efficiency and reducing the power consumption.

Optionally, before step S20, the method may further include: and initializing the hyper-parameters corresponding to each network layer. Specifically, the processor may perform initialization operation on the hyper-parameter corresponding to each network layer before model training, so as to set the hyper-parameter to a reasonable initial value, so that the target network model obtained through training is more reasonable. Optionally, the hyper-parameter may be initialized to a smaller value, for example, zero, so that the hyper-parameter is debugged from 0 in a certain step in the training process, and the maximum sparsity of the neuron data is realized while the accuracy of the output result is ensured to meet the requirement, so that the trained target network model is more accurate, and the output result of the target network model is more accurate.

An embodiment of the present application further provides a data processing method, which is applied to a computing platform including a processor, and may refer to the process shown in fig. 3, where the method includes:

s30, acquiring data to be processed; the data to be processed includes at least one of voice data, text data, and image data.

And S40, recognizing the data to be processed by adopting the neural network model according to any one of the embodiments to obtain a recognition result.

The technical principle and the implementation effect of the data processing method provided by the embodiment can be referred to the description in the above embodiments. The neural network model in the embodiment is adopted, namely the trained target network model is used for processing the data to be processed to obtain the recognition result, so that the data volume and the operation volume of the operation data in the data processing process can be greatly reduced; meanwhile, due to the reduction of the data volume, the carrying volume of the data can be greatly reduced, the storage space of the data can be reduced, and the data to be processed comprises at least one of voice data, text data and image data, so that the processor adopts the target network model to process the voice data, the text data and the image data, the data volume and the operation volume of operation data can be reduced, the storage space of the data is further saved, the processing efficiency of the voice data, the text data and the image data is greatly improved, and the power consumption of the processor is reduced. The data processing method may be to perform feature vector extraction on the image data to identify the image, for example, identify an object in the image, identify the type of the image, and the like; the method may also be to identify or convert voice data, and identify or convert text data, so as to identify semantics, and the like, which is not limited in the embodiment of the present application.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided an apparatus for training a neural network model, which is applied in a computing platform including a processor, the apparatus including:

a first obtaining module 100, configured to obtain an initial network model; wherein the initial network model is a neural network model;

a training module 200, configured to input training data into the initial network model for training by using a data sparsification method, so as to obtain a target network model;

In one embodiment, the weight data sparsification includes weight data static sparsification, and the training module 200 is specifically configured to input the training data into the initial network model for training by using a method of weight data static sparsification to obtain a target network model;

In one embodiment, the weight data thinning includes weight data dynamic thinning, and the training module 200 is specifically configured to input the training data into the initial network model for training by using a method of weight data dynamic thinning, so as to obtain a target network model;

In an embodiment, the training module 200 is further configured to adjust the sparse condition according to an output result corresponding to the training data.

In an embodiment, the training module 200 is specifically configured to adjust the sparsity condition in a direction of reducing sparsity when a difference between the output result and the reference result is greater than or equal to a preset difference threshold; and/or when the difference degree between the output result and the reference result is smaller than the difference degree threshold value, adjusting the sparse condition to the direction of improving the sparse degree.

In an embodiment, the training module 200 is specifically configured to input the training data into the initial network model for training by using a neuron data sparsification method, so as to obtain a target network model;

In an embodiment, the training module 200 is further configured to adjust the hyper-parameter according to an output result corresponding to the training data.

In an embodiment, the training module 200 is specifically configured to decrease the hyper-parameter when a difference between the output result and the reference result is greater than or equal to a preset difference threshold; and/or when the difference degree of the output result and the reference result is smaller than the difference degree threshold value, increasing the super parameter.

In an embodiment, the training module 200 is further configured to perform an initialization operation on the hyper-parameter corresponding to each network layer.

In one embodiment, as shown in fig. 5, there is provided a data processing apparatus for use in a computing platform including a processor, the apparatus comprising:

a second obtaining module 300, configured to obtain data to be processed; the data to be processed comprises at least one of voice data, text data and image data;

the processing module 400 is configured to identify the data to be processed by using the target network model according to any of the embodiments described above, so as to obtain an identification result.

For specific limitations of the training apparatus and the data processing apparatus of the neural network model, reference may be made to the above limitations of the training method and the data processing method of the neural network model, respectively, and details are not repeated here. The modules in the training device and the data processing device of the neural network model can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

An embodiment of the present application further provides a processor, where the processor is configured to implement the following steps:

In one embodiment, the weight data sparsification comprises a weight data static sparsification, and the processor is further configured to implement the steps of:

In one embodiment, the weight data thinning comprises weight data dynamic thinning, and the processor is further configured to implement the following steps:

In one embodiment, the processor is further configured to implement the steps of:

and initializing the hyper-parameters corresponding to each network layer.

The embodiment of the application also provides a neural network chip, and the chip comprises the processor in the embodiment.

Fig. 6 is a schematic structural diagram of a board card according to an embodiment. The board may be used in an electronic device, and may include other accessories in addition to the artificial intelligence processor 389, including but not limited to: memory device 390, receiving means 391 and control device 392;

the memory device 390 is connected to the artificial intelligence processor through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the artificial intelligence processor through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM). DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips).

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the artificial intelligence processor and is used for controlling data transmission and data storage of each storage unit.

The receiving device is electrically connected with the artificial intelligence processor. The receiving device is used for realizing data transmission between the artificial intelligence processor and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the artificial intelligence processor by the server through a standard PCIE interface, so that data transfer is realized. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the results of the artificial intelligence processor's calculations are still transmitted by the receiving device back to an external device (e.g., a server).

The control device is electrically connected with the artificial intelligence processor. The control device is used for monitoring the state of the artificial intelligence processor. Specifically, the artificial intelligence processor and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As described, the artificial intelligence processor may comprise a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the artificial intelligence processor can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence processor.

In one embodiment, an electronic device is provided, which includes the processor, chip or board described above.

The electronic device may be a data processor, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by an electronic device program, which can be stored in a non-volatile electronic device readable storage medium, and can include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for training a neural network model, applied to a computing platform comprising a processor, the method comprising:

2. The method of claim 1, wherein the data sparsification comprises weight data sparsification and/or neuron data sparsification.

3. The method according to claim 2, wherein the weight data sparsification includes weight data static sparsification, and the obtaining of the target network model by inputting training data into the initial network model for training by using a data sparsification method includes:

4. The method according to claim 2, wherein the weight data sparsification includes weight data dynamic sparsification, and the obtaining of the target network model by inputting training data into the initial network model for training using a data sparsification method includes:

5. The method according to claim 3 or 4, characterized in that the method further comprises:

6. The method according to claim 5, wherein the adjusting the sparse condition according to the output result corresponding to the training data comprises:

7. The method of claim 2, wherein the using the data sparsification method to input training data into the initial network model for training to obtain a target network model comprises:

8. The method of claim 7, further comprising:

9. The method of claim 8, wherein the adjusting the hyper-parameter according to the output result corresponding to the training data comprises:

10. The method of claim 2, wherein before inputting the training data into the initial network model for training to obtain the target network model, the method of neural data sparsification comprises:

and initializing the hyper-parameters corresponding to each network layer.

11. A data processing method, for use in a computing platform including a processor, the method comprising:

the target network model according to any one of claims 1 to 10 is adopted to identify the data to be processed, and an identification result is obtained.

12. An apparatus for training a neural network model, for use in a computing platform including a processor, the apparatus comprising:

13. A data processing apparatus for use in a computing platform including a processor, the apparatus comprising:

a processing module, configured to identify the data to be processed by using the target network model according to any one of claims 1 to 10, so as to obtain an identification result.

14. A processor, characterized in that it is configured to implement the method of any of the preceding claims 1-11.

15. A neural network chip, comprising the processor of claim 14.

16. The utility model provides a board card, its characterized in that, the board card includes: a memory device, a receiving device and a control device and a neural network chip as claimed in claim 15;

the storage device is used for storing data;

and the control device is used for monitoring the state of the chip.

17. The board of claim 16,

the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;

the receiving device is as follows: a standard PCIE interface.

18. An electronic device, characterized in that it comprises a chip according to claim 15.