CN111882046B - Multimedia data identification method, device, equipment and computer storage medium - Google Patents
Multimedia data identification method, device, equipment and computer storage medium Download PDFInfo
- Publication number
- CN111882046B CN111882046B CN202011034311.1A CN202011034311A CN111882046B CN 111882046 B CN111882046 B CN 111882046B CN 202011034311 A CN202011034311 A CN 202011034311A CN 111882046 B CN111882046 B CN 111882046B
- Authority
- CN
- China
- Prior art keywords
- layer
- neural network
- network model
- target
- batch normalization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium, and belongs to the technical field of data processing. The method comprises the following steps: obtaining a second neural network model; acquiring multimedia data to be identified; and identifying the multimedia data to be identified through the second neural network model and acquiring an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for identifying multimedia data.
Background
Multimedia data recognition is a technology for recognizing various multimedia data such as image data and audio data, and various technical effects such as classification, processing, analysis, and the like of the image data and the audio data can be realized by recognizing the image data and the audio data.
In a multimedia data recognition method in the related art, multimedia data to be recognized is recognized through a neural network model, the neural network model includes at least one hidden layer, a Batch Normalization (BN) layer is connected to the hidden layer of the neural network model, output of the hidden layer is used as input of the BN layer, and output of the BN layer is used as input of a next layer, so that optimization of the neural network model is achieved.
However, in the above multimedia data recognition method, the structure of the neural network model is complicated, and the computation amount is large, so that the computation amount of the multimedia data recognition method is large, and the recognition speed is slow.
Disclosure of Invention
The embodiment of the application provides a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium. The technical scheme comprises the following contents.
According to an aspect of the present application, there is provided a method of identifying multimedia data, the method comprising the following steps.
The method comprises the steps of obtaining a second neural network model, wherein the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula into a target hidden layer of the first neural network model, the first neural network model comprises at least one hidden layer, the target hidden layer of the at least one hidden layer is connected with the target batch normalization layer, the batch normalization processing formula is obtained by preprocessing parameters of the target batch normalization layer, and the batch normalization processing formula is used for processing an original output quantity of the target hidden layer into a processing output quantity and inputting the processing output quantity into the next layer of the target hidden layer.
And acquiring multimedia data to be identified.
And identifying the multimedia data to be identified through the second neural network model.
And obtaining the identification result of the second neural network model.
Optionally, the obtaining the second neural network model includes the following steps.
And acquiring the first neural network model.
And acquiring a preprocessing parameter of the target batch normalization layer, wherein the target batch normalization layer is used for processing the original output quantity of the target hidden layer according to the preprocessing parameter to obtain a processing output quantity.
And obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters.
And removing the target batch standardization layer.
And adding the batch standardization processing formula into the target hidden layer to obtain the second neural network model.
Optionally, the batch normalization process formula includes:
wherein, theSaidSaidIs the second of the treatment outputiAn element, theIs the second of the original output quantityiAn element, theNormalizing the weights of the layers for the target lotiAn element, theData variance of normalization layer for the target batchiAn element, theepsIs a preset number, theb bn (i) Normalization layer compensation amount for the target batchiAn element, thee bn (i) Data mean of the target batch normalization layeriAn element;
the preprocessing parameters of the target batch normalization layer comprise the preprocessing parametersSaidSaidb bn (i) And the above-mentionede bn (i)。
Optionally, when the target hidden layer is a full-connected layer, the ith element of the original output quantityThe batch normalization processing formula includes:
wherein, theinput size Is the input dimension of the fully-connected layer, theIs the first of the weight matrix of the full connection layeriGo to the firstjElements of a column, saids(j) Is the first of the input feature quantities of the full connection layerjElements of a column, saidb fc (i) For the compensation of the full connection layeriThe elements of a row.
Optionally, when the target hidden layer is a convolutional layer, the original output quantity corresponding to a jth output channel of the convolutional layerThe batch normalization processing formula includes:
wherein, theThe ith element of the processing output quantity corresponding to the J-th output channel of the convolution layerFor the jth input channel of the convolutional layerA convolution kernel matrix of the ith element of the corresponding processing output, saidIs the input quantity of the Lth input channel of the convolution layer, theThe ith element of the compensation amount for the jth output channel of the convolutional layerA convolution kernel matrix corresponding to the Lth input channel and the Jth output channel of the convolution layerThe compensation amount for the jth output channel of the convolutional layer.
Optionally, after the obtaining the second neural network model, the method further includes:
and quantifying the parameters of the second neural network model and deploying the parameters on the embedded equipment.
According to another aspect of the present application, there is provided an apparatus for identifying multimedia data, including the following modules.
The first acquisition module is used for acquiring the multimedia data to be identified.
The second obtaining module is configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to the target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is configured to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer.
And the identification module is used for identifying the multimedia data to be identified through the second neural network model.
And the result obtaining module is used for obtaining the identification result of the second neural network model.
Optionally, the batch normalization process formula includes:
wherein, theSaidSaidIs the second of the treatment outputiAn element, theIs the second of the original output quantityiAn element, theNormalizing the weights of the layers for the target lotiAn element, theData variance of normalization layer for the target batchiAn element, theepsIs a preset number, theb bn (i) Normalization layer compensation amount for the target batchiAn element, thee bn (i) Data mean of the target batch normalization layeriAn element;
the preprocessing parameters of the target batch normalization layer comprise the preprocessing parametersSaidSaidb bn (i) And the above-mentionede bn (i)。
According to another aspect of the present application, there is provided an apparatus for identifying multimedia data, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method for identifying multimedia data as any one of the above.
According to another aspect of the application, a computer storage medium has stored therein at least one instruction, at least one program, set of codes or set of instructions that is loaded and executed by a processor to implement the method of identifying any multimedia data as described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise: a multimedia data identification method is provided, and the multimedia data is identified through a second neural network model to obtain an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation environment of a multimedia data identification method provided in an embodiment of the present application;
fig. 2 is a flowchart of a multimedia data identification method according to an embodiment of the present application;
fig. 3 is a flowchart of another multimedia data identification method provided in an embodiment of the present application;
fig. 4 is a block diagram of an apparatus for identifying multimedia data according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for identifying multimedia data according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
A Neural network model is a model for simulating a human actual Neural network, the Neural network model comprises at least one hidden layer, a Batch Normalization (BN) layer is connected behind the hidden layer of the Neural network model, the output of the hidden layer is used as the input of a BN layer, the output of the BN layer is used as the input of the next layer, the BN layer is a technology capable of accelerating the training speed of the Neural network model and improving the output precision of the Neural network model, and the BN layer is added into the Neural network model, so that the optimization of the Neural network model can be realized.
Calculating training sample mean value when training neural network model,For the ith sample in the training samples, the standard deviation of the training samples is calculatedNormalizing the ith sample in the training samplesEpsilon is a small number, the denominator is prevented from being 0, and the output of the hidden layer is reconstructed and transformed to obtain the output of the BN layerParameter ofAnd parametersβFor training parameters, parameters of the neural network model are used when using the trained neural network modelParameter ofβAndthe calculations are performed in real time.
The structure of the neural network model is complex and the calculation amount is large.
The embodiment of the application provides a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium.
Fig. 1 is a schematic diagram of an implementation environment of a multimedia data identification method provided in an embodiment of the present application, where the implementation environment may include a server 11 and a terminal 12.
The server 11 may be a server or a cluster of servers.
The terminal 12 may be a mobile phone, a tablet computer, a notebook computer, an intelligent wearable device, or other various terminals. The terminal 12 can be connected to the server by wire or wirelessly (in the case of wireless connection shown in fig. 1). The terminal 12 may be an embedded device, and when the terminal 12 is an embedded device, the terminal 12 may include various embedded devices such as a mobile phone, a tablet computer, and a smart wearable device.
Fig. 2 is a flowchart of a multimedia data identification method according to an embodiment of the present application. The multimedia data recognition method can be applied to the server or the terminal of the implementation environment. The method for identifying multimedia data may include the following steps.
And step 203, identifying the multimedia data to be identified through the second neural network model.
And step 204, obtaining the identification result of the second neural network model.
In summary, the embodiment of the present application provides a multimedia data identification method, which identifies multimedia data through a second neural network model to obtain an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.
Fig. 3 is a flowchart of another multimedia data identification method according to an embodiment of the present application, where the multimedia data identification method can be applied to a server or a terminal in the above implementation environment. As can be seen with reference to fig. 3, the method for identifying multimedia data may include the following steps.
The server may retrieve the first neural network model from memory (the first neural network model in memory may be provided by an operator). The neural network model generally includes an input layer, an output layer, and at least one hidden layer, which may be a feature representation layer. The hidden layer can comprise a full connection layer, a convolution layer and the like. The output of the target hidden layer can be input into a target batch normalization layer, and the output of the target batch normalization layer can be input into an output layer or the next hidden layer after being processed by an activation function.
Adding a batch normalization layer in the neural network model can improve the accuracy of the output of the neural network model.
It should be noted that the first neural network model may be a trained neural network model or an untrained neural network model, which is not limited in the embodiment of the present application.
For example, the first neural network model may be a trained neural network model, the server or the terminal may obtain a training sample and an initial neural network model from a database, train the initial neural network model according to the training sample to obtain the first neural network model, the first neural network model includes an input layer, an output layer, at least one hidden layer and at least one target batch normalization layer, an output of the hidden layer may be the hidden layer, the target batch normalization layer and the output layer, and when an output of the hidden layer is input into the target batch normalization layer, the hidden layer is the target hidden layer. When the initial neural network model is trained, the target batch standardization layer calculates input to obtain preprocessing parameters of the target batch standardization layer, when the trained first neural network model is applied, the target batch standardization layer calculates input to obtain processing output, and the target batch standardization layer can accurately perform batch standardization processing on the input of the first neural network model.
The server or the terminal may obtain the pre-processing parameters from a target batch normalization layer of the first neural network model.
The original output quantity of the target hidden layer is input into a target batch standardization layer, and the target batch standardization layer can carry out batch standardization processing on the original output quantity according to the preprocessing parameters to obtain the processing output quantity.
The raw output as well as the processed output may be in the form of vectors. When the original output quantity and the processing output quantity are in the form of vectors, the dimension of any vector can be set by an operator according to actual conditions.
For example, when the raw output is in the form of a vector, the dimension of the raw output may be node _ size × 1, where node _ size is the dimension of the input of the target hidden layer.
And 303, obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters.
When the neural network model is used, the target batch standardization layer can carry out batch standardization processing on the original output quantity of the target hidden layer according to the preprocessing parameters to obtain the processing output quantity, so that a batch standardization processing formula of the target batch standardization layer can be obtained according to the preprocessing parameters.
A batch normalization process formula comprising:
wherein the content of the first and second substances,,,to process the output quantityiThe number of the elements is one,is the first of the original output quantityiThe number of the elements is one,normalization layer weight for target batchiThe number of the elements is one,data variance of normalization layer for target batchiThe number of the elements is one,epsfor a preset number the denominator is prevented from being 0,b bn (i) Normalization layer Compensation for target batchiThe number of the elements is one,e bn (i) Data mean of target batch normalization layeriAnd (4) each element.
The pre-processing parameters of the target batch normalization layer include,,b bn (i) Ande bn (i)。
it should be noted that the quantities in the batch normalization formula may all be in the form of vectors.
In an exemplary manner, the first and second electrodes are,,,b bn (i) Ande bn (i) The vector with the dimension of node _ size × 1 can be used, and the node _ size is the input dimension of the target batch normalization layer, and the input dimension can be set by an operator according to the actual situation.
It should be noted that the hidden layer may include a fully-connected layer and a convolutional layer, and the batch normalization processing formula may be changed according to the characteristics of different hidden layers.
1) When the target hidden layer is a full connection layer, the ith element of the original output quantityThe batch normalization process formula includes:
wherein the content of the first and second substances,input size is the input dimension of the fully-connected layer,is the first of the weight matrix of the full connection layeriGo to the firstjThe elements of the column are,s(j) Is the first of the input characteristic quantities of the full connection layerjThe elements of the column are,b fc (i) For compensation of full connection layeriThe elements of a row.
In an exemplary manner, the first and second electrodes are,may be a vector with dimension input size 1,b fc (i) It may be a vector with dimension node _ size x 1, input _ size being the input dimension of the fully connected layer, and node _ size x 1 being the output dimension of the fully connected layer.
2) When the hidden target layer is a convolution layer, the original output quantity corresponding to the J-th output channel of the convolution layerThe batch normalization process formula includes:
wherein the content of the first and second substances,is the ith element of the processing output quantity corresponding to the J-th output channel of the convolution layer, W cnn (J,L, i) A convolution kernel matrix for convolution layer of the ith element of the processing output quantity corresponding to the jth input channel and the jth output channel,the input of the L-th input channel of the convolution layer,b cnn (J,i) The ith element of the offset for the convolutional layer jth output channel,W cnn (J,L) Is the convolution kernel matrix corresponding to the Lth input channel and the J th output channel of the convolution layer,b cnn (J) For the compensation of the jth output channel of the convolutional layer,,wherein, in the step (A),the ith element of the weight vector corresponding to the jth output channel of the target batch normalization layer,the ith element of the variance vector corresponding to the jth output channel of the target batch normalization layer,the ith element of the compensation vector corresponding to the jth output channel of the target batch normalization layer,the ith element of the mean vector corresponding to the jth output channel of the target batch normalization layer.
And step 304, removing the target batch normalization layer.
The server or the terminal may remove the target batch normalization layer in the first neural network model, and after the target batch normalization layer in the first neural network model is removed, the next layer of the target hidden layer is a hidden layer or an output layer.
For example, in the first neural network model, the next layer of the target hidden layer is a target batch normalization layer, the next layer of the target batch normalization layer is a first hidden layer or an output layer, and when the target batch normalization layer in the first neural network model is removed, the next layer of the target hidden layer is the first hidden layer or the output layer.
And 305, adding a batch standardization processing formula into the target hidden layer to obtain a second neural network model. The batch standardization processing formula is used for processing the original output quantity of the target hidden layer into a processing output quantity, and inputting the processing output quantity into the next layer of the hidden layer.
And removing the target batch standardization layer, and adding a batch standardization processing formula to the target hidden layer to obtain a second neural network model, wherein the output of the target hidden layer in the second neural network model is a processing output quantity, and the processing output quantity is subjected to batch standardization processing.
The processing output quantity output by the target hidden layer is input into the next hidden layer or output layer, so that the structure of the neural network model is simplified.
Illustratively, the target batch normalization layer in the first neural network model has four preprocessing parameters,,b bn (i) Ande bn (i) The four preprocessing parameters are subjected to operations such as addition, subtraction, multiplication, division, root opening and the like, that is, when an initial neural network model is trained and a first neural network model is used, the preprocessing parameters in a target batch normalization layer are calculated in real time, and if each addition (subtraction, multiplication or division) is a calculated quantity, the calculated quantity is obtainedα i Is calculated to be 3, to obtainβ i Is calculated by 3 (wherein obtainedThe step (b) is a repeated step, and only one calculated amount is counted), to obtainIs 2, and the target batch normalization layer of the first neural network model has a computation load of 8 at runtime. In the second neural network model, if the batch normalization formula is used instead of the target batch normalization layer, the calculation amount of the batch normalization processing performed by applying the batch normalization formula is 2, and thus it can be seen that the effect of reducing the operation amount can be achieved by using the batch normalization formula provided in the embodiment of the present application instead of the target batch normalization layer.
It should be noted that, when the second neural network model is a pre-acquired model, the server or the terminal may directly retrieve the second neural network model from local without acquiring the second neural network model step by step through step 301 to step 305.
And step 306, quantifying the parameters of the second neural network model and deploying.
The parameters in the second neural network model are floating point parameters, and the parameters of the second neural network model are generally required to be quantized to deploy the second neural network model on the embedded device.
Quantization is a unit of storage that quantizes floating point parameters in a neural network model into low bits (e.g., 8 bits and 16 bits).
When steps 301 to 305 are executed by the server, in step 306, the server may quantify parameters of the second neural network model and deploy the quantified parameters in a local or terminal (e.g., an embedded device). When steps 301 to 305 are executed by the server, the terminal may quantize the parameters of the second neural network model and deploy the quantized parameters locally.
Illustratively, the second neural network model is deployed on the arm embedded device, the parameters of the second neural network model are quantized by using a quantization method, and the operation is accelerated by using an arm neon instruction on the arm embedded device, so that the effect of reducing the power consumption can be achieved. Or, the second neural network model is deployed on the dsp embedded device, the parameters of the second neural network model are quantized by using a quantization method, and the operation is accelerated by using a hifi instruction on the dsp embedded device, so that the dsp embedded device can be ensured to use a lower memory to run a larger second neural network model.
And 307, acquiring multimedia data to be identified.
The server or the terminal may acquire multimedia data to be recognized, which may be provided by the user through the user terminal. The multimedia data may include image data or audio data, etc.
And 308, identifying the multimedia data to be identified through the second neural network model.
The server or the terminal can input the multimedia data into the second neural network model so as to identify the multimedia data to be identified through the second neural network model.
The flow of the second neural network model in performing the identification may include the following steps.
Step one, obtaining the original output quantity of a target hidden layer in a second neural network model.
When the second neural network model is used, the server or the terminal can obtain the original output quantity of the target hidden layer in the second neural network model.
And step two, carrying out batch standardization processing on the original output quantity according to a batch standardization processing formula to obtain the processing output quantity of the target hidden layer.
And according to a batch standardization processing formula, carrying out batch standardization processing on the original output quantity of the target hidden layer to obtain the processing output quantity of the target hidden layer, namely the output of the target hidden layer is the processing output quantity subjected to batch standardization processing.
And step three, inputting the processing output quantity into the next layer of the target hidden layer.
And inputting the processed output quantity into the next layer of the target hidden layer after the processing of the overactivation function, wherein the next layer of the target hidden layer can be a hidden layer or an output layer.
It can be seen that the batch standardization processing is performed by the batch standardization processing formula in the process, and compared with the batch standardization processing performed by the BN layer in the related art, the calculation amount of the scheme of the application is small, and the calculation speed is high.
For example, when the second neural network model is a model for performing speech recognition, the recognition result is a speech recognition result (for example, a speaker of speech is recognized), and when the second neural network model is a model for performing image recognition, the recognition result is an image recognition result (for example, some objects of an image are recognized).
The first neural network model and the second neural network model provided by the embodiment of the application can be used in the technical fields of image recognition, voice recognition and the like. For example, the first neural network model and the second neural network model can be used for voice awakening (i.e., detecting the target keyword in continuous voice) and can also be used for image detection (i.e., detecting the person and object included in the image).
For example, when the second neural network model provided in the embodiment of the present application is used to detect whether to perform voice wakeup, the second neural network model is deployed in an embedded device, voice data acquired by the embedded device is used as an input of the second neural network model, an input layer may determine the type and the style of the input data, a hidden layer may perform linear division on features of the input data, since a batch normalization processing formula is added to the hidden layer of the second neural network model, a training speed and a convergence speed of the second neural network model are high, an output layer may output a detection result of the voice data, and the detection result may be used to determine whether to perform voice wakeup.
In summary, the embodiment of the present application provides a multimedia data identification method, which identifies multimedia data through a second neural network model to obtain an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.
In an exemplary embodiment, the multimedia data recognition method provided in the embodiment of the present application is applied in a server, and a first neural network model is obtained, where the first neural network model is a trained neural network model, and the first neural network model includes at least one hidden layer, a next layer of a target hidden layer in the at least one hidden layer is a batch normalization layer, the target hidden layer is a fully-connected layer, and preprocessing parameters of the target batch normalization layer are obtained, and the preprocessing parameters of the batch normalization layer include a preprocessing parameter of the target batch normalization layer,,b bn (i) Ande bn (i) The target batch normalization layer processes the original output vector of the full connection layer according to the preprocessing parameters to obtain a processed output vector, the dimension of the original output is node _ size 1, and a batch normalization processing formula of the target batch normalization layer is obtained according to the preprocessing parametersWherein, in the step (A),,,to process the output quantityiThe number of the elements is one,is the first of the original output quantityiThe number of the elements is one,normalization layer weight for target batchiThe number of the elements is one,data variance of normalization layer for target batchiThe number of the elements is one,epsis a pre-set number of the number,b bn (i) Normalization layer Compensation for target batchiThe number of the elements is one,e bn (i) Data mean of target batch normalization layeriThe number of the elements is one,input size is the input dimension of the fully-connected layer,is the first of the weight matrix of the full connection layeriGo to the firstjThe elements of the column are,s(j) Is the first of the input characteristic quantities of the full connection layerjThe elements of the column are,b fc (i) For compensation of full connection layeriAnd removing the target batch standardization layer in the first neural network model by using the row elements, and adding a batch standardization formula into the target hidden layer to obtain a second neural network model.
The multimedia data identification method provided by the embodiment of the application is applied to a terminal, the terminal can be an embedded device, and parameters of a second neural network model are quantized and deployed on the embedded device. And when the second neural network model is used, acquiring the original output quantity of the target hidden layer, performing batch standardization on the original output quantity according to a batch standardization processing formula to obtain the processing output quantity of the target hidden layer, and inputting the processing output quantity into the next layer of the target hidden layer.
Fig. 4 is a block diagram of an apparatus for identifying multimedia data according to an embodiment of the present application. Referring to fig. 4, the multimedia data recognition apparatus 400 may include the following modules.
The first obtaining module 410 is configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of the first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to a target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is configured to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer.
The second obtaining module 420 is configured to obtain multimedia data to be identified.
And the identifying module 430 is configured to identify the multimedia data to be identified through the second neural network model.
And a result obtaining module 440, configured to obtain a recognition result of the second neural network model.
A batch normalization process formula comprising:
wherein the content of the first and second substances,,,to process the output quantityiThe number of the elements is one,is the first of the original output quantityiThe number of the elements is one,normalization layer weight for target batchiThe number of the elements is one,data variance of normalization layer for target batchiThe number of the elements is one,epsis a pre-set number of the number,b bn (i) Normalization layer Compensation for target batchiThe number of the elements is one,e bn (i) Data mean of target batch normalization layeriAnd (4) each element.
The pre-processing parameters of the target batch normalization layer comprise,,b bn (i) Ande bn (i)。
in summary, the present application provides a multimedia data recognition apparatus, which recognizes multimedia data through a second neural network model to obtain a recognition result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.
Fig. 5 is a schematic structural diagram of an identification apparatus 500 for multimedia data according to an embodiment of the present application, where the identification apparatus 500 for multimedia data may be a server or a terminal. Illustratively, as shown in fig. 5, the multimedia data recognition apparatus 500 includes a Central Processing Unit (CPU) 501, a Memory 502, and a system bus 503 connecting the Memory 502 and the CPU 501, and the Memory 502 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM).
Without loss of generality, computer-readable storage media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical, magnetic, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing.
The memory 502 further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU to implement the multimedia data identification method provided in the embodiment of the present application.
The embodiment of the present application further provides an apparatus for identifying multimedia data, where the apparatus for identifying multimedia data includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for identifying multimedia data provided in the above method embodiment.
The embodiment of the present application further provides a computer storage medium, in which at least one instruction, at least one program, a code set, or an instruction set is stored, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the multimedia data identification method provided by the above method embodiment.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (8)
1. A method for identifying multimedia data, the method comprising:
obtaining a second neural network model, wherein the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula into a target hidden layer of the first neural network model, the first neural network model comprises at least one hidden layer, the target hidden layer of the at least one hidden layer is connected with the target batch normalization layer, the batch normalization processing formula is obtained by preprocessing parameters of the target batch normalization layer, and the batch normalization processing formula is used for processing an original output quantity of the target hidden layer into a processing output quantity and inputting the processing output quantity into the next layer of the target hidden layer;
acquiring multimedia data to be identified;
identifying the multimedia data to be identified through the second neural network model;
obtaining a recognition result of the second neural network model;
the batch normalization processing formula includes:
wherein, theSaidSaidIs the second of the treatment outputiAn element, theIs the second of the original output quantityiAn element, theNormalizing the weights of the layers for the target lotiAn element, theData variance of normalization layer for the target batchiAn element, theepsFor preset parameters, theb bn (i) Normalization layer compensation amount for the target batchiAn element, thee bn (i) Data mean of the target batch normalization layeriAn element;
2. The method of claim 1, wherein obtaining a second neural network model comprises:
acquiring the first neural network model;
acquiring a preprocessing parameter of the target batch normalization layer, wherein the target batch normalization layer is used for processing the original output quantity of the target hidden layer according to the preprocessing parameter to obtain a processing output quantity;
obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters;
removing the target batch normalization layer;
and adding the batch standardization processing formula into the target hidden layer to obtain the second neural network model.
3. The method of claim 1, wherein the ith element of the original output quantity when the target hidden layer is a fully-connected layerThe batch normalization processing formula includes:
wherein, theinput size Is the input dimension of the fully-connected layer, theIs the first of the weight matrix of the full connection layeriGo to the firstjElements of a column, saids(j) Is the first of the input feature quantities of the full connection layerjElements of a column, saidb fc (i) For the compensation of the full connection layeriThe elements of a row.
4. The method of claim 1, wherein when the target hidden layer is a convolutional layer, the original output amount corresponding to the jth output channel of the convolutional layerThe batch normalization processing formula includes:
wherein, theThe ith element of the processing output quantity corresponding to the J-th output channel of the convolution layerA convolution kernel matrix of the i-th element of the processing output quantity corresponding to the J-th input channel and the J-th output channel of the convolution layerIs the input quantity of the Lth input channel of the convolution layer, theThe ith element of the compensation amount for the jth output channel of the convolutional layerA convolution kernel matrix corresponding to the Lth input channel and the Jth output channel of the convolution layerFor the compensation of the jth output channel of the convolutional layer,,wherein, in the step (A),the ith element of the weight vector corresponding to the J output channel of the target batch normalization layer,the ith element of the variance vector corresponding to the J output channel of the target batch normalization layer,the ith element of the compensation vector corresponding to the J-th output channel of the target batch normalization layer,and the ith element of the mean vector corresponding to the J-th output channel of the target batch normalization layer.
5. The method of claim 1, wherein after obtaining the second neural network model, the method further comprises:
and quantifying the parameters of the second neural network model and deploying the parameters on the embedded equipment.
6. An apparatus for identifying multimedia data, the apparatus comprising:
a first obtaining module, configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to the target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is used to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer;
the second acquisition module is used for acquiring multimedia data to be identified;
the identification module is used for identifying the multimedia data to be identified through the second neural network model;
the result obtaining module is used for obtaining the identification result of the second neural network model;
the batch normalization processing formula includes:
wherein, theSaidSaidIs the second of the treatment outputiAn element, theIs the second of the original output quantityiAn element, theNormalizing the weights of the layers for the target lotiAn element, theData variance of normalization layer for the target batchiAn element, theepsFor preset parameters, theb bn (i) Normalization layer compensation amount for the target batchiAn element, thee bn (i) Data mean of the target batch normalization layeriAn element;
7. An identification device of multimedia data, characterized in that it comprises a processor and a memory, in which at least one instruction, at least one program, set of codes or set of instructions is stored, which is loaded and executed by said processor to implement the identification method of multimedia data according to any one of claims 1 to 5.
8. A computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of identifying multimedia data according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011034311.1A CN111882046B (en) | 2020-09-27 | 2020-09-27 | Multimedia data identification method, device, equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011034311.1A CN111882046B (en) | 2020-09-27 | 2020-09-27 | Multimedia data identification method, device, equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111882046A CN111882046A (en) | 2020-11-03 |
CN111882046B true CN111882046B (en) | 2021-01-19 |
Family
ID=73200061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011034311.1A Active CN111882046B (en) | 2020-09-27 | 2020-09-27 | Multimedia data identification method, device, equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111882046B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117371533B (en) * | 2023-11-01 | 2024-05-24 | 深圳市马博士网络科技有限公司 | Method and device for generating data tag rule |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710919A (en) * | 2018-05-25 | 2018-10-26 | 东南大学 | A kind of crack automation delineation method based on multi-scale feature fusion deep learning |
CN109635937B (en) * | 2018-12-30 | 2023-07-11 | 南京大学 | Low-power consumption system oriented to low-bit wide convolution neural network |
JP7026357B2 (en) * | 2019-01-31 | 2022-02-28 | 日本電信電話株式会社 | Time frequency mask estimator learning device, time frequency mask estimator learning method, program |
CN109859107B (en) * | 2019-02-12 | 2023-04-07 | 广东工业大学 | Remote sensing image super-resolution method, device, equipment and readable storage medium |
-
2020
- 2020-09-27 CN CN202011034311.1A patent/CN111882046B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111882046A (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111275175B (en) | Neural network training method, device, image classification method, device and medium | |
CN110750965B (en) | English text sequence labeling method, english text sequence labeling system and computer equipment | |
CN110175641B (en) | Image recognition method, device, equipment and storage medium | |
CN113434683B (en) | Text classification method, device, medium and electronic equipment | |
CN113239702A (en) | Intention recognition method and device and electronic equipment | |
CN113011532A (en) | Classification model training method and device, computing equipment and storage medium | |
CN114332500A (en) | Image processing model training method and device, computer equipment and storage medium | |
CN111882046B (en) | Multimedia data identification method, device, equipment and computer storage medium | |
CN113743650B (en) | Power load prediction method, device, equipment and storage medium | |
CN114360520A (en) | Training method, device and equipment of voice classification model and storage medium | |
CN113343711A (en) | Work order generation method, device, equipment and storage medium | |
CN116564315A (en) | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium | |
CN116680401A (en) | Document processing method, document processing device, apparatus and storage medium | |
CN116703466A (en) | System access quantity prediction method based on improved wolf algorithm and related equipment thereof | |
CN113469237B (en) | User intention recognition method, device, electronic equipment and storage medium | |
CN114913871A (en) | Target object classification method, system, electronic device and storage medium | |
CN113850686A (en) | Insurance application probability determination method and device, storage medium and electronic equipment | |
CN113345464A (en) | Voice extraction method, system, device and storage medium | |
CN113160823A (en) | Voice awakening method and device based on pulse neural network and electronic equipment | |
CN116959489B (en) | Quantization method and device for voice model, server and storage medium | |
CN113160795B (en) | Language feature extraction model training method, device, equipment and storage medium | |
CN113571085B (en) | Voice separation method, system, device and storage medium | |
CN115881103B (en) | Speech emotion recognition model training method, speech emotion recognition method and device | |
CN116541766B (en) | Training method of electroencephalogram data restoration model, electroencephalogram data restoration method and device | |
CN117251574B (en) | Text classification extraction method and system based on multi-feature data fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Identification methods, devices, devices, and computer storage media for multimedia data Effective date of registration: 20230904 Granted publication date: 20210119 Pledgee: Zhongguancun Beijing technology financing Company limited by guarantee Pledgor: SOUNDAI TECHNOLOGY Co.,Ltd. Registration number: Y2023990000438 |