CN111882046B

CN111882046B - Multimedia data identification method, device, equipment and computer storage medium

Info

Publication number: CN111882046B
Application number: CN202011034311.1A
Authority: CN
Inventors: 栾天祥; 陈孝良; 冯大航
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-01-19
Anticipated expiration: 2040-09-27
Also published as: CN111882046A

Abstract

The application discloses a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium, and belongs to the technical field of data processing. The method comprises the following steps: obtaining a second neural network model; acquiring multimedia data to be identified; and identifying the multimedia data to be identified through the second neural network model and acquiring an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.

Description

Multimedia data identification method, device, equipment and computer storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for identifying multimedia data.

Background

Multimedia data recognition is a technology for recognizing various multimedia data such as image data and audio data, and various technical effects such as classification, processing, analysis, and the like of the image data and the audio data can be realized by recognizing the image data and the audio data.

In a multimedia data recognition method in the related art, multimedia data to be recognized is recognized through a neural network model, the neural network model includes at least one hidden layer, a Batch Normalization (BN) layer is connected to the hidden layer of the neural network model, output of the hidden layer is used as input of the BN layer, and output of the BN layer is used as input of a next layer, so that optimization of the neural network model is achieved.

However, in the above multimedia data recognition method, the structure of the neural network model is complicated, and the computation amount is large, so that the computation amount of the multimedia data recognition method is large, and the recognition speed is slow.

Disclosure of Invention

The embodiment of the application provides a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium. The technical scheme comprises the following contents.

According to an aspect of the present application, there is provided a method of identifying multimedia data, the method comprising the following steps.

The method comprises the steps of obtaining a second neural network model, wherein the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula into a target hidden layer of the first neural network model, the first neural network model comprises at least one hidden layer, the target hidden layer of the at least one hidden layer is connected with the target batch normalization layer, the batch normalization processing formula is obtained by preprocessing parameters of the target batch normalization layer, and the batch normalization processing formula is used for processing an original output quantity of the target hidden layer into a processing output quantity and inputting the processing output quantity into the next layer of the target hidden layer.

And acquiring multimedia data to be identified.

And identifying the multimedia data to be identified through the second neural network model.

And obtaining the identification result of the second neural network model.

Optionally, the obtaining the second neural network model includes the following steps.

And acquiring the first neural network model.

And acquiring a preprocessing parameter of the target batch normalization layer, wherein the target batch normalization layer is used for processing the original output quantity of the target hidden layer according to the preprocessing parameter to obtain a processing output quantity.

And obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters.

And removing the target batch standardization layer.

And adding the batch standardization processing formula into the target hidden layer to obtain the second neural network model.

Optionally, the batch normalization process formula includes:

；

wherein, the

Said

Said

Is the second of the treatment outputiAn element, the

Is the second of the original output quantityiAn element, the

Normalizing the weights of the layers for the target lotiAn element, the

Data variance of normalization layer for the target batchiAn element, theepsIs a preset number, theb _bn（i) Normalization layer compensation amount for the target batchiAn element, thee _bn（i) Data mean of the target batch normalization layeriAn element;

the preprocessing parameters of the target batch normalization layer comprise the preprocessing parameters

Said

Saidb _bn（i) And the above-mentionede _bn（i）。

Optionally, when the target hidden layer is a full-connected layer, the ith element of the original output quantity

The batch normalization processing formula includes:

；

wherein, theinput _sizeIs the input dimension of the fully-connected layer, the

Is the first of the weight matrix of the full connection layeriGo to the firstjElements of a column, saids（j) Is the first of the input feature quantities of the full connection layerjElements of a column, saidb _fc（i) For the compensation of the full connection layeriThe elements of a row.

Optionally, when the target hidden layer is a convolutional layer, the original output quantity corresponding to a jth output channel of the convolutional layer

The batch normalization processing formula includes:

；

wherein, the

The ith element of the processing output quantity corresponding to the J-th output channel of the convolution layer

For the jth input channel of the convolutional layerA convolution kernel matrix of the ith element of the corresponding processing output, said

Is the input quantity of the Lth input channel of the convolution layer, the

The ith element of the compensation amount for the jth output channel of the convolutional layer

A convolution kernel matrix corresponding to the Lth input channel and the Jth output channel of the convolution layer

The compensation amount for the jth output channel of the convolutional layer.

Optionally, after the obtaining the second neural network model, the method further includes:

and quantifying the parameters of the second neural network model and deploying the parameters on the embedded equipment.

According to another aspect of the present application, there is provided an apparatus for identifying multimedia data, including the following modules.

The first acquisition module is used for acquiring the multimedia data to be identified.

The second obtaining module is configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to the target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is configured to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer.

And the identification module is used for identifying the multimedia data to be identified through the second neural network model.

And the result obtaining module is used for obtaining the identification result of the second neural network model.

Optionally, the batch normalization process formula includes:

；

wherein, the

Said

Said

Is the second of the treatment outputiAn element, the

Is the second of the original output quantityiAn element, the

Normalizing the weights of the layers for the target lotiAn element, the

Said

Saidb _bn（i) And the above-mentionede _bn（i）。

According to another aspect of the present application, there is provided an apparatus for identifying multimedia data, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method for identifying multimedia data as any one of the above.

According to another aspect of the application, a computer storage medium has stored therein at least one instruction, at least one program, set of codes or set of instructions that is loaded and executed by a processor to implement the method of identifying any multimedia data as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise: a multimedia data identification method is provided, and the multimedia data is identified through a second neural network model to obtain an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a multimedia data identification method provided in an embodiment of the present application;

fig. 2 is a flowchart of a multimedia data identification method according to an embodiment of the present application;

fig. 3 is a flowchart of another multimedia data identification method provided in an embodiment of the present application;

fig. 4 is a block diagram of an apparatus for identifying multimedia data according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an apparatus for identifying multimedia data according to an embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

A Neural network model is a model for simulating a human actual Neural network, the Neural network model comprises at least one hidden layer, a Batch Normalization (BN) layer is connected behind the hidden layer of the Neural network model, the output of the hidden layer is used as the input of a BN layer, the output of the BN layer is used as the input of the next layer, the BN layer is a technology capable of accelerating the training speed of the Neural network model and improving the output precision of the Neural network model, and the BN layer is added into the Neural network model, so that the optimization of the Neural network model can be realized.

Calculating training sample mean value when training neural network model

，

For the ith sample in the training samples, the standard deviation of the training samples is calculated

Normalizing the ith sample in the training samples

Epsilon is a small number, the denominator is prevented from being 0, and the output of the hidden layer is reconstructed and transformed to obtain the output of the BN layer

Parameter of

And parametersβFor training parameters, parameters of the neural network model are used when using the trained neural network model

Parameter ofβAnd

the calculations are performed in real time.

The structure of the neural network model is complex and the calculation amount is large.

The embodiment of the application provides a multimedia data identification method, a multimedia data identification device, multimedia data identification equipment and a computer storage medium.

Fig. 1 is a schematic diagram of an implementation environment of a multimedia data identification method provided in an embodiment of the present application, where the implementation environment may include a server 11 and a terminal 12.

The server 11 may be a server or a cluster of servers.

The terminal 12 may be a mobile phone, a tablet computer, a notebook computer, an intelligent wearable device, or other various terminals. The terminal 12 can be connected to the server by wire or wirelessly (in the case of wireless connection shown in fig. 1). The terminal 12 may be an embedded device, and when the terminal 12 is an embedded device, the terminal 12 may include various embedded devices such as a mobile phone, a tablet computer, and a smart wearable device.

Fig. 2 is a flowchart of a multimedia data identification method according to an embodiment of the present application. The multimedia data recognition method can be applied to the server or the terminal of the implementation environment. The method for identifying multimedia data may include the following steps.

Step 201, obtaining a second neural network model, wherein the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula into a target hidden layer of the first neural network model, the first neural network model comprises at least one hidden layer, the target hidden layer in the at least one hidden layer is connected with a target batch normalization layer, the batch normalization processing formula is obtained by preprocessing parameters of the target batch normalization layer, and the batch normalization processing formula is used for processing original output quantity of the target hidden layer into processing output quantity and inputting the processing output quantity into the next layer of the target hidden layer.

Step 202, multimedia data to be identified is acquired.

And step 203, identifying the multimedia data to be identified through the second neural network model.

And step 204, obtaining the identification result of the second neural network model.

In summary, the embodiment of the present application provides a multimedia data identification method, which identifies multimedia data through a second neural network model to obtain an identification result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.

Fig. 3 is a flowchart of another multimedia data identification method according to an embodiment of the present application, where the multimedia data identification method can be applied to a server or a terminal in the above implementation environment. As can be seen with reference to fig. 3, the method for identifying multimedia data may include the following steps.

Step 301, a first neural network model is obtained. The first neural network model comprises at least one hidden layer, and a target hidden layer in the at least one hidden layer is connected with the target batch standardized layer.

The server may retrieve the first neural network model from memory (the first neural network model in memory may be provided by an operator). The neural network model generally includes an input layer, an output layer, and at least one hidden layer, which may be a feature representation layer. The hidden layer can comprise a full connection layer, a convolution layer and the like. The output of the target hidden layer can be input into a target batch normalization layer, and the output of the target batch normalization layer can be input into an output layer or the next hidden layer after being processed by an activation function.

Adding a batch normalization layer in the neural network model can improve the accuracy of the output of the neural network model.

It should be noted that the first neural network model may be a trained neural network model or an untrained neural network model, which is not limited in the embodiment of the present application.

For example, the first neural network model may be a trained neural network model, the server or the terminal may obtain a training sample and an initial neural network model from a database, train the initial neural network model according to the training sample to obtain the first neural network model, the first neural network model includes an input layer, an output layer, at least one hidden layer and at least one target batch normalization layer, an output of the hidden layer may be the hidden layer, the target batch normalization layer and the output layer, and when an output of the hidden layer is input into the target batch normalization layer, the hidden layer is the target hidden layer. When the initial neural network model is trained, the target batch standardization layer calculates input to obtain preprocessing parameters of the target batch standardization layer, when the trained first neural network model is applied, the target batch standardization layer calculates input to obtain processing output, and the target batch standardization layer can accurately perform batch standardization processing on the input of the first neural network model.

Step 302, acquiring the preprocessing parameters of the target batch normalization layer. And the target batch normalization layer is used for processing the original output quantity of the target hidden layer according to the preprocessing parameters to obtain a processing output quantity.

The server or the terminal may obtain the pre-processing parameters from a target batch normalization layer of the first neural network model.

The original output quantity of the target hidden layer is input into a target batch standardization layer, and the target batch standardization layer can carry out batch standardization processing on the original output quantity according to the preprocessing parameters to obtain the processing output quantity.

The raw output as well as the processed output may be in the form of vectors. When the original output quantity and the processing output quantity are in the form of vectors, the dimension of any vector can be set by an operator according to actual conditions.

For example, when the raw output is in the form of a vector, the dimension of the raw output may be node _ size × 1, where node _ size is the dimension of the input of the target hidden layer.

And 303, obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters.

When the neural network model is used, the target batch standardization layer can carry out batch standardization processing on the original output quantity of the target hidden layer according to the preprocessing parameters to obtain the processing output quantity, so that a batch standardization processing formula of the target batch standardization layer can be obtained according to the preprocessing parameters.

A batch normalization process formula comprising:

；

wherein the content of the first and second substances,

，

，

to process the output quantityiThe number of the elements is one,

is the first of the original output quantityiThe number of the elements is one,

normalization layer weight for target batchiThe number of the elements is one,

data variance of normalization layer for target batchiThe number of the elements is one,epsfor a preset number the denominator is prevented from being 0,b _bn（i) Normalization layer Compensation for target batchiThe number of the elements is one,e _bn（i) Data mean of target batch normalization layeriAnd (4) each element.

The pre-processing parameters of the target batch normalization layer include

，

，b _bn（i) Ande _bn（i）。

it should be noted that the quantities in the batch normalization formula may all be in the form of vectors.

In an exemplary manner, the first and second electrodes are,

，

，b _bn（i) Ande _bn（i) The vector with the dimension of node _ size × 1 can be used, and the node _ size is the input dimension of the target batch normalization layer, and the input dimension can be set by an operator according to the actual situation.

It should be noted that the hidden layer may include a fully-connected layer and a convolutional layer, and the batch normalization processing formula may be changed according to the characteristics of different hidden layers.

1) When the target hidden layer is a full connection layer, the ith element of the original output quantity

The batch normalization process formula includes:

；

wherein the content of the first and second substances,input _sizeis the input dimension of the fully-connected layer,

is the first of the weight matrix of the full connection layeriGo to the firstjThe elements of the column are,s（j) Is the first of the input characteristic quantities of the full connection layerjThe elements of the column are,b _fc（i) For compensation of full connection layeriThe elements of a row.

In an exemplary manner, the first and second electrodes are,

may be a vector with dimension input size 1,b _fc（i) It may be a vector with dimension node _ size x 1, input _ size being the input dimension of the fully connected layer, and node _ size x 1 being the output dimension of the fully connected layer.

2) When the hidden target layer is a convolution layer, the original output quantity corresponding to the J-th output channel of the convolution layer

The batch normalization process formula includes:

；

wherein the content of the first and second substances,

is the ith element of the processing output quantity corresponding to the J-th output channel of the convolution layer, W _cnn（J,L, i) A convolution kernel matrix for convolution layer of the ith element of the processing output quantity corresponding to the jth input channel and the jth output channel,

the input of the L-th input channel of the convolution layer,b _cnn(J,i) The ith element of the offset for the convolutional layer jth output channel,W _cnn（J,L) Is the convolution kernel matrix corresponding to the Lth input channel and the J th output channel of the convolution layer,b _cnn(J) For the compensation of the jth output channel of the convolutional layer,

，

wherein, in the step (A),

the ith element of the weight vector corresponding to the jth output channel of the target batch normalization layer,

the ith element of the variance vector corresponding to the jth output channel of the target batch normalization layer,

the ith element of the compensation vector corresponding to the jth output channel of the target batch normalization layer,

the ith element of the mean vector corresponding to the jth output channel of the target batch normalization layer.

And step 304, removing the target batch normalization layer.

The server or the terminal may remove the target batch normalization layer in the first neural network model, and after the target batch normalization layer in the first neural network model is removed, the next layer of the target hidden layer is a hidden layer or an output layer.

For example, in the first neural network model, the next layer of the target hidden layer is a target batch normalization layer, the next layer of the target batch normalization layer is a first hidden layer or an output layer, and when the target batch normalization layer in the first neural network model is removed, the next layer of the target hidden layer is the first hidden layer or the output layer.

And 305, adding a batch standardization processing formula into the target hidden layer to obtain a second neural network model. The batch standardization processing formula is used for processing the original output quantity of the target hidden layer into a processing output quantity, and inputting the processing output quantity into the next layer of the hidden layer.

And removing the target batch standardization layer, and adding a batch standardization processing formula to the target hidden layer to obtain a second neural network model, wherein the output of the target hidden layer in the second neural network model is a processing output quantity, and the processing output quantity is subjected to batch standardization processing.

The processing output quantity output by the target hidden layer is input into the next hidden layer or output layer, so that the structure of the neural network model is simplified.

Illustratively, the target batch normalization layer in the first neural network model has four preprocessing parameters

，

，b _bn（i) Ande _bn（i) The four preprocessing parameters are subjected to operations such as addition, subtraction, multiplication, division, root opening and the like, that is, when an initial neural network model is trained and a first neural network model is used, the preprocessing parameters in a target batch normalization layer are calculated in real time, and if each addition (subtraction, multiplication or division) is a calculated quantity, the calculated quantity is obtainedα _iIs calculated to be 3, to obtainβ _iIs calculated by 3 (wherein obtained

The step (b) is a repeated step, and only one calculated amount is counted), to obtain

Is 2, and the target batch normalization layer of the first neural network model has a computation load of 8 at runtime. In the second neural network model, if the batch normalization formula is used instead of the target batch normalization layer, the calculation amount of the batch normalization processing performed by applying the batch normalization formula is 2, and thus it can be seen that the effect of reducing the operation amount can be achieved by using the batch normalization formula provided in the embodiment of the present application instead of the target batch normalization layer.

Steps 301 to 305 are steps of obtaining a second neural network model. When the first neural network model is a trained neural network model, the second neural network model can be used directly without training.

It should be noted that, when the second neural network model is a pre-acquired model, the server or the terminal may directly retrieve the second neural network model from local without acquiring the second neural network model step by step through step 301 to step 305.

Steps 306 to 309 are steps of using the second neural network model at the terminal, where the terminal may include an embedded device, and when the terminal is an embedded device, the terminal may include various embedded devices such as a mobile phone, a tablet computer, a smart wearable device, and the like. Of course, step 301 to step 305 may also be executed by the terminal, and step 306 to step 309 may also be executed by the server, which is not limited in this embodiment of the application.

And step 306, quantifying the parameters of the second neural network model and deploying.

The parameters in the second neural network model are floating point parameters, and the parameters of the second neural network model are generally required to be quantized to deploy the second neural network model on the embedded device.

Quantization is a unit of storage that quantizes floating point parameters in a neural network model into low bits (e.g., 8 bits and 16 bits).

When steps 301 to 305 are executed by the server, in step 306, the server may quantify parameters of the second neural network model and deploy the quantified parameters in a local or terminal (e.g., an embedded device). When steps 301 to 305 are executed by the server, the terminal may quantize the parameters of the second neural network model and deploy the quantized parameters locally.

Illustratively, the second neural network model is deployed on the arm embedded device, the parameters of the second neural network model are quantized by using a quantization method, and the operation is accelerated by using an arm neon instruction on the arm embedded device, so that the effect of reducing the power consumption can be achieved. Or, the second neural network model is deployed on the dsp embedded device, the parameters of the second neural network model are quantized by using a quantization method, and the operation is accelerated by using a hifi instruction on the dsp embedded device, so that the dsp embedded device can be ensured to use a lower memory to run a larger second neural network model.

And 307, acquiring multimedia data to be identified.

The server or the terminal may acquire multimedia data to be recognized, which may be provided by the user through the user terminal. The multimedia data may include image data or audio data, etc.

And 308, identifying the multimedia data to be identified through the second neural network model.

The server or the terminal can input the multimedia data into the second neural network model so as to identify the multimedia data to be identified through the second neural network model.

The flow of the second neural network model in performing the identification may include the following steps.

Step one, obtaining the original output quantity of a target hidden layer in a second neural network model.

When the second neural network model is used, the server or the terminal can obtain the original output quantity of the target hidden layer in the second neural network model.

And step two, carrying out batch standardization processing on the original output quantity according to a batch standardization processing formula to obtain the processing output quantity of the target hidden layer.

And according to a batch standardization processing formula, carrying out batch standardization processing on the original output quantity of the target hidden layer to obtain the processing output quantity of the target hidden layer, namely the output of the target hidden layer is the processing output quantity subjected to batch standardization processing.

And step three, inputting the processing output quantity into the next layer of the target hidden layer.

And inputting the processed output quantity into the next layer of the target hidden layer after the processing of the overactivation function, wherein the next layer of the target hidden layer can be a hidden layer or an output layer.

It can be seen that the batch standardization processing is performed by the batch standardization processing formula in the process, and compared with the batch standardization processing performed by the BN layer in the related art, the calculation amount of the scheme of the application is small, and the calculation speed is high.

Step 309, obtaining the recognition result of the second neural network model.

For example, when the second neural network model is a model for performing speech recognition, the recognition result is a speech recognition result (for example, a speaker of speech is recognized), and when the second neural network model is a model for performing image recognition, the recognition result is an image recognition result (for example, some objects of an image are recognized).

The first neural network model and the second neural network model provided by the embodiment of the application can be used in the technical fields of image recognition, voice recognition and the like. For example, the first neural network model and the second neural network model can be used for voice awakening (i.e., detecting the target keyword in continuous voice) and can also be used for image detection (i.e., detecting the person and object included in the image).

For example, when the second neural network model provided in the embodiment of the present application is used to detect whether to perform voice wakeup, the second neural network model is deployed in an embedded device, voice data acquired by the embedded device is used as an input of the second neural network model, an input layer may determine the type and the style of the input data, a hidden layer may perform linear division on features of the input data, since a batch normalization processing formula is added to the hidden layer of the second neural network model, a training speed and a convergence speed of the second neural network model are high, an output layer may output a detection result of the voice data, and the detection result may be used to determine whether to perform voice wakeup.

In an exemplary embodiment, the multimedia data recognition method provided in the embodiment of the present application is applied in a server, and a first neural network model is obtained, where the first neural network model is a trained neural network model, and the first neural network model includes at least one hidden layer, a next layer of a target hidden layer in the at least one hidden layer is a batch normalization layer, the target hidden layer is a fully-connected layer, and preprocessing parameters of the target batch normalization layer are obtained, and the preprocessing parameters of the batch normalization layer include a preprocessing parameter of the target batch normalization layer

，

，b _bn（i) Ande _bn（i) The target batch normalization layer processes the original output vector of the full connection layer according to the preprocessing parameters to obtain a processed output vector, the dimension of the original output is node _ size 1, and a batch normalization processing formula of the target batch normalization layer is obtained according to the preprocessing parameters

Wherein, in the step (A),

，

，

to process the output quantityiThe number of the elements is one,

is the first of the original output quantityiThe number of the elements is one,

normalization layer weight for target batchiThe number of the elements is one,

data variance of normalization layer for target batchiThe number of the elements is one,epsis a pre-set number of the number,b _bn（i) Normalization layer Compensation for target batchiThe number of the elements is one,e _bn（i) Data mean of target batch normalization layeriThe number of the elements is one,input _sizeis the input dimension of the fully-connected layer,

is the first of the weight matrix of the full connection layeriGo to the firstjThe elements of the column are,s（j) Is the first of the input characteristic quantities of the full connection layerjThe elements of the column are,b _fc（i) For compensation of full connection layeriAnd removing the target batch standardization layer in the first neural network model by using the row elements, and adding a batch standardization formula into the target hidden layer to obtain a second neural network model.

The multimedia data identification method provided by the embodiment of the application is applied to a terminal, the terminal can be an embedded device, and parameters of a second neural network model are quantized and deployed on the embedded device. And when the second neural network model is used, acquiring the original output quantity of the target hidden layer, performing batch standardization on the original output quantity according to a batch standardization processing formula to obtain the processing output quantity of the target hidden layer, and inputting the processing output quantity into the next layer of the target hidden layer.

Fig. 4 is a block diagram of an apparatus for identifying multimedia data according to an embodiment of the present application. Referring to fig. 4, the multimedia data recognition apparatus 400 may include the following modules.

The first obtaining module 410 is configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of the first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to a target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is configured to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer.

The second obtaining module 420 is configured to obtain multimedia data to be identified.

And the identifying module 430 is configured to identify the multimedia data to be identified through the second neural network model.

And a result obtaining module 440, configured to obtain a recognition result of the second neural network model.

A batch normalization process formula comprising:

；

wherein the content of the first and second substances,

，

，

to process the output quantityiThe number of the elements is one,

is the first of the original output quantityiThe number of the elements is one,

normalization layer weight for target batchiThe number of the elements is one,

data variance of normalization layer for target batchiThe number of the elements is one,epsis a pre-set number of the number,b _bn（i) Normalization layer Compensation for target batchiThe number of the elements is one,e _bn（i) Data mean of target batch normalization layeriAnd (4) each element.

The pre-processing parameters of the target batch normalization layer comprise

，

，b _bn（i) Ande _bn（i）。

in summary, the present application provides a multimedia data recognition apparatus, which recognizes multimedia data through a second neural network model to obtain a recognition result. Because the second neural network model does not comprise the target batch normalization layer, but realizes the same function of the target batch normalization layer by the batch normalization processing formula determined according to the preprocessing parameters of the target batch normalization layer, the structure of the second neural network model can be simplified, the operation amount of batch normalization processing in the second neural network model can be reduced, and the processing speed of the neural network model can be accelerated. The problems that the computation amount of the multimedia data identification method is large and the identification speed is slow in the related technology are solved, and the effects of reducing the computation amount of the multimedia data identification method and accelerating the identification speed are achieved.

Fig. 5 is a schematic structural diagram of an identification apparatus 500 for multimedia data according to an embodiment of the present application, where the identification apparatus 500 for multimedia data may be a server or a terminal. Illustratively, as shown in fig. 5, the multimedia data recognition apparatus 500 includes a Central Processing Unit (CPU) 501, a Memory 502, and a system bus 503 connecting the Memory 502 and the CPU 501, and the Memory 502 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM).

Without loss of generality, computer-readable storage media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical, magnetic, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing.

The memory 502 further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU to implement the multimedia data identification method provided in the embodiment of the present application.

The embodiment of the present application further provides an apparatus for identifying multimedia data, where the apparatus for identifying multimedia data includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for identifying multimedia data provided in the above method embodiment.

The embodiment of the present application further provides a computer storage medium, in which at least one instruction, at least one program, a code set, or an instruction set is stored, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the multimedia data identification method provided by the above method embodiment.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for identifying multimedia data, the method comprising:

obtaining a second neural network model, wherein the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula into a target hidden layer of the first neural network model, the first neural network model comprises at least one hidden layer, the target hidden layer of the at least one hidden layer is connected with the target batch normalization layer, the batch normalization processing formula is obtained by preprocessing parameters of the target batch normalization layer, and the batch normalization processing formula is used for processing an original output quantity of the target hidden layer into a processing output quantity and inputting the processing output quantity into the next layer of the target hidden layer;

acquiring multimedia data to be identified;

identifying the multimedia data to be identified through the second neural network model;

obtaining a recognition result of the second neural network model;

the batch normalization processing formula includes:

；

wherein, the

Said

Said

Is the second of the treatment outputiAn element, the

Is the second of the original output quantityiAn element, the

Normalizing the weights of the layers for the target lotiAn element, the

Data variance of normalization layer for the target batchiAn element, theepsFor preset parameters, theb _bn（i) Normalization layer compensation amount for the target batchiAn element, thee _bn（i) Data mean of the target batch normalization layeriAn element;

Said

Saidb _bn（i) And said e_bn（i）。

2. The method of claim 1, wherein obtaining a second neural network model comprises:

acquiring the first neural network model;

acquiring a preprocessing parameter of the target batch normalization layer, wherein the target batch normalization layer is used for processing the original output quantity of the target hidden layer according to the preprocessing parameter to obtain a processing output quantity;

obtaining a batch standardization processing formula of the target batch standardization layer according to the pretreatment parameters;

removing the target batch normalization layer;

3. The method of claim 1, wherein the ith element of the original output quantity when the target hidden layer is a fully-connected layer

The batch normalization processing formula includes:

；

wherein, theinput _sizeIs the input dimension of the fully-connected layer, the

4. The method of claim 1, wherein when the target hidden layer is a convolutional layer, the original output amount corresponding to the jth output channel of the convolutional layer

The batch normalization processing formula includes:

；

wherein, the

A convolution kernel matrix of the i-th element of the processing output quantity corresponding to the J-th input channel and the J-th output channel of the convolution layer

Is the input quantity of the Lth input channel of the convolution layer, the

For the compensation of the jth output channel of the convolutional layer,

，

wherein, in the step (A),

the ith element of the weight vector corresponding to the J output channel of the target batch normalization layer,

the ith element of the variance vector corresponding to the J output channel of the target batch normalization layer,

the ith element of the compensation vector corresponding to the J-th output channel of the target batch normalization layer,

and the ith element of the mean vector corresponding to the J-th output channel of the target batch normalization layer.

5. The method of claim 1, wherein after obtaining the second neural network model, the method further comprises:

6. An apparatus for identifying multimedia data, the apparatus comprising:

a first obtaining module, configured to obtain a second neural network model, where the second neural network model is obtained by removing a target batch normalization layer of a first neural network model and then adding a batch normalization processing formula to a target hidden layer of the first neural network model, the first neural network model includes at least one hidden layer, the target hidden layer in the at least one hidden layer is connected to the target batch normalization layer, the batch normalization processing formula is obtained from a preprocessing parameter of the target batch normalization layer, and the batch normalization processing formula is used to process an original output quantity of the target hidden layer into a processing output quantity and input the processing output quantity into a next layer of the target hidden layer;

the second acquisition module is used for acquiring multimedia data to be identified;

the identification module is used for identifying the multimedia data to be identified through the second neural network model;

the result obtaining module is used for obtaining the identification result of the second neural network model;

the batch normalization processing formula includes:

；

wherein, the

Said

Said

Is the second of the treatment outputiAn element, the

Is the second of the original output quantityiAn element, the

Normalizing the weights of the layers for the target lotiAn element, the

Said

Saidb _bn（i) And the above-mentionede _bn（i）。

7. An identification device of multimedia data, characterized in that it comprises a processor and a memory, in which at least one instruction, at least one program, set of codes or set of instructions is stored, which is loaded and executed by said processor to implement the identification method of multimedia data according to any one of claims 1 to 5.

8. A computer storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of identifying multimedia data according to any one of claims 1 to 5.