WO2020220926A1

WO2020220926A1 - Multimedia data identification method and device

Info

Publication number: WO2020220926A1
Application number: PCT/CN2020/082961
Authority: WO
Inventors: 高岱恒
Original assignee: 北京灵汐科技有限公司
Priority date: 2019-04-28
Filing date: 2020-04-02
Publication date: 2020-11-05
Also published as: CN111860053B; CN111860053A

Abstract

A multimedia data identification method and device, wherein the method comprises: inputting multimedia data to be identified into a pre-constructed neural network structure; wherein, the neural network structure comprises an adaptive leakage-integration-emission ALIF timing model, and the ALIF timing model comprises a multi-layer ALIF network layer; the multimedia data to be identified comprises image data and/or video data (S301); identifying and calculating the multimedia data to be identified through the multi-layer ALIF network layer in the neural network structure, and outputting a calculation result (S302). A novel algorithm ALIF that integrates SNN with ANN is proposed, can show better identification capability and anti-noise capability in fuzzy image identification tasks, the model is more robust and can effectively identify whether a target object is contained in the scene.

Description

Method and device for recognizing multimedia data

Technical field

The present invention relates to the technical field of deep learning, in particular to a method and device for identifying multimedia data.

Background technique

Deep learning refers to a collection of algorithms that use various machine learning algorithms to solve various problems such as images and texts on a multilayer neural network. Deep learning can be classified into neural networks from a broad category, but there are many changes in specific implementation. The core of deep learning is feature learning, which aims to obtain hierarchical feature information through a hierarchical network, so as to solve the important problem of manually designing features in the past. Deep learning is a framework that includes many important algorithms such as Convolutional Neural Networks (CNN), AutoEncoder, Sparse Coding, Restricted Boltzmann Machine (RBM), and confidence Networks (Deep Belief Networks, DBN) and multi-layer feedback loop neural network (Recurrent Neural Network, RNN) and other neural networks.

For different problems (image, voice, text), different network models need to be selected to achieve better results. Before the advent of deep learning, commonly used machine algorithms such as Support Vector Machine (SVM), etc., were also widely used in various tasks. At present, what we call Artificial Intelligence (AI) mainly refers to deep learning algorithms represented by neural network models and machine learning algorithms represented by SVM.

Due to the defects of Artificial Neural Network (ANN) methods represented by deep learning-poor interpretability, low level of biological simulation and other reasons, some people began to turn their attention to brain-inspired computing In the field, the third-generation neural network represented by Spike Neural Network (SNN) has begun to receive widespread attention. Compared with ANN, SNN has very low power consumption, which means that low power consumption means that it is possible to approximate tens of billions of neuron cells that can approximate the human brain (because the essence of deep learning is to deepen and increase the neural network , So that the parameter amount has exploded). In addition, SNN also has stronger biological rationality. Whether it is from the Leaky Integrate-and-Fire (LIF) model proposed by the French physiological scientist Louis Lapicque in 1907, or the Hodge invented by Hodgkin and Huxley at Trinity College, Cambridge University in the mid-20th century. The Hodgkin-Huxley (HH) model is based on the real biological brain to analyze the working mechanism of neurons and their responses to different levels of stimulation.

However, since pure SNN can only receive discrete signal input, our real-world tasks are basically continuous input. The current research on signal conversion is not so in-depth, and SNN algorithms are mainly used in the design and development of neuromorphic chips. It can be said that SNN has not shown great power in various practical tasks such as target recognition, object classification, image generation and so on like ANN.

Image recognition is a classic problem in the field of computer vision. With the rapid development of AI technology represented by deep learning, the field of image recognition has attracted the attention of many researchers. However, in the field of fuzzy image recognition, because it is difficult to evaluate the data distribution form of fuzzy and noise and simulate and model it, the existing ANN-based algorithms are difficult to achieve the recognition ability similar to human.

At present, the classic fuzzy image recognition process can be divided into two steps: 1) remove the noise and blur of the image; 2) perform image recognition on the denoised image. We know that the source of noise in images is usually caused by violent spatial scene conversion or shooting techniques or devices (such as shooting equipment with too low resolution). As a highly ill-posed problem, the denoising of images/videos usually requires a large amount of prior knowledge (such as knowledge distilling of various possible noises). For noise/blurring When the source is relatively fixed, there are 3 common denoising modes based on a priori:

1. Tony Chen from UCLA is equal to the "full variational blind deconvolution" proposed in 1998;

2. Levin.L is equal to 2009 based on Tony Chen's total variational blind deconvolution, and proposed a new method considering the priori of sparse images;

3. Freeman is equal to the heavy-tail gradient prior proposed in 2008, which can effectively remove the blur caused by the photographer's hand shaking from a single image.

These algorithms use a coarse-to-fine maximum posterior probability (MAP) framework to estimate the blur kernel, but the problem with this type of algorithm is that it is time-consuming and does not work well for low-resolution images.

Since 2012, with the renewed popularity of deep learning, many CNN-based image deblurring algorithms have been proposed. For example, Professor Sun Jian of Xi'an Jiaotong University proposed a method to estimate the direction of the blur kernel of each small area in the image. End-to-end CNN architecture. Researchers from Seoul University in South Korea have proposed a unified framework that adds timing information (that can operate on video sequences), which can effectively deblur video/images and perform super-resolution reconstruction, and use optical flow information to control motion Make an estimate (effective deblurring).

The situation of highly alienated fuzzy sources, as shown in Figure 1 (airports with airplanes), because the original image itself contains insufficient information for the CNN-type model to recognize, so compared to CNN, the fuzzy image sequence of time-dimensional information is added. It is easier to be identified (based on the assumption that the region of interest in the image can be collected by the RNN-like timing model over time).

Although the previous methods have achieved good results in image/video denoising and are conducive to the recognition task, the motion blur kernel estimation or sequence modeling is highly sensitive to irregular and dense noise. In addition, as the noise and blur in the image/video increase, the success rate of the recognition of the timing model will also be significantly reduced.

Summary of the invention

In view of the above problems, the present invention provides a method and device for recognizing multimedia data that overcomes the above problems or at least partially solves the above problems.

According to one aspect of the present invention, there is provided a method for identifying multimedia data, including:

Input multimedia data to be identified into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; the to-be-identified Multimedia data includes image data and/or video data;

The multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.

Optionally, for any ALIF network layer, the neuron output is calculated by the following formula:

y _t =σ(v _t +δ)

Among them, t represents the t-th time step, y _t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f _thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v _t represents the membrane potential at the t-th time step.

Optionally, the membrane potential v _{t at} the t-th time step is calculated by the following formula:

v _t = W _x x _t +αv _t-1

Among them, v _t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x _t represents the input of the ALIF network layer; v _t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;

If y _t ≥ f _thres , then v _t '=v _t- β, and β represents a preset parameter.

Optionally, the shape of the Wx is: the data dimension input at each time step×the number of units of the ALIF network layer.

Optionally, the image data and/or video data are blurred image data and/or video data.

According to another aspect of the present invention, there is also provided a multimedia data recognition device, including:

The data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF network Layer; the multimedia data to be identified includes image data and/or video data;

The data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.

Optionally, the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:

y _t =σ(v _t +δ)

Optionally, the data calculation module is further configured to calculate the membrane potential v _{t at} the t-th time step by the following formula:

v _t = W _x x _t +αv _t-1

According to another aspect of the present invention, there is also provided a storage device in which a computer program is stored. When the computer program runs in an electronic device, it is loaded and executed by the processor of the electronic device. Multimedia data recognition method.

According to another aspect of the present invention, there is also provided an electronic device, including:

A processor for running computer programs; and

The storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any one of the above.

The present invention proposes a new algorithm ALIF that combines SNN and ANN. Under the premise that the weight is significantly less than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and resistance in fuzzy image recognition tasks. Noise capability, the model is more robust, and can effectively identify whether the scene contains target objects.

The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.

Based on the following detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings, those skilled in the art will better understand the above and other objectives, advantages and features of the present invention.

Description of the drawings

By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only used for the purpose of illustrating the preferred embodiments, and are not considered as a limitation to the present invention. Also, throughout the drawings, the same reference symbols are used to denote the same components. In the attached picture:

Figure 1 shows a schematic diagram of a blurred image;

Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention;

Figure 3 shows a schematic flowchart of a method for identifying multimedia data according to an embodiment of the present invention;

Figure 4 shows a schematic diagram of a neuron;

Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention;

Fig. 6 shows a schematic diagram of a training picture of an input neural network according to an embodiment of the present invention;

Fig. 7 shows a schematic diagram of a test picture of an input neural network according to an embodiment of the present invention;

FIG. 8 shows a comparison diagram of experimental results of neural networks for different realization models according to an embodiment of the present invention;

Fig. 9 shows a schematic structural diagram of a multimedia data recognition device according to an embodiment of the present invention.

Detailed ways

Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

The embodiment of the present invention proposes an adaptive leak-integrate-and-fire (ALIF) timing algorithm model that integrates SNN and ANN, and uses it to perform fuzzy image recognition. Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention. As shown in Figure 2, because ALIF is a time series model, the input of the neural network shown in Figure 2 is a continuous blurred image, and the continuous image of the neural network is input The time step of the sequence (GoI, Group of Images) has five situations of 2, 5, 10, 15, and 20. In addition, for comparison, we added RNN, LSTM and other timing models for comparison. The FC in Figure 2 means a fully connected layer.

The neural network structure provided by the embodiment of the present invention mainly lies in ALIF, which is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation The reference to the working mechanism of the pulse neural network is more reasonable at the biological level.

FIG. 3 shows a schematic flow chart of a method for identifying multimedia data according to an embodiment of the present invention. The multimedia data may include fuzzy image data or fuzzy video data. As shown in FIG. 3, the method for identifying multimedia data provided by an embodiment of the present invention may include:

Step S301, input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; The multimedia data to be identified includes image data and/or video data; the multimedia data to be identified is fuzzy image data or fuzzy video data, that is, multimedia data with low pixels/large noise.

Step S302: Perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.

In the spiking neural network model, referring to Figure 4, we can see that the bifurcated structure connected to the cell membrane is called dendrites, which are inputs, and that long "tail" is called axons, which are outputs. Neurons output electrical and chemical signals. The most important is an electrical pulse that travels along the surface of the axon cell membrane. Both dendrites and axons have a large number of branches. The end of the axon is usually connected to the dendrites of other cells. The connection point is a structure called "synapse". The output of a neuron is transmitted to thousands of downstream neurons through the synapse. A neuron has thousands of upstream neurons, accumulating their input and producing output.

Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention. As shown in Fig. 5, the calculation logic of the ALIF network layer is similar to the RNN, but this embodiment makes it more unique by adding random noise, adaptive transmission module and other content. In Figure 5, v _t,1 indicates that the current layer is 1, the membrane potential of the t-th time step, x _t indicates the input of the ALIF network layer, and y _t indicates the output of the neuron at the t-th time step of the ALIF network layer.

The calculation logic can be expressed as follows (taking the current time step as an example for illustration). In the above step S302, the neuron output at each time step of any ALIF network layer can be calculated by the following formula:

y _t =σ(v _t +δ)

Among them, t represents the t-th time step, y _t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f _thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v _t represents the membrane potential at the t-th time step. Among them, the f _thres algorithm is an algorithm that performs statistics based on the emission value of the data and adaptively adjusts the emission threshold. Suppose there are 5 values to be transmitted, respectively 0.5, 0.4, 0.7, 0.8, 0.9. The currently set f _thres = 0.8, then only 0.8 and 0.9 can be successfully transmitted (to achieve transmission), and the three values of 0.5, 0.4, and 0.7 are not transmitted by the neuron.

Further, the membrane potential v _{t at} the t-th time step is calculated by the following formula:

v _t = W _x x _t +αv _t-1

Among them, W _x represents the two-dimensional weight matrix that changes the input in the ALIF time series model; x _t represents the input of the ALIF network layer; v _t-1 represents the membrane potential at the t-1 time step; α represents the preset matrix.

Wx is a two-dimensional weight matrix that changes the input in the time series model. The matrix shape of the two-dimensional weight matrix is input_dim (the data dimension of each time step input (fixed)) × unit of ALIF (the number of units of the ALIF network layer, Consistent with the concepts in time series models such as RNN), α can represent a preset matrix, which is used to replace W _h (where h refers to Hidden (hidden)) that is matrix multiplied with membrane potential v _t , which is a 1×unit of ALIF's matrix. Compared with the W _h of the normal unit of ALIF×unit of ALIF, especially when the unit is very large, it saves a huge weight, which can effectively improve the calculation speed.

Optionally, if y _t ≥f _thres , then v _t '=v _t -β

If there is a value greater than f _thres in the activation level y _t , then the membrane potential corresponding to this position will be subtracted from the preset parameter β, which is equivalent to the recovery potential in SNN, that is, the membrane potential returns to its original position. Note that all the above parameters are updated using the back propagation mechanism. The emission threshold of f _thres is adaptively adjusted according to the activation level distribution of the current time step, which is not limited in the present invention.

The specific calculation logic of the ALIF timing model provided by the embodiment of the present invention may be as follows. Take the forward propagation process of the first layer when the time step is t as an example, where only f _thres is a scalar.

Input: y _t,l-1 ,v _t-1,l

Parameters: W _l-1 , b _l-1 , α, β, δ

Output: y _t,l , v _t,l

step1 (step 1), calculate the hidden state h _t,l

x _t,l =y _t,l-1

h _t,l =W _l-1 x _t,l +b _l-1

step2 (step 2), update the membrane potential v _t,l

v _t,l =h _t,l +αv _t-1,l

step3 (step 3), get the activated y _t,l

y _t,l =σ(v _t,l +δ)

step4 (step 4), update f _thres through Adaptive learning method

f _thres,l =update(f _thres,l ), the specific implementation logic is as follows:

1) Set the hyperparameter p ₁ ＝85%, p ₂ ＝98%, lr _thres ＝0.001

2) If f _thres,l <p ₁ or f _thres,l >p _{2 of} y _t,l , continue to judge:

If f _thres,l <p ₁ , then f _thres,l = f _thres,l +lr _thres

If not, then f _thres,l = f _thres,l -lr _thres

End judgment

Return f _thres,l

step5 (step 5), regularize v _t,l according to y _t,l and f _thres _,l

v _t,l =v _t,l -β·step(y _t,l -f _thres,l )

step6 (step 6), limit y _t,l by changing the limit

y _t,l = clip(y _t,l ,0,f _thres,l )

Based on the solutions provided by the foregoing embodiment, the embodiment of the present invention recognizes the neural network structure using different time series models for the training pictures and test pictures of the input neural network shown in FIG. 6 and FIG. 7. If the input data is video data, the video data can be understood as a discrete picture sequence. As shown in Figure 6 and Figure 7, when training the image, the original image is randomly rotated by plus or minus 15 degrees, and the test image is based on Gaussian blur and salt noise (salt noise). noise), take a time step of 10 as an example, the test picture takes 30% of all pictures as an example, and the leftmost picture represents the original picture.

Figure 8 shows a comparison diagram of the experimental results of neural networks for different implementation models. For ALIF, CNN, MLP and ConvSNN, test the recognition accuracy after changing the proportion of salt noise in the whole image. The experimental results shown in Figure 8 show that the neural network structure based on the ALIF time series model is better than CNN, MLP and ConvSNN in identifying fuzzy pictures/videos with different noise ratios.

After many experiments, based on the recognition method provided by the embodiment of the present invention, by using ALIF to replace common time series models such as RNN and LSTM, it has very strong robustness to fuzzy image recognition. The structure of ALIF/ConvALIF2D is a complete, low-complexity network layer for modeling time and spatial information. In the field of object recognition under noisy conditions, there is a nearly 10% improvement, which is calculated in the field of computer vision The above is a milestone (milestone) achievement.

Based on the same inventive concept, the embodiment of the present invention also provides a multimedia data recognition device. As shown in FIG. 9, the multimedia data recognition device may include:

The data input module 910 is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;

The data calculation module 920 is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.

In an optional embodiment of the present invention, the data calculation module 920 is further configured to: for any ALIF network layer, use the following formula to calculate the output of the neuron:

y _t =σ(v _t +δ)

In an optional embodiment of the present invention, the data calculation module 920 is further configured to calculate the membrane potential v _{t at} the t-th time step by the following formula:

v _t = W _x x _t +αv _t-1

In an optional embodiment of the present invention, the shape of the Wx is: the data dimension input at each time step×the number of units of the ALIF network layer.

Based on the same inventive concept, an embodiment of the present invention also provides a storage device, in which a computer program is stored, and when the computer program is running in an electronic device, it is loaded by the processor of the electronic device and executes any of the above-mentioned embodiments. Said multimedia data recognition method.

Based on the same inventive concept, an embodiment of the present invention also provides an electronic device, including:

A processor for running computer programs; and

The storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any of the above embodiments.

The present invention proposes a fuzzy image/video recognition method considering time sequence information and spatial information (based on the fusion of SNN and ANN methods). The basic idea of the technical scheme proposed by the present invention is: Integrating the advantages of SNN and ANN methods into one, designing a new type of timing model, thereby effectively extracting the region of interest in the video sequence, and effectively increasing the noise Image recognition ability in the case of complex sources and large impact on pictures/videos. In addition, the model provided in this embodiment can also incorporate convolution, that is, a form similar to ConvLSTM2D, which constructs the special ConvALIF2D of this embodiment. Compared with the time series model method, it combines a new algorithm of SNN and ANN. ALIF, under the premise that the weight is significantly smaller than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and anti-noise ability in fuzzy image recognition tasks, and the model is more robust and can effectively identify Whether there are target objects in the scene.

In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures and technologies are not shown in detail, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the present disclosure and help understand one or more of the various inventive aspects, in the above description of the exemplary embodiments of the present invention, the various features of the present invention are sometimes grouped together into a single embodiment, Figure, or its description. However, the disclosed method should not be interpreted as reflecting the intention that the claimed invention requires more features than those explicitly stated in each claim. More precisely, as reflected in the following claims, the inventive aspect lies in less than all the features of a single embodiment disclosed previously. Therefore, the claims following the specific embodiment are thus explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the present invention.

Those skilled in the art can understand that it is possible to adaptively change the modules in the device in the embodiment and set them in one or more devices different from the embodiment. The modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are within the scope of the present invention. Within and form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.

It should be noted that the above-mentioned embodiments illustrate the present invention rather than limit the present invention, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be constructed as a limitation to the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of multiple such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims enumerating several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.

So far, those skilled in the art should realize that although a number of exemplary embodiments of the present invention have been illustrated and described in detail herein, they can still be disclosed according to the present invention without departing from the spirit and scope of the present invention. The content directly determines or deduces many other variations or modifications that conform to the principles of the present invention. Therefore, the scope of the present invention should be understood and deemed to cover all these other variations or modifications.

Claims

A method for identifying multimedia data, including:

Input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; Identify multimedia data including image data and/or video data;

The multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.
The method according to claim 1, wherein, for any ALIF network layer, the neuron output is calculated by the following formula:

y t =σ(v t +δ)

Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
The method according to claim 2, wherein the membrane potential v t at the t-th time step is calculated by the following formula:

v t = W x x t +αv t-1

Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;

If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
The method according to claim 3, wherein the shape of the Wx is: data dimension input at each time step×the number of units of the ALIF network layer.
The method according to any one of claims 1 to 4, wherein the image data and/or video data are blurred image data and/or video data.
A multimedia data recognition device includes:

The data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;

The data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
The device according to claim 6, wherein the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:

y t =σ(v t +δ)

Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
The device according to claim 7, wherein the data calculation module is further configured to calculate the membrane potential v t at the t-th time step by the following formula:

v t = W x x t +αv t-1

Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;

If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
A storage device, wherein a computer program is stored, and when the computer program is running in an electronic device, the processor of the electronic device loads and executes the multimedia data identification method according to any one of claims 1-4.
An electronic device including:

A processor for running computer programs; and

The storage device is configured to store a computer program, which is loaded by the processor when the computer program is running in the electronic device and executes the multimedia data identification method according to any one of claims 1-5.