WO2020220926A1 - Multimedia data identification method and device - Google Patents

Multimedia data identification method and device Download PDF

Info

Publication number
WO2020220926A1
WO2020220926A1 PCT/CN2020/082961 CN2020082961W WO2020220926A1 WO 2020220926 A1 WO2020220926 A1 WO 2020220926A1 CN 2020082961 W CN2020082961 W CN 2020082961W WO 2020220926 A1 WO2020220926 A1 WO 2020220926A1
Authority
WO
WIPO (PCT)
Prior art keywords
alif
time step
multimedia data
data
network layer
Prior art date
Application number
PCT/CN2020/082961
Other languages
French (fr)
Chinese (zh)
Inventor
高岱恒
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2020220926A1 publication Critical patent/WO2020220926A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of deep learning, in particular to a method and device for identifying multimedia data.
  • Deep learning refers to a collection of algorithms that use various machine learning algorithms to solve various problems such as images and texts on a multilayer neural network. Deep learning can be classified into neural networks from a broad category, but there are many changes in specific implementation.
  • the core of deep learning is feature learning, which aims to obtain hierarchical feature information through a hierarchical network, so as to solve the important problem of manually designing features in the past.
  • Deep learning is a framework that includes many important algorithms such as Convolutional Neural Networks (CNN), AutoEncoder, Sparse Coding, Restricted Boltzmann Machine (RBM), and confidence Networks (Deep Belief Networks, DBN) and multi-layer feedback loop neural network (Recurrent Neural Network, RNN) and other neural networks.
  • CNN Convolutional Neural Networks
  • RBM Restricted Boltzmann Machine
  • RNN Recurrent Neural Network
  • SNN Spike Neural Network
  • HH Hodgkin-Huxley
  • Image recognition is a classic problem in the field of computer vision. With the rapid development of AI technology represented by deep learning, the field of image recognition has attracted the attention of many researchers. However, in the field of fuzzy image recognition, because it is difficult to evaluate the data distribution form of fuzzy and noise and simulate and model it, the existing ANN-based algorithms are difficult to achieve the recognition ability similar to human.
  • the classic fuzzy image recognition process can be divided into two steps: 1) remove the noise and blur of the image; 2) perform image recognition on the denoised image.
  • the source of noise in images is usually caused by violent spatial scene conversion or shooting techniques or devices (such as shooting equipment with too low resolution).
  • the denoising of images/videos usually requires a large amount of prior knowledge (such as knowledge distilling of various possible noises).
  • prior knowledge such as knowledge distilling of various possible noises.
  • Freeman is equal to the heavy-tail gradient prior proposed in 2008, which can effectively remove the blur caused by the photographer's hand shaking from a single image.
  • the motion blur kernel estimation or sequence modeling is highly sensitive to irregular and dense noise.
  • the success rate of the recognition of the timing model will also be significantly reduced.
  • the present invention provides a method and device for recognizing multimedia data that overcomes the above problems or at least partially solves the above problems.
  • a method for identifying multimedia data including:
  • Input multimedia data to be identified into a pre-built neural network structure wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers;
  • the to-be-identified Multimedia data includes image data and/or video data;
  • the multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.
  • the neuron output is calculated by the following formula:
  • t represents the t-th time step
  • y t represents the output of the neuron in the ALIF network layer at the t-th time step
  • represents the activation function including the adaptive adjustment of the f thres algorithm
  • represents the setting to simulate the random noise of the brain Tensor
  • v t represents the membrane potential at the t-th time step.
  • the membrane potential v t at the t-th time step is calculated by the following formula:
  • v t represents the membrane potential at the t-th time step
  • Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
  • x t represents the input of the ALIF network layer
  • v t-1 represents the t-1 time Step membrane potential
  • represents the preset matrix
  • the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
  • the image data and/or video data are blurred image data and/or video data.
  • a multimedia data recognition device including:
  • the data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF network Layer; the multimedia data to be identified includes image data and/or video data;
  • the data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
  • the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:
  • t represents the t-th time step
  • y t represents the output of the neuron in the ALIF network layer at the t-th time step
  • represents the activation function including the adaptive adjustment of the f thres algorithm
  • represents the setting to simulate the random noise of the brain Tensor
  • v t represents the membrane potential at the t-th time step.
  • the data calculation module is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
  • v t represents the membrane potential at the t-th time step
  • Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
  • x t represents the input of the ALIF network layer
  • v t-1 represents the t-1 time Step membrane potential
  • represents the preset matrix
  • the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
  • a storage device in which a computer program is stored.
  • the computer program runs in an electronic device, it is loaded and executed by the processor of the electronic device.
  • Multimedia data recognition method Multimedia data recognition method.
  • an electronic device including:
  • a processor for running computer programs and
  • the storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any one of the above.
  • the present invention proposes a new algorithm ALIF that combines SNN and ANN. Under the premise that the weight is significantly less than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and resistance in fuzzy image recognition tasks. Noise capability, the model is more robust, and can effectively identify whether the scene contains target objects.
  • Figure 1 shows a schematic diagram of a blurred image
  • Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention
  • Figure 3 shows a schematic flowchart of a method for identifying multimedia data according to an embodiment of the present invention
  • Figure 4 shows a schematic diagram of a neuron
  • Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention
  • Fig. 6 shows a schematic diagram of a training picture of an input neural network according to an embodiment of the present invention
  • Fig. 7 shows a schematic diagram of a test picture of an input neural network according to an embodiment of the present invention.
  • FIG. 8 shows a comparison diagram of experimental results of neural networks for different realization models according to an embodiment of the present invention.
  • Fig. 9 shows a schematic structural diagram of a multimedia data recognition device according to an embodiment of the present invention.
  • the embodiment of the present invention proposes an adaptive leak-integrate-and-fire (ALIF) timing algorithm model that integrates SNN and ANN, and uses it to perform fuzzy image recognition.
  • Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention. As shown in Figure 2, because ALIF is a time series model, the input of the neural network shown in Figure 2 is a continuous blurred image, and the continuous image of the neural network is input.
  • the time step of the sequence (GoI, Group of Images) has five situations of 2, 5, 10, 15, and 20.
  • RNN, LSTM and other timing models for comparison we added RNN, LSTM and other timing models for comparison.
  • the FC in Figure 2 means a fully connected layer.
  • the neural network structure provided by the embodiment of the present invention mainly lies in ALIF, which is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation
  • ALIF is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation
  • the reference to the working mechanism of the pulse neural network is more reasonable at the biological level.
  • FIG. 3 shows a schematic flow chart of a method for identifying multimedia data according to an embodiment of the present invention.
  • the multimedia data may include fuzzy image data or fuzzy video data.
  • the method for identifying multimedia data provided by an embodiment of the present invention may include:
  • Step S301 input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers;
  • the multimedia data to be identified includes image data and/or video data; the multimedia data to be identified is fuzzy image data or fuzzy video data, that is, multimedia data with low pixels/large noise.
  • Step S302 Perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
  • Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention.
  • the calculation logic of the ALIF network layer is similar to the RNN, but this embodiment makes it more unique by adding random noise, adaptive transmission module and other content.
  • v t,1 indicates that the current layer is 1, the membrane potential of the t-th time step
  • x t indicates the input of the ALIF network layer
  • y t indicates the output of the neuron at the t-th time step of the ALIF network layer.
  • the calculation logic can be expressed as follows (taking the current time step as an example for illustration).
  • the neuron output at each time step of any ALIF network layer can be calculated by the following formula:
  • t represents the t-th time step
  • y t represents the output of the neuron in the ALIF network layer at the t-th time step
  • represents the activation function including the adaptive adjustment of the f thres algorithm
  • represents the setting to simulate the random noise of the brain Tensor
  • v t represents the membrane potential at the t-th time step.
  • membrane potential v t at the t-th time step is calculated by the following formula:
  • W x represents the two-dimensional weight matrix that changes the input in the ALIF time series model
  • x t represents the input of the ALIF network layer
  • v t-1 represents the membrane potential at the t-1 time step
  • represents the preset matrix
  • Wx is a two-dimensional weight matrix that changes the input in the time series model.
  • the matrix shape of the two-dimensional weight matrix is input_dim (the data dimension of each time step input (fixed))
  • ⁇ unit of ALIF the number of units of the ALIF network layer, Consistent with the concepts in time series models such as RNN
  • can represent a preset matrix, which is used to replace W h (where h refers to Hidden (hidden)) that is matrix multiplied with membrane potential v t , which is a 1 ⁇ unit of ALIF's matrix.
  • the membrane potential corresponding to this position will be subtracted from the preset parameter ⁇ , which is equivalent to the recovery potential in SNN, that is, the membrane potential returns to its original position. Note that all the above parameters are updated using the back propagation mechanism.
  • the emission threshold of f thres is adaptively adjusted according to the activation level distribution of the current time step, which is not limited in the present invention.
  • the specific calculation logic of the ALIF timing model may be as follows. Take the forward propagation process of the first layer when the time step is t as an example, where only f thres is a scalar.
  • step1 step 1), calculate the hidden state h t,l
  • step2 update the membrane potential v t,l
  • step3 (step 3), get the activated y t,l
  • step4 update f thres through Adaptive learning method
  • step5 step 5
  • step 5 regularize v t,l according to y t,l and f thres ,l
  • step6 step 6
  • step 6 limit y t,l by changing the limit
  • the embodiment of the present invention recognizes the neural network structure using different time series models for the training pictures and test pictures of the input neural network shown in FIG. 6 and FIG. 7.
  • the input data is video data
  • the video data can be understood as a discrete picture sequence.
  • the original image is randomly rotated by plus or minus 15 degrees
  • the test image is based on Gaussian blur and salt noise (salt noise). noise)
  • the test picture takes 30% of all pictures as an example, and the leftmost picture represents the original picture.
  • Figure 8 shows a comparison diagram of the experimental results of neural networks for different implementation models.
  • ALIF For ALIF, CNN, MLP and ConvSNN, test the recognition accuracy after changing the proportion of salt noise in the whole image.
  • the experimental results shown in Figure 8 show that the neural network structure based on the ALIF time series model is better than CNN, MLP and ConvSNN in identifying fuzzy pictures/videos with different noise ratios.
  • the embodiment of the present invention also provides a multimedia data recognition device.
  • the multimedia data recognition device may include:
  • the data input module 910 is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;
  • the data calculation module 920 is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
  • the data calculation module 920 is further configured to: for any ALIF network layer, use the following formula to calculate the output of the neuron:
  • t represents the t-th time step
  • y t represents the output of the neuron in the ALIF network layer at the t-th time step
  • represents the activation function including the adaptive adjustment of the f thres algorithm
  • represents the setting to simulate the random noise of the brain Tensor
  • v t represents the membrane potential at the t-th time step.
  • the data calculation module 920 is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
  • v t represents the membrane potential at the t-th time step
  • Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
  • x t represents the input of the ALIF network layer
  • v t-1 represents the t-1 time Step membrane potential
  • represents the preset matrix
  • the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
  • an embodiment of the present invention also provides a storage device, in which a computer program is stored, and when the computer program is running in an electronic device, it is loaded by the processor of the electronic device and executes any of the above-mentioned embodiments. Said multimedia data recognition method.
  • an embodiment of the present invention also provides an electronic device, including:
  • a processor for running computer programs and
  • the storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any of the above embodiments.
  • the present invention proposes a fuzzy image/video recognition method considering time sequence information and spatial information (based on the fusion of SNN and ANN methods).
  • the basic idea of the technical scheme proposed by the present invention is: Integrating the advantages of SNN and ANN methods into one, designing a new type of timing model, thereby effectively extracting the region of interest in the video sequence, and effectively increasing the noise Image recognition ability in the case of complex sources and large impact on pictures/videos.
  • the model provided in this embodiment can also incorporate convolution, that is, a form similar to ConvLSTM2D, which constructs the special ConvALIF2D of this embodiment. Compared with the time series model method, it combines a new algorithm of SNN and ANN.
  • ALIF under the premise that the weight is significantly smaller than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and anti-noise ability in fuzzy image recognition tasks, and the model is more robust and can effectively identify Whether there are target objects in the scene.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A multimedia data identification method and device, wherein the method comprises: inputting multimedia data to be identified into a pre-constructed neural network structure; wherein, the neural network structure comprises an adaptive leakage-integration-emission ALIF timing model, and the ALIF timing model comprises a multi-layer ALIF network layer; the multimedia data to be identified comprises image data and/or video data (S301); identifying and calculating the multimedia data to be identified through the multi-layer ALIF network layer in the neural network structure, and outputting a calculation result (S302). A novel algorithm ALIF that integrates SNN with ANN is proposed, can show better identification capability and anti-noise capability in fuzzy image identification tasks, the model is more robust and can effectively identify whether a target object is contained in the scene.

Description

一种多媒体数据识别方法及装置Method and device for recognizing multimedia data 技术领域Technical field
本发明涉及深度学习技术领域,特别是涉及一种多媒体数据识别方法及装置。The present invention relates to the technical field of deep learning, in particular to a method and device for identifying multimedia data.
背景技术Background technique
深度学习是指多层神经网络上运用各种机器学习算法解决图像,文本等各种问题的算法集合。深度学习从大类上可以归入神经网络,不过在具体实现上有许多变化。深度学习的核心是特征学习,旨在通过分层网络获取分层次的特征信息,从而解决以往需要人工设计特征的重要难题。深度学习是一个框架,包含多个重要算法如卷积神经网络(Convolutional Neural Networks,CNN)、自动编码器AutoEncoder、稀疏编码Sparse Coding、限制波尔兹曼机(Restricted Boltzmann Machine,RBM)、深信度网络(Deep Belief Networks,DBN)以及多层反馈循环神经网络(Recurrent neural Network,RNN)等神经网络。Deep learning refers to a collection of algorithms that use various machine learning algorithms to solve various problems such as images and texts on a multilayer neural network. Deep learning can be classified into neural networks from a broad category, but there are many changes in specific implementation. The core of deep learning is feature learning, which aims to obtain hierarchical feature information through a hierarchical network, so as to solve the important problem of manually designing features in the past. Deep learning is a framework that includes many important algorithms such as Convolutional Neural Networks (CNN), AutoEncoder, Sparse Coding, Restricted Boltzmann Machine (RBM), and confidence Networks (Deep Belief Networks, DBN) and multi-layer feedback loop neural network (Recurrent Neural Network, RNN) and other neural networks.
对于不同问题(图像,语音,文本),需要选用不同网络模型才能达到更好效果。在深度学习出现之前,常用的机器算法如支持向量机(Support Vector Machine,SVM)等,亦被广泛应用于各类任务当中。目前,我们所说的人工智能(Artificial Intelligence,AI),主要是指以神经网络模型为代表的深度学习算法和以SVM为代表的机器学习算法。For different problems (image, voice, text), different network models need to be selected to achieve better results. Before the advent of deep learning, commonly used machine algorithms such as Support Vector Machine (SVM), etc., were also widely used in various tasks. At present, what we call Artificial Intelligence (AI) mainly refers to deep learning algorithms represented by neural network models and machine learning algorithms represented by SVM.
由于以深度学习为代表的人工神经网络(Artificial Neural Network,ANN)类方法有缺陷--可解释性差,生物模拟层次低等种种原因,开始有人将注意力转到了类脑(brain-inspired)计算领域,以脉冲神经网络(Spiking Neural Network,SNN)为代表的第3代神经网络开始得到了广泛的关注。相比ANN,SNN具有很低的功耗,这意味着低功耗意味着可以近似的模拟人脑的数百亿的神经元细胞成为可能(因为深度学习的本质就是将神经网络加深、加大,使其参数量爆增)。此外,SNN还具备更强的生物合理性。无论是从1907年法国生理科学家Louis Lapicque提出的泄露-集成-发射(Leaky Integrate-and-Fire,LIF)模型,还是20世纪中叶剑桥大学三一学院的霍奇金和赫胥黎发明的霍奇金-赫胥黎(Hodgkin-Huxley,HH)模型,都是从真正的生物大脑出发,分析神经元的工作机 制以及其受到不同程度刺激下的反应。Due to the defects of Artificial Neural Network (ANN) methods represented by deep learning-poor interpretability, low level of biological simulation and other reasons, some people began to turn their attention to brain-inspired computing In the field, the third-generation neural network represented by Spike Neural Network (SNN) has begun to receive widespread attention. Compared with ANN, SNN has very low power consumption, which means that low power consumption means that it is possible to approximate tens of billions of neuron cells that can approximate the human brain (because the essence of deep learning is to deepen and increase the neural network , So that the parameter amount has exploded). In addition, SNN also has stronger biological rationality. Whether it is from the Leaky Integrate-and-Fire (LIF) model proposed by the French physiological scientist Louis Lapicque in 1907, or the Hodge invented by Hodgkin and Huxley at Trinity College, Cambridge University in the mid-20th century. The Hodgkin-Huxley (HH) model is based on the real biological brain to analyze the working mechanism of neurons and their responses to different levels of stimulation.
但是,由于纯粹的SNN只能接收离散信号输入,而我们现实世界的任务基本都是连续输入。而目前对信号转换的研究还没有那么深入,而且,SNN类算法主要被用于神经形态芯片的设计和开发中。可以说,SNN还没有像ANN一样在例如目标识别、物体分类、图像生成等种种实际任务中大显神威。However, since pure SNN can only receive discrete signal input, our real-world tasks are basically continuous input. The current research on signal conversion is not so in-depth, and SNN algorithms are mainly used in the design and development of neuromorphic chips. It can be said that SNN has not shown great power in various practical tasks such as target recognition, object classification, image generation and so on like ANN.
图像识别,是计算机视觉领域的一个经典问题,随着以深度学习为代表的AI技术的飞速发展,图像识别领域受到了许多研究人员的关注。但是,在模糊图像识别领域,由于难以评估模糊和噪声的数据分布形式并对其进行模拟建模,现有的基于ANN的算法难以达到与近似人类的识别能力。Image recognition is a classic problem in the field of computer vision. With the rapid development of AI technology represented by deep learning, the field of image recognition has attracted the attention of many researchers. However, in the field of fuzzy image recognition, because it is difficult to evaluate the data distribution form of fuzzy and noise and simulate and model it, the existing ANN-based algorithms are difficult to achieve the recognition ability similar to human.
目前,经典的模糊图像识别过程可以分为2个步骤:1)去除图像的噪声和模糊;2)对去噪后的图像进行图像识别。我们知道,图像中噪声的来源通常是由于剧烈的空间场景转换或由拍摄手法或装置(比如分辨率过低的拍摄设备)导致的。作为一个高度的病态问题(highly ill-posed problem),图像/视频的去噪通常需要依赖于大量的先验知识(比如对各种可能的噪声进行知识蒸馏(knowledge distilling)),对于噪声/模糊来源比较固定的情况,有3种比较常用的基于先验方式的去噪模式:At present, the classic fuzzy image recognition process can be divided into two steps: 1) remove the noise and blur of the image; 2) perform image recognition on the denoised image. We know that the source of noise in images is usually caused by violent spatial scene conversion or shooting techniques or devices (such as shooting equipment with too low resolution). As a highly ill-posed problem, the denoising of images/videos usually requires a large amount of prior knowledge (such as knowledge distilling of various possible noises). For noise/blurring When the source is relatively fixed, there are 3 common denoising modes based on a priori:
1.来自UCLA的Tony陈等于1998年提出的“全变分盲反卷积”;1. Tony Chen from UCLA is equal to the "full variational blind deconvolution" proposed in 1998;
2.Levin.L等于2009年基于Tony陈的全变分盲反卷积,提出了考虑稀疏图像先验的新方法;2. Levin.L is equal to 2009 based on Tony Chen's total variational blind deconvolution, and proposed a new method considering the priori of sparse images;
3.Freeman等于2008年提出的重尾梯度先验,可以从单张图像中有效的去除由于拍摄者手抖而造成的模糊。3. Freeman is equal to the heavy-tail gradient prior proposed in 2008, which can effectively remove the blur caused by the photographer's hand shaking from a single image.
这些算法都是通过一个由粗到细的最大后验概率(MAP)框架来估计模糊核,但是这种类型算法的问题在于比较耗时,而且对低分辨率图像效果不佳。These algorithms use a coarse-to-fine maximum posterior probability (MAP) framework to estimate the blur kernel, but the problem with this type of algorithm is that it is time-consuming and does not work well for low-resolution images.
从2012年以来,随着深度学习的重新火热,有很多基于CNN的图像去模糊算法被提出来,比如西安交通大学的孙剑教授,提出了一种估计图像中各个小区域的模糊核方向的端到端的CNN架构。来自韩国首尔大学的研究人员提出了一种加入时序信息(可以对视频序列进行操作)的统一框架,可以有效的对视频/图像去模糊并进行超分辨率重建,并通过光流信息来对运动进行估计(有效去模糊)。Since 2012, with the renewed popularity of deep learning, many CNN-based image deblurring algorithms have been proposed. For example, Professor Sun Jian of Xi'an Jiaotong University proposed a method to estimate the direction of the blur kernel of each small area in the image. End-to-end CNN architecture. Researchers from Seoul University in South Korea have proposed a unified framework that adds timing information (that can operate on video sequences), which can effectively deblur video/images and perform super-resolution reconstruction, and use optical flow information to control motion Make an estimate (effective deblurring).
而对模糊来源高度异化的情况,如图1所示(有飞机的机场),因为原图本身包含的信息不足以让CNN类模型进行识别,所以相对CNN,加入时间维度的信息的模糊图片序列更容易被识别出来(基于图像的感兴趣区域随时间可以被RNN类时序模型搜集全的假设)。The situation of highly alienated fuzzy sources, as shown in Figure 1 (airports with airplanes), because the original image itself contains insufficient information for the CNN-type model to recognize, so compared to CNN, the fuzzy image sequence of time-dimensional information is added. It is easier to be identified (based on the assumption that the region of interest in the image can be collected by the RNN-like timing model over time).
虽然之前的方法在图像/视频去噪方面取得了较好的效果,有利于识别任务的进行,但是运动模糊核估计或序列建模对不规则和密集噪声都具有较高的敏感性。此外,随着图像/视频中噪声和模糊的增加,时序模型的识别的成功率也会出现显著的降低。Although the previous methods have achieved good results in image/video denoising and are conducive to the recognition task, the motion blur kernel estimation or sequence modeling is highly sensitive to irregular and dense noise. In addition, as the noise and blur in the image/video increase, the success rate of the recognition of the timing model will also be significantly reduced.
发明内容Summary of the invention
鉴于上述问题,本发明提供了一种克服上述问题或至少部分地解决了上述问题的一种多媒体数据识别方法及装置。In view of the above problems, the present invention provides a method and device for recognizing multimedia data that overcomes the above problems or at least partially solves the above problems.
根据本发明的一个方面,提供了一种多媒体数据识别方法,包括:According to one aspect of the present invention, there is provided a method for identifying multimedia data, including:
将待识别多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据和/或视频数据;Input multimedia data to be identified into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; the to-be-identified Multimedia data includes image data and/or video data;
通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。The multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.
可选地,对于任意一层ALIF网络层,神经元输出通过以下公式进行计算:Optionally, for any ALIF network layer, the neuron output is calculated by the following formula:
y t=σ(v t+δ) y t =σ(v t +δ)
其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
可选地,第t个时间步的膜电位v t通过以下公式进行计算: Optionally, the membrane potential v t at the t-th time step is calculated by the following formula:
v t=W xx t+αv t-1 v t = W x x t +αv t-1
其中,v t表示第t个时间步的膜电位,Wx表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵; Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;
若y t≥f thres,则v t'=v t-β,β表示预设参数。 If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
可选地,所述Wx的形状为:每个时间步输入的数据维度×ALIF网络层的单元数。Optionally, the shape of the Wx is: the data dimension input at each time step×the number of units of the ALIF network layer.
可选地,所述图像数据和/或视频数据为模糊图像数据和/或视频数据。Optionally, the image data and/or video data are blurred image data and/or video data.
根据本发明的另一方面,还提供了一种多媒体数据识别装置,包括:According to another aspect of the present invention, there is also provided a multimedia data recognition device, including:
数据输入模块,配置为将待识别多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据 和/或视频数据;The data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF network Layer; the multimedia data to be identified includes image data and/or video data;
数据计算模块,配置为通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。The data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
可选地,所述数据计算模块还配置为:对于任意一层ALIF网络层,通过以下公式计算神经元的输出:Optionally, the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:
y t=σ(v t+δ) y t =σ(v t +δ)
其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
可选地,所述数据计算模块还配置为:通过以下公式计算第t个时间步的膜电位v tOptionally, the data calculation module is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
v t=W xx t+αv t-1 v t = W x x t +αv t-1
其中,v t表示第t个时间步的膜电位,Wx表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵; Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;
若y t≥f thres,则v t'=v t-β,β表示预设参数。 If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
可选地,所述Wx的形状为:每个时间步输入的数据维度×ALIF网络层的单元数。Optionally, the shape of the Wx is: the data dimension input at each time step×the number of units of the ALIF network layer.
根据本发明的又一方面,还提供了一种存储设备,其中存储由计算机程序,所述计算机程序在电子设备中运行时,由所述电子设备的处理器加载并执行上述任一项所述的多媒体数据识别方法。According to another aspect of the present invention, there is also provided a storage device in which a computer program is stored. When the computer program runs in an electronic device, it is loaded and executed by the processor of the electronic device. Multimedia data recognition method.
根据本发明的又一方面,还提供了一种电子设备,包括:According to another aspect of the present invention, there is also provided an electronic device, including:
处理器,用于运行计算机程序;以及A processor for running computer programs; and
存储设备,用于存储计算机程序,所述计算机程序在所述电子设备中运行时由处理器加载并执行上述任一项所述的多媒体数据识别方法。The storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any one of the above.
本发明提出了一种融合了SNN和ANN于一身的新型算法ALIF,在权重显著小于常用时序模型RNN,LSTM和GRU的前提下,在模糊图像识别任务中能够表现出更好的识别能力和抗噪声能力,模型的鲁棒性更强,能够有效的识别出场景中是否含有目标物体。The present invention proposes a new algorithm ALIF that combines SNN and ANN. Under the premise that the weight is significantly less than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and resistance in fuzzy image recognition tasks. Noise capability, the model is more robust, and can effectively identify whether the scene contains target objects.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly, it can be implemented in accordance with the content of the description, and in order to make the above and other objectives, features and advantages of the present invention more obvious and understandable. In the following, specific embodiments of the present invention are specifically cited.
根据下文结合附图对本发明具体实施例的详细描述,本领域技术人员将会更加明了本发明的上述以及其他目的、优点和特征。Based on the following detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings, those skilled in the art will better understand the above and other objectives, advantages and features of the present invention.
附图说明Description of the drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:By reading the detailed description of the preferred embodiments below, various other advantages and benefits will become clear to those of ordinary skill in the art. The drawings are only used for the purpose of illustrating the preferred embodiments, and are not considered as a limitation to the present invention. Also, throughout the drawings, the same reference symbols are used to denote the same components. In the attached picture:
图1示出了模糊图像示意图;Figure 1 shows a schematic diagram of a blurred image;
图2示出了根据本发明实施例的神经网络结构示意图;Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention;
图3示出了根据本发明实施例的多媒体数据识别方法流程示意图;Figure 3 shows a schematic flowchart of a method for identifying multimedia data according to an embodiment of the present invention;
图4示出了神经元示意图;Figure 4 shows a schematic diagram of a neuron;
图5示出了根据本发明实施例的ALIF网络层计算示意图;Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention;
图6示出了根据本发明实施例的输入神经网络的训练图片示意图;Fig. 6 shows a schematic diagram of a training picture of an input neural network according to an embodiment of the present invention;
图7示出了根据本发明实施例的输入神经网络的测试图片示意图;Fig. 7 shows a schematic diagram of a test picture of an input neural network according to an embodiment of the present invention;
图8示出了根据本发明实施例的针对不同实现模型的神经网络的试验结果对比图;FIG. 8 shows a comparison diagram of experimental results of neural networks for different realization models according to an embodiment of the present invention;
图9示出了根据本发明实施例的多媒体数据识别装置结构示意图。Fig. 9 shows a schematic structural diagram of a multimedia data recognition device according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
本发明实施例提出了融合SNN和ANN的自适应的泄露-集成-发射(Adaptive Leaky Integrate-and-Fire,ALIF)的时序算法模型,并且用其来进行模糊图像识别。图2示出了根据本发明实施例的神经网络结构示意图,如图2所示,因为ALIF是时序模型,因此图2所示神经网络的输入是连续的模糊图像,输入该神经网络的连续图像序列(GoI,Group of Images)的时间步有2,5,10,15,20这五种情况。此外,为了比较,我们加入了RNN,LSTM等时序模型进行比较,图2中的FC是全连接层的意思。The embodiment of the present invention proposes an adaptive leak-integrate-and-fire (ALIF) timing algorithm model that integrates SNN and ANN, and uses it to perform fuzzy image recognition. Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention. As shown in Figure 2, because ALIF is a time series model, the input of the neural network shown in Figure 2 is a continuous blurred image, and the continuous image of the neural network is input The time step of the sequence (GoI, Group of Images) has five situations of 2, 5, 10, 15, and 20. In addition, for comparison, we added RNN, LSTM and other timing models for comparison. The FC in Figure 2 means a fully connected layer.
本发明实施例提供的的神经网络结构中,主要在于ALIF,它是网络中的一 层,与RNN,LSTM以及GRU等流行的深度学习时序模型的概念层次是一致的,只是在实现中更多的参考了脉冲神经网络的工作机制在生物层面更具合理性。The neural network structure provided by the embodiment of the present invention mainly lies in ALIF, which is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation The reference to the working mechanism of the pulse neural network is more reasonable at the biological level.
图3示出了根据本发明实施例的多媒体数据识别方法流程示意图,该多媒体数据可以包括模糊图像数据或模糊视频数据,如图3所示,本发明实施例提供的多媒体数据识别方法可以包括:FIG. 3 shows a schematic flow chart of a method for identifying multimedia data according to an embodiment of the present invention. The multimedia data may include fuzzy image data or fuzzy video data. As shown in FIG. 3, the method for identifying multimedia data provided by an embodiment of the present invention may include:
步骤S301,将待识别的多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据和/或视频数据;该待识别多媒体数据为模糊的图像数据或者是模糊视频数据,即为像素较低/噪声较大的多媒体数据。Step S301, input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; The multimedia data to be identified includes image data and/or video data; the multimedia data to be identified is fuzzy image data or fuzzy video data, that is, multimedia data with low pixels/large noise.
步骤S302,通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。Step S302: Perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
在脉冲神经网络模型中,参见图4可知,连在细胞膜上的分叉结构叫树突,是输入,那根长长的“尾巴”叫轴突,是输出。神经元输出的有电信号和化学信号。最主要的是沿着轴突细胞膜表面传播的一个电脉冲。树突和轴突都有大量的分支。轴突的末端通常连接到其他细胞的树突上,连接点上是一个叫“突触”的结构,一个神经元的输出通过突触传递给成千上万个下游的神经元。一个神经元有成千上万个上游神经元,积累它们的输入,产生输出。In the spiking neural network model, referring to Figure 4, we can see that the bifurcated structure connected to the cell membrane is called dendrites, which are inputs, and that long "tail" is called axons, which are outputs. Neurons output electrical and chemical signals. The most important is an electrical pulse that travels along the surface of the axon cell membrane. Both dendrites and axons have a large number of branches. The end of the axon is usually connected to the dendrites of other cells. The connection point is a structure called "synapse". The output of a neuron is transmitted to thousands of downstream neurons through the synapse. A neuron has thousands of upstream neurons, accumulating their input and producing output.
图5示出了根据本发明实施例的ALIF网络层计算示意图。如图5,ALIF网络层的计算逻辑类似于RNN,但是本实施例通过加入了随机噪声,自适应发射模块等内容,使得它更具独特性。在图5中,v t,1表示当前层为1,第t个时间步的膜电位,x t表示ALIF网络层的输入,y t表示ALIF网络层第t个时间步的神经元的输出。 Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention. As shown in Fig. 5, the calculation logic of the ALIF network layer is similar to the RNN, but this embodiment makes it more unique by adding random noise, adaptive transmission module and other content. In Figure 5, v t,1 indicates that the current layer is 1, the membrane potential of the t-th time step, x t indicates the input of the ALIF network layer, and y t indicates the output of the neuron at the t-th time step of the ALIF network layer.
其计算逻辑可以表示如下(以当前时间步为t例进行说明)。上述步骤S302中,对于任意一层ALIF网络层每个时间步的神经元输出可通过以下公式进行计算:The calculation logic can be expressed as follows (taking the current time step as an example for illustration). In the above step S302, the neuron output at each time step of any ALIF network layer can be calculated by the following formula:
y t=σ(v t+δ) y t =σ(v t +δ)
其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。其中,f thres算法是一个根据数据的发射值进行统计,自适应调整发射阈值的算法。假设有5个值要发射,分别为0.5、0.4、0.7、0.8、0.9。当前设置的f thres=0.8,那么只有0.8和0.9 能成功的发射出去(实现传递),0.5、0.4、0.7这3个值并没有被神经元发射出去。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step. Among them, the f thres algorithm is an algorithm that performs statistics based on the emission value of the data and adaptively adjusts the emission threshold. Suppose there are 5 values to be transmitted, respectively 0.5, 0.4, 0.7, 0.8, 0.9. The currently set f thres = 0.8, then only 0.8 and 0.9 can be successfully transmitted (to achieve transmission), and the three values of 0.5, 0.4, and 0.7 are not transmitted by the neuron.
进一步地,第t个时间步的膜电位v t通过以下公式进行计算: Further, the membrane potential v t at the t-th time step is calculated by the following formula:
v t=W xx t+αv t-1 v t = W x x t +αv t-1
其中,W x表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵。 Among them, W x represents the two-dimensional weight matrix that changes the input in the ALIF time series model; x t represents the input of the ALIF network layer; v t-1 represents the membrane potential at the t-1 time step; α represents the preset matrix.
Wx是时序模型中对输入进行变化的二维权重矩阵,该二维权重矩阵的矩阵形状为input_dim(每个时间步输入的数据维度(固定))×unit of ALIF(ALIF网络层的单元数,跟RNN等时序模型中的概念一致),α可以表示预设矩阵,用于替代与膜电位v t做矩阵乘法的W h(其中,h代指Hidden(隐藏)),是一个1×unit of ALIF的矩阵。相比正常的unit of ALIF×unit of ALIF的W h,尤其在unit很大的时候,节省了巨大的权重,从而能够有效的提升计算速度。 Wx is a two-dimensional weight matrix that changes the input in the time series model. The matrix shape of the two-dimensional weight matrix is input_dim (the data dimension of each time step input (fixed)) × unit of ALIF (the number of units of the ALIF network layer, Consistent with the concepts in time series models such as RNN), α can represent a preset matrix, which is used to replace W h (where h refers to Hidden (hidden)) that is matrix multiplied with membrane potential v t , which is a 1×unit of ALIF's matrix. Compared with the W h of the normal unit of ALIF×unit of ALIF, especially when the unit is very large, it saves a huge weight, which can effectively improve the calculation speed.
可选地,若y t≥f thres,则v t'=v tOptionally, if y t ≥f thres , then v t '=v t
如果激活水平y t中有大于f thres的值,那么这个位置对应的膜电位将会减去预设参数β,β的作用相当于SNN中的恢复电位,即将膜电位回复到其初始位置.值得注意的是,上述所有参数都是使用反向传播机制进行值的更新的,f thres这一发射阈值是根据当前时间步的激活水平分布情况进行自适应的调整的,本发明不做限定。 If there is a value greater than f thres in the activation level y t , then the membrane potential corresponding to this position will be subtracted from the preset parameter β, which is equivalent to the recovery potential in SNN, that is, the membrane potential returns to its original position. Note that all the above parameters are updated using the back propagation mechanism. The emission threshold of f thres is adaptively adjusted according to the activation level distribution of the current time step, which is not limited in the present invention.
本发明实施例提供的ALIF时序模型的具体计算逻辑可以如下,以时间步为t时第l层的前向传播过程为例,其中,只有f thres是标量。 The specific calculation logic of the ALIF timing model provided by the embodiment of the present invention may be as follows. Take the forward propagation process of the first layer when the time step is t as an example, where only f thres is a scalar.
输入:y t,l-1,v t-1,l Input: y t,l-1 ,v t-1,l
参数:W l-1,b l-1,α,β,δ Parameters: W l-1 , b l-1 , α, β, δ
输出:y t,l,v t,l Output: y t,l , v t,l
step1(步骤1),计算隐藏状态h t,l step1 (step 1), calculate the hidden state h t,l
x t,l=y t,l-1 x t,l =y t,l-1
h t,l=W l-1x t,l+b l-1 h t,l =W l-1 x t,l +b l-1
step2(步骤2),更新膜电位v t,l step2 (step 2), update the membrane potential v t,l
v t,l=h t,l+αv t-1,l v t,l =h t,l +αv t-1,l
step3(步骤3),获取激活后的y t,l step3 (step 3), get the activated y t,l
y t,l=σ(v t,l+δ) y t,l =σ(v t,l +δ)
step4(步骤4),通过Adaptive learning method自适应学习方法更新f thres step4 (step 4), update f thres through Adaptive learning method
f thres,l=update(f thres,l),具体实现逻辑如下: f thres,l =update(f thres,l ), the specific implementation logic is as follows:
1)设置超参p 1=85%,p 2=98%,lr thres=0.001 1) Set the hyperparameter p 1 =85%, p 2 =98%, lr thres =0.001
2)若y t,l的f thres,l<p 1或f thres,l>p 2,则继续判断: 2) If f thres,l <p 1 or f thres,l >p 2 of y t,l , continue to judge:
若f thres,l<p 1,则f thres,l=f thres,l+lr thres If f thres,l <p 1 , then f thres,l = f thres,l +lr thres
若否,则f thres,l=f thres,l-lr thres If not, then f thres,l = f thres,l -lr thres
结束判断End judgment
返回f thres,l Return f thres,l
step5(步骤5),根据y t,l和f thres,l规则化v t,l step5 (step 5), regularize v t,l according to y t,l and f thres ,l
v t,l=v t,l-β·step(y t,l-f thres,l) v t,l =v t,l -β·step(y t,l -f thres,l )
step6(步骤6),通过变化界限限制y t,l step6 (step 6), limit y t,l by changing the limit
y t,l=clip(y t,l,0,f thres,l) y t,l = clip(y t,l ,0,f thres,l )
基于上述实施例提供的方案,本发明实施例针对图6、图7所示输入神经网络的训练图片和测试图片采用不同的时序模型的神经网络结构进行了识别。如果输入数据为视频数据时,可以将视频数据理解成为离散的图片序列。如图6、图7所示,训练图片的时候是对原图进行随机旋转正负15度后的图片,而测试图片是在其基础上加上了高斯模糊(Gaussian blur)和盐噪声(salt noise),以时间步取10为例,测试图片以噪声占全部图片的30%为例,最左边的图片表示原图。Based on the solutions provided by the foregoing embodiment, the embodiment of the present invention recognizes the neural network structure using different time series models for the training pictures and test pictures of the input neural network shown in FIG. 6 and FIG. 7. If the input data is video data, the video data can be understood as a discrete picture sequence. As shown in Figure 6 and Figure 7, when training the image, the original image is randomly rotated by plus or minus 15 degrees, and the test image is based on Gaussian blur and salt noise (salt noise). noise), take a time step of 10 as an example, the test picture takes 30% of all pictures as an example, and the leftmost picture represents the original picture.
图8示出了针对不同实现模型的神经网络的试验结果对比图。针对ALIF、CNN、MLP以及ConvSNN,改变盐噪声占全图的比例后对识别准确率进行测试。如图8所示试验结果表明,基于ALIF时序模型的神经网络结构对于不同噪声比例的识别模糊图片/视频的效果相较于CNN、MLP以及ConvSNN的更好。Figure 8 shows a comparison diagram of the experimental results of neural networks for different implementation models. For ALIF, CNN, MLP and ConvSNN, test the recognition accuracy after changing the proportion of salt noise in the whole image. The experimental results shown in Figure 8 show that the neural network structure based on the ALIF time series model is better than CNN, MLP and ConvSNN in identifying fuzzy pictures/videos with different noise ratios.
经过多次试验,基于本发明实施例提供的识别方法,通过使用ALIF替代RNN,LSTM等常见时序模型,对模糊图像识别具有非常强的鲁棒性。ALIF/ConvALIF2D的结构是一个完备的、低复杂度的对时域,空域信息建模的网络层.在有噪声情况下的物体识别领域,有接近10%的提升,这在计算机视觉领域算的上是里程碑(milestone)式的成果。After many experiments, based on the recognition method provided by the embodiment of the present invention, by using ALIF to replace common time series models such as RNN and LSTM, it has very strong robustness to fuzzy image recognition. The structure of ALIF/ConvALIF2D is a complete, low-complexity network layer for modeling time and spatial information. In the field of object recognition under noisy conditions, there is a nearly 10% improvement, which is calculated in the field of computer vision The above is a milestone (milestone) achievement.
基于同一发明构思,本发明实施例还提供了一种多媒体数据识别装置,如 图9所示,多媒体数据识别装置可以包括:Based on the same inventive concept, the embodiment of the present invention also provides a multimedia data recognition device. As shown in FIG. 9, the multimedia data recognition device may include:
数据输入模块910,配置为将待识别多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据和/或视频数据;The data input module 910 is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;
数据计算模块920,配置为通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。The data calculation module 920 is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
在本发明一可选实施例中,所述数据计算模块920还配置为:对于任意一层ALIF网络层,通过以下公式计算神经元的输出:In an optional embodiment of the present invention, the data calculation module 920 is further configured to: for any ALIF network layer, use the following formula to calculate the output of the neuron:
y t=σ(v t+δ) y t =σ(v t +δ)
其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
在本发明一可选实施例中,所述数据计算模块920还配置为:通过以下公式计算第t个时间步的膜电位v tIn an optional embodiment of the present invention, the data calculation module 920 is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
v t=W xx t+αv t-1 v t = W x x t +αv t-1
其中,v t表示第t个时间步的膜电位,Wx表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵; Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;
若y t≥f thres,则v t'=v t-β,β表示预设参数。 If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
在本发明一可选实施例中,所述Wx的形状为:每个时间步输入的数据维度×ALIF网络层的单元数。In an optional embodiment of the present invention, the shape of the Wx is: the data dimension input at each time step×the number of units of the ALIF network layer.
基于同一发明构思,本发明实施例还提供了一种存储设备,其中存储由计算机程序,所述计算机程序在电子设备中运行时,由所述电子设备的处理器加载并执行上述任一实施例所述的多媒体数据识别方法。Based on the same inventive concept, an embodiment of the present invention also provides a storage device, in which a computer program is stored, and when the computer program is running in an electronic device, it is loaded by the processor of the electronic device and executes any of the above-mentioned embodiments. Said multimedia data recognition method.
基于同一发明构思,本发明实施例还提供了一种电子设备,包括:Based on the same inventive concept, an embodiment of the present invention also provides an electronic device, including:
处理器,用于运行计算机程序;以及A processor for running computer programs; and
存储设备,用于存储计算机程序,所述计算机程序在所述电子设备中运行时由处理器加载并执行上述任一实施例所述的多媒体数据识别方法。The storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any of the above embodiments.
本发明提出了一种考虑时序信息和空间信息(基于SNN和ANN方法融合)的模糊图像/视频识别方法。本发明所提出的技术方案的基础思路为:集SNN和ANN方法的优点合二为一,设计出一种新型的时序模型,从而有效的提取视频 序列中的感兴趣区域,有效的增加了噪声来源复杂,图片/视频受影响程度大的情况下的图像识别能力。除此之外,本实施例提供的模型还能够把卷积纳入进来,即类似ConvLSTM2D的形式,构造出本实施例特的ConvALIF2D,相比时序模型方法,融合了SNN和ANN于一身的新型算法ALIF,在权重显著小于常用时序模型RNN,LSTM和GRU的前提下,在模糊图像识别任务中能够表现出更好的识别能力和抗噪声能力,模型的鲁棒性更强,能够有效的识别出场景中是否含有目标物体。The present invention proposes a fuzzy image/video recognition method considering time sequence information and spatial information (based on the fusion of SNN and ANN methods). The basic idea of the technical scheme proposed by the present invention is: Integrating the advantages of SNN and ANN methods into one, designing a new type of timing model, thereby effectively extracting the region of interest in the video sequence, and effectively increasing the noise Image recognition ability in the case of complex sources and large impact on pictures/videos. In addition, the model provided in this embodiment can also incorporate convolution, that is, a form similar to ConvLSTM2D, which constructs the special ConvALIF2D of this embodiment. Compared with the time series model method, it combines a new algorithm of SNN and ANN. ALIF, under the premise that the weight is significantly smaller than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and anti-noise ability in fuzzy image recognition tasks, and the model is more robust and can effectively identify Whether there are target objects in the scene.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a lot of specific details are explained. However, it can be understood that the embodiments of the present invention can be practiced without these specific details. In some instances, well-known methods, structures and technologies are not shown in detail, so as not to obscure the understanding of this specification.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be understood that in order to simplify the present disclosure and help understand one or more of the various inventive aspects, in the above description of the exemplary embodiments of the present invention, the various features of the present invention are sometimes grouped together into a single embodiment, Figure, or its description. However, the disclosed method should not be interpreted as reflecting the intention that the claimed invention requires more features than those explicitly stated in each claim. More precisely, as reflected in the following claims, the inventive aspect lies in less than all the features of a single embodiment disclosed previously. Therefore, the claims following the specific embodiment are thus explicitly incorporated into the specific embodiment, wherein each claim itself serves as a separate embodiment of the present invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that it is possible to adaptively change the modules in the device in the embodiment and set them in one or more devices different from the embodiment. The modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments means that they are within the scope of the present invention. Within and form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。 在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate the present invention rather than limit the present invention, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses should not be constructed as a limitation to the claims. The word "comprising" does not exclude the presence of elements or steps not listed in the claims. The word "a" or "an" preceding an element does not exclude the presence of multiple such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the unit claims enumerating several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.
至此,本领域技术人员应认识到,虽然本文已详尽示出和描述了本发明的多个示例性实施例,但是,在不脱离本发明精神和范围的情况下,仍可根据本发明公开的内容直接确定或推导出符合本发明原理的许多其他变型或修改。因此,本发明的范围应被理解和认定为覆盖了所有这些其他变型或修改。So far, those skilled in the art should realize that although a number of exemplary embodiments of the present invention have been illustrated and described in detail herein, they can still be disclosed according to the present invention without departing from the spirit and scope of the present invention. The content directly determines or deduces many other variations or modifications that conform to the principles of the present invention. Therefore, the scope of the present invention should be understood and deemed to cover all these other variations or modifications.

Claims (10)

  1. 一种多媒体数据识别方法,包括:A method for identifying multimedia data, including:
    将待识别的多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据和/或视频数据;Input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers; Identify multimedia data including image data and/or video data;
    通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。The multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.
  2. 根据权利要求1所述的方法,其中,对于任意一层ALIF网络层,神经元输出通过以下公式进行计算:The method according to claim 1, wherein, for any ALIF network layer, the neuron output is calculated by the following formula:
    y t=σ(v t+δ) y t =σ(v t +δ)
    其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
  3. 根据权利要求2所述的方法,其中,第t个时间步的膜电位v t通过以下公式进行计算: The method according to claim 2, wherein the membrane potential v t at the t-th time step is calculated by the following formula:
    v t=W xx t+αv t-1 v t = W x x t +αv t-1
    其中,v t表示第t个时间步的膜电位,Wx表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵; Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;
    若y t≥f thres,则v t'=v t-β,β表示预设参数。 If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
  4. 根据权利要求3所述的方法,其中,所述Wx的形状为:每个时间步输入的数据维度×ALIF网络层的单元数。The method according to claim 3, wherein the shape of the Wx is: data dimension input at each time step×the number of units of the ALIF network layer.
  5. 根据权利要求1-4任一项所述的方法,其中,所述图像数据和/或视频数据为模糊图像数据和/或视频数据。The method according to any one of claims 1 to 4, wherein the image data and/or video data are blurred image data and/or video data.
  6. 一种多媒体数据识别装置,包括:A multimedia data recognition device includes:
    数据输入模块,配置为将待识别的多媒体数据输入预先构建的神经网络结构中;其中,所述神经网络结构包括自适应的泄露-集成-发射ALIF时序模型,所述ALIF时序模型包括多层ALIF网络层;所述待识别多媒体数据包括图像数据和/或视频数据;The data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;
    数据计算模块,配置为通过所述神经网络结构中的多层ALIF网络层对所述待识别多媒体数据进行识别计算,并输出计算结果。The data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
  7. 根据权利要求6所述的装置,其中,所述数据计算模块还配置为:对于任意一层ALIF网络层,通过以下公式计算神经元的输出:The device according to claim 6, wherein the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:
    y t=σ(v t+δ) y t =σ(v t +δ)
    其中,t表示第t个时间步,y t表示ALIF网络层在第t个时间步的神经元的输出;σ表示包含自适应调整f thres算法的激活函数;δ表示模拟大脑的随机噪声而设置的张量;v t表示第t个时间步的膜电位。 Among them, t represents the t-th time step, y t represents the output of the neuron in the ALIF network layer at the t-th time step; σ represents the activation function including the adaptive adjustment of the f thres algorithm; δ represents the setting to simulate the random noise of the brain Tensor; v t represents the membrane potential at the t-th time step.
  8. 根据权利要求7所述的装置,其中,所述数据计算模块还配置为:通过以下公式计算第t个时间步的膜电位v tThe device according to claim 7, wherein the data calculation module is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
    v t=W xx t+αv t-1 v t = W x x t +αv t-1
    其中,v t表示第t个时间步的膜电位,Wx表示ALIF时序模型中对输入进行变化的二维权重矩阵;x t表示ALIF网络层的输入;v t-1表示第t-1个时间步的膜电位;α表示预设矩阵; Among them, v t represents the membrane potential at the t-th time step, Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model; x t represents the input of the ALIF network layer; v t-1 represents the t-1 time Step membrane potential; α represents the preset matrix;
    若y t≥f thres,则v t'=v t-β,β表示预设参数。 If y t ≥ f thres , then v t '=v t- β, and β represents a preset parameter.
  9. 一种存储设备,其中存储由计算机程序,所述计算机程序在电子设备中运行时,由所述电子设备的处理器加载并执行权利要求1-4任一项所述的多媒体数据识别方法。A storage device, wherein a computer program is stored, and when the computer program is running in an electronic device, the processor of the electronic device loads and executes the multimedia data identification method according to any one of claims 1-4.
  10. 一种电子设备,包括:An electronic device including:
    处理器,用于运行计算机程序;以及A processor for running computer programs; and
    存储设备,用于存储计算机程序,所述计算机程序在所述电子设备中运行时由处理器加载并执行权利要求1-5任一项所述的多媒体数据识别方法。The storage device is configured to store a computer program, which is loaded by the processor when the computer program is running in the electronic device and executes the multimedia data identification method according to any one of claims 1-5.
PCT/CN2020/082961 2019-04-28 2020-04-02 Multimedia data identification method and device WO2020220926A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910348456.XA CN111860053B (en) 2019-04-28 2019-04-28 Multimedia data identification method and device
CN201910348456.X 2019-04-28

Publications (1)

Publication Number Publication Date
WO2020220926A1 true WO2020220926A1 (en) 2020-11-05

Family

ID=72966205

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082961 WO2020220926A1 (en) 2019-04-28 2020-04-02 Multimedia data identification method and device

Country Status (2)

Country Link
CN (1) CN111860053B (en)
WO (1) WO2020220926A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610707A (en) * 2021-07-23 2021-11-05 广东工业大学 Video super-resolution method based on time attention and cyclic feedback network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110141258A1 (en) * 2007-02-16 2011-06-16 Industrial Technology Research Institute Emotion recognition method and system thereof
EP3023911A1 (en) * 2014-11-24 2016-05-25 Samsung Electronics Co., Ltd. Method and apparatus for recognizing object, and method and apparatus for training recognizer
CN106250829A (en) * 2016-07-22 2016-12-21 中国科学院自动化研究所 Digit recognition method based on lip texture structure
CN108021927A (en) * 2017-11-07 2018-05-11 天津大学 A kind of method for extracting video fingerprints based on slow change visual signature
CN108960059A (en) * 2018-06-01 2018-12-07 众安信息技术服务有限公司 A kind of video actions recognition methods and device
CN109635791A (en) * 2019-01-28 2019-04-16 深圳大学 A kind of video evidence collecting method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110141258A1 (en) * 2007-02-16 2011-06-16 Industrial Technology Research Institute Emotion recognition method and system thereof
EP3023911A1 (en) * 2014-11-24 2016-05-25 Samsung Electronics Co., Ltd. Method and apparatus for recognizing object, and method and apparatus for training recognizer
CN106250829A (en) * 2016-07-22 2016-12-21 中国科学院自动化研究所 Digit recognition method based on lip texture structure
CN108021927A (en) * 2017-11-07 2018-05-11 天津大学 A kind of method for extracting video fingerprints based on slow change visual signature
CN108960059A (en) * 2018-06-01 2018-12-07 众安信息技术服务有限公司 A kind of video actions recognition methods and device
CN109635791A (en) * 2019-01-28 2019-04-16 深圳大学 A kind of video evidence collecting method based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610707A (en) * 2021-07-23 2021-11-05 广东工业大学 Video super-resolution method based on time attention and cyclic feedback network
CN113610707B (en) * 2021-07-23 2024-02-09 广东工业大学 Video super-resolution method based on time attention and cyclic feedback network

Also Published As

Publication number Publication date
CN111860053B (en) 2023-11-24
CN111860053A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN107506712B (en) Human behavior identification method based on 3D deep convolutional network
KR102224253B1 (en) Teacher-student framework for light weighted ensemble classifier combined with deep network and random forest and the classification method based on thereof
Liu et al. Predicting eye fixations using convolutional neural networks
Salama et al. Sheep identification using a hybrid deep learning and bayesian optimization approach
US11551076B2 (en) Event-driven temporal convolution for asynchronous pulse-modulated sampled signals
US11443514B2 (en) Recognizing minutes-long activities in videos
CN111507182B (en) Skeleton point fusion cyclic cavity convolution-based littering behavior detection method
CN112699956A (en) Neural morphology visual target classification method based on improved impulse neural network
CN112528830A (en) Lightweight CNN mask face pose classification method combined with transfer learning
Wang et al. Fire detection in infrared video surveillance based on convolutional neural network and SVM
Gao et al. An end-to-end broad learning system for event-based object classification
US20220132050A1 (en) Video processing using a spectral decomposition layer
CN112288080A (en) Pulse neural network-oriented adaptive model conversion method and system
Yang et al. RGBT tracking via cross-modality message passing
KR20210018600A (en) System for recognizing facial expression
CN115471831B (en) Image saliency detection method based on text reinforcement learning
WO2020220926A1 (en) Multimedia data identification method and device
Shi et al. Knowledge-guided semantic computing network
US20230076290A1 (en) Rounding mechanisms for post-training quantization
Wang et al. A fast interpretable adaptive meta-learning enhanced deep learning framework for diagnosis of diabetic retinopathy
Guan et al. Deep learning approaches for image classification techniques
KR102178469B1 (en) Method and system for estimation of pedestrian pose orientation using soft target training based on teacher-student framework
Zuo et al. NALA: A Nesterov Accelerated Look-Ahead optimizer for deep neural networks
Chen et al. Deep global-connected net with the generalized multi-piecewise ReLU activation in deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20799239

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20799239

Country of ref document: EP

Kind code of ref document: A1