WO2020220926A1 - Procédé et dispositif d'identification de données multimédia - Google Patents
Procédé et dispositif d'identification de données multimédia Download PDFInfo
- Publication number
- WO2020220926A1 WO2020220926A1 PCT/CN2020/082961 CN2020082961W WO2020220926A1 WO 2020220926 A1 WO2020220926 A1 WO 2020220926A1 CN 2020082961 W CN2020082961 W CN 2020082961W WO 2020220926 A1 WO2020220926 A1 WO 2020220926A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- alif
- time step
- multimedia data
- data
- network layer
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000013528 artificial neural network Methods 0.000 claims abstract description 52
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 230000003044 adaptive effect Effects 0.000 claims abstract description 16
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 claims description 34
- 239000012528 membrane Substances 0.000 claims description 28
- 210000002569 neuron Anatomy 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 8
- 210000004556 brain Anatomy 0.000 claims description 8
- 238000013135 deep learning Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 210000003050 axon Anatomy 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 210000001787 dendrite Anatomy 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 150000003839 salts Chemical class 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 210000000170 cell membrane Anatomy 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 210000000225 synapse Anatomy 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012421 spiking Methods 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present invention relates to the technical field of deep learning, in particular to a method and device for identifying multimedia data.
- Deep learning refers to a collection of algorithms that use various machine learning algorithms to solve various problems such as images and texts on a multilayer neural network. Deep learning can be classified into neural networks from a broad category, but there are many changes in specific implementation.
- the core of deep learning is feature learning, which aims to obtain hierarchical feature information through a hierarchical network, so as to solve the important problem of manually designing features in the past.
- Deep learning is a framework that includes many important algorithms such as Convolutional Neural Networks (CNN), AutoEncoder, Sparse Coding, Restricted Boltzmann Machine (RBM), and confidence Networks (Deep Belief Networks, DBN) and multi-layer feedback loop neural network (Recurrent Neural Network, RNN) and other neural networks.
- CNN Convolutional Neural Networks
- RBM Restricted Boltzmann Machine
- RNN Recurrent Neural Network
- SNN Spike Neural Network
- HH Hodgkin-Huxley
- Image recognition is a classic problem in the field of computer vision. With the rapid development of AI technology represented by deep learning, the field of image recognition has attracted the attention of many researchers. However, in the field of fuzzy image recognition, because it is difficult to evaluate the data distribution form of fuzzy and noise and simulate and model it, the existing ANN-based algorithms are difficult to achieve the recognition ability similar to human.
- the classic fuzzy image recognition process can be divided into two steps: 1) remove the noise and blur of the image; 2) perform image recognition on the denoised image.
- the source of noise in images is usually caused by violent spatial scene conversion or shooting techniques or devices (such as shooting equipment with too low resolution).
- the denoising of images/videos usually requires a large amount of prior knowledge (such as knowledge distilling of various possible noises).
- prior knowledge such as knowledge distilling of various possible noises.
- Freeman is equal to the heavy-tail gradient prior proposed in 2008, which can effectively remove the blur caused by the photographer's hand shaking from a single image.
- the motion blur kernel estimation or sequence modeling is highly sensitive to irregular and dense noise.
- the success rate of the recognition of the timing model will also be significantly reduced.
- the present invention provides a method and device for recognizing multimedia data that overcomes the above problems or at least partially solves the above problems.
- a method for identifying multimedia data including:
- Input multimedia data to be identified into a pre-built neural network structure wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers;
- the to-be-identified Multimedia data includes image data and/or video data;
- the multi-layer ALIF network layer in the neural network structure performs recognition calculation on the multimedia data to be recognized, and outputs the calculation result.
- the neuron output is calculated by the following formula:
- t represents the t-th time step
- y t represents the output of the neuron in the ALIF network layer at the t-th time step
- ⁇ represents the activation function including the adaptive adjustment of the f thres algorithm
- ⁇ represents the setting to simulate the random noise of the brain Tensor
- v t represents the membrane potential at the t-th time step.
- the membrane potential v t at the t-th time step is calculated by the following formula:
- v t represents the membrane potential at the t-th time step
- Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
- x t represents the input of the ALIF network layer
- v t-1 represents the t-1 time Step membrane potential
- ⁇ represents the preset matrix
- the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
- the image data and/or video data are blurred image data and/or video data.
- a multimedia data recognition device including:
- the data input module is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF network Layer; the multimedia data to be identified includes image data and/or video data;
- the data calculation module is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
- the data calculation module is further configured to: for any ALIF network layer, calculate the output of the neuron by the following formula:
- t represents the t-th time step
- y t represents the output of the neuron in the ALIF network layer at the t-th time step
- ⁇ represents the activation function including the adaptive adjustment of the f thres algorithm
- ⁇ represents the setting to simulate the random noise of the brain Tensor
- v t represents the membrane potential at the t-th time step.
- the data calculation module is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
- v t represents the membrane potential at the t-th time step
- Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
- x t represents the input of the ALIF network layer
- v t-1 represents the t-1 time Step membrane potential
- ⁇ represents the preset matrix
- the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
- a storage device in which a computer program is stored.
- the computer program runs in an electronic device, it is loaded and executed by the processor of the electronic device.
- Multimedia data recognition method Multimedia data recognition method.
- an electronic device including:
- a processor for running computer programs and
- the storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any one of the above.
- the present invention proposes a new algorithm ALIF that combines SNN and ANN. Under the premise that the weight is significantly less than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and resistance in fuzzy image recognition tasks. Noise capability, the model is more robust, and can effectively identify whether the scene contains target objects.
- Figure 1 shows a schematic diagram of a blurred image
- Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention
- Figure 3 shows a schematic flowchart of a method for identifying multimedia data according to an embodiment of the present invention
- Figure 4 shows a schematic diagram of a neuron
- Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention
- Fig. 6 shows a schematic diagram of a training picture of an input neural network according to an embodiment of the present invention
- Fig. 7 shows a schematic diagram of a test picture of an input neural network according to an embodiment of the present invention.
- FIG. 8 shows a comparison diagram of experimental results of neural networks for different realization models according to an embodiment of the present invention.
- Fig. 9 shows a schematic structural diagram of a multimedia data recognition device according to an embodiment of the present invention.
- the embodiment of the present invention proposes an adaptive leak-integrate-and-fire (ALIF) timing algorithm model that integrates SNN and ANN, and uses it to perform fuzzy image recognition.
- Figure 2 shows a schematic diagram of a neural network structure according to an embodiment of the present invention. As shown in Figure 2, because ALIF is a time series model, the input of the neural network shown in Figure 2 is a continuous blurred image, and the continuous image of the neural network is input.
- the time step of the sequence (GoI, Group of Images) has five situations of 2, 5, 10, 15, and 20.
- RNN, LSTM and other timing models for comparison we added RNN, LSTM and other timing models for comparison.
- the FC in Figure 2 means a fully connected layer.
- the neural network structure provided by the embodiment of the present invention mainly lies in ALIF, which is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation
- ALIF is a layer in the network, which is consistent with the conceptual level of popular deep learning time series models such as RNN, LSTM, and GRU, but more in the implementation
- the reference to the working mechanism of the pulse neural network is more reasonable at the biological level.
- FIG. 3 shows a schematic flow chart of a method for identifying multimedia data according to an embodiment of the present invention.
- the multimedia data may include fuzzy image data or fuzzy video data.
- the method for identifying multimedia data provided by an embodiment of the present invention may include:
- Step S301 input multimedia data to be recognized into a pre-built neural network structure; wherein, the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes multiple ALIF network layers;
- the multimedia data to be identified includes image data and/or video data; the multimedia data to be identified is fuzzy image data or fuzzy video data, that is, multimedia data with low pixels/large noise.
- Step S302 Perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
- Fig. 5 shows a schematic diagram of ALIF network layer calculation according to an embodiment of the present invention.
- the calculation logic of the ALIF network layer is similar to the RNN, but this embodiment makes it more unique by adding random noise, adaptive transmission module and other content.
- v t,1 indicates that the current layer is 1, the membrane potential of the t-th time step
- x t indicates the input of the ALIF network layer
- y t indicates the output of the neuron at the t-th time step of the ALIF network layer.
- the calculation logic can be expressed as follows (taking the current time step as an example for illustration).
- the neuron output at each time step of any ALIF network layer can be calculated by the following formula:
- t represents the t-th time step
- y t represents the output of the neuron in the ALIF network layer at the t-th time step
- ⁇ represents the activation function including the adaptive adjustment of the f thres algorithm
- ⁇ represents the setting to simulate the random noise of the brain Tensor
- v t represents the membrane potential at the t-th time step.
- membrane potential v t at the t-th time step is calculated by the following formula:
- W x represents the two-dimensional weight matrix that changes the input in the ALIF time series model
- x t represents the input of the ALIF network layer
- v t-1 represents the membrane potential at the t-1 time step
- ⁇ represents the preset matrix
- Wx is a two-dimensional weight matrix that changes the input in the time series model.
- the matrix shape of the two-dimensional weight matrix is input_dim (the data dimension of each time step input (fixed))
- ⁇ unit of ALIF the number of units of the ALIF network layer, Consistent with the concepts in time series models such as RNN
- ⁇ can represent a preset matrix, which is used to replace W h (where h refers to Hidden (hidden)) that is matrix multiplied with membrane potential v t , which is a 1 ⁇ unit of ALIF's matrix.
- the membrane potential corresponding to this position will be subtracted from the preset parameter ⁇ , which is equivalent to the recovery potential in SNN, that is, the membrane potential returns to its original position. Note that all the above parameters are updated using the back propagation mechanism.
- the emission threshold of f thres is adaptively adjusted according to the activation level distribution of the current time step, which is not limited in the present invention.
- the specific calculation logic of the ALIF timing model may be as follows. Take the forward propagation process of the first layer when the time step is t as an example, where only f thres is a scalar.
- step1 step 1), calculate the hidden state h t,l
- step2 update the membrane potential v t,l
- step3 (step 3), get the activated y t,l
- step4 update f thres through Adaptive learning method
- step5 step 5
- step 5 regularize v t,l according to y t,l and f thres ,l
- step6 step 6
- step 6 limit y t,l by changing the limit
- the embodiment of the present invention recognizes the neural network structure using different time series models for the training pictures and test pictures of the input neural network shown in FIG. 6 and FIG. 7.
- the input data is video data
- the video data can be understood as a discrete picture sequence.
- the original image is randomly rotated by plus or minus 15 degrees
- the test image is based on Gaussian blur and salt noise (salt noise). noise)
- the test picture takes 30% of all pictures as an example, and the leftmost picture represents the original picture.
- Figure 8 shows a comparison diagram of the experimental results of neural networks for different implementation models.
- ALIF For ALIF, CNN, MLP and ConvSNN, test the recognition accuracy after changing the proportion of salt noise in the whole image.
- the experimental results shown in Figure 8 show that the neural network structure based on the ALIF time series model is better than CNN, MLP and ConvSNN in identifying fuzzy pictures/videos with different noise ratios.
- the embodiment of the present invention also provides a multimedia data recognition device.
- the multimedia data recognition device may include:
- the data input module 910 is configured to input multimedia data to be recognized into a pre-built neural network structure; wherein the neural network structure includes an adaptive leak-integration-transmit ALIF timing model, and the ALIF timing model includes a multilayer ALIF Network layer; the multimedia data to be identified includes image data and/or video data;
- the data calculation module 920 is configured to perform recognition calculation on the multimedia data to be recognized through the multi-layer ALIF network layer in the neural network structure, and output the calculation result.
- the data calculation module 920 is further configured to: for any ALIF network layer, use the following formula to calculate the output of the neuron:
- t represents the t-th time step
- y t represents the output of the neuron in the ALIF network layer at the t-th time step
- ⁇ represents the activation function including the adaptive adjustment of the f thres algorithm
- ⁇ represents the setting to simulate the random noise of the brain Tensor
- v t represents the membrane potential at the t-th time step.
- the data calculation module 920 is further configured to calculate the membrane potential v t at the t-th time step by the following formula:
- v t represents the membrane potential at the t-th time step
- Wx represents the two-dimensional weight matrix that changes the input in the ALIF timing model
- x t represents the input of the ALIF network layer
- v t-1 represents the t-1 time Step membrane potential
- ⁇ represents the preset matrix
- the shape of the Wx is: the data dimension input at each time step ⁇ the number of units of the ALIF network layer.
- an embodiment of the present invention also provides a storage device, in which a computer program is stored, and when the computer program is running in an electronic device, it is loaded by the processor of the electronic device and executes any of the above-mentioned embodiments. Said multimedia data recognition method.
- an embodiment of the present invention also provides an electronic device, including:
- a processor for running computer programs and
- the storage device is used to store a computer program, which is loaded by the processor when running in the electronic device and executes the multimedia data identification method described in any of the above embodiments.
- the present invention proposes a fuzzy image/video recognition method considering time sequence information and spatial information (based on the fusion of SNN and ANN methods).
- the basic idea of the technical scheme proposed by the present invention is: Integrating the advantages of SNN and ANN methods into one, designing a new type of timing model, thereby effectively extracting the region of interest in the video sequence, and effectively increasing the noise Image recognition ability in the case of complex sources and large impact on pictures/videos.
- the model provided in this embodiment can also incorporate convolution, that is, a form similar to ConvLSTM2D, which constructs the special ConvALIF2D of this embodiment. Compared with the time series model method, it combines a new algorithm of SNN and ANN.
- ALIF under the premise that the weight is significantly smaller than the commonly used time series models RNN, LSTM and GRU, it can show better recognition ability and anti-noise ability in fuzzy image recognition tasks, and the model is more robust and can effectively identify Whether there are target objects in the scene.
- modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
La présente invention concerne un procédé et un dispositif d'identification de données multimédia, le procédé consistant : à entrer des données multimédia à identifier dans une structure de réseau neuronal pré-construite, la structure de réseau neuronal comportant un modèle de minutage d'intégration et de déclenchement à aspects de fuite et adaptatif (ALIF), le modèle de minutage ALIF comportant une couche de réseau ALIF multicouche, et les données multimédia à identifier comportant des données d'image et/ou des données vidéo (S301) ; à identifier et à calculer les données multimédia à identifier par l'intermédiaire de la couche de réseau ALIF multicouche dans la structure de réseau neuronal, puis à fournir un résultat de calcul (S302). Un nouvel algorithme ALIF qui intègre un SNN avec un ANN est proposé, peut présenter une meilleure capacité d'identification ainsi qu'une meilleure capacité anti-bruit dans des tâches d'identification d'image floue, le modèle est plus robuste et peut efficacement identifier si un objet cible est compris dans la scène.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910348456.XA CN111860053B (zh) | 2019-04-28 | 2019-04-28 | 一种多媒体数据识别方法及装置 |
CN201910348456.X | 2019-04-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020220926A1 true WO2020220926A1 (fr) | 2020-11-05 |
Family
ID=72966205
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/082961 WO2020220926A1 (fr) | 2019-04-28 | 2020-04-02 | Procédé et dispositif d'identification de données multimédia |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111860053B (fr) |
WO (1) | WO2020220926A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610707A (zh) * | 2021-07-23 | 2021-11-05 | 广东工业大学 | 一种基于时间注意力与循环反馈网络的视频超分辨率方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
EP3023911A1 (fr) * | 2014-11-24 | 2016-05-25 | Samsung Electronics Co., Ltd. | Procédé et appareil de reconnaissance d'objet et procédé et appareil de reconnaissance d'apprentissage |
CN106250829A (zh) * | 2016-07-22 | 2016-12-21 | 中国科学院自动化研究所 | 基于唇部纹理结构的数字识别方法 |
CN108021927A (zh) * | 2017-11-07 | 2018-05-11 | 天津大学 | 一种基于慢变视觉特征的视频指纹提取方法 |
CN108960059A (zh) * | 2018-06-01 | 2018-12-07 | 众安信息技术服务有限公司 | 一种视频动作识别方法及装置 |
CN109635791A (zh) * | 2019-01-28 | 2019-04-16 | 深圳大学 | 一种基于深度学习的视频取证方法 |
-
2019
- 2019-04-28 CN CN201910348456.XA patent/CN111860053B/zh active Active
-
2020
- 2020-04-02 WO PCT/CN2020/082961 patent/WO2020220926A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110141258A1 (en) * | 2007-02-16 | 2011-06-16 | Industrial Technology Research Institute | Emotion recognition method and system thereof |
EP3023911A1 (fr) * | 2014-11-24 | 2016-05-25 | Samsung Electronics Co., Ltd. | Procédé et appareil de reconnaissance d'objet et procédé et appareil de reconnaissance d'apprentissage |
CN106250829A (zh) * | 2016-07-22 | 2016-12-21 | 中国科学院自动化研究所 | 基于唇部纹理结构的数字识别方法 |
CN108021927A (zh) * | 2017-11-07 | 2018-05-11 | 天津大学 | 一种基于慢变视觉特征的视频指纹提取方法 |
CN108960059A (zh) * | 2018-06-01 | 2018-12-07 | 众安信息技术服务有限公司 | 一种视频动作识别方法及装置 |
CN109635791A (zh) * | 2019-01-28 | 2019-04-16 | 深圳大学 | 一种基于深度学习的视频取证方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610707A (zh) * | 2021-07-23 | 2021-11-05 | 广东工业大学 | 一种基于时间注意力与循环反馈网络的视频超分辨率方法 |
CN113610707B (zh) * | 2021-07-23 | 2024-02-09 | 广东工业大学 | 一种基于时间注意力与循环反馈网络的视频超分辨率方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111860053B (zh) | 2023-11-24 |
CN111860053A (zh) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108133188B (zh) | 一种基于运动历史图像与卷积神经网络的行为识别方法 | |
CN107506712B (zh) | 一种基于3d深度卷积网络的人类行为识别的方法 | |
KR102224253B1 (ko) | 심층 네트워크와 랜덤 포레스트가 결합된 앙상블 분류기의 경량화를 위한 교사-학생 프레임워크 및 이를 기반으로 하는 분류 방법 | |
Liu et al. | Predicting eye fixations using convolutional neural networks | |
Salama et al. | Sheep identification using a hybrid deep learning and bayesian optimization approach | |
US11551076B2 (en) | Event-driven temporal convolution for asynchronous pulse-modulated sampled signals | |
US11443514B2 (en) | Recognizing minutes-long activities in videos | |
CN111507182B (zh) | 基于骨骼点融合循环空洞卷积的乱丢垃圾行为检测方法 | |
CN112699956A (zh) | 一种基于改进脉冲神经网络的神经形态视觉目标分类方法 | |
CN112528830A (zh) | 一种结合迁移学习的轻量级cnn口罩人脸姿态分类方法 | |
Wang et al. | Fire detection in infrared video surveillance based on convolutional neural network and SVM | |
Gao et al. | An end-to-end broad learning system for event-based object classification | |
US20220132050A1 (en) | Video processing using a spectral decomposition layer | |
CN112288080A (zh) | 面向脉冲神经网络的自适应模型转化方法及系统 | |
Yang et al. | RGBT tracking via cross-modality message passing | |
KR20210018600A (ko) | 얼굴 표정 인식 시스템 | |
CN115471831B (zh) | 一种基于文本增强学习的图像显著性检测方法 | |
WO2020220926A1 (fr) | Procédé et dispositif d'identification de données multimédia | |
Shi et al. | Knowledge-guided semantic computing network | |
US20230076290A1 (en) | Rounding mechanisms for post-training quantization | |
Wang et al. | A fast interpretable adaptive meta-learning enhanced deep learning framework for diagnosis of diabetic retinopathy | |
Guan et al. | Deep learning approaches for image classification techniques | |
KR102178469B1 (ko) | 교사-학생 프레임워크 기반의 소프트 타겟 학습방법을 이용한 보행자 포즈 방향 추정 방법 및 시스템 | |
Zuo et al. | NALA: A Nesterov Accelerated Look-Ahead optimizer for deep neural networks | |
Chen et al. | Deep global-connected net with the generalized multi-piecewise ReLU activation in deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20799239 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20799239 Country of ref document: EP Kind code of ref document: A1 |