CN113378917A - Event camera target identification method based on self-attention mechanism - Google Patents

Event camera target identification method based on self-attention mechanism Download PDF

Info

Publication number
CN113378917A
CN113378917A CN202110640443.7A CN202110640443A CN113378917A CN 113378917 A CN113378917 A CN 113378917A CN 202110640443 A CN202110640443 A CN 202110640443A CN 113378917 A CN113378917 A CN 113378917A
Authority
CN
China
Prior art keywords
event camera
data
self
target
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110640443.7A
Other languages
Chinese (zh)
Other versions
CN113378917B (en
Inventor
张世雄
魏文应
李楠楠
傅弘
龙仕强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Original Assignee
Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Instritute Of Intelligent Video Audio Technology Longgang Shenzhen filed Critical Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority to CN202110640443.7A priority Critical patent/CN113378917B/en
Publication of CN113378917A publication Critical patent/CN113378917A/en
Application granted granted Critical
Publication of CN113378917B publication Critical patent/CN113378917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A method of event camera target recognition based on a self-attention mechanism, comprising the steps of: s1, initializing an event camera; s2, completing a data acquisition task by using the initialized event camera; s3, performing imaging conversion on the collected event camera data to enable the event camera data to be used for a target recognition task; s4, extracting features of the event camera data after the imaging conversion by using the trained network to obtain depth features of the target; s5, inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and S6, outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence level as a final result for outputting. The method solves the technical problems that the current event camera training data is insufficient, the traditional deep learning method is weak in overall perception capability of the event camera data, and the like.

Description

Event camera target identification method based on self-attention mechanism
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an event camera target identification method based on a self-attention mechanism.
Background
The target recognition technology has been widely applied to various fields of our daily life, and the traditional target recognition is based on the traditional RGB camera for recognition. In the past decades, deep learning techniques have become popular and have become a popular identification technique. However, as technology is continuously applied, a scheme for performing identification based on a conventional RGB camera has certain defects, for example, the conventional RGB camera acquires external world data in a series of frame pictures, a large amount of information redundancy exists between consecutive frames, and some key information is easily lost between adjacent frames due to a relatively fixed refresh frequency of an event of frame acquisition. Meanwhile, with the popularization of the traditional RGB camera, bottlenecks in memory and transmission become more and more obvious, and in the target recognition method based on deep learning, a large amount of calculation power needs to be consumed, which leads to the decrease of real-time performance of some applications.
The event camera is a different traditional RGB camera vision collection mode, and is a nerve mimicry vision sensor which is inspired and designed by human retina. It is based on an event-driven approach to capture changes in the outside world. In the event camera, there is no update method of the frame picture. When the scene of the outside world changes, the event camera will perform a series of pixel level updates, and for the part without change, it will not perform the update. The pixel of the event camera includes four parts of information which can be represented as (x, y, t, p), wherein (x, y) represents the two-dimensional coordinates of the event pixel, t represents the time stamp of the pixel, p represents the polarity of the pixel, and the polarity reflects the brightness change of the pixel and has two trends of rising and falling. Due to this updated mode of the event camera, the amount of data of the event camera is very small, while less resources are required for data storage and data processing.
The attention mechanism is a neural network model designed based on the attention of the human brain. The method can effectively extract key information in network training. The attention mechanism can be used for hierarchically distinguishing local information and global information through a certain part of dynamic attention features, and for an important part influencing a result, weight distribution is effectively carried out, so that the accuracy of the result can be improved.
Disclosure of Invention
The invention aims to provide an event camera target recognition method based on an attention mechanism, and solves the technical problems that the current event camera training data is insufficient, the traditional deep learning method is weak in overall perception capability of the event camera data, and the like.
The technical scheme of the invention is as follows:
the invention discloses a method for recognizing an event camera target based on a self-attention mechanism, which comprises the following steps of: s1, initialization: initializing the event camera, wherein different data initialization schemes are adopted for different event camera data types; s2, data acquisition: completing a data acquisition task by utilizing the initialized event camera; s3, data conversion: carrying out imaging conversion on the acquired event camera data so that the event camera data can be used for a target recognition task; s4, feature extraction: extracting features of the event camera data after the imaging conversion by using a trained network to obtain depth features of the target; s5, self-attention calculation: inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and S6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
Preferably, in the above method for object recognition of an event camera based on the self-attention mechanism, in step S1, the event camera is initialized in two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, an event camera is fixed on a moving unmanned aerial vehicle, the static targets are collected by utilizing the principle of relative motion, and the operation can also collect moving target data.
Preferably, in the above method for recognizing an event camera target based on a self-attention mechanism, in step S2, data to be detected is collected and transmitted to the data conversion module in real time.
Preferably, in the above method for identifying an event camera target based on a self-attention mechanism, in step S3, the event camera data collected in step S2 is subjected to imaging conversion, where the conversion method is to use an event t to intercept and store a pixel cross section at the same time t, each event pixel a (x, y, t, p) of the cross section has a uniform time t, the size of the pixel value is obtained by normalizing the image p, and the range of the value is 0-255, so as to obtain a sparse image data with a clear target contour.
Preferably, in the above method for recognizing an event camera target based on a self-attention mechanism, in step S4, the event camera data obtained in step S3 is input into a depth model pre-trained by a large amount of RGB images and trained by a small amount of event camera data to extract a depth feature effective for the target, and the depth feature retains rich target contour structure information.
Preferably, in the above method for identifying an event camera target based on a self-attention mechanism, in step S5, the method for calculating the self-attention mechanism is to weight the depth features first, and the method for weighting is to multiply the features by three different weight matrices, i.e., W1, W2, and W3, and then perform pairwise operation on the weighted feature matrices to obtain a result.
According to the technical scheme of the invention, the beneficial effects are as follows:
the method utilizes an attention mechanism to identify the data collected by an event camera and identify the moving target collected by the event camera; a special characteristic training and extracting mode is designed for a special data expression form of the event camera. The method can use the minimum processing cost to perform target recognition, and then provides a pre-training method for learning structural features by using the existing RGB data aiming at the problem of insufficient training data of the event camera, so that the method for training the data of the event camera effectively improves the generalization expression capability of the model; the method has the advantages that the self-attention mechanism is utilized to carry out associated learning on the trained event camera features, and finally, the effective event camera data recognition network is trained, so that the application field of the event camera is expanded.
For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following examples, taken in conjunction with the accompanying drawings, in which:
drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method of event camera target recognition based on a self-attention mechanism of the present invention; and
fig. 2 is a data diagram collected by an event camera.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments in conjunction with the accompanying drawings.
The invention discloses a method for identifying an event camera target based on a self-attention mechanism, and relates to a moving target identification technology of an event camera utilizing the self-attention mechanism. Specifically, for data collected by an event camera, a neural network with an attention mechanism is used for carrying out target detection, identification and calculation, and finally, a target object in the data is identified. According to the method, the data of the event camera to be recognized is input into a trained self-attention mechanism model, and the self-attention model can effectively perform recognition detection. The method provided by the invention is a method for intelligently identifying the targets acquired by the event camera by using a self-attention mechanism, can effectively identify various targets captured by the event camera, and can effectively improve the stability and reliability of target identification.
The principle of the method of the invention is as follows: the target recognition technology based on the self-attention mechanism is used, and the technology can be effectively applied to the target recognition of the event camera by improving the training and feature extraction mode of the technology. The improved strategy is to learn the structural characteristics of target recognition by utilizing rich RGB images for pre-training, and the structural characteristics have rich expression capacity on the object outline. Then, the learned features are retrained by a small amount of event camera data, so that the network can adapt to the data characteristics of the event cameras, and at the moment, the network has learned strong target structural features. After the effective features are learned, the target features are subjected to associated learning through an attention mechanism, finally, the similarity score of the target is obtained, and the target object belongs to the highest score.
As shown in fig. 1, the technique of the method of the present invention comprises the following steps:
s1, initialization: event cameras are initialized, with different data initialization schemes being employed for different event camera data types.
In this step, the event camera is initialized in two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, fix an event camera on a moving unmanned aerial vehicle, and collect the static targets by using the principle of relative motion, although the operation can also collect moving target data.
S2, data acquisition: and completing the data acquisition task by utilizing the initialized event camera.
After the setting of step S1, data to be detected is collected in step S2, and the data is transmitted to the data conversion module in real time.
S3, data conversion: and performing imaging conversion on the acquired event camera data, namely performing imaging operation on the event camera, so that the event camera data can be used for a target recognition task.
The event camera data collected in step S2 is converted into an image by capturing and storing a section of a pixel at the same time t using the event t. Each event pixel a (x, y, t, p) of the section has a uniform time t, the size of the pixel value obtains a value according to the normalization of the image p, the value range is (0-255), and a sparse image data with a clear target contour is obtained.
As shown in fig. 2, the present invention uses the visualization effect obtained by processing the imaging processing method of the event camera in step S3 after acquiring the data of the vehicle (object) in step S2, and the acquisition processing method and effect for other objects also follow this exemplary effect.
S4, feature extraction: and extracting the features of the event camera data after the imaging conversion by using the trained network to obtain the depth features of the target.
The event camera data obtained in step S3 is input into a depth model (i.e., a trained network) pre-trained with a large number of RGB images and trained with a small number of event camera data to extract a target valid depth feature, which retains rich target contour structure information.
S5, self-attention calculation: and inputting the extracted depth features into a self-attention mechanism model for self-attention calculation. And calculating the target type contained in the event camera data at the current moment through the calculation of the self-attention mechanism model.
The depth feature of the target obtained in step S4 is subjected to attention mechanism calculation, the calculation method is to weight the depth feature first, the weighting method is to multiply the feature by three different weight matrices, namely W1, W2 and W3, and then to perform pairwise operation on the weighted feature matrices to obtain a result.
S6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
The method utilizes the self-attention mechanism to the global information so as to solve the problem of identification based on the event camera data, designs a set of methods from the acquisition mode of the event camera data, the imaging processing mode of the event camera, the characteristic learning mode of the self-attention mechanism depth model and the training mode of a small amount of event cameras, and effectively solves the problems that the event camera cannot identify a target, the small amount of event camera data cannot be effectively trained and the like.
The foregoing description is of the preferred embodiment of the concepts and principles of operation in accordance with the invention. The above-described embodiments should not be construed as limiting the scope of the claims, and other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.

Claims (6)

1. A method for event camera target recognition based on a self-attention mechanism, comprising the steps of:
s1, initialization: initializing the event camera, wherein different data initialization schemes are adopted for different event camera data types;
s2, data acquisition: completing a data acquisition task by utilizing the initialized event camera;
s3, data conversion: carrying out imaging conversion on the acquired event camera data so that the event camera data can be used for a target recognition task;
s4, feature extraction: extracting features of the event camera data after the imaging conversion by using a trained network to obtain depth features of the target;
s5, self-attention calculation: inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and
s6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
2. The method for event camera target recognition based on the self-attention mechanism as claimed in claim 1, wherein in step S1, the event camera is initialized by two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, an event camera is fixed on a moving unmanned aerial vehicle, the static targets are collected by utilizing the principle of relative motion, and the operation can also collect moving target data.
3. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S2, the data to be detected is collected and transmitted to the data conversion module in real time.
4. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S3, the event camera data collected in step S2 is converted into image, the conversion method is to use the event t to intercept and store the pixel cross section at the same time t, each event pixel a (x, y, t, p) of the cross section has a uniform time t, the size of the pixel value is obtained by the normalization of the image p, the value range is 0-255, and a sparse image data with a clear target contour is obtained.
5. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S4, the event camera data obtained in step S3 is input into a depth model pre-trained by a large amount of RGB images and trained by a small amount of event camera data to extract the target valid depth features, and the depth features retain rich target contour structure information.
6. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S5, the calculation method of the attention mechanism is to weight the depth features first, the weighting method is to multiply the features by three different weight matrices, i.e. W1, W2 and W3, respectively, and then to operate the weighted feature matrices two by two to obtain the result, the self-attention mechanism can effectively capture the global information, because it operates two by two, and thus can process in parallel.
CN202110640443.7A 2021-06-09 2021-06-09 Event camera target recognition method based on self-attention mechanism Active CN113378917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110640443.7A CN113378917B (en) 2021-06-09 2021-06-09 Event camera target recognition method based on self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110640443.7A CN113378917B (en) 2021-06-09 2021-06-09 Event camera target recognition method based on self-attention mechanism

Publications (2)

Publication Number Publication Date
CN113378917A true CN113378917A (en) 2021-09-10
CN113378917B CN113378917B (en) 2023-06-09

Family

ID=77572905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110640443.7A Active CN113378917B (en) 2021-06-09 2021-06-09 Event camera target recognition method based on self-attention mechanism

Country Status (1)

Country Link
CN (1) CN113378917B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140656A (en) * 2022-02-07 2022-03-04 中船(浙江)海洋科技有限公司 Marine ship target identification method based on event camera

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942518A (en) * 2018-09-24 2020-03-31 苹果公司 Contextual computer-generated reality (CGR) digital assistant
CN111766939A (en) * 2019-03-15 2020-10-13 苹果公司 Attention direction on optical transmission display
CN112686928A (en) * 2021-01-07 2021-04-20 大连理工大学 Moving target visual tracking method based on multi-source information fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942518A (en) * 2018-09-24 2020-03-31 苹果公司 Contextual computer-generated reality (CGR) digital assistant
CN111766939A (en) * 2019-03-15 2020-10-13 苹果公司 Attention direction on optical transmission display
CN112686928A (en) * 2021-01-07 2021-04-20 大连理工大学 Moving target visual tracking method based on multi-source information fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J YANG ETAL.: "\"Modeling point clouds with self-attention and gumbel subset sampling\"", 《CVPR》 *
邱忠宇: ""基于动态视觉传感器的目标检测与识别算法研究"", 《信息科技辑》 *
闵永浩: ""基于注意力机制的源相机模型识别算法研究"", 《万方》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140656A (en) * 2022-02-07 2022-03-04 中船(浙江)海洋科技有限公司 Marine ship target identification method based on event camera
CN114140656B (en) * 2022-02-07 2022-07-12 中船(浙江)海洋科技有限公司 Marine ship target identification method based on event camera

Also Published As

Publication number Publication date
CN113378917B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN110135319B (en) Abnormal behavior detection method and system
CN108460356B (en) Face image automatic processing system based on monitoring system
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN109919977B (en) Video motion person tracking and identity recognition method based on time characteristics
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN103729614A (en) People recognition method and device based on video images
CN110555420B (en) Fusion model network and method based on pedestrian regional feature extraction and re-identification
CN107392131A (en) A kind of action identification method based on skeleton nodal distance
CN111582095B (en) Light-weight rapid detection method for abnormal behaviors of pedestrians
CN112016402B (en) Self-adaptive method and device for pedestrian re-recognition field based on unsupervised learning
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN111639580A (en) Gait recognition method combining feature separation model and visual angle conversion model
CN113111758A (en) SAR image ship target identification method based on pulse neural network
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN115188066A (en) Moving target detection system and method based on cooperative attention and multi-scale fusion
CN113378917B (en) Event camera target recognition method based on self-attention mechanism
CN111881818B (en) Medical action fine-grained recognition device and computer-readable storage medium
CN113901931A (en) Knowledge distillation model-based behavior recognition method for infrared and visible light videos
CN110633631B (en) Pedestrian re-identification method based on component power set and multi-scale features
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN110334703B (en) Ship detection and identification method in day and night image
CN110555406B (en) Video moving target identification method based on Haar-like characteristics and CNN matching
CN111950476A (en) Deep learning-based automatic river channel ship identification method in complex environment
CN110852214A (en) Light-weight face recognition method facing edge calculation
CN113869151A (en) Cross-view gait recognition method and system based on feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant