CN113378917A - Event camera target identification method based on self-attention mechanism - Google Patents
Event camera target identification method based on self-attention mechanism Download PDFInfo
- Publication number
- CN113378917A CN113378917A CN202110640443.7A CN202110640443A CN113378917A CN 113378917 A CN113378917 A CN 113378917A CN 202110640443 A CN202110640443 A CN 202110640443A CN 113378917 A CN113378917 A CN 113378917A
- Authority
- CN
- China
- Prior art keywords
- event camera
- data
- self
- target
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A method of event camera target recognition based on a self-attention mechanism, comprising the steps of: s1, initializing an event camera; s2, completing a data acquisition task by using the initialized event camera; s3, performing imaging conversion on the collected event camera data to enable the event camera data to be used for a target recognition task; s4, extracting features of the event camera data after the imaging conversion by using the trained network to obtain depth features of the target; s5, inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and S6, outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence level as a final result for outputting. The method solves the technical problems that the current event camera training data is insufficient, the traditional deep learning method is weak in overall perception capability of the event camera data, and the like.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an event camera target identification method based on a self-attention mechanism.
Background
The target recognition technology has been widely applied to various fields of our daily life, and the traditional target recognition is based on the traditional RGB camera for recognition. In the past decades, deep learning techniques have become popular and have become a popular identification technique. However, as technology is continuously applied, a scheme for performing identification based on a conventional RGB camera has certain defects, for example, the conventional RGB camera acquires external world data in a series of frame pictures, a large amount of information redundancy exists between consecutive frames, and some key information is easily lost between adjacent frames due to a relatively fixed refresh frequency of an event of frame acquisition. Meanwhile, with the popularization of the traditional RGB camera, bottlenecks in memory and transmission become more and more obvious, and in the target recognition method based on deep learning, a large amount of calculation power needs to be consumed, which leads to the decrease of real-time performance of some applications.
The event camera is a different traditional RGB camera vision collection mode, and is a nerve mimicry vision sensor which is inspired and designed by human retina. It is based on an event-driven approach to capture changes in the outside world. In the event camera, there is no update method of the frame picture. When the scene of the outside world changes, the event camera will perform a series of pixel level updates, and for the part without change, it will not perform the update. The pixel of the event camera includes four parts of information which can be represented as (x, y, t, p), wherein (x, y) represents the two-dimensional coordinates of the event pixel, t represents the time stamp of the pixel, p represents the polarity of the pixel, and the polarity reflects the brightness change of the pixel and has two trends of rising and falling. Due to this updated mode of the event camera, the amount of data of the event camera is very small, while less resources are required for data storage and data processing.
The attention mechanism is a neural network model designed based on the attention of the human brain. The method can effectively extract key information in network training. The attention mechanism can be used for hierarchically distinguishing local information and global information through a certain part of dynamic attention features, and for an important part influencing a result, weight distribution is effectively carried out, so that the accuracy of the result can be improved.
Disclosure of Invention
The invention aims to provide an event camera target recognition method based on an attention mechanism, and solves the technical problems that the current event camera training data is insufficient, the traditional deep learning method is weak in overall perception capability of the event camera data, and the like.
The technical scheme of the invention is as follows:
the invention discloses a method for recognizing an event camera target based on a self-attention mechanism, which comprises the following steps of: s1, initialization: initializing the event camera, wherein different data initialization schemes are adopted for different event camera data types; s2, data acquisition: completing a data acquisition task by utilizing the initialized event camera; s3, data conversion: carrying out imaging conversion on the acquired event camera data so that the event camera data can be used for a target recognition task; s4, feature extraction: extracting features of the event camera data after the imaging conversion by using a trained network to obtain depth features of the target; s5, self-attention calculation: inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and S6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
Preferably, in the above method for object recognition of an event camera based on the self-attention mechanism, in step S1, the event camera is initialized in two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, an event camera is fixed on a moving unmanned aerial vehicle, the static targets are collected by utilizing the principle of relative motion, and the operation can also collect moving target data.
Preferably, in the above method for recognizing an event camera target based on a self-attention mechanism, in step S2, data to be detected is collected and transmitted to the data conversion module in real time.
Preferably, in the above method for identifying an event camera target based on a self-attention mechanism, in step S3, the event camera data collected in step S2 is subjected to imaging conversion, where the conversion method is to use an event t to intercept and store a pixel cross section at the same time t, each event pixel a (x, y, t, p) of the cross section has a uniform time t, the size of the pixel value is obtained by normalizing the image p, and the range of the value is 0-255, so as to obtain a sparse image data with a clear target contour.
Preferably, in the above method for recognizing an event camera target based on a self-attention mechanism, in step S4, the event camera data obtained in step S3 is input into a depth model pre-trained by a large amount of RGB images and trained by a small amount of event camera data to extract a depth feature effective for the target, and the depth feature retains rich target contour structure information.
Preferably, in the above method for identifying an event camera target based on a self-attention mechanism, in step S5, the method for calculating the self-attention mechanism is to weight the depth features first, and the method for weighting is to multiply the features by three different weight matrices, i.e., W1, W2, and W3, and then perform pairwise operation on the weighted feature matrices to obtain a result.
According to the technical scheme of the invention, the beneficial effects are as follows:
the method utilizes an attention mechanism to identify the data collected by an event camera and identify the moving target collected by the event camera; a special characteristic training and extracting mode is designed for a special data expression form of the event camera. The method can use the minimum processing cost to perform target recognition, and then provides a pre-training method for learning structural features by using the existing RGB data aiming at the problem of insufficient training data of the event camera, so that the method for training the data of the event camera effectively improves the generalization expression capability of the model; the method has the advantages that the self-attention mechanism is utilized to carry out associated learning on the trained event camera features, and finally, the effective event camera data recognition network is trained, so that the application field of the event camera is expanded.
For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following examples, taken in conjunction with the accompanying drawings, in which:
drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method of event camera target recognition based on a self-attention mechanism of the present invention; and
fig. 2 is a data diagram collected by an event camera.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments in conjunction with the accompanying drawings.
The invention discloses a method for identifying an event camera target based on a self-attention mechanism, and relates to a moving target identification technology of an event camera utilizing the self-attention mechanism. Specifically, for data collected by an event camera, a neural network with an attention mechanism is used for carrying out target detection, identification and calculation, and finally, a target object in the data is identified. According to the method, the data of the event camera to be recognized is input into a trained self-attention mechanism model, and the self-attention model can effectively perform recognition detection. The method provided by the invention is a method for intelligently identifying the targets acquired by the event camera by using a self-attention mechanism, can effectively identify various targets captured by the event camera, and can effectively improve the stability and reliability of target identification.
The principle of the method of the invention is as follows: the target recognition technology based on the self-attention mechanism is used, and the technology can be effectively applied to the target recognition of the event camera by improving the training and feature extraction mode of the technology. The improved strategy is to learn the structural characteristics of target recognition by utilizing rich RGB images for pre-training, and the structural characteristics have rich expression capacity on the object outline. Then, the learned features are retrained by a small amount of event camera data, so that the network can adapt to the data characteristics of the event cameras, and at the moment, the network has learned strong target structural features. After the effective features are learned, the target features are subjected to associated learning through an attention mechanism, finally, the similarity score of the target is obtained, and the target object belongs to the highest score.
As shown in fig. 1, the technique of the method of the present invention comprises the following steps:
s1, initialization: event cameras are initialized, with different data initialization schemes being employed for different event camera data types.
In this step, the event camera is initialized in two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, fix an event camera on a moving unmanned aerial vehicle, and collect the static targets by using the principle of relative motion, although the operation can also collect moving target data.
S2, data acquisition: and completing the data acquisition task by utilizing the initialized event camera.
After the setting of step S1, data to be detected is collected in step S2, and the data is transmitted to the data conversion module in real time.
S3, data conversion: and performing imaging conversion on the acquired event camera data, namely performing imaging operation on the event camera, so that the event camera data can be used for a target recognition task.
The event camera data collected in step S2 is converted into an image by capturing and storing a section of a pixel at the same time t using the event t. Each event pixel a (x, y, t, p) of the section has a uniform time t, the size of the pixel value obtains a value according to the normalization of the image p, the value range is (0-255), and a sparse image data with a clear target contour is obtained.
As shown in fig. 2, the present invention uses the visualization effect obtained by processing the imaging processing method of the event camera in step S3 after acquiring the data of the vehicle (object) in step S2, and the acquisition processing method and effect for other objects also follow this exemplary effect.
S4, feature extraction: and extracting the features of the event camera data after the imaging conversion by using the trained network to obtain the depth features of the target.
The event camera data obtained in step S3 is input into a depth model (i.e., a trained network) pre-trained with a large number of RGB images and trained with a small number of event camera data to extract a target valid depth feature, which retains rich target contour structure information.
S5, self-attention calculation: and inputting the extracted depth features into a self-attention mechanism model for self-attention calculation. And calculating the target type contained in the event camera data at the current moment through the calculation of the self-attention mechanism model.
The depth feature of the target obtained in step S4 is subjected to attention mechanism calculation, the calculation method is to weight the depth feature first, the weighting method is to multiply the feature by three different weight matrices, namely W1, W2 and W3, and then to perform pairwise operation on the weighted feature matrices to obtain a result.
S6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
The method utilizes the self-attention mechanism to the global information so as to solve the problem of identification based on the event camera data, designs a set of methods from the acquisition mode of the event camera data, the imaging processing mode of the event camera, the characteristic learning mode of the self-attention mechanism depth model and the training mode of a small amount of event cameras, and effectively solves the problems that the event camera cannot identify a target, the small amount of event camera data cannot be effectively trained and the like.
The foregoing description is of the preferred embodiment of the concepts and principles of operation in accordance with the invention. The above-described embodiments should not be construed as limiting the scope of the claims, and other embodiments and combinations of implementations according to the inventive concept are within the scope of the invention.
Claims (6)
1. A method for event camera target recognition based on a self-attention mechanism, comprising the steps of:
s1, initialization: initializing the event camera, wherein different data initialization schemes are adopted for different event camera data types;
s2, data acquisition: completing a data acquisition task by utilizing the initialized event camera;
s3, data conversion: carrying out imaging conversion on the acquired event camera data so that the event camera data can be used for a target recognition task;
s4, feature extraction: extracting features of the event camera data after the imaging conversion by using a trained network to obtain depth features of the target;
s5, self-attention calculation: inputting the extracted depth features into a self-attention mechanism model for self-attention calculation to obtain a target type contained in the event camera data at the current moment; and
s6, outputting a result: and outputting the result of the feature subjected to the self-attention calculation, and taking the result with the highest confidence as a final result for outputting.
2. The method for event camera target recognition based on the self-attention mechanism as claimed in claim 1, wherein in step S1, the event camera is initialized by two ways: one mode is to identify a moving target, adopt a fixed camera to acquire data of the moving target, and shield a camera before the first frame is acquired so as to ensure that the initialization of an event camera is not interfered by other background information; the other mode is to collect non-moving static targets, an event camera is fixed on a moving unmanned aerial vehicle, the static targets are collected by utilizing the principle of relative motion, and the operation can also collect moving target data.
3. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S2, the data to be detected is collected and transmitted to the data conversion module in real time.
4. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S3, the event camera data collected in step S2 is converted into image, the conversion method is to use the event t to intercept and store the pixel cross section at the same time t, each event pixel a (x, y, t, p) of the cross section has a uniform time t, the size of the pixel value is obtained by the normalization of the image p, the value range is 0-255, and a sparse image data with a clear target contour is obtained.
5. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S4, the event camera data obtained in step S3 is input into a depth model pre-trained by a large amount of RGB images and trained by a small amount of event camera data to extract the target valid depth features, and the depth features retain rich target contour structure information.
6. The method for event camera target recognition based on self-attention mechanism as claimed in claim 1, wherein in step S5, the calculation method of the attention mechanism is to weight the depth features first, the weighting method is to multiply the features by three different weight matrices, i.e. W1, W2 and W3, respectively, and then to operate the weighted feature matrices two by two to obtain the result, the self-attention mechanism can effectively capture the global information, because it operates two by two, and thus can process in parallel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110640443.7A CN113378917B (en) | 2021-06-09 | 2021-06-09 | Event camera target recognition method based on self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110640443.7A CN113378917B (en) | 2021-06-09 | 2021-06-09 | Event camera target recognition method based on self-attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113378917A true CN113378917A (en) | 2021-09-10 |
CN113378917B CN113378917B (en) | 2023-06-09 |
Family
ID=77572905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110640443.7A Active CN113378917B (en) | 2021-06-09 | 2021-06-09 | Event camera target recognition method based on self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378917B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140656A (en) * | 2022-02-07 | 2022-03-04 | 中船(浙江)海洋科技有限公司 | Marine ship target identification method based on event camera |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942518A (en) * | 2018-09-24 | 2020-03-31 | 苹果公司 | Contextual computer-generated reality (CGR) digital assistant |
CN111766939A (en) * | 2019-03-15 | 2020-10-13 | 苹果公司 | Attention direction on optical transmission display |
CN112686928A (en) * | 2021-01-07 | 2021-04-20 | 大连理工大学 | Moving target visual tracking method based on multi-source information fusion |
-
2021
- 2021-06-09 CN CN202110640443.7A patent/CN113378917B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942518A (en) * | 2018-09-24 | 2020-03-31 | 苹果公司 | Contextual computer-generated reality (CGR) digital assistant |
CN111766939A (en) * | 2019-03-15 | 2020-10-13 | 苹果公司 | Attention direction on optical transmission display |
CN112686928A (en) * | 2021-01-07 | 2021-04-20 | 大连理工大学 | Moving target visual tracking method based on multi-source information fusion |
Non-Patent Citations (3)
Title |
---|
J YANG ETAL.: "\"Modeling point clouds with self-attention and gumbel subset sampling\"", 《CVPR》 * |
邱忠宇: ""基于动态视觉传感器的目标检测与识别算法研究"", 《信息科技辑》 * |
闵永浩: ""基于注意力机制的源相机模型识别算法研究"", 《万方》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114140656A (en) * | 2022-02-07 | 2022-03-04 | 中船(浙江)海洋科技有限公司 | Marine ship target identification method based on event camera |
CN114140656B (en) * | 2022-02-07 | 2022-07-12 | 中船(浙江)海洋科技有限公司 | Marine ship target identification method based on event camera |
Also Published As
Publication number | Publication date |
---|---|
CN113378917B (en) | 2023-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135319B (en) | Abnormal behavior detection method and system | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN111460968B (en) | Unmanned aerial vehicle identification and tracking method and device based on video | |
CN109919977B (en) | Video motion person tracking and identity recognition method based on time characteristics | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN103729614A (en) | People recognition method and device based on video images | |
CN110555420B (en) | Fusion model network and method based on pedestrian regional feature extraction and re-identification | |
CN107392131A (en) | A kind of action identification method based on skeleton nodal distance | |
CN111582095B (en) | Light-weight rapid detection method for abnormal behaviors of pedestrians | |
CN112016402B (en) | Self-adaptive method and device for pedestrian re-recognition field based on unsupervised learning | |
CN110390308B (en) | Video behavior identification method based on space-time confrontation generation network | |
CN111639580A (en) | Gait recognition method combining feature separation model and visual angle conversion model | |
CN113111758A (en) | SAR image ship target identification method based on pulse neural network | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
CN115188066A (en) | Moving target detection system and method based on cooperative attention and multi-scale fusion | |
CN113378917B (en) | Event camera target recognition method based on self-attention mechanism | |
CN111881818B (en) | Medical action fine-grained recognition device and computer-readable storage medium | |
CN113901931A (en) | Knowledge distillation model-based behavior recognition method for infrared and visible light videos | |
CN110633631B (en) | Pedestrian re-identification method based on component power set and multi-scale features | |
CN112487926A (en) | Scenic spot feeding behavior identification method based on space-time diagram convolutional network | |
CN110334703B (en) | Ship detection and identification method in day and night image | |
CN110555406B (en) | Video moving target identification method based on Haar-like characteristics and CNN matching | |
CN111950476A (en) | Deep learning-based automatic river channel ship identification method in complex environment | |
CN110852214A (en) | Light-weight face recognition method facing edge calculation | |
CN113869151A (en) | Cross-view gait recognition method and system based on feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |