CN113221852A - Target identification method and device - Google Patents

Target identification method and device Download PDF

Info

Publication number
CN113221852A
CN113221852A CN202110645394.6A CN202110645394A CN113221852A CN 113221852 A CN113221852 A CN 113221852A CN 202110645394 A CN202110645394 A CN 202110645394A CN 113221852 A CN113221852 A CN 113221852A
Authority
CN
China
Prior art keywords
detection data
source
source detection
data
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110645394.6A
Other languages
Chinese (zh)
Other versions
CN113221852B (en
Inventor
吕亚飞
张筱晗
熊伟
崔亚奇
姚立波
黄猛
王雅芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unit 91977 Of Pla
Original Assignee
Unit 91977 Of Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unit 91977 Of Pla filed Critical Unit 91977 Of Pla
Priority to CN202110645394.6A priority Critical patent/CN113221852B/en
Publication of CN113221852A publication Critical patent/CN113221852A/en
Application granted granted Critical
Publication of CN113221852B publication Critical patent/CN113221852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target identification method and a target identification device, belongs to the technical field of target identification, and mainly solves the problems that multi-source fusion identification is difficult to perform and the accuracy of fusion identification is low in the prior art. According to the method, a multi-source attention fusion module is constructed, feature representation vectors of multi-source detection data are fused according to mutual similarity and importance degree, fusion feature representation of the multi-source detection data is obtained, the importance degree of the multi-source information feature representation can be corrected, important features are enhanced, non-important features are weakened, the feature representation capability of the fused features is improved, and accuracy of multi-source fusion recognition is improved.

Description

Target identification method and device
Technical Field
The present invention relates to the field of target identification technologies, and in particular, to a target identification method and apparatus.
Background
Target identification is always a research hotspot in the field of data processing, and aims to obtain the significance characteristics of a target by extracting the characteristics of data information and realize the discrimination of target identity information; with the development of multi-sensors in all fields such as shore, sea, air and the like, how to fully utilize the complementary advantages among multi-source information to realize the fusion recognition of the multi-source detection information is a key for improving the recognition accuracy of a detection target, and further accurate situation judgment and behavior prediction can be formed.
In the research of target identification, the research on multi-source fusion identification is less, which is mainly because the heterogeneity and heterogeneity among multi-source information form a barrier of multi-source information fusion, and the fusion method can be generally divided into three types, namely data layer fusion, feature layer fusion and decision layer fusion according to the sequence stage of the multi-source fusion. Wherein, the data layer fusion becomes very difficult due to the heterogeneity among multi-source detection data; the processes of decision layer fusion and feature extraction are mutually independent, and only the recognition results of all multi-source information are subjected to fusion processing, so that the accuracy of target recognition by the decision layer fusion is improved to a limited extent. With the development of artificial intelligence technologies such as deep learning in recent years, the feature extraction capability based on a deep neural network is continuously improved, so that feature layer fusion among multi-source information becomes possible, however, the current feature layer fusion method mainly performs simple fusion on the extracted multi-source information in a manner of directly connecting, adding or multiplying the extracted multi-source information in a feature layer, although the characterization capability of fusion features is improved to a certain extent, the relationship among the multi-source information is lack of utilization, the lack of discussion on the importance degree among the multi-source information features is still needed to be improved, and the accuracy of fusion identification is still needed to be improved.
Disclosure of Invention
In view of this, the present invention provides a target identification method and apparatus, and mainly aims to solve the problems in the prior art that multi-source fusion identification is difficult to perform and the accuracy of fusion identification is not high.
According to an aspect of the present invention, there is provided an object recognition method, including the steps of: step 1: collecting multi-source detection data of different sensors on the same target; step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion takes the feature expression vector of each multi-source detection data as input, and outputs multi-source fusion features; and step 3: collecting the multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity class labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; and 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting the probability distribution of a target recognition result.
As a further improvement of the present invention, the feature extraction network specifically includes: and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.
As a further improvement of the invention, the specific steps of the multi-source attention fusion are as follows: the first step is as follows: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection dataNKey vector kNVector of sum values vN: the second step is that: using a query vector q for each of said multi-source probe dataNKey vector k corresponding to each of the multi-source detection dataNMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data
Figure BDA0003108969770000021
The attention moment array
Figure BDA0003108969770000022
The value is a measure of the degree of importance between the multi-source detection data; the third step: will notice the moment matrix
Figure BDA0003108969770000023
Value vector v corresponding to each multi-source detection dataNMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection datak×N
Figure BDA0003108969770000024
The fourth step: adding the corrected feature representation B in a column mode to obtain a final fused feature representation ffuse∈dk
Figure BDA0003108969770000025
As a further improvement of the present invention, the training of the target recognition deep neural network model specifically includes: in the fusion feature representation ffuse∈dkOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model; the cross entropy loss function to train y in the sample set DiThe value is true, and the output of the whole network is taken as a predicted value:
Figure BDA0003108969770000031
in the formula, p (f)fuse) Representing the output of the deep neural network model, i.e. the probability distribution of the target recognition result for each input multi-source detection data, q (y)i) Real label y for representing target identity categoryiProbability distribution of (2).
As a further refinement of the present invention, the multi-source detection data types include image, text, speech, and location level data.
According to another aspect of the present invention, there is provided an object recognition apparatus including: a data acquisition module: configured to collect multi-source detection data of different sensors for the same target; a model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion submodule takes a feature representation vector of each multi-source detection data as input and outputs multi-source fusion features; a model training module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is configured to collect the multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; a model application module: and the probability distribution of the target recognition result is output after the multi-source detection data is input into the trained target recognition deep neural network model after being preprocessed.
As a further improvement of the present invention, the feature extraction network specifically includes: and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.
As a further improvement of the present invention, the multi-source attention fusion submodule specifically includes: a vector generation unit: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection dataNKey vector kNVector of sum values vN(ii) a Note that the moment array unit: using a query vector q for each of said multi-source probe dataNKey vector k corresponding to each of the multi-source detection dataNMultiplying to obtain a similarity measure A between any two multi-source detection data, and carrying out similarity measurement AObtaining attention moment array among any multi-source detection data after row normalization calculation
Figure BDA0003108969770000032
The attention moment array
Figure BDA0003108969770000041
The value is a measure of the degree of importance between the multi-source detection data; a correction unit: will notice the moment matrix
Figure BDA0003108969770000042
Value vector v corresponding to each multi-source detection dataNMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection datak×N
Figure BDA0003108969770000043
A fusion unit: adding the corrected feature representation B in a column mode to obtain a final fused feature representation ffuse∈dk
Figure BDA0003108969770000044
As a further improvement of the present invention, the training of the target recognition deep neural network model specifically includes: in the fusion feature representation ffuse∈dkOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model; the cross entropy loss function to train y in the sample set DiThe value is true, and the output of the whole network is taken as a predicted value:
Figure BDA0003108969770000045
in the formula, p (f)fuse) Representing the output of the deep neural network model, i.e. the probability distribution of the target recognition result for each input multi-source detection data, q (y)i) Real label y for representing target identity categoryiProbability distribution of (2).
As a further refinement of the present invention, the multi-source detection data types include image, text, speech, and location level data.
By the technical scheme, the beneficial effects provided by the invention are as follows:
(1) collecting multi-source detection data of different sensors on the same target, wherein the multi-source detection data comprises image, text, voice and position level data; the data sources are richer, the data sources can complement each other, and the limitation that when single-type sensor data are used, the identification result completely depends on the quality of the data is overcome.
(2) The method comprises the steps of constructing a multi-source attention fusion module, fusing feature representation vectors of multi-source detection data according to mutual similarity and importance degree to obtain fusion feature representation of the multi-source detection data, correcting the importance degree among the multi-source information feature representations to achieve the purposes of enhancing important features and weakening non-important features, improving the representation capability of the fused features and further improving the accuracy of multi-source fusion recognition.
(3) The method is characterized in that a feature extraction network of various data and a multi-source attention fusion module are combined to form a target recognition deep neural network, a training data set is constructed by using multi-source detection data to train the whole network, the trained target recognition deep neural network can perform target recognition on the multi-source detection data in real time, and compared with a network which is trained by means of feature extraction and feature fusion separately, the recognition efficiency and accuracy are improved to a certain extent.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a schematic diagram of a target recognition deep neural network model in a target recognition method based on multi-source data fusion provided by an embodiment of the present invention;
fig. 2 shows a schematic diagram of a multi-source attention fusion module in a target identification method based on multi-source data fusion provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The method mainly solves the problems that in the prior art, multi-source fusion recognition is difficult to perform and the fusion recognition accuracy is low due to the fact that fusion is performed on multi-source information features in a simple linear superposition mode, the relationship among the multi-source information is not utilized, and the importance degree among the multi-source information features is not distinguished.
According to the invention, a multi-source attention fusion module is designed by utilizing the self-attention thought of a Transformer, and the importance degree among the multi-source information feature representations is corrected, so that the purposes of enhancing important features and weakening non-important features are realized, the characterization capability of the fused features is improved, and the accuracy of multi-source fusion recognition is further improved.
Example 1
The technical scheme of the method comprises the following steps:
step 1: collecting multi-source detection data of different sensors on the same target;
multi-source refers to data obtained from multiple different types of sensors, including but not limited to satellites, drones, radars, AIS, recording equipment, etc.; the data types acquired by different types of sensors are different, for example, the data type acquired by a satellite is image data, a radar can acquire position data, and a recording device can acquire voice data. The multi-source detection data types include image, text, speech, and location level data.
Step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion;
fig. 1 shows a schematic diagram of a deep neural network model for target recognition in a target recognition method based on multi-source data fusion provided in an embodiment of the present invention, and as shown in fig. 1, a feature extraction network module is used for extracting feature representation vectors of each multi-source detection data; the multi-source attention fusion module takes the feature expression vector of each multi-source detection data as input and outputs multi-source fusion features;
step 2.1: the feature extraction network module constructs a feature extraction network of each multi-source detection data to extract a feature expression vector of each multi-source detection data.
Step 2.1.1 the feature extraction network module includes the use of two broad classes of deep neural networks: extracting feature expression vectors of the image data by utilizing the representation capability of the convolutional neural network on the image data; extracting feature expression vectors of sequence information data by utilizing the representation capability of a recurrent neural network on the sequence information data such as texts, voices and positions; the feature expression vector F of the multi-source heterogeneous data is a high-dimensional vector represented by the last full-connection layer of the convolutional neural network and the cyclic neural network:
F={(f1,f2,f3,...fN),fN∈dK},
wherein k represents the dimension of each feature vector; n denotes the total number of data entered.
The convolutional neural network used in the present embodiment includes: VGG, ResNet, SeNet, ShuffleNet, GoogleNet; any one of the convolutional neural networks can be selected in practical application.
The recurrent neural network used in this embodiment includes: RNN or GRU; any kind of recurrent neural network can be selected in practical application.
Step 2.1.2: respectively connecting a full-connection layer and an activation function on the basis of the high-level feature representation F extracted in the step 2.1.1, performing one-step nonlinear processing on the obtained high-level feature representation, wherein the activation function is a relu function, and formulas of the processing process and the relu function are respectively shown in formulas (1) and (2), so as to obtain a feature representation vector F' { (F) of each multi-source detection data1′,f2′,f3′,...fN′),fN′∈dK};
fN′=relu(FC(fN))=relu(Wk×k·fN+b) (1)
Figure BDA0003108969770000071
Wherein FC represents the full link layer, Wk×kAnd b are a matrix and a vector having dimensions (k × k) and (k, 1), respectively.
Step 2.2: and constructing a multi-source attention fusion module, and fusing the feature representation vectors of the multi-source detection data according to the similarity and the importance degree of each other to obtain the fusion feature representation of the multi-source detection data.
And the multi-source attention fusion module takes the feature representation vector F' of each multi-source detection data obtained in the step 2.1.2 as input and takes the multi-source fusion feature as output.
The multi-source attention fusion module corrects the importance degree among the multi-source information feature representations to achieve the purposes of enhancing important features and weakening non-important features and improve the representation capability of the fused features.
The importance degree of each data source is closely related to the similarity among the data sources, and the method corrects the weight of each data source according to the similarity. For similarity comparison of each data source, three comparison variables are defined for each data source respectively on the basis of the original feature expression vector of each data source: query variables, key variables, and value variables; the query variable and the key variable are used for comparing similarity of the data sources, similarity measurement between the data sources is obtained through the query variable and the key variable between the data sources, and the obtained similarity measurement is used for correcting value variables of the data sources, so that corrected feature representation of the data sources is obtained. Fig. 2 shows a schematic diagram of a multi-source attention fusion module in a target identification method based on multi-source data fusion according to an embodiment of the present invention, taking feature fusion of two data sources as an example; as shown in fig. 2, the specific implementation is as follows:
step 2.2.1: predefining three trainable matrix variables, each being Wq,Wk,Wv∈dk×kInitializing the three matrix variables in a random initialization mode, wherein the values of the matrix variables can be updated along with the training of the network;
step 2.2.2: multiplying the matrix variables predefined in the step 2.2.1 by the feature expression vector F' respectively to obtain 3 sub-variable matrixes: query matrix Q ∈ dk×NThe key matrix K e dk×NThe sum matrix V ∈ dk×NAs shown in the following formula,
Q=(q1,q2,...,qN)=Wq×F′=Wq×(f1′,f2′,f3′,...fN′) (3)
K=(k1,k2,...,kN)=Wk×F′=Wk×(f1′,f2′,f3′,...fN′) (4)
V=(v1,v2,...,vN)=Wv×F′=Wv×(f1′,f2′,f3′,...fN′) (5)
feature representation f for each data sourceNIn other words, the query vector q is obtained by the equations (3), (4) and (5)NKey vector kNVector of sum values vN
Step 2.2.3: to obtain inputBy the similarity and importance degree between the data sources of each query vector qNKey vectors k for respective data sourcesNMultiplying to obtain a similarity measurement A between any two data sources, mapping the similarity measurement value between the data sources to a range from 0 to 1 through a softmax function on the basis of the A, and obtaining an attention moment array between any two data sources when the sum of all the measurement values is 1 as shown in a formula (6)
Figure BDA0003108969770000081
Figure BDA0003108969770000082
Wherein is divided by
Figure BDA0003108969770000083
The normalization effect is realized for the convenience of calculation;
step 2.2.4: vector of values vNAs a representation of the vector characteristics of each data source, attention moment matrix
Figure BDA0003108969770000084
The magnitude of the value is taken as a measure of the degree of importance between the data sources, and attention is paid to the moment matrix
Figure BDA0003108969770000085
Value vector v corresponding to each data sourceNRespectively multiplying, namely modifying the characteristic representation of each data source by utilizing the similarity measurement between the data sources to obtain the final characteristic representation of each data source after modification, wherein the final characteristic representation belongs to the field of data sourcesk×NAs shown in the formula (7),
Figure BDA0003108969770000086
step 2.2.5: adding the obtained corrected feature representation B in a row mode to obtain a final fused feature representation ffuse∈dkSuch as formula(8) As shown in the drawings, the above-described,
Figure BDA0003108969770000087
and step 3: collecting multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity category labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;
step 31: carrying out data cleaning and preprocessing, wherein the data cleaning comprises denoising, missing value supplement and abnormal value elimination on each multi-source detection data, and the preprocessing comprises image correction, image enhancement, data slicing and data standardization on the multi-source data;
step 32: constructing a multi-source fusion recognition training sample set D
D={(xi 1,xi 2,...xi m,yi) I ∈ (0, n) }, where xi mIndicating that the mth data source is to the target yiM denotes the number of types of data sources, yiIdentity category labels representing the targets corresponding to the m data sources, i represents the size of the category number of the data sets, and n represents the total number of the category of the data sets;
step 33: the multi-source fusion feature representation f extracted in step 2fuse∈dkOn the basis, a cross entropy loss function is used as a constraint, and a target recognition deep neural network model is trained and learned on the multi-source fusion recognition training data set constructed in the step 32;
cross entropy loss function to train y in sample set DiThe truth value is true, the output of the whole network is taken as a predicted value, as shown in formula (4),
Figure BDA0003108969770000091
p(ffuse) Representing the output of the entire network, i.e. the probability distribution of the predictions for each input data source, q (y)i) Label y representing realityiA probability distribution of (a);
the training and learning of the target recognition deep neural network model is to perform end-to-end training of the target recognition deep neural network model on a computer configured with a GPU; randomly selecting 90% of data in the training sample set D as a training set, and the rest as a test set. In the training process, randomly read multi-source data are input into a target recognition deep neural network model, the whole network is trained by adopting a random gradient descent method, the size of the data batch read in the training process can be selected to be 2, 4, 8, 16, 32 and 64 according to the calculation capacity of a GPU, the whole data set is trained and iterated for 100 cycles, and the learning rate is set to be 1 e-4.
And 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting probability distribution of a target recognition result of the multi-source detection data.
The trained network model can be used for real-time fusion recognition of multi-source detection data, detected multi-source information is input into the trained network after being preprocessed, and probability distribution of recognition results of the multi-source information can be output.
Example 2
Further, as an implementation of the method shown in the above embodiment, another embodiment of the present invention further provides an object recognition apparatus. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. In the apparatus of this embodiment, there are the following modules:
1. a data acquisition module: configured to collect multi-source detection data of different sensors for the same target; this module corresponds to step 1 in example 1.
2. A model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting feature representation vectors F' of all multi-source detection data; the multi-source attention fusion submodule takes the feature expression vector of each multi-source detection data as input and outputs multi-source fusion features; this module corresponds to step 2 in example 1.
The feature extraction network specifically comprises: and extracting image class characteristics from the multi-source heterogeneous data with semantic classes marked as image classes by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with semantic classes marked as sequence classes by using a cyclic neural network.
The multi-source attention fusion submodule specifically comprises:
a vector generation unit: generating a query vector q of each multi-source detection data based on the feature expression vector F' of each multi-source detection dataNKey vector kNVector of sum values vN
Note that the moment array unit: query vector q with multiple source probe dataNKey vector k corresponding to each multi-source detection dataNMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data
Figure BDA0003108969770000101
Attention moment array
Figure BDA0003108969770000102
The value is a measure of the degree of importance between the multi-source detection data;
a correction unit: will notice the moment matrix
Figure BDA0003108969770000103
Value vector v corresponding to each multi-source detection dataNMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection datak×N
Figure BDA0003108969770000104
A fusion unit: adding the corrected feature representation B in a column mode to obtain a final fused feature representation ffuse∈dk
Figure BDA0003108969770000105
3. A model training module: the system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is configured to acquire multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; this module corresponds to step 3 in example 1.
4. A model application module: the method is configured to input the pre-processed multi-source detection data into a trained target recognition deep neural network model and output the probability distribution of a target recognition result. This module corresponds to step 4 in example 1.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Claims (10)

1. A method of object recognition, comprising the steps of:
step 1: collecting multi-source detection data of different sensors on the same target;
step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion takes the feature expression vector of each multi-source detection data as input, and outputs multi-source fusion features;
and step 3: collecting the multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity class labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;
and 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting the probability distribution of a target recognition result.
2. The object recognition method according to claim 1, wherein the feature extraction network is specifically:
and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.
3. The target identification method of claim 2, wherein the multi-source attention fusion comprises the following specific steps:
the first step is as follows: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection dataNKey vector kNVector of sum values vN
The second step is that: using a query vector q for each of said multi-source probe dataNKey vector k corresponding to each of the multi-source detection dataNMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data
Figure FDA0003108969760000011
The attention moment array
Figure FDA0003108969760000012
The value is a measure of the degree of importance between the multi-source detection data;
the third step: will notice the moment matrix
Figure FDA0003108969760000013
Value vector v corresponding to each multi-source detection dataNMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection datak×N
Figure FDA0003108969760000021
The fourth step: adding the corrected feature representation B in a column mode to obtain a final fused feature representation ffuse∈dk
Figure FDA0003108969760000022
4. The target recognition method of claim 3, wherein the training of the target recognition deep neural network model specifically comprises:
in the fusion feature representation ffuse∈dkOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model;
the cross entropy loss function to train y in the sample set DiThe value is true, and the output of the whole network is taken as a predicted value:
Figure FDA0003108969760000023
in the formula, p (f)fuse) Representing an output of the target recognition deep neural network modelI.e. the probability distribution of the object recognition result for each input multi-source detection data, q (y)i) Real label y for representing target identity categoryiProbability distribution of (2).
5. The object recognition method of claim 1, wherein the multi-source detection data types include image, text, speech, and location level data.
6. An object recognition apparatus, comprising:
a data acquisition module: configured to collect multi-source detection data of different sensors for the same target;
a model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion submodule takes a feature representation vector of each multi-source detection data as input and outputs multi-source fusion features;
a model training module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is configured to collect the multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;
a model application module: and the probability distribution of the target recognition result is output after the multi-source detection data is input into the trained target recognition deep neural network model after being preprocessed.
7. The object recognition device of claim 6, wherein the feature extraction network is specifically:
and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.
8. The object recognition device of claim 7, wherein the multi-source attention fusion submodule is specifically:
a vector generation unit: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection dataNKey vector kNVector of sum values vN
Note that the moment array unit: using a query vector q for each of said multi-source probe dataNKey vector k corresponding to each of the multi-source detection dataNMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data
Figure FDA0003108969760000031
The attention moment array
Figure FDA0003108969760000032
The value is a measure of the degree of importance between the multi-source detection data;
a correction unit: will notice the moment matrix
Figure FDA0003108969760000033
Value vector v corresponding to each multi-source detection dataNMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection datak×N
Figure FDA0003108969760000034
A fusion unit: adding the corrected feature representation B in a column mode to obtain a final fused feature representation ffuse∈dk
Figure FDA0003108969760000035
9. The object recognition device of claim 8, wherein the training of the object recognition deep neural network model is specifically:
in the fusion feature representation ffuse∈dkOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model;
the cross entropy loss function to train y in the sample set DiThe value is true, and the output of the whole network is taken as a predicted value:
Figure FDA0003108969760000041
in the formula, p (f)fuse) Representing the output of the deep neural network model, i.e. the probability distribution of the target recognition result for each input multi-source detection data, q (y)i) Real label y for representing target identity categoryiProbability distribution of (2).
10. The object recognition device of claim 6, wherein the multi-source detection data types include image, text, voice, and location level data.
CN202110645394.6A 2021-06-09 2021-06-09 Target identification method and device Active CN113221852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110645394.6A CN113221852B (en) 2021-06-09 2021-06-09 Target identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110645394.6A CN113221852B (en) 2021-06-09 2021-06-09 Target identification method and device

Publications (2)

Publication Number Publication Date
CN113221852A true CN113221852A (en) 2021-08-06
CN113221852B CN113221852B (en) 2021-12-31

Family

ID=77083507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110645394.6A Active CN113221852B (en) 2021-06-09 2021-06-09 Target identification method and device

Country Status (1)

Country Link
CN (1) CN113221852B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778132A (en) * 2021-09-26 2021-12-10 大连海事大学 Integrated parallel control platform for sea-air collaborative heterogeneous unmanned system
CN115496976A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Visual processing method, device, equipment and medium for multi-source heterogeneous data fusion
CN115496975A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN116956212A (en) * 2023-06-27 2023-10-27 四川九洲视讯科技有限责任公司 Multi-source visual information feature recognition and extraction method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102017206123A1 (en) * 2017-04-10 2018-10-11 Robert Bosch Gmbh Method and device for merging data from various sensors of a vehicle as part of an object recognition
CN110110765A (en) * 2019-04-23 2019-08-09 四川九洲电器集团有限责任公司 A kind of multisource data fusion target identification method based on deep learning
CN110873879A (en) * 2018-08-30 2020-03-10 沈阳航空航天大学 Device and method for deep fusion of characteristics of multi-source heterogeneous sensor
CN111860351A (en) * 2020-07-23 2020-10-30 中国石油大学(华东) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
CN112434745A (en) * 2020-11-27 2021-03-02 西安电子科技大学 Occlusion target detection and identification method based on multi-source cognitive fusion
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102017206123A1 (en) * 2017-04-10 2018-10-11 Robert Bosch Gmbh Method and device for merging data from various sensors of a vehicle as part of an object recognition
CN110873879A (en) * 2018-08-30 2020-03-10 沈阳航空航天大学 Device and method for deep fusion of characteristics of multi-source heterogeneous sensor
CN110110765A (en) * 2019-04-23 2019-08-09 四川九洲电器集团有限责任公司 A kind of multisource data fusion target identification method based on deep learning
CN111860351A (en) * 2020-07-23 2020-10-30 中国石油大学(华东) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion
CN112434745A (en) * 2020-11-27 2021-03-02 西安电子科技大学 Occlusion target detection and identification method based on multi-source cognitive fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FELIX NOBIS等: "A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection", 《2019 SENSOR DATA FUSION: TRENDS, SOLUTIONS, APPLICATIONS 》 *
李朝等: "基于注意力的毫米波-激光雷达融合目标检测", 《计算机应用》 *
谭志明等: "《健康医疗大数据与人工智能》", 31 March 2019, 华南理工大学出版社 *
阮敬等: "《python数据分析基础》", 31 August 2018, 中国统计出版社 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778132A (en) * 2021-09-26 2021-12-10 大连海事大学 Integrated parallel control platform for sea-air collaborative heterogeneous unmanned system
CN113778132B (en) * 2021-09-26 2023-12-12 大连海事大学 Integrated parallel control platform for sea-air collaborative heterogeneous unmanned system
CN115496976A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Visual processing method, device, equipment and medium for multi-source heterogeneous data fusion
CN115496975A (en) * 2022-08-29 2022-12-20 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN115496976B (en) * 2022-08-29 2023-08-11 锋睿领创(珠海)科技有限公司 Visual processing method, device, equipment and medium for multi-source heterogeneous data fusion
CN115496975B (en) * 2022-08-29 2023-08-18 锋睿领创(珠海)科技有限公司 Auxiliary weighted data fusion method, device, equipment and storage medium
CN116956212A (en) * 2023-06-27 2023-10-27 四川九洲视讯科技有限责任公司 Multi-source visual information feature recognition and extraction method

Also Published As

Publication number Publication date
CN113221852B (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN113221852B (en) Target identification method and device
Lin et al. Transfer learning based traffic sign recognition using inception-v3 model
CN110334705B (en) Language identification method of scene text image combining global and local information
CN108960073B (en) Cross-modal image mode identification method for biomedical literature
CN109711464B (en) Image description method constructed based on hierarchical feature relationship diagram
CN110046671A (en) A kind of file classification method based on capsule network
Al-Haija et al. Multi-class weather classification using ResNet-18 CNN for autonomous IoT and CPS applications
CN110175248B (en) Face image retrieval method and device based on deep learning and Hash coding
CN115830471B (en) Multi-scale feature fusion and alignment domain self-adaptive cloud detection method
CN111461174A (en) Multi-mode label recommendation model construction method and device based on multi-level attention mechanism
CN109785409B (en) Image-text data fusion method and system based on attention mechanism
CN111597340A (en) Text classification method and device and readable storage medium
CN114863091A (en) Target detection training method based on pseudo label
Lin et al. Land cover classification of RADARSAT-2 SAR data using convolutional neural network
CN114626476A (en) Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN115731411A (en) Small sample image classification method based on prototype generation
CN112395953A (en) Road surface foreign matter detection system
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN111832580A (en) SAR target identification method combining few-sample learning and target attribute features
CN114898472A (en) Signature identification method and system based on twin vision Transformer network
CN111209886B (en) Rapid pedestrian re-identification method based on deep neural network
DR RECOGNITION OF SIGN LANGUAGE USING DEEP NEURAL NETWORK.
CN116310596A (en) Domain adaptation-based small sample target detection method for electric power instrument
CN117011219A (en) Method, apparatus, device, storage medium and program product for detecting quality of article
CN112765989B (en) Variable-length text semantic recognition method based on representation classification network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant