CN113221852A

CN113221852A - Target identification method and device

Info

Publication number: CN113221852A
Application number: CN202110645394.6A
Authority: CN
Inventors: 吕亚飞; 张筱晗; 熊伟; 崔亚奇; 姚立波; 黄猛; 王雅芬
Original assignee: Unit 91977 Of Pla
Current assignee: Unit 91977 Of Pla
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-08-06
Anticipated expiration: 2041-06-09
Also published as: CN113221852B

Abstract

The invention discloses a target identification method and a target identification device, belongs to the technical field of target identification, and mainly solves the problems that multi-source fusion identification is difficult to perform and the accuracy of fusion identification is low in the prior art. According to the method, a multi-source attention fusion module is constructed, feature representation vectors of multi-source detection data are fused according to mutual similarity and importance degree, fusion feature representation of the multi-source detection data is obtained, the importance degree of the multi-source information feature representation can be corrected, important features are enhanced, non-important features are weakened, the feature representation capability of the fused features is improved, and accuracy of multi-source fusion recognition is improved.

Description

Target identification method and device

Technical Field

The present invention relates to the field of target identification technologies, and in particular, to a target identification method and apparatus.

Background

Target identification is always a research hotspot in the field of data processing, and aims to obtain the significance characteristics of a target by extracting the characteristics of data information and realize the discrimination of target identity information; with the development of multi-sensors in all fields such as shore, sea, air and the like, how to fully utilize the complementary advantages among multi-source information to realize the fusion recognition of the multi-source detection information is a key for improving the recognition accuracy of a detection target, and further accurate situation judgment and behavior prediction can be formed.

In the research of target identification, the research on multi-source fusion identification is less, which is mainly because the heterogeneity and heterogeneity among multi-source information form a barrier of multi-source information fusion, and the fusion method can be generally divided into three types, namely data layer fusion, feature layer fusion and decision layer fusion according to the sequence stage of the multi-source fusion. Wherein, the data layer fusion becomes very difficult due to the heterogeneity among multi-source detection data; the processes of decision layer fusion and feature extraction are mutually independent, and only the recognition results of all multi-source information are subjected to fusion processing, so that the accuracy of target recognition by the decision layer fusion is improved to a limited extent. With the development of artificial intelligence technologies such as deep learning in recent years, the feature extraction capability based on a deep neural network is continuously improved, so that feature layer fusion among multi-source information becomes possible, however, the current feature layer fusion method mainly performs simple fusion on the extracted multi-source information in a manner of directly connecting, adding or multiplying the extracted multi-source information in a feature layer, although the characterization capability of fusion features is improved to a certain extent, the relationship among the multi-source information is lack of utilization, the lack of discussion on the importance degree among the multi-source information features is still needed to be improved, and the accuracy of fusion identification is still needed to be improved.

Disclosure of Invention

In view of this, the present invention provides a target identification method and apparatus, and mainly aims to solve the problems in the prior art that multi-source fusion identification is difficult to perform and the accuracy of fusion identification is not high.

According to an aspect of the present invention, there is provided an object recognition method, including the steps of: step 1: collecting multi-source detection data of different sensors on the same target; step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion takes the feature expression vector of each multi-source detection data as input, and outputs multi-source fusion features; and step 3: collecting the multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity class labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; and 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting the probability distribution of a target recognition result.

As a further improvement of the present invention, the feature extraction network specifically includes: and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.

As a further improvement of the invention, the specific steps of the multi-source attention fusion are as follows: the first step is as follows: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection data_NKey vector k_NVector of sum values v_N: the second step is that: using a query vector q for each of said multi-source probe data_NKey vector k corresponding to each of the multi-source detection data_NMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data

The attention moment array

The value is a measure of the degree of importance between the multi-source detection data; the third step: will notice the moment matrix

Value vector v corresponding to each multi-source detection data_NMultiplying respectively to obtain the corrected characteristic representation B e d of each multi-source detection data^k×N；

The fourth step: adding the corrected feature representation B in a column mode to obtain a final fused feature representation f_fuse∈d^k，

As a further improvement of the present invention, the training of the target recognition deep neural network model specifically includes: in the fusion feature representation f_fuse∈d^kOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model; the cross entropy loss function to train y in the sample set DⁱThe value is true, and the output of the whole network is taken as a predicted value:

in the formula, p (f)_fuse) Representing the output of the deep neural network model, i.e. the probability distribution of the target recognition result for each input multi-source detection data, q (y)ⁱ) Real label y for representing target identity categoryⁱProbability distribution of (2).

As a further refinement of the present invention, the multi-source detection data types include image, text, speech, and location level data.

According to another aspect of the present invention, there is provided an object recognition apparatus including: a data acquisition module: configured to collect multi-source detection data of different sensors for the same target; a model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion submodule takes a feature representation vector of each multi-source detection data as input and outputs multi-source fusion features; a model training module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is configured to collect the multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; a model application module: and the probability distribution of the target recognition result is output after the multi-source detection data is input into the trained target recognition deep neural network model after being preprocessed.

As a further improvement of the present invention, the multi-source attention fusion submodule specifically includes: a vector generation unit: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection data_NKey vector k_NVector of sum values v_N(ii) a Note that the moment array unit: using a query vector q for each of said multi-source probe data_NKey vector k corresponding to each of the multi-source detection data_NMultiplying to obtain a similarity measure A between any two multi-source detection data, and carrying out similarity measurement AObtaining attention moment array among any multi-source detection data after row normalization calculation

The attention moment array

The value is a measure of the degree of importance between the multi-source detection data; a correction unit: will notice the moment matrix

A fusion unit: adding the corrected feature representation B in a column mode to obtain a final fused feature representation f_fuse∈d^k，

By the technical scheme, the beneficial effects provided by the invention are as follows:

(1) collecting multi-source detection data of different sensors on the same target, wherein the multi-source detection data comprises image, text, voice and position level data; the data sources are richer, the data sources can complement each other, and the limitation that when single-type sensor data are used, the identification result completely depends on the quality of the data is overcome.

(2) The method comprises the steps of constructing a multi-source attention fusion module, fusing feature representation vectors of multi-source detection data according to mutual similarity and importance degree to obtain fusion feature representation of the multi-source detection data, correcting the importance degree among the multi-source information feature representations to achieve the purposes of enhancing important features and weakening non-important features, improving the representation capability of the fused features and further improving the accuracy of multi-source fusion recognition.

(3) The method is characterized in that a feature extraction network of various data and a multi-source attention fusion module are combined to form a target recognition deep neural network, a training data set is constructed by using multi-source detection data to train the whole network, the trained target recognition deep neural network can perform target recognition on the multi-source detection data in real time, and compared with a network which is trained by means of feature extraction and feature fusion separately, the recognition efficiency and accuracy are improved to a certain extent.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a schematic diagram of a target recognition deep neural network model in a target recognition method based on multi-source data fusion provided by an embodiment of the present invention;

fig. 2 shows a schematic diagram of a multi-source attention fusion module in a target identification method based on multi-source data fusion provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The method mainly solves the problems that in the prior art, multi-source fusion recognition is difficult to perform and the fusion recognition accuracy is low due to the fact that fusion is performed on multi-source information features in a simple linear superposition mode, the relationship among the multi-source information is not utilized, and the importance degree among the multi-source information features is not distinguished.

According to the invention, a multi-source attention fusion module is designed by utilizing the self-attention thought of a Transformer, and the importance degree among the multi-source information feature representations is corrected, so that the purposes of enhancing important features and weakening non-important features are realized, the characterization capability of the fused features is improved, and the accuracy of multi-source fusion recognition is further improved.

Example 1

The technical scheme of the method comprises the following steps:

step 1: collecting multi-source detection data of different sensors on the same target;

multi-source refers to data obtained from multiple different types of sensors, including but not limited to satellites, drones, radars, AIS, recording equipment, etc.; the data types acquired by different types of sensors are different, for example, the data type acquired by a satellite is image data, a radar can acquire position data, and a recording device can acquire voice data. The multi-source detection data types include image, text, speech, and location level data.

Step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion;

fig. 1 shows a schematic diagram of a deep neural network model for target recognition in a target recognition method based on multi-source data fusion provided in an embodiment of the present invention, and as shown in fig. 1, a feature extraction network module is used for extracting feature representation vectors of each multi-source detection data; the multi-source attention fusion module takes the feature expression vector of each multi-source detection data as input and outputs multi-source fusion features;

step 2.1: the feature extraction network module constructs a feature extraction network of each multi-source detection data to extract a feature expression vector of each multi-source detection data.

Step 2.1.1 the feature extraction network module includes the use of two broad classes of deep neural networks: extracting feature expression vectors of the image data by utilizing the representation capability of the convolutional neural network on the image data; extracting feature expression vectors of sequence information data by utilizing the representation capability of a recurrent neural network on the sequence information data such as texts, voices and positions; the feature expression vector F of the multi-source heterogeneous data is a high-dimensional vector represented by the last full-connection layer of the convolutional neural network and the cyclic neural network:

F＝{(f₁，f₂，f₃，...f_N)，f_N∈d^K}，

wherein k represents the dimension of each feature vector; n denotes the total number of data entered.

The convolutional neural network used in the present embodiment includes: VGG, ResNet, SeNet, ShuffleNet, GoogleNet; any one of the convolutional neural networks can be selected in practical application.

The recurrent neural network used in this embodiment includes: RNN or GRU; any kind of recurrent neural network can be selected in practical application.

Step 2.1.2: respectively connecting a full-connection layer and an activation function on the basis of the high-level feature representation F extracted in the step 2.1.1, performing one-step nonlinear processing on the obtained high-level feature representation, wherein the activation function is a relu function, and formulas of the processing process and the relu function are respectively shown in formulas (1) and (2), so as to obtain a feature representation vector F' { (F) of each multi-source detection data₁′，f₂′，f₃′，...f_N′)，f_N′∈d^K}；

f_N′＝relu(FC(f_N))＝relu(W_k×k·f_N+b) (1)

Wherein FC represents the full link layer, W_k×kAnd b are a matrix and a vector having dimensions (k × k) and (k, 1), respectively.

Step 2.2: and constructing a multi-source attention fusion module, and fusing the feature representation vectors of the multi-source detection data according to the similarity and the importance degree of each other to obtain the fusion feature representation of the multi-source detection data.

And the multi-source attention fusion module takes the feature representation vector F' of each multi-source detection data obtained in the step 2.1.2 as input and takes the multi-source fusion feature as output.

The multi-source attention fusion module corrects the importance degree among the multi-source information feature representations to achieve the purposes of enhancing important features and weakening non-important features and improve the representation capability of the fused features.

The importance degree of each data source is closely related to the similarity among the data sources, and the method corrects the weight of each data source according to the similarity. For similarity comparison of each data source, three comparison variables are defined for each data source respectively on the basis of the original feature expression vector of each data source: query variables, key variables, and value variables; the query variable and the key variable are used for comparing similarity of the data sources, similarity measurement between the data sources is obtained through the query variable and the key variable between the data sources, and the obtained similarity measurement is used for correcting value variables of the data sources, so that corrected feature representation of the data sources is obtained. Fig. 2 shows a schematic diagram of a multi-source attention fusion module in a target identification method based on multi-source data fusion according to an embodiment of the present invention, taking feature fusion of two data sources as an example; as shown in fig. 2, the specific implementation is as follows:

step 2.2.1: predefining three trainable matrix variables, each being W_q，W_k，W_v∈d^k×kInitializing the three matrix variables in a random initialization mode, wherein the values of the matrix variables can be updated along with the training of the network;

step 2.2.2: multiplying the matrix variables predefined in the step 2.2.1 by the feature expression vector F' respectively to obtain 3 sub-variable matrixes: query matrix Q ∈ d^k×NThe key matrix K e d^k×NThe sum matrix V ∈ d^k×NAs shown in the following formula,

Q＝(q₁，q₂，...，q_N)＝W_q×F′＝Wq×(f₁′，f₂′，f₃′，...f_N′) (3)

K＝(k₁，k₂，...，k_N)＝W_k×F′＝W_k×(f₁′，f₂′，f₃′，...f_N′) (4)

V＝(v₁，v₂，...，v_N)＝W_v×F′＝W_v×(f₁′，f₂′，f₃′，...f_N′) (5)

feature representation f for each data source_NIn other words, the query vector q is obtained by the equations (3), (4) and (5)_NKey vector k_NVector of sum values v_N；

Step 2.2.3: to obtain inputBy the similarity and importance degree between the data sources of each query vector q_NKey vectors k for respective data sources_NMultiplying to obtain a similarity measurement A between any two data sources, mapping the similarity measurement value between the data sources to a range from 0 to 1 through a softmax function on the basis of the A, and obtaining an attention moment array between any two data sources when the sum of all the measurement values is 1 as shown in a formula (6)

Wherein is divided by

The normalization effect is realized for the convenience of calculation;

step 2.2.4: vector of values v_NAs a representation of the vector characteristics of each data source, attention moment matrix

The magnitude of the value is taken as a measure of the degree of importance between the data sources, and attention is paid to the moment matrix

Value vector v corresponding to each data source_NRespectively multiplying, namely modifying the characteristic representation of each data source by utilizing the similarity measurement between the data sources to obtain the final characteristic representation of each data source after modification, wherein the final characteristic representation belongs to the field of data sources^k×NAs shown in the formula (7),

step 2.2.5: adding the obtained corrected feature representation B in a row mode to obtain a final fused feature representation f_fuse∈d^kSuch as formula(8) As shown in the drawings, the above-described,

and step 3: collecting multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity category labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;

step 31: carrying out data cleaning and preprocessing, wherein the data cleaning comprises denoising, missing value supplement and abnormal value elimination on each multi-source detection data, and the preprocessing comprises image correction, image enhancement, data slicing and data standardization on the multi-source data;

step 32: constructing a multi-source fusion recognition training sample set D

D＝{(xⁱ ₁，xⁱ ₂，...xⁱ _m，yⁱ) I ∈ (0, n) }, where xⁱ _mIndicating that the mth data source is to the target yⁱM denotes the number of types of data sources, yⁱIdentity category labels representing the targets corresponding to the m data sources, i represents the size of the category number of the data sets, and n represents the total number of the category of the data sets;

step 33: the multi-source fusion feature representation f extracted in step 2_fuse∈d^kOn the basis, a cross entropy loss function is used as a constraint, and a target recognition deep neural network model is trained and learned on the multi-source fusion recognition training data set constructed in the step 32;

cross entropy loss function to train y in sample set DⁱThe truth value is true, the output of the whole network is taken as a predicted value, as shown in formula (4),

p(f_fuse) Representing the output of the entire network, i.e. the probability distribution of the predictions for each input data source, q (y)ⁱ) Label y representing realityⁱA probability distribution of (a);

the training and learning of the target recognition deep neural network model is to perform end-to-end training of the target recognition deep neural network model on a computer configured with a GPU; randomly selecting 90% of data in the training sample set D as a training set, and the rest as a test set. In the training process, randomly read multi-source data are input into a target recognition deep neural network model, the whole network is trained by adopting a random gradient descent method, the size of the data batch read in the training process can be selected to be 2, 4, 8, 16, 32 and 64 according to the calculation capacity of a GPU, the whole data set is trained and iterated for 100 cycles, and the learning rate is set to be 1 e-4.

And 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting probability distribution of a target recognition result of the multi-source detection data.

The trained network model can be used for real-time fusion recognition of multi-source detection data, detected multi-source information is input into the trained network after being preprocessed, and probability distribution of recognition results of the multi-source information can be output.

Example 2

Further, as an implementation of the method shown in the above embodiment, another embodiment of the present invention further provides an object recognition apparatus. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. In the apparatus of this embodiment, there are the following modules:

1. a data acquisition module: configured to collect multi-source detection data of different sensors for the same target; this module corresponds to step 1 in example 1.

2. A model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting feature representation vectors F' of all multi-source detection data; the multi-source attention fusion submodule takes the feature expression vector of each multi-source detection data as input and outputs multi-source fusion features; this module corresponds to step 2 in example 1.

The feature extraction network specifically comprises: and extracting image class characteristics from the multi-source heterogeneous data with semantic classes marked as image classes by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with semantic classes marked as sequence classes by using a cyclic neural network.

The multi-source attention fusion submodule specifically comprises:

a vector generation unit: generating a query vector q of each multi-source detection data based on the feature expression vector F' of each multi-source detection data_NKey vector k_NVector of sum values v_N；

Note that the moment array unit: query vector q with multiple source probe data_NKey vector k corresponding to each multi-source detection data_NMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data

Attention moment array

The value is a measure of the degree of importance between the multi-source detection data;

a correction unit: will notice the moment matrix

3. A model training module: the system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is configured to acquire multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model; this module corresponds to step 3 in example 1.

4. A model application module: the method is configured to input the pre-processed multi-source detection data into a trained target recognition deep neural network model and output the probability distribution of a target recognition result. This module corresponds to step 4 in example 1.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Claims

1. A method of object recognition, comprising the steps of:

step 2: constructing a target recognition deep neural network model, wherein the model comprises a feature extraction network and multi-source attention fusion; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion takes the feature expression vector of each multi-source detection data as input, and outputs multi-source fusion features;

and step 3: collecting the multi-source detection data of a plurality of different targets according to the step 1, preprocessing the multi-source detection data, and carrying out identity class labeling on the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;

and 4, step 4: and preprocessing the multi-source detection data, inputting the preprocessed multi-source detection data into a trained target recognition deep neural network model, and outputting the probability distribution of a target recognition result.

2. The object recognition method according to claim 1, wherein the feature extraction network is specifically:

and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.

3. The target identification method of claim 2, wherein the multi-source attention fusion comprises the following specific steps:

the first step is as follows: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection data_NKey vector k_NVector of sum values v_N；

The second step is that: using a query vector q for each of said multi-source probe data_NKey vector k corresponding to each of the multi-source detection data_NMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data

The attention moment array

the third step: will notice the moment matrix

4. The target recognition method of claim 3, wherein the training of the target recognition deep neural network model specifically comprises:

in the fusion feature representation f_fuse∈d^kOn the basis, a cross entropy loss function is used as a constraint, and the training sample set D is used for training the target recognition deep neural network model;

the cross entropy loss function to train y in the sample set DⁱThe value is true, and the output of the whole network is taken as a predicted value:

in the formula, p (f)_fuse) Representing an output of the target recognition deep neural network modelI.e. the probability distribution of the object recognition result for each input multi-source detection data, q (y)ⁱ) Real label y for representing target identity categoryⁱProbability distribution of (2).

5. The object recognition method of claim 1, wherein the multi-source detection data types include image, text, speech, and location level data.

6. An object recognition apparatus, comprising:

a data acquisition module: configured to collect multi-source detection data of different sensors for the same target;

a model construction module: the method comprises the steps of constructing a target recognition deep neural network model, and comprising a feature extraction network and a multi-source attention fusion sub-module; the feature extraction network is used for extracting a feature representation vector F' of each multi-source detection data; the multi-source attention fusion submodule takes a feature representation vector of each multi-source detection data as input and outputs multi-source fusion features;

a model training module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is configured to collect the multi-source detection data of a plurality of different targets, preprocess the multi-source detection data and label the identity category of the multi-source detection data one by one according to the different targets; all the multi-source detection data sets labeled by the identity categories form a training sample set D to train the target recognition deep neural network model;

a model application module: and the probability distribution of the target recognition result is output after the multi-source detection data is input into the trained target recognition deep neural network model after being preprocessed.

7. The object recognition device of claim 6, wherein the feature extraction network is specifically:

8. The object recognition device of claim 7, wherein the multi-source attention fusion submodule is specifically:

a vector generation unit: generating a query vector q of each multi-source detection data based on the feature representation vector F' of each multi-source detection data_NKey vector k_NVector of sum values v_N；

Note that the moment array unit: using a query vector q for each of said multi-source probe data_NKey vector k corresponding to each of the multi-source detection data_NMultiplying to obtain a similarity measurement A between any two multi-source detection data, and carrying out normalization calculation on the similarity measurement A to obtain an attention moment array between any two multi-source detection data

The attention moment array

a correction unit: will notice the moment matrix

9. The object recognition device of claim 8, wherein the training of the object recognition deep neural network model is specifically:

10. The object recognition device of claim 6, wherein the multi-source detection data types include image, text, voice, and location level data.