CN113157678B - Multi-source heterogeneous data association method - Google Patents

Multi-source heterogeneous data association method Download PDF

Info

Publication number
CN113157678B
CN113157678B CN202110421743.6A CN202110421743A CN113157678B CN 113157678 B CN113157678 B CN 113157678B CN 202110421743 A CN202110421743 A CN 202110421743A CN 113157678 B CN113157678 B CN 113157678B
Authority
CN
China
Prior art keywords
data
source
source heterogeneous
heterogeneous data
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110421743.6A
Other languages
Chinese (zh)
Other versions
CN113157678A (en
Inventor
吕亚飞
张筱晗
石敏
江志浩
王雅芬
黄猛
涂卫红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unit 91977 Of Pla
Original Assignee
Unit 91977 Of Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unit 91977 Of Pla filed Critical Unit 91977 Of Pla
Priority to CN202110421743.6A priority Critical patent/CN113157678B/en
Publication of CN113157678A publication Critical patent/CN113157678A/en
Application granted granted Critical
Publication of CN113157678B publication Critical patent/CN113157678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-source heterogeneous data association method, belongs to the technical field of data fusion, and mainly solves the problem that heterogeneous gaps exist among different modal information in multi-source heterogeneous data in the prior art. The method comprises the steps of constructing a multi-source heterogeneous data association model, carrying out nonlinear mapping on the multi-source heterogeneous data by using a deep neural network, and constructing an association relation between different modal information. The method integrates the common characteristics and the unique characteristics of all data sources, fully utilizes the complementary information of the unique characteristics containing all data sources, and has positive promotion effect on promoting the correlation judgment among multi-source heterogeneous data.

Description

Multi-source heterogeneous data association method
Technical Field
The invention relates to the technical field of data fusion, in particular to a multi-source heterogeneous data association method.
Background
In recent years, with the rapid development of various data detection platforms, the types and the number of sensors are continuously increased, and the accumulation of detection data reaches the scale of big data. Taking two types of information, namely image type and position type as an example, the acquisition of image type data generally has the characteristics of wide detection range, long revisit period, high positioning precision, obvious visual characteristics and the like, and can be used for early-stage large-range early warning and terminal identity identification in the early warning detection process; the position data such as radar and AIS have the characteristics of strong real-time performance and weak visual characteristics, and can be used for links such as target tracking, situation generation and intention judgment in the early warning detection process. If the defects among different information sources can be overcome by performing correlation fusion on various data, information complementation can be realized, and the identification precision and accurate intention judgment of the target are improved. Therefore, the key problem of realizing effective fusion of the multi-source heterogeneous data and further realizing full mining and utilization of the big data is to solve the problem of establishing the incidence relation among the multi-source heterogeneous data.
The difficulty of establishing the association relationship of the multi-source heterogeneous data mainly lies in the problem that heterogeneous gaps exist among information of different modes in the multi-source heterogeneous data, namely inconsistency between information characteristic representation and characteristic distribution of the different modes is strong, and the similarity measurement is difficult to directly carry out by the existing methods such as Euclidean distance and Mahalanobis distance.
Disclosure of Invention
In view of this, the present invention provides a multi-source heterogeneous data association method, which mainly aims to solve the problem in the prior art that it is difficult to directly establish an association relationship because multi-source heterogeneous data is in different probability distribution spaces and feature spaces.
The technical scheme of the invention comprises the following steps: step 1: detecting the same target by using various sensors to acquire multi-data source detection data; step 2: preprocessing the multi-data-source detection data, and setting semantic category labels to form multi-source heterogeneous data; and step 3: constructing a multi-source heterogeneous data association model, carrying out nonlinear mapping on the multi-source heterogeneous data by using a deep neural network, and constructing an association relation between different modal information; and 4, step 4: and inputting the multi-source heterogeneous data into a multi-source heterogeneous data association model to obtain an association result.
As a further improvement of the invention, the multi-source heterogeneous data association model comprises: the method comprises the steps that an information fusion network and an association measurement space are formed, wherein the information fusion network performs feature extraction on multi-source heterogeneous data to obtain feature expression vectors of data sources and fuses the feature expression vectors into fusion feature expression vectors; and the association metric space is used for transferring the fusion feature representation vector to a feature extraction network of each multi-source heterogeneous data so as to strengthen semantic association between the multi-source heterogeneous data.
As a further improvement of the present invention, the feature extraction of each multisource heterogeneous data specifically comprises: and extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a cyclic neural network.
As a further improvement of the present invention, the construction steps of the fusion feature representation vector specifically include: the feature expression vector F of the multi-source heterogeneous data is a high-dimensional vector represented by the convolutional neural network and the last full-connection layer of the cyclic neural network
Figure BDA0003027289960000021
Wherein k represents the dimension of each feature vector; m represents the number of types of data sources; n represents the number of entire data sets, i represents the size of the data volume; d belongs to a multi-source heterogeneous data association learning training data set D; respectively connecting a full connection layer on the basis of the feature expression vector F of the multi-source heterogeneous data, and then obtaining a fusion feature expression vector fusing the common features and the specific features of each data source in a feature connection mode; and taking a cross entropy loss function and a central loss function as target functions, and using the data set D to train the information fusion network, so that the fusion characteristic expression vector is more accurate.
As a further improvement of the present invention, the construction steps of the association metric space specifically include: extracting the characteristics of the multi-source heterogeneous data to obtain characteristic expression vectors of the multi-source heterogeneous data:
Figure BDA0003027289960000022
wherein k represents the dimension of each feature vector; m represents the number of types of data sources, N represents the number of the whole data set, and i represents the size of the data volume; d belongs to a multi-source heterogeneous data association learning training data set D;
Figure BDA0003027289960000023
a feature representation vector representing the data source M; narrowing the distance between each data source feature representation vector and the fused feature representation vector by an L2 constraint; and training the association measurement space by taking the semantic category label in the data set D as supervision information and a sequencing loss function as a target function, so that the association result is more accurate.
By the technical scheme, the beneficial effects provided by the invention are as follows:
(1) collecting multi-source data of the same target to perform artificial association to form multi-source heterogeneous data association learning training data, performing nonlinear mapping by using two types of deep neural networks, and constructing an association relation between different modal information.
(2) The method has the advantages that fusion feature expression vectors among multi-source heterogeneous data are extracted by constructing the multi-source heterogeneous data information fusion network, the fusion feature expression vectors fuse common features and unique features of all data sources, and the unique features of all the data sources contain complementary information among all the data sources, so that the method has a positive promoting effect on promoting association judgment among the multi-source heterogeneous data.
(3) By transferring the fusion feature expression vector to each modal information feature extraction neural network, the feature expression of each data source can acquire fusion features among multiple data sources and own unique features, the accuracy of feature expression of each data source can be improved, the relevance among multi-source heterogeneous data feature expressions to be correlated is enhanced, and the correlation accuracy is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a schematic flow chart of a multi-source heterogeneous data association method according to an embodiment of the present invention;
FIG. 2 shows a schematic diagram of the relationship of common features to complementary features between multi-source heterogeneous data.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The invention mainly solves the problem that effective association can not be carried out in multi-source heterogeneous data in the prior art because heterogeneous gaps exist among information of different modes.
According to the method, a multi-source heterogeneous data association model is constructed, a deep neural network is used for carrying out nonlinear mapping on the multi-source heterogeneous data, and an association relation between different modal information is constructed. The common characteristics and the unique characteristics of the data sources are further fused, complementary information of the data sources is fully utilized, the accuracy of feature representation of the data sources is improved, and the relevance between the multi-source heterogeneous data feature representations to be correlated is enhanced.
Fig. 1 is a schematic flow chart of a multi-source heterogeneous data association method provided in an embodiment of the present invention, and as shown in fig. 1, a technical scheme of the method according to this embodiment includes the following steps:
step 1: and detecting the same target by using various sensors to acquire multi-data-source detection data.
Multi-source refers to data obtained from multiple heterogeneous sensors, including but not limited to satellites, drones, radar, AIS, etc.; the heterogeneous data is data having different structures such as image data, character data, voice data, and position data.
Step 1.1: collecting detection data of different sensors on the same scene or the same target, wherein the data types comprise image, text, voice and position level data.
Step 2: preprocessing the multi-data-source detection data, and performing semantic category marking to form multi-source heterogeneous data;
step 2.1: and (2) carrying out data cleaning and preprocessing, wherein the data cleaning action comprises denoising, missing value supplement and abnormal value elimination on the multi-source detection data collected in the step 1.1, and the preprocessing comprises image correction, image enhancement, data slicing and data standardization operation on the multi-source data.
And step 3: constructing a multi-source heterogeneous data association model, carrying out nonlinear mapping on the multi-source heterogeneous data by using a deep neural network, and constructing an association relation between different modal information;
constructing a multi-source heterogeneous data association model which comprises an information fusion network and an association measurement space, wherein the information fusion network extracts the features of the multi-source heterogeneous data to obtain feature expression vectors of the data sources and fuses the feature expression vectors into fusion feature expression vectors; the association measurement space is used for transferring the fusion feature representation vector to a feature extraction network of each multi-source heterogeneous data so as to strengthen semantic association between the multi-source heterogeneous data;
the fusion features among the multi-source heterogeneous data comprise common features and unique features, the unique features of the data sources comprise complementary information among the data sources, and the fusion features have a positive promoting effect on promoting the association judgment among the multi-source heterogeneous data.
In the prior art, only common features among multi-source heterogeneous data are concerned, an association relation among the multi-source heterogeneous data is established through the common features, information complementarity among special features of the multi-source heterogeneous data is ignored in an association process, and therefore association accuracy is not high.
For example, fig. 2 shows a schematic diagram of a relationship between common features and characteristic features of multi-source heterogeneous data, and taking the image and the text shown in fig. 2 as an example, the image type information and the text type information are respectively subjected to feature extraction to obtain respective feature representations. The feature representation of the two modes can be divided into common features and specific features, and in the prior art, only the semantic consistency of the common features is utilized, and the information complementation between the specific features is omitted. For example, the description "five" of the number of airplanes in the text information is important high-level semantic information that is difficult to extract from the image information, and if the information of the number of airplanes can be migrated to a deep network for image training as supervision information for image feature extraction, the accuracy of image feature representation is improved. Therefore, the complementarity among the unique characteristics of the heterogeneous data is utilized to improve the characteristic representation capability among the modal information, and the accuracy of the association relationship among the multi-source heterogeneous data can be improved.
The steps of constructing the multi-source heterogeneous data association model are as follows:
step 3.1: and constructing a learning training data set of the multi-source heterogeneous data association model.
Performing semantic category marking on multi-source detection data of the same target to form a multi-source heterogeneous data association learning training data set D { (x)i 1,xi 2,...xi m,yi) I ∈ (0, N) }, where xi mRepresenting the probe data of the m-th data source to the target, m representing the number of the data sources, yiRepresenting semantic category labels of the targets corresponding to the m data sources, wherein N represents the number of the whole data set, and i represents the size of data volume;
step 3.2: and constructing a multi-source heterogeneous data information fusion network to extract fusion characteristic expression vectors among the multi-source heterogeneous data.
The multi-source heterogeneous data information fusion network provided by the embodiment of the invention specifically comprises three parts, namely feature extraction of each data source, a multi-source feature fusion network and target function construction.
Wherein:
step 3.2.1: and extracting the characteristics of each data source. Including the use of two broad classes of deep neural networks: extracting feature expression vectors of the image data by utilizing the representation capability of the convolutional neural network on the image data; extracting feature expression vectors of sequence information data by utilizing the representation capability of a recurrent neural network on the sequence information data such as texts, voices and positions; the feature expression vector F of the multi-source heterogeneous data is a high-dimensional vector represented by the last full-connection layer of the convolutional neural network and the cyclic neural network:
Figure BDA0003027289960000061
wherein k represents the dimension of each feature vector; m represents the number of categories of data sources.
The convolutional neural network used in the present embodiment includes: VGG, ResNet, SeNet, ShuffleNet, GoogleNet; any one of the convolutional neural networks can be selected in practical application.
The recurrent neural network used in this embodiment includes: RNN or GRU; any kind of recurrent neural network can be selected in practical application.
Step 3.2.2: a multi-source feature fusion network. On the basis of the feature expression vector F of the multi-source heterogeneous data, after a full connection layer is connected respectively, a feature connection mode is adopted to obtain a fusion feature expression vector of each data source.
Specifically, connecting a full-link layer respectively means performing one-step nonlinear processing on the obtained feature expression vectors, where the activation function is a relu function, and the processing procedure and the formula of the relu function are shown in formulas (1) and (2),
Figure BDA0003027289960000062
Figure BDA0003027289960000063
wherein, FC represents the full connection layer,
Figure BDA0003027289960000064
bithe matrix and the vector with the dimension of (k × k) and (M × k, 1) respectively.
Specifically, the feature connection mode is to obtain a fused feature representation vector for the feature representation vectors of the data sources in an end-to-end connection mode, and the processing procedure is shown in formula (3):
Figure BDA0003027289960000065
conc represents the end-to-end function, and the resulting fused feature representation vector
Figure BDA0003027289960000066
Step 3.2.3: and constructing an objective function.
The construction of the objective function refers to the objective constraint on the fused characteristic expression vector, and the objective is to enable the obtained fused characteristic expression vector of the multi-source heterogeneous data to accurately fuse the common characteristics and the characteristic characteristics of each data source, and mainly comprises a cross entropy loss function and a central loss function.
Specifically, the cross entropy loss function takes yi in the data set D as a true value and takes the output of the whole network as a predicted value, as shown in equation (4).
Figure BDA0003027289960000067
Figure BDA0003027289960000071
Representing information fusionThe output of the network, i.e. the probability distribution of the predictions for each input data source, q (y)i) Label y representing realityiA probability distribution of (a);
specifically, the center loss function is shown in equation (5),
Figure BDA0003027289960000072
wherein the content of the first and second substances,
Figure BDA0003027289960000073
a fused feature representation vector representing the jth set of multi-source data,
Figure BDA0003027289960000074
indicating identity class label yjRepresents the average of the vectors, M represents the identity class label in the dataset as yjThe total number of source data of (a);
step 3.3: and migrating the fusion characteristic expression vectors among the multi-source heterogeneous data to each data source characteristic extraction network, wherein the migration comprises three parts of characteristic extraction of each multi-source heterogeneous data, migration of the fusion characteristic expression vectors and construction of an objective function.
Step 3.3.1: and (5) extracting the characteristics of the multi-source heterogeneous data. The step adopts the same feature extraction network as the step 3.2.1 to obtain the feature expression vector of the multi-source heterogeneous data:
Figure BDA0003027289960000075
wherein k represents the dimension of each feature vector; m represents the number of categories of data sources.
Step 3.3.2: the fused features represent the migration of the vectors. Through L2And (4) constraining and drawing close the distance between the characteristic representation vectors of the data sources and the fusion characteristic representation vectors so as to realize the purpose of migrating the fusion information in the fusion characteristic representation vectors to the data sources. L is2Constraining the feature representation vectors to the data source M and fusing the feature representation vectorsThe constraint is shown in a formula (6),
Figure BDA0003027289960000076
wherein the content of the first and second substances,
Figure BDA0003027289960000077
a fusion feature representation vector representing the jth group of multi-source data;
Figure BDA0003027289960000078
a feature representation vector representing the data source M;
the method enables the feature representation of each data source to obtain the fusion features and the characteristic features among various data sources, so that the accuracy of feature representation of each data source is improved, the relevance among the feature representations of multi-source heterogeneous data to be correlated is enhanced, and the improvement of the accuracy of correlation is facilitated.
Step 3.3.3, tag y with semantic class in dataset DiAnd (4) for supervision information, finishing the construction of the multi-source heterogeneous data association measurement space by taking the sequencing loss function as a target function.
Specifically, the ordering loss function is shown in equation (7):
Figure BDA0003027289960000079
wherein the content of the first and second substances,
Figure BDA0003027289960000081
a similarity score between data from data source 1 and data source 2 representing consistent tag properties,
Figure BDA0003027289960000082
and
Figure BDA0003027289960000083
is the similarity score between the data of data source 1 and data source 2 with inconsistent label properties, the constant alpha is a preset boundary,[x]+denoted max (x, 0), the similarity score between tag-consistent data pairs in the associative metric space is to exceed the similarity score of tag-inconsistent data pairs by the value of the boundary α under the ordering penalty constraint.
Step 3.4: and performing multi-round training on the multi-source heterogeneous data association model, and determining the final multi-source heterogeneous data association model.
Specifically, the embodiment learns and trains the above construction method on a computer configured with a GPU; randomly pick 90% of the data in data set D as training set and the rest as test set.
In the training process, multi-source heterogeneous data with consistent semantic category labels are read randomly, in an application scene, the size of a parameter alpha in a sequencing loss constraint is set to be 0.2, an Adam optimizer is adopted for training in the whole method, the size of a data batch read in the training process can be selected to be 2, 4, 8, 16, 32 and 64 according to the calculation capacity of a GPU, the training iteration of the whole data set is performed for 100 cycles, the learning rate is set to be 2e-4, and the cosine distance is used as a similarity measurement standard between different data sources.
The trained multi-source heterogeneous data association model (referred to as model 1 in the table) is tested on multi-source data sets UCM _ Captions and RSICD, and is tested on the same data set with the model (referred to as model 2 in the table) formed by removing the information fusion network in the multi-source heterogeneous data association model, and the test results of the model and the model are shown in the following tables 1 and 2.
Table 1 experimental comparison results on multi-source data sets UCM _ Captions
Figure BDA0003027289960000084
Table 2 experimental comparison results on multi-source dataset RSICD
Figure BDA0003027289960000091
Wherein: the task of 'image- > text' is that after an image is input into a model, the model sequentially returns associated texts from a text library according to the association degree from large to small, and 'R @ K' represents the proportion that the first K numbers contain true values in a returned text sequence, wherein the larger the value of the index is, the better the algorithm performance is; med _ r represents the ordinal median of the first occurrence of the true value in the returned sequence, and the smaller the value of the index, the better the algorithm performance is proved.
As can be seen from tables 1 and 2, the method of the present embodiment significantly improves the correlation accuracy of the multi-source heterogeneous data on both multi-source data sets.
And 4, step 4: and inputting the multi-source heterogeneous data into a multi-source heterogeneous data association model to obtain an association result.
When the trained multi-source heterogeneous data association model is used, multi-source data to be associated and judged are respectively read, the multi-source data enter an association judging space and are subjected to feature extraction, feature expression vectors of all the multi-source data are obtained, the rest chord distances are directly calculated for the obtained feature expression vectors to carry out similarity judgment, the information fusion network only plays a role in guiding and teacher's action on the feature extraction network of the association judging space in the training process, and the information fusion network does not play a role in the use process after the multi-source heterogeneous data association model is trained.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Claims (1)

1. A multi-source heterogeneous data association method is characterized by comprising the following steps:
step 1: detecting the same target by using various sensors to acquire multi-data source detection data;
step 2: preprocessing the multi-data-source detection data, and setting semantic category labels to form multi-source heterogeneous data;
the pretreatment comprises the following steps: denoising, missing value supplement, abnormal value elimination, image correction, image enhancement, data slicing and data standardization;
and step 3: constructing a multi-source heterogeneous data association model, carrying out nonlinear mapping on the multi-source heterogeneous data by using a deep neural network, and constructing an association relation between different modal information;
the multi-source heterogeneous data association model comprises: the information fusion network comprises a first multi-source heterogeneous data feature extraction network, is used for acquiring feature expression vectors of all data sources and is fused into fusion feature expression vectors; the association metric space is used for transferring the fusion feature representation vector to a second multi-source heterogeneous data feature extraction network so as to strengthen semantic association between the multi-source heterogeneous data;
the first multi-source heterogeneous data feature extraction network specifically comprises: extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a first convolution neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a first circulation neural network;
the construction steps of the fusion feature representation vector are specifically as follows:
the feature representation vector F of the multi-source heterogeneous data is a high-dimensional vector represented by the last fully-connected layer of the first convolutional neural network and the first recurrent neural network:
Figure FDA0003505793390000011
wherein K represents the dimension of each feature vector; m represents the number of types of data sources; n represents the number of entire data sets, i represents the size of the data volume; d belongs to a multi-source heterogeneous data association learning training data set D;
Figure FDA0003505793390000012
a feature representation vector representing the data source M;
setting semantic category labels for multi-data source detection data of the same target, and forming the multi-source heterogeneous data association learning training data set D { (x)i 1,xi 2,...xi M,yi) I ∈ (0, N) }, where xi MIndicating the probe data of the Mth data source to the target, yiSemantic category labels representing the corresponding targets of the M data sources;
respectively connecting a full connection layer on the basis of the feature expression vector F of the multi-source heterogeneous data, and then obtaining a fusion feature expression vector fusing the common features and the specific features of each data source in a feature connection mode;
taking a cross entropy loss function and a central loss function as target functions, and using the data set D to train the information fusion network, so that the fusion characteristic expression vector is more accurate;
the cross entropy loss function is specifically:
the cross entropy loss function takes the semantic category label yi in the data set D as a true value and takes the output of the information fusion network as a predicted value, and the formula is as follows:
Figure FDA0003505793390000021
probability distribution of prediction results of data sources with outputs of information fusion network as inputs
Figure FDA0003505793390000022
q(yi) Label y representing realityiA probability distribution of (a);
the central loss function is specifically:
Figure FDA0003505793390000023
wherein the content of the first and second substances,
Figure FDA0003505793390000024
a fused feature representation vector representing the jth set of multi-source data,
Figure FDA0003505793390000025
tag y for indicating identity categoryjM represents an identity class label y in the data set DjThe total number of source data of (a);
the construction steps of the association metric space are specifically as follows:
and extracting the characteristics of each multi-source heterogeneous data by using the second multi-source heterogeneous data characteristic extraction network to obtain a characteristic expression vector V of the multi-source heterogeneous data:
Figure FDA0003505793390000026
wherein K represents the dimension of each feature vector; m represents the number of types of data sources, N represents the number of the whole data set, and i represents the size of the data volume; d belongs to the multi-source heterogeneous data association learning training data set D;
Figure FDA0003505793390000027
a feature representation vector representing the data source M;
the second multi-source heterogeneous data feature extraction network specifically comprises: extracting image class characteristics from the multi-source heterogeneous data with the semantic class marked as an image class by using a second convolutional neural network, and extracting sequence class characteristics from the multi-source heterogeneous data with the semantic class marked as a sequence class by using a second convolutional neural network;
the first convolutional neural network and the second convolutional neural network are specifically any one of VGG, ResNet, SeNet, ShuffleNet and GoogleNet;
the first recurrent neural network and the second recurrent neural network are specifically RNN or GRU;
drawing the distance between the feature representation vector V of each data source and the fusion feature representation vector by L2 constraint to realize the migration of fusion information in the fusion feature representation vector to each data source, so that the feature representation of each data source can acquire the fusion feature and the unique feature among various data sources;
the L2 constraint is specifically:
Figure FDA0003505793390000031
wherein the content of the first and second substances,
Figure FDA0003505793390000032
a fusion feature representation vector representing the jth group of multi-source data in the data set D;
Figure FDA0003505793390000033
a feature representation vector representing the data source M;
training the association measurement space by taking the semantic category label in the data set D as supervision information and a sequencing loss function as a target function, so that an association result is more accurate; performing multi-round training on the multi-source heterogeneous data association model, and determining a final multi-source heterogeneous data association model;
the ranking loss function is specifically:
Figure FDA0003505793390000034
wherein the content of the first and second substances,
Figure FDA0003505793390000035
a similarity score between data representing data source 1 and data source 2 that identity category labels are consistent,
Figure FDA0003505793390000036
and
Figure FDA0003505793390000037
is the similarity score between the data of data source 1 and data source 2 with inconsistent identity class labels, and the constant alpha is a preset boundary, [ x ]]+Represents max (x, 0);
under the constraint of an ordering loss function, the similarity score between the identity class label consistent data pairs in the association metric space is to exceed the similarity score of the identity class label inconsistent data pairs by the value of a boundary alpha;
and 4, step 4: and inputting the multi-source heterogeneous data into a multi-source heterogeneous data association model to obtain an association result.
CN202110421743.6A 2021-04-19 2021-04-19 Multi-source heterogeneous data association method Active CN113157678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110421743.6A CN113157678B (en) 2021-04-19 2021-04-19 Multi-source heterogeneous data association method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110421743.6A CN113157678B (en) 2021-04-19 2021-04-19 Multi-source heterogeneous data association method

Publications (2)

Publication Number Publication Date
CN113157678A CN113157678A (en) 2021-07-23
CN113157678B true CN113157678B (en) 2022-03-15

Family

ID=76869216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110421743.6A Active CN113157678B (en) 2021-04-19 2021-04-19 Multi-source heterogeneous data association method

Country Status (1)

Country Link
CN (1) CN113157678B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822290A (en) * 2021-09-26 2021-12-21 上海闻泰电子科技有限公司 Data fusion method and device, electronic equipment and computer readable storage medium
CN113744570B (en) * 2021-11-03 2022-03-25 武汉理工大学 Anti-collision early warning method and device for ships in water area of bridge area
CN115563654B (en) * 2022-11-23 2023-03-31 山东智豆数字科技有限公司 Digital marketing big data processing method
CN116662434B (en) * 2023-06-21 2023-10-13 河北维嘉信息科技有限公司 Multi-source heterogeneous big data processing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807122A (en) * 2019-10-18 2020-02-18 浙江大学 Image-text cross-modal feature disentanglement method based on depth mutual information constraint
CN111599438A (en) * 2020-04-02 2020-08-28 浙江工业大学 Real-time diet health monitoring method for diabetic patient based on multi-modal data
CN111666313A (en) * 2020-05-25 2020-09-15 中科星图股份有限公司 Correlation construction and multi-user data matching method based on multi-source heterogeneous remote sensing data
CN112100410A (en) * 2020-08-13 2020-12-18 中国科学院计算技术研究所 Cross-modal retrieval method and system based on semantic condition association learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308483B (en) * 2018-07-11 2021-09-17 南京航空航天大学 Dual-source image feature extraction and fusion identification method based on convolutional neural network
CN111738315B (en) * 2020-06-10 2022-08-12 西安电子科技大学 Image classification method based on countermeasure fusion multi-source transfer learning
CN112221156B (en) * 2020-10-27 2021-07-27 腾讯科技(深圳)有限公司 Data abnormality recognition method, data abnormality recognition device, storage medium, and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807122A (en) * 2019-10-18 2020-02-18 浙江大学 Image-text cross-modal feature disentanglement method based on depth mutual information constraint
CN111599438A (en) * 2020-04-02 2020-08-28 浙江工业大学 Real-time diet health monitoring method for diabetic patient based on multi-modal data
CN111666313A (en) * 2020-05-25 2020-09-15 中科星图股份有限公司 Correlation construction and multi-user data matching method based on multi-source heterogeneous remote sensing data
CN112100410A (en) * 2020-08-13 2020-12-18 中国科学院计算技术研究所 Cross-modal retrieval method and system based on semantic condition association learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Cross-modality Person re-identification with Shared-Specific Feature Transfer;Yan Lu 等;《2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20200805;13376-13386 *
Joint Feature Learning Network for Visible-Infrared Person Re-identification;Kunfeng Chen 等;《Pattern Recognition and Computer Version》;20201031;652-663 *
一种通用的跨模态遥感信息关联学习方法;吕亚飞 等;《武汉大学学报(信息科学版)》;20201224;1-10 *
吕亚飞 等.一种通用的跨模态遥感信息关联学习方法.《武汉大学学报(信息科学版)》.2020,1-10. *

Also Published As

Publication number Publication date
CN113157678A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113157678B (en) Multi-source heterogeneous data association method
CN109919031B (en) Human behavior recognition method based on deep neural network
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110414368B (en) Unsupervised pedestrian re-identification method based on knowledge distillation
CN109145939B (en) Semantic segmentation method for small-target sensitive dual-channel convolutional neural network
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN109948425B (en) Pedestrian searching method and device for structure-aware self-attention and online instance aggregation matching
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
CN107145862B (en) Multi-feature matching multi-target tracking method based on Hough forest
CN109410238B (en) Wolfberry identification and counting method based on PointNet + + network
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN111369535B (en) Cell detection method
CN110728694A (en) Long-term visual target tracking method based on continuous learning
CN111461121A (en) Electric meter number identification method based on YO L OV3 network
CN112036511B (en) Image retrieval method based on attention mechanism graph convolution neural network
CN112116950B (en) Protein folding identification method based on depth measurement learning
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
CN111652171A (en) Construction method of facial expression recognition model based on double branch network
CN114937182B (en) Image emotion distribution prediction method based on emotion wheel and convolutional neural network
CN116304020A (en) Industrial text entity extraction method based on semantic source analysis and span characteristics
CN108428234B (en) Interactive segmentation performance optimization method based on image segmentation result evaluation
CN114494777A (en) Hyperspectral image classification method and system based on 3D CutMix-transform
CN112861881A (en) Honeycomb lung recognition method based on improved MobileNet model
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
CN109558883B (en) Blade feature extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant