CN111160426B - Feature fusion method and system based on tensor fusion and LSTM network - Google Patents

Feature fusion method and system based on tensor fusion and LSTM network Download PDF

Info

Publication number
CN111160426B
CN111160426B CN201911299573.8A CN201911299573A CN111160426B CN 111160426 B CN111160426 B CN 111160426B CN 201911299573 A CN201911299573 A CN 201911299573A CN 111160426 B CN111160426 B CN 111160426B
Authority
CN
China
Prior art keywords
heterogeneous data
sub
modal
modal heterogeneous
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911299573.8A
Other languages
Chinese (zh)
Other versions
CN111160426A (en
Inventor
董爱美
李志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201911299573.8A priority Critical patent/CN111160426B/en
Publication of CN111160426A publication Critical patent/CN111160426A/en
Application granted granted Critical
Publication of CN111160426B publication Critical patent/CN111160426B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a feature fusion method and a system based on tensor fusion and LSTM network, which relate to the technical field of heterogeneous data, wherein the implementation of the feature fusion method comprises the following steps: the method comprises the steps of obtaining complete modal heterogeneous data, splitting the complete modal heterogeneous data into A1 complete sub-modal heterogeneous data without missing data and A2 sub-modal heterogeneous data with missing data, and preprocessing the A2 sub-modal heterogeneous data into A2 missing sub-modal heterogeneous data; extracting characteristics of complete sub-modal heterogeneous data and missing sub-modal heterogeneous data by using low-rank representation, and carrying out data modeling on the relation between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data by using tensor fusion to obtain a common matrix of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data, wherein the total number of the common matrices is A1 and A2; and splicing the A1-A2 common matrixes, and then inputting the spliced matrixes into an LSTM network, and outputting the fused matrixes by the LSTM network. The invention can avoid the influence caused by data errors caused by the completion of the deletion sub-mode heterogeneous data and effectively solve the problem of data deletion phenomenon in the heterogeneous data.

Description

Feature fusion method and system based on tensor fusion and LSTM network
Technical Field
The invention relates to the technical field of heterogeneous data, in particular to a feature fusion method based on tensor fusion and an LSTM network.
Background
Classifying the heterogeneous data by utilizing different features of the heterogeneous data with the upper-layer semantic-related bottom layer representation, and complementing the missing values of the heterogeneous data is always an important research method for heterogeneous data processing. For various reasons, heterogeneous data often has the phenomenon of data loss. Although we complement the missing sub-modal heterogeneous data by various methods, there is an error between the true value after the missing sub-modal heterogeneous data is complemented because the heterogeneous data bottom layer represents different characteristics.
Disclosure of Invention
Aiming at the needs and the shortcomings of the prior art development, the invention provides a feature fusion method and a system based on tensor fusion and LSTM network, which do not need to complement the missing sub-mode heterogeneous data, utilize the high-level semantic association features between the complete mode heterogeneous data and the missing sub-mode heterogeneous data, and effectively solve the situation of data missing phenomenon in the heterogeneous data.
Firstly, the invention provides a feature fusion method based on tensor fusion and LSTM network, which solves the technical problems as follows:
a feature fusion method based on tensor fusion and LSTM network, the implementation content of the method includes:
s1, acquiring heterogeneous data, wherein the acquired heterogeneous data is called complete modal heterogeneous data;
s2, splitting the complete modal heterogeneous data into A pieces of sub-modal heterogeneous data, wherein each piece of sub-modal heterogeneous data comprises all descriptions of the same thing, A1 and A2 are natural numbers larger than 0, all the A1 pieces of sub-modal heterogeneous data do not contain missing data, the data are called complete sub-modal heterogeneous data, and the A2 pieces of sub-modal heterogeneous data respectively contain missing data;
step S3, preprocessing the A2 sub-modal heterogeneous data respectively: deleting all lines containing the missing value in each sub-modal heterogeneous data, wherein the sub-modal heterogeneous data after deleting the data is called missing sub-modal heterogeneous data;
s4, respectively extracting the characteristics of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data by using low-rank representation, and then performing mathematical modeling on the relation between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after the characteristics are extracted by using tensor fusion to obtain a common matrix of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data, wherein the total number of the common matrices is A1;
and S5, splicing the A1-A2 common matrixes obtained in the step S4, inputting an LSTM network after splicing, fusing the spliced common matrixes by the LSTM network, and outputting a fused matrix.
Heterogeneous data that has been processed into feature vectors is acquired in step S1.
And when the step S3 is executed, matlab is used, the isnan function is utilized to find the row where the missing value is located in the A2 sub-modal heterogeneous data, and the row containing the missing value in the A2 sub-modal heterogeneous data is deleted, so that the A2 missing sub-modal heterogeneous data is obtained.
When the step S4 is executed, a common matrix of the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data is obtained, and the specific operation steps of the process comprise:
step S4.1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
s4.2, changing the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data from scalar quantities into vectors respectively through a linear function of the formula (1) to obtain a matrix A representing the complete sub-modal heterogeneous data 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
S4.3, performing mathematical modeling on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by using tensor outer product to obtain a common matrix Z l
Figure BDA0002321522590000021
The commonality matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
step S4.4, circularly executing the steps S4.2-S4.3 until A1-A2 common matrixes are obtained.
When executing step S5, the specific operation content of outputting the fusion matrix includes:
step S5.1, arranging A1-A2 commonality matrixes obtained in the step S4 according to a modeling sequence;
step S5.2, adjusting A1-A2 commonality matrixes by using a reshape function of matlab, and splicing the adjusted A1-A2 commonality matrixes by using matlab according to the line order;
and S5.3, inputting an LSTM network after the splicing is completed, screening important information in the spliced A1-A2 common matrixes by the LSTM network, and outputting a two-dimensional fusion matrix, wherein the two-dimensional fusion matrix comprises all characteristic information of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data.
Specifically, the number of rows of the involved two-dimensional fusion matrix is equal to the number of rows of the complete modal heterogeneous data/sub-modal heterogeneous data, and the number of columns of the two-dimensional fusion matrix is smaller than the maximum value of the product of the number of columns of any two common matrixes in the A1 x A2 common matrixes and larger than the number of columns/rows of any common matrix in the A1 x A2 common matrixes.
Secondly, the invention provides a feature fusion system based on tensor fusion and LSTM network, which solves the technical problems as follows:
a feature fusion system based on tensor fusion and LSTM network comprises the following structures:
the acquisition module is used for acquiring heterogeneous data which is processed into feature vectors, and the acquired heterogeneous data is called complete modal heterogeneous data;
the splitting module is used for splitting the complete modal heterogeneous data into A pieces of sub-modal heterogeneous data, after splitting, each piece of sub-modal heterogeneous data comprises all descriptions of the same thing, wherein A1 and A2 are natural numbers larger than 0, all the A1 pieces of sub-modal heterogeneous data do not contain missing data, the complete sub-modal heterogeneous data are called, and the A2 pieces of sub-modal heterogeneous data respectively contain missing data;
the preprocessing module is used for preprocessing the A2 sub-modal heterogeneous data to obtain A2 missing sub-modal heterogeneous data;
the feature extraction module is used for respectively extracting features of the A1 complete sub-modal heterogeneous data and the A2 missing sub-modal heterogeneous data by using low-rank representation;
the tensor fusion module is used for carrying out mathematical modeling on the relation between the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data after the feature extraction by utilizing tensor fusion to obtain a common matrix of the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data, wherein the total number of the common matrices is A1;
the judging and circulating module is used for judging whether the tensor fusion module carries out mathematical modeling on the relation between any complete sub-modal heterogeneous data and any missing sub-modal heterogeneous data, if so, inputting the output result of the tensor fusion module into the splicing module, and if not, returning to the tensor fusion module to continue mathematical modeling;
the splicing module is used for splicing the A1-A2 common matrixes obtained by the tensor fusion module according to the modeling sequence;
the LSTM network module is used for receiving the splicing matrix output by the splicing module and outputting a fusion matrix, and the two-dimensional fusion matrix comprises all characteristic information of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data.
Optionally, the related preprocessing module preprocesses the A2 sub-modal heterogeneous data, and specifically comprises the following operations:
the preprocessing module firstly finds the row where the missing value in the A2 sub-modal heterogeneous data is located by using the isnan function of matlab, and then deletes the row containing the missing value in the A2 sub-modal heterogeneous data to obtain the A2 missing sub-modal heterogeneous data.
Optionally, the related tensor fusion module performs mathematical modeling on the relation between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after feature extraction to obtain a common matrix, which comprises the following specific operations:
step 1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
step 2, changing the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data from scalar quantities into vectors respectively through a linear function of the formula (1) to obtain a matrix A representing the complete modal heterogeneous data 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
Step 3, mathematical modeling is carried out on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by utilizing tensor outer product to obtain a commonality matrix Z l
Figure BDA0002321522590000051
The commonality matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
and 4, circularly executing the steps 2-3 until A1-A2 common matrixes are obtained.
Optionally, the specific operation of the related splicing module to splice the A1 x A2 commonality matrices is:
firstly, arranging A1 and A2 commonality matrixes obtained in the step S4 according to a modeling sequence;
and then adjusting A1-A2 commonality matrixes by using a reshape function of matlab, and splicing the adjusted A1-A2 commonality matrixes in line order by using matlab.
Optionally, the related LSTM network module receives the matrix obtained by orderly splicing, screens important information of the matrix and outputs a two-dimensional fusion matrix. The number of lines of the two-dimensional fusion matrix is equal to the number of lines of the complete modal heterogeneous data/sub-modal heterogeneous data, the number of columns of the two-dimensional fusion matrix is smaller than the maximum value of the product of the number of columns of any two common matrixes in the A1-A2 common matrixes, and is larger than the number of columns/lines of any common matrix in the A1-A2 common matrixes.
The feature fusion method and system based on tensor fusion and LSTM network has the beneficial effects compared with the prior art that:
1) According to the invention, the relation among all sub-modes of the heterogeneous data is fully utilized, the relation among all sub-modes of the heterogeneous data is simulated by establishing a mathematical model, and a fusion matrix among the sub-modes is obtained through an LSTM (link state transition) network, so that the influence caused by data errors caused by the completion of missing sub-mode heterogeneous data is avoided;
2) According to the invention, the missing sub-mode heterogeneous data is not required to be complemented, and the high-level semantic association characteristic between the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data is utilized, so that the situation of data missing phenomenon in the heterogeneous data is effectively solved.
Drawings
FIG. 1 is a flow chart of a method according to a first embodiment of the invention;
fig. 2 is a block diagram showing structural connection of a second embodiment of the present invention.
The reference numerals in the drawings represent:
1. an acquisition module, a preprocessing module, a feature extraction module, a tensor fusion module,
5. the system comprises a splicing module, a LSTM network module, a splitting module, a judging and circulating module and a splicing module.
Detailed Description
In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.
For the following two embodiments, it should be noted that:
in the embodiments, the "m×y", "m×q", "m×u", "m×p", "m×e", "m×h", "s×p", "d×e", "k×h" respectively represent feature vectors of data;
if the obtained heterogeneous data is a video segment, which contains three types of information of voice, text and image, the voice is usually processed into feature vectors by adopting an LSTM, the text is processed into feature vectors by adopting a word2vec mode, and the image is processed into feature vectors by adopting a CNN.
Embodiment one:
with reference to fig. 1, this embodiment proposes a feature fusion method based on tensor fusion and LSTM network, where implementation content of the method includes:
and S1, acquiring heterogeneous data which is processed into feature vectors, wherein the acquired heterogeneous data is called complete modal heterogeneous data. In this embodiment, it is assumed that complete modal isomer data m×y is acquired.
And S2, splitting the complete modal heterogeneous data into five pieces of sub-modal heterogeneous data, wherein each piece of sub-modal heterogeneous data comprises all descriptions of the same thing, two pieces of sub-modal heterogeneous data do not comprise missing data, the two pieces of sub-modal heterogeneous data are called complete sub-modal heterogeneous data, and three pieces of sub-modal heterogeneous data respectively comprise missing data. In this embodiment, it is assumed that the complete modal isomerism data m×y is split to obtain complete sub-modal isomerism data m×q and complete sub-modal isomerism data m×u, and sub-modal isomerism data m×p, sub-modal isomerism data m×e and sub-modal isomerism data m×h containing missing data.
Step S3, respectively preprocessing three sub-mode heterogeneous data containing missing data: and using matlab, respectively finding the row where the missing value is located in the three sub-modal heterogeneous data by using isnan function, and deleting the row containing the missing value in the three sub-modal heterogeneous data to obtain the three missing sub-modal heterogeneous data. In this embodiment, the deletion sub-mode heterogeneous data m×p containing the deletion data is deleted to obtain the deletion sub-mode heterogeneous data s×p, the deletion sub-mode heterogeneous data m×e containing the deletion data is deleted to obtain the deletion sub-mode heterogeneous data d×e, and the deletion sub-mode heterogeneous data m×h containing the deletion data is deleted to obtain the deletion sub-mode heterogeneous data k×h, so that s < m, d < m, and k < m can be known.
And S4, respectively extracting the characteristics of the two complete sub-modal heterogeneous data and the three missing sub-modal heterogeneous data by using low-rank representation, and then performing mathematical modeling on the relation between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after the characteristics are extracted by using tensor fusion to obtain a common matrix of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data, wherein the total number of the common matrix is six. The specific operation steps of the process comprise:
step S4.1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
s4.2, changing the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data from scalar quantities into vectors respectively through a linear function of the formula (1) to obtain a matrix A representing the complete sub-modal heterogeneous data 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
S4.3, performing mathematical modeling on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by using tensor outer product to obtain a common matrix Z l
Figure BDA0002321522590000071
The commonSex matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
and step S4.4, circularly executing the steps S4.2-S4.3 until six common matrixes are obtained.
In this embodiment, the common matrix mp_qs is obtained after the complete sub-modal heterogeneous data m_q and the missing sub-modal heterogeneous data s_p are processed in the steps of S4.2-S4.3, the common matrix me_qd is obtained after the complete sub-modal heterogeneous data m_q and the missing sub-modal heterogeneous data d_e are processed in the steps of S4.2-S4.3, and the common matrix mh_qk is obtained after the complete sub-modal heterogeneous data m_q and the missing sub-modal heterogeneous data k_h are processed in the steps of S4.2-S4.3; the method comprises the steps of processing complete sub-modal isomerism data m x u and missing sub-modal isomerism data S x p through a step S4.2-a step S4.3 to obtain a commonality matrix mp x us, processing complete sub-modal isomerism data m x u and missing sub-modal isomerism data d x e through a step S4.2-a step S4.3 to obtain a commonality matrix me x ud, and processing complete sub-modal isomerism data m x u and missing sub-modal isomerism data k x h through a step S4.2-a step S4.3 to obtain a commonality matrix mh x uk.
Step S5, splicing the six commonality matrixes obtained in the step S4, inputting an LSTM network after splicing, fusing the spliced commonality matrixes by the LSTM network, and outputting the fused matrixes, wherein the specific operation content of the process comprises the following steps:
s5.1, arranging six commonality matrixes obtained in the step S4 according to a modeling sequence;
s5.2, adjusting six common matrixes by using a reshape function of matlab, and splicing the adjusted six common matrixes according to rows by using matlab;
and S5.3, inputting an LSTM network after the splicing is completed, screening important information in the six spliced commonality matrixes by the LSTM network, and outputting a two-dimensional fusion matrix which contains all characteristic information of two complete sub-modal heterogeneous data and three missing sub-modal heterogeneous data. The number of lines of the two-dimensional fusion matrix is equal to the number of lines of the complete modal heterogeneous data/sub-modal heterogeneous data, the number of columns of the two-dimensional fusion matrix is smaller than the maximum value of the product of the number of columns of any two of the six common matrixes, and is larger than the number of columns/lines of any one of the six common matrixes.
In this embodiment, the common matrix mp_qs, the common matrix me_qd, the common matrix mh_qk, the common matrix mp_us, the common matrix me_ud, the common matrix mh_uk are sequentially arranged, the common matrix mp_qs is adjusted to be the common matrix m_ pqs, the common matrix me_qd is adjusted to be the common matrix m_ eqd, the common matrix mh_qk is adjusted to be the common matrix m_hqk, the common matrix mp_ pus, the common matrix me_ud is adjusted to be the common matrix m_ eud, the common matrix mh_uk is adjusted to be the common matrix m_ huk, the common matrix m_ qqs, the common matrix m_ eqd, the common matrix m_qk are sequentially spliced to be the common matrix m_ huk, and the common matrix m_hqk are sequentially spliced to be the common matrix m_hqk, and the common matrix m_hqk are sequentially arranged, and the two-dimensional data are input to the two-dimensional network in which is equal to or greater than the maximum value of the two-dimensional data in the network (n, the two-dimensional data) is equal to or greater than the maximum value of the two-dimensional data in the network (n) and the network (n, n) is equal to the maximum value of 24).
Embodiment two:
with reference to fig. 2, this embodiment proposes a feature fusion system based on tensor fusion and LSTM network, where the structure includes:
the acquisition module 1 is used for acquiring heterogeneous data which is processed into feature vectors, and the acquired heterogeneous data is called complete modal heterogeneous data;
the splitting module 7 is configured to split the complete modal heterogeneous data into a pieces of sub-modal heterogeneous data, where after splitting, each piece of sub-modal heterogeneous data includes all descriptions of the same thing, where A1 and A2 are natural numbers greater than 0, none of the A1 pieces of sub-modal heterogeneous data contains missing data, which is called complete sub-modal heterogeneous data, and the A2 pieces of sub-modal heterogeneous data respectively contain missing data;
the preprocessing module 2 is used for preprocessing the A2 sub-modal heterogeneous data to obtain A2 missing sub-modal heterogeneous data;
the feature extraction module 3 is used for respectively extracting features of the A1 complete sub-modal heterogeneous data and the A2 missing sub-modal heterogeneous data by using low-rank representation;
the tensor fusion module 4 is used for carrying out mathematical modeling on the relation between the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data after the feature extraction by utilizing tensor fusion to obtain a common matrix of the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data, wherein the total number of the common matrices is A1;
the judging and circulating module 8 is configured to judge whether the tensor fusion module 4 performs mathematical modeling on the relationship between any complete sub-modal heterogeneous data and any missing sub-modal heterogeneous data, if yes, input the output result of the tensor fusion module 4 into the splicing module 5, and if no, return to the tensor fusion module 4 to continue performing mathematical modeling;
the splicing module 5 is used for splicing the A1-A2 common matrixes obtained by the tensor fusion module 4 according to a modeling sequence;
the LSTM network module 6 is configured to receive the splicing matrix output by the splicing module 5, and output a fusion matrix, where the two-dimensional fusion matrix includes all feature information of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data.
In this embodiment, the preprocessing module 2 preprocesses the A2 sub-modal heterogeneous data, which specifically includes:
the preprocessing module 2 firstly finds the row where the missing value in the A2 sub-modal heterogeneous data is located by using the isnan function of matlab, and then deletes the row containing the missing value in the A2 sub-modal heterogeneous data to obtain the A2 missing sub-modal heterogeneous data.
In this embodiment, the related tensor fusion module 4 performs mathematical modeling on the relationship between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after feature extraction to obtain the common matrix, which specifically includes:
step 1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
step 2, carrying out linear function of the formula (1) to obtain complete sub-mode heterogeneous data and missing sub-mode heterogeneous dataThe data are respectively changed into vectors from scalar quantities, and a matrix A representing the complete modal heterogeneous data is obtained 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
Step 3, mathematical modeling is carried out on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by utilizing tensor outer product to obtain a commonality matrix Z l
Figure BDA0002321522590000101
The commonality matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
and 4, circularly executing the steps 2-3 until A1-A2 common matrixes are obtained.
In this embodiment, the specific operation of the splicing module 5 to splice the a1×a2 commonalities is:
firstly, arranging A1 and A2 commonality matrixes obtained in the step S4 according to a modeling sequence;
and then adjusting A1-A2 commonality matrixes by using a reshape function of matlab, and splicing the adjusted A1-A2 commonality matrixes in line order by using matlab.
In this embodiment, the related LSTM network module 6 receives the matrix obtained by orderly splicing, screens important information of the matrix, and outputs a two-dimensional fusion matrix. The number of lines of the two-dimensional fusion matrix is equal to the number of lines of the complete modal heterogeneous data/sub-modal heterogeneous data, the number of columns of the two-dimensional fusion matrix is smaller than the maximum value of the product of the number of columns of any two common matrixes in the A1-A2 common matrixes, and is larger than the number of columns/lines of any common matrix in the A1-A2 common matrixes.
The embodiment is specifically implemented as follows:
the method comprises the steps that an acquisition module 1 is assumed to acquire complete modal heterogeneous data m x y, a splitting module 7 is assumed to split the complete modal heterogeneous data m x y into five sub-modal heterogeneous data, wherein two sub-modal heterogeneous data do not contain missing data, namely complete sub-modal heterogeneous data m x q and complete sub-modal heterogeneous data m x u respectively, and the remaining three sub-modal heterogeneous data contain missing data, namely sub-modal heterogeneous data m x p, sub-modal heterogeneous data m x e and sub-modal heterogeneous data m x h containing the missing data respectively;
then, the preprocessing module 2 deletes the row from the sub-mode heterogeneous data m×p containing the missing data to obtain the missing sub-mode heterogeneous data s×p, deletes the row from the sub-mode heterogeneous data m×e containing the missing data to obtain the missing sub-mode heterogeneous data d×e, deletes the row from the sub-mode heterogeneous data m×h containing the missing data to obtain the missing sub-mode heterogeneous data k×h, so that s < m, d < m, k < m;
subsequently, the feature extraction module 3 extracts features of the complete sub-modal heterogeneous data m×q, the complete sub-modal heterogeneous data m×u, the complete sub-modal heterogeneous data s×p, the complete sub-modal heterogeneous data d×e, and the complete sub-modal heterogeneous data k×h by using low rank representation, and the tensor fusion module 4 performs mathematical modeling on the complete sub-modal heterogeneous data m×q and the complete sub-modal heterogeneous data s×p, the complete sub-modal heterogeneous data m×q and the complete sub-modal heterogeneous data d×e, the complete sub-modal heterogeneous data m×q and the complete sub-modal heterogeneous data k×h, the complete sub-modal heterogeneous data m×u and the complete sub-modal heterogeneous data s×p, the complete sub-modal heterogeneous data m×u and the complete sub-modal heterogeneous data d×e, the complete sub-modal heterogeneous data m×u and the complete sub-modal heterogeneous data k×h, obtaining a common matrix mp_qs of complete sub-modal isomerism data m_q and missing sub-modal isomerism data s_p, a common matrix me_qd of complete sub-modal isomerism data m_q and missing sub-modal isomerism data d_e, a common matrix mh_qk of complete sub-modal isomerism data m_q and missing sub-modal isomerism data k_h, a common matrix mp_us of complete sub-modal isomerism data m_u and missing sub-modal isomerism data s_p, a common matrix me_ud of complete sub-modal isomerism data m_u and missing sub-modal isomerism data d_e, and a common matrix mh_uv of complete sub-modal isomerism data m_u and missing sub-modal isomerism data k_h;
then, the splicing module 5 firstly adjusts the commonality matrix mp_qs to the commonality matrix m_ pqs, adjusts the commonality matrix me_qd to the commonality matrix m_ eqd, adjusts the commonality matrix mh_qk to the commonality matrix m_hqk, adjusts the commonality matrix mp_us to the commonality matrix m_ pus, adjusts the commonality matrix me_ud to the commonality matrix m_ eud, adjusts the commonality matrix mh_uk to the commonality matrix m_ huk, and then, the splicing module 5 sequentially splices the commonality matrix m_ pqs, the commonality matrix m_ eqd, the commonality matrix m_hqk, the commonality matrix m_ pus, the commonality matrix m_ eud, the commonality matrix m_ huk according to the line, and inputs the spliced commonality matrix m_3824 to the LSTM network module 6. The LSTM network module 6 outputs a two-dimensional fusion matrix of m×n, where n is smaller than the product of the maximum two numbers in q, u, p, e, h, s, d, k and n is greater than the maximum value in q, u, p, e, h, s, d, k.
In the process of implementing the embodiment, the judging and circulating module 8 is also used for judging whether the tensor fusion module 4 carries out mathematical modeling on the relation between any complete sub-modal heterogeneous data and any missing sub-modal heterogeneous data, if so, six common matrixes output by the tensor fusion module 4 are input into the splicing module 5, and if not, the tensor fusion module 4 is returned to continue mathematical modeling until six common matrixes are obtained.
In summary, by adopting the feature fusion method and system based on tensor fusion and LSTM network, the relation among all sub-modes of heterogeneous data is fully utilized, the relation among all sub-modes of heterogeneous data is simulated by establishing a mathematical model, and the fusion matrix among the sub-modes is obtained through the LSTM network, so that the situation of data missing phenomenon in the heterogeneous data is effectively solved.
Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims (10)

1. A feature fusion method based on tensor fusion and LSTM network is characterized in that the implementation content of the method comprises the following steps:
s1, acquiring heterogeneous data, wherein the acquired heterogeneous data is called complete modal heterogeneous data;
s2, splitting the complete modal heterogeneous data into A pieces of sub-modal heterogeneous data, wherein each piece of sub-modal heterogeneous data comprises all descriptions of the same thing, A1 and A2 are natural numbers larger than 0, all the A1 pieces of sub-modal heterogeneous data do not contain missing data, the data are called complete sub-modal heterogeneous data, and the A2 pieces of sub-modal heterogeneous data respectively contain missing data;
step S3, preprocessing the A2 sub-modal heterogeneous data respectively: deleting all lines containing the missing value in each sub-modal heterogeneous data, wherein the sub-modal heterogeneous data after deleting the data is called missing sub-modal heterogeneous data;
s4, respectively extracting the characteristics of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data by using low-rank representation, and then performing mathematical modeling on the relation between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after the characteristics are extracted by using tensor fusion to obtain a common matrix of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data, wherein the total number of the common matrices is A1;
and S5, splicing the A1-A2 common matrixes obtained in the step S4, inputting an LSTM network after splicing, fusing the spliced common matrixes by the LSTM network, and outputting a fused matrix.
2. The method of claim 1, wherein heterogeneous data processed into feature vectors is acquired in step S1.
3. The method for feature fusion based on tensor fusion and LSTM network according to claim 2, wherein in the step S3, matlab is used to find the row where the missing value in the A2 sub-modal heterogeneous data is located by using isnan function, and the row containing the missing value in the A2 sub-modal heterogeneous data is deleted to obtain the A2 missing sub-modal heterogeneous data.
4. The method for feature fusion based on tensor fusion and LSTM network according to claim 2, wherein, when executing step S4, mathematical modeling is performed on the relationship between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data to obtain a common matrix of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data, and the specific operation steps of the process include:
step S4.1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
s4.2, changing the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data from scalar quantities into vectors respectively through a linear function of the formula (1) to obtain a matrix A representing the complete sub-modal heterogeneous data 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
S4.3, performing mathematical modeling on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by using tensor outer product to obtain a common matrix Z l
Figure FDA0004131190660000021
The commonality matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
step S4.4, circularly executing the steps S4.2-S4.3 until A1-A2 common matrixes are obtained.
5. The method of any one of claims 2-4, wherein outputting the specific operation content of the fusion matrix when executing step S5 comprises:
step S5.1, arranging A1-A2 commonality matrixes obtained in the step S4 according to a modeling sequence;
step S5.2, adjusting A1-A2 commonality matrixes by using a reshape function of matlab, and splicing the adjusted A1-A2 commonality matrixes by using matlab according to the line order;
s5.3, inputting an LSTM network after splicing, screening important information in the spliced A1-A2 common matrixes by the LSTM network, and outputting a two-dimensional fusion matrix, wherein the two-dimensional fusion matrix comprises all characteristic information of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data;
the number of lines of the two-dimensional fusion matrix is equal to the number of lines of the complete modal heterogeneous data/sub-modal heterogeneous data, the number of columns of the two-dimensional fusion matrix is smaller than the maximum value of the product of the number of columns of any two common matrixes in the A1 x A2 common matrixes, and is larger than the number of columns/lines of any common matrix in the A1 x A2 common matrixes.
6. A feature fusion system based on tensor fusion and LSTM network, characterized in that the structure thereof comprises:
the acquisition module is used for acquiring heterogeneous data which is processed into feature vectors, and the acquired heterogeneous data is called complete modal heterogeneous data;
the splitting module is used for splitting the complete modal heterogeneous data into A pieces of sub-modal heterogeneous data, after splitting, each piece of sub-modal heterogeneous data comprises all descriptions of the same thing, wherein A1 and A2 are natural numbers larger than 0, all the A1 pieces of sub-modal heterogeneous data do not contain missing data, the complete sub-modal heterogeneous data are called, and the A2 pieces of sub-modal heterogeneous data respectively contain missing data;
the preprocessing module is used for preprocessing the A2 sub-modal heterogeneous data to obtain A2 missing sub-modal heterogeneous data;
the feature extraction module is used for respectively extracting features of the A1 complete sub-modal heterogeneous data and the A2 missing sub-modal heterogeneous data by using low-rank representation;
the tensor fusion module is used for carrying out mathematical modeling on the relation between the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data after the feature extraction by utilizing tensor fusion to obtain a common matrix of the complete sub-mode heterogeneous data and the missing sub-mode heterogeneous data, wherein the total number of the common matrices is A1;
the judging and circulating module is used for judging whether the tensor fusion module carries out mathematical modeling on the relation between any complete sub-modal heterogeneous data and any missing sub-modal heterogeneous data, if so, inputting the output result of the tensor fusion module into the splicing module, and if not, returning to the tensor fusion module to continue mathematical modeling;
the splicing module is used for splicing the A1-A2 common matrixes obtained by the tensor fusion module according to the modeling sequence;
the LSTM network module is used for receiving the splicing matrix output by the splicing module and outputting a two-dimensional fusion matrix, wherein the two-dimensional fusion matrix comprises all characteristic information of A1 complete sub-modal heterogeneous data and A2 missing sub-modal heterogeneous data.
7. The feature fusion system based on tensor fusion and LSTM network of claim 6, wherein the preprocessing module performs preprocessing on the A2 sub-modal heterogeneous data, specifically:
the preprocessing module firstly finds a row where a missing value is located in the A2 sub-modal heterogeneous data by using an isnan function of matlab, and then deletes the row containing the missing value in the A2 sub-modal heterogeneous data to obtain the A2 missing sub-modal heterogeneous data.
8. The feature fusion system based on tensor fusion and LSTM network of claim 6, wherein the specific operation of the tensor fusion module for mathematically modeling the relationship between the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data after feature extraction to obtain the commonality matrix is as follows:
step 1, introducing a linear function of the formula (1),
y=ωx+b formula (1)
Wherein ω represents weight, x represents input complete sub-modal heterogeneous data or missing sub-modal heterogeneous data, b represents bias, y represents output vector value;
step 2, changing the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data from scalar quantities into vectors respectively through a linear function of the formula (1) to obtain a matrix A representing the complete modal heterogeneous data 1 ,A 2 ,…,A n And a matrix B representing missing sub-modal heterogeneous data 1 ,B 2 ,…,B m
Step 3,Mathematical modeling is carried out on the matrix of the complete sub-modal heterogeneous data and the matrix of the missing sub-modal heterogeneous data by using tensor outer product to obtain a commonality matrix Z l
Figure FDA0004131190660000041
The commonality matrix Z l All characteristic information of the complete sub-modal heterogeneous data and the missing sub-modal heterogeneous data is contained;
and 4, circularly executing the steps 2-3 until A1-A2 common matrixes are obtained.
9. The feature fusion system based on tensor fusion and LSTM network according to claim 6, 7 or 8, wherein the specific operation of the stitching module to stitch a1×a2 commonality matrices is:
firstly, arranging A1 and A2 commonality matrixes obtained in the step S4 according to a modeling sequence;
and then adjusting A1-A2 commonality matrixes by using a reshape function of matlab, and splicing the adjusted A1-A2 commonality matrixes in line order by using matlab.
10. The feature fusion system based on tensor fusion and LSTM network as set forth in claim 9, wherein said LSTM network module receives the matrix obtained by ordered splicing, screens the important information of the matrix, and outputs a two-dimensional fusion matrix;
the number of lines of the two-dimensional fusion matrix is equal to the number of lines of the complete modal heterogeneous data/sub-modal heterogeneous data;
the column number of the two-dimensional fusion matrix is smaller than the maximum value of the column number product of any two common matrixes in the A1-A2 common matrixes and larger than the column number/row number of any one common matrix in the A1-A2 common matrixes.
CN201911299573.8A 2019-12-17 2019-12-17 Feature fusion method and system based on tensor fusion and LSTM network Active CN111160426B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911299573.8A CN111160426B (en) 2019-12-17 2019-12-17 Feature fusion method and system based on tensor fusion and LSTM network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911299573.8A CN111160426B (en) 2019-12-17 2019-12-17 Feature fusion method and system based on tensor fusion and LSTM network

Publications (2)

Publication Number Publication Date
CN111160426A CN111160426A (en) 2020-05-15
CN111160426B true CN111160426B (en) 2023-04-28

Family

ID=70557272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911299573.8A Active CN111160426B (en) 2019-12-17 2019-12-17 Feature fusion method and system based on tensor fusion and LSTM network

Country Status (1)

Country Link
CN (1) CN111160426B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391619A (en) * 2017-07-05 2017-11-24 清华大学 For the adaptive hash method and device of imperfect isomeric data
JP2018128708A (en) * 2017-02-06 2018-08-16 日本電信電話株式会社 Tensor factor decomposition processing apparatus, tensor factor decomposition processing method and tensor factor decomposition processing program
CN109919366A (en) * 2019-02-22 2019-06-21 西南财经大学 Forecasting of Stock Prices method based on tensor and event-driven LSTM model
CN110414788A (en) * 2019-06-25 2019-11-05 国网上海市电力公司 A kind of power quality prediction technique based on similar day and improvement LSTM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018128708A (en) * 2017-02-06 2018-08-16 日本電信電話株式会社 Tensor factor decomposition processing apparatus, tensor factor decomposition processing method and tensor factor decomposition processing program
CN107391619A (en) * 2017-07-05 2017-11-24 清华大学 For the adaptive hash method and device of imperfect isomeric data
CN109919366A (en) * 2019-02-22 2019-06-21 西南财经大学 Forecasting of Stock Prices method based on tensor and event-driven LSTM model
CN110414788A (en) * 2019-06-25 2019-11-05 国网上海市电力公司 A kind of power quality prediction technique based on similar day and improvement LSTM

Also Published As

Publication number Publication date
CN111160426A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN110472065B (en) Cross-language knowledge graph entity alignment method based on GCN twin network
CN106203617B (en) A kind of acceleration processing unit and array structure based on convolutional neural networks
US20210216854A1 (en) Neural network searching method, device and storage medium
CN103886023B (en) The storage of Excel tables of data, extracting method and system
JP2008015872A (en) Bit string retrieving device, retrieval method and program
CN105739337B (en) A kind of human-computer interaction type voice control and teaching system and method
CN108664993B (en) Dense weight connection convolutional neural network image classification method
WO2021110147A1 (en) Methods and apparatuses for image processing, image training and channel shuffling
CN111125408B (en) Searching method, searching device, computer equipment and storage medium based on feature extraction
CN114359938B (en) Form identification method and device
CN110275940A (en) A kind of Chinese address recognition methods and equipment
CN113485837A (en) Tensor processing method and processing system based on parallel branch and tensor segmentation
CN107239549A (en) Method, device and the terminal of database terminology retrieval
CN113239875A (en) Method, system and device for acquiring human face features and computer readable storage medium
CN111160426B (en) Feature fusion method and system based on tensor fusion and LSTM network
CN106294530B (en) The method and system of rule match
CN115115744A (en) Image processing method, apparatus, device, storage medium, and program product
CN109885708A (en) The searching method and device of certificate picture
CN109344371A (en) The generation method and device of gauge outfit
CN110705398A (en) Mobile-end-oriented test paper layout image-text real-time detection method
US20220180245A1 (en) System and method of constructing machine learning workflows through machine learning suggestions
CN115273135A (en) Gesture image classification method based on DC-Res2Net and feature fusion attention module
US20220180243A1 (en) System and method of suggesting machine learning workflows through machine learning
CN114372097A (en) Efficient connection comparison implementation method and device for data set serialization
CN112634989A (en) Double-sided genome fragment filling method and device based on fragment contig

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant