CN116821842A

CN116821842A - Feature gating network training and feature fusion method, device and storage medium

Info

Publication number: CN116821842A
Application number: CN202310571474.0A
Authority: CN
Inventors: 李有儒; 陈林勋; 朱振峰
Original assignee: Zhejiang eCommerce Bank Co Ltd
Current assignee: Zhejiang eCommerce Bank Co Ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-09-29

Abstract

The embodiment of the specification discloses a method, a device and a storage medium for training a feature gating network and fusing features, wherein sample features of at least two feature types are obtained, each sample feature is projected to a semantic subspace set corresponding to each feature type respectively based on the feature gating network corresponding to each feature type, and sample decoupling features corresponding to each sample feature are obtained; obtaining sample fusion characteristics based on the sample decoupling characteristics, and determining sample prediction results output by the prediction model based on the sample fusion characteristics; and obtaining prediction loss based on the sample prediction result, and training each characteristic gating network until convergence. Because the subspace subsets are randomly allocated to the features of each feature type through the feature gating network, and the features are decoupled in multiple aspects through the subspace subsets, the aspect information which is effective for the prediction task in the various types of features is obtained, so that the follow-up effective fusion of complex multielement features is realized.

Description

Feature gating network training and feature fusion method, device and storage medium

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence deep learning, in particular to a method, a device and a storage medium for feature gating network training and feature fusion.

Background

Along with the development of big data technology, a target task is often related to multiple information acquired by multiple different source heads, so that the multiple information can be respectively subjected to characteristic characterization and all the characteristic characterization can be integrated into a whole for use by a downstream prediction model. However, the information relationship between these pieces of information directly affects the effect of feature fusion, and thus the prediction accuracy of the downstream model. Therefore, there is a need for a feature fusion method that can effectively and accurately fuse multiple feature characterizations.

Disclosure of Invention

The embodiment of the specification provides a method, a device, a storage medium and a terminal for training a characteristic gating network and fusing characteristics, which can solve the technical problem of poor multi-element characteristic fusion effect in the related art.

In a first aspect, embodiments of the present disclosure provide a feature-gated network training method, where the method includes:

acquiring sample characteristics of at least two characteristic types corresponding to sample input, respectively projecting each sample characteristic to a semantic subspace set corresponding to each characteristic type based on a characteristic gating network corresponding to each characteristic type in a prediction model, and obtaining sample decoupling characteristics corresponding to each sample characteristic output by each semantic subspace set, wherein the semantic subspace set is a subspace subset randomly determined from all semantic subspaces by each characteristic gating network;

Obtaining sample fusion characteristics corresponding to the sample input based on the sample decoupling characteristics, and determining a sample prediction result output by the prediction model based on the sample fusion characteristics;

and obtaining the prediction loss of the prediction model based on the standard prediction result and the sample prediction result corresponding to the sample input, and training each characteristic gating network based on the prediction loss until the prediction model converges.

In a second aspect, embodiments of the present disclosure provide a feature fusion method, including:

acquiring to-be-fused features of at least two feature types corresponding to target input, respectively projecting each to-be-fused feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and obtaining to-be-fused decoupling features corresponding to each to-be-fused feature output by each semantic subspace set;

obtaining target fusion characteristics corresponding to the target input based on each decoupling characteristic to be fused, and determining a target prediction result output by the prediction model based on the target fusion characteristics;

the prediction network is a prediction model obtained after training and converging the feature gating network training method in any one of the embodiments of the description, and the feature gating network is a feature gating network in the prediction model obtained after training and converging the feature gating network training method in any one of the embodiments of the description.

In a third aspect, embodiments of the present disclosure provide a feature-gated network training device, the device comprising:

the single feature decoupling module is used for acquiring sample features of at least two feature types corresponding to sample input, respectively projecting each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in the prediction model to obtain sample decoupling features corresponding to each sample feature output by each semantic subspace set, wherein the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces;

the multi-feature fusion module is used for obtaining sample fusion features corresponding to the sample input based on each sample decoupling feature, and determining a sample prediction result output by the prediction model based on the sample fusion features;

and the gating network training module is used for obtaining the prediction loss of the prediction model based on the standard prediction result corresponding to the sample input and the sample prediction result, and training each characteristic gating network based on the prediction loss until the prediction model converges.

In a fourth aspect, embodiments of the present disclosure provide a feature fusion apparatus, the apparatus comprising:

The feature decoupling module is used for acquiring to-be-fused features of at least two feature types corresponding to target input, and respectively projecting each to-be-fused feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in the prediction model to obtain to-be-fused decoupling features corresponding to each to-be-fused feature output by each semantic subspace set;

the feature fusion module is used for obtaining target fusion features corresponding to the target input based on the decoupling features to be fused and determining a target prediction result output by the prediction model based on the target fusion features;

In a fifth aspect, embodiments of the present description provide a computer program product comprising instructions which, when run on a computer or a processor, cause the computer or the processor to perform the steps of the method described above.

In a sixth aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method described above.

In a seventh aspect, embodiments of the present description provide a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being adapted to be loaded by the processor and to perform the steps of the method described above.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

the embodiment of the specification provides a feature gating network training method, which comprises the steps of obtaining sample features of at least two feature types corresponding to sample input, respectively projecting each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and obtaining sample decoupling features corresponding to each sample feature output by each semantic subspace set, wherein the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces; obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by a prediction model based on the sample fusion characteristics; and obtaining the prediction loss of the prediction model based on the sample input corresponding standard prediction result and the sample prediction result, and training each characteristic gating network based on the prediction loss until the prediction model converges. Because the corresponding feature gating network is set for each feature type, the feature gating network randomly distributes subspace subsets for the features of each feature type, and decouples the multiple aspects of information of the features through the subspace subsets, so that the problem of subspace weight imbalance existing when the random subspace is not distributed is solved, further, decoupling of the features of different types from multiple aspects is realized, and aspect information effective for prediction tasks in the features of each type is more accurately acquired, so that effective fusion of complex multiple features is facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary system architecture diagram of a feature gating network training method provided in an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a feature gating network training method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a logic diagram for assigning a random semantic subspace set based on a feature gating network according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of a feature gating network training method according to an embodiment of the present disclosure;

fig. 5 is a schematic flow chart of a feature fusion method according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a feature gating network training device according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a feature fusion device according to an embodiment of the present disclosure;

Fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.

Detailed Description

In order to make the features and advantages of the embodiments of the present specification more comprehensible, the technical solutions in the embodiments of the present specification are described in detail below with reference to the accompanying drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the embodiments herein.

When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of the embodiments of the present description as detailed in the accompanying claims.

In many real algorithm modeling scenes, the target task is analyzed based on big data in the internet, and the target task is found to be often related to multiple information acquired by a plurality of different source heads, so that the multiple information can be embedded into feature vectors based on a deep learning method, and all feature features are integrated into a whole and then used by a downstream prediction model, so that the final target prediction task is completed. In the process of training the prediction model, the accuracy of sample feature characterization can directly influence the prediction knowledge learned by the model, and finally influence the accuracy of a prediction result.

In the existing fusion method of feature depth characterization, the method is divided into two modes, namely a bitwise addition method, namely, the embedded vector characterization of the same dimension of each feature is directly added bitwise, and the obtained feature characterization is used as the feature characterization of a target to be predicted; and the other is a direct splicing method, namely, the embedded vector characterization of each feature is directly spliced end to end, so that the final feature characterization of the target to be predicted is obtained. In addition, in order to achieve a better feature fusion effect, on the basis of the two feature fusion modes, weighting processing is carried out on various types of features in the process of feature splicing and adding through a mapping matrix capable of training and learning.

However, when multiple types of multi-component features are acquired for one target to be predicted, first, multi-component data corresponding to the target to be predicted needs to be acquired from multiple data sources, and feature characterization of the target to be predicted is determined by analyzing information contained in the multi-component data, so that accurate prediction is performed on the model. However, due to the diversity of the data sources, there may be a relationship between compatibility and complementarity of information contained in the data of each source, where compatibility of information means that information of multiple information inevitably has a characteristic of repeatedly appearing, and complementarity of information means that information of multiple information unique to each other has a characteristic of being able to complement each other. The information relationship between the multiple information can directly influence the effect of feature fusion, thereby influencing the prediction accuracy of the downstream model. When the information relationship among the multiple data is ignored, the characteristic characterization finally fused is quite easy to be inaccurate, and an accurate prediction result cannot be finally obtained.

Therefore, the embodiment of the specification provides a feature gating network training method, which is used for acquiring sample features of at least two feature types corresponding to sample input, respectively projecting each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and obtaining sample decoupling features corresponding to each sample feature output by each semantic subspace set, wherein the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces; obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by a prediction model based on the sample fusion characteristics; and obtaining the prediction loss of the prediction model based on the sample input corresponding standard prediction result and the sample prediction result, and training each characteristic gating network based on the prediction loss until convergence so as to solve the technical problem of poor fusion effect of the multiple characteristics.

Referring to fig. 1, fig. 1 is an exemplary system architecture diagram of a feature gating network training method according to an embodiment of the present disclosure.

As shown in fig. 1, the system architecture may include a terminal 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminals 101 and servers 103. Network 102 may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire, or a coaxial cable, and the Wireless communication link includes a bluetooth communication link, a Wireless-Fidelity (Wi-Fi) communication link, a microwave communication link, or the like.

Terminal 101 may interact with server 103 via network 102 to receive messages from server 103 or to send messages to server 103, or terminal 101 may interact with server 103 via network 102 to receive messages or data sent by other users to server 103. The terminal 101 may be hardware or software. When the terminal 101 is hardware, it may be various electronic devices including, but not limited to, a smart watch, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal 101 is software, it may be installed in the above-listed electronic device, and it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, which is not specifically limited herein.

In the embodiment of the present disclosure, the terminal 101 firstly obtains sample features of at least two feature types corresponding to sample input, and projects each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, so as to obtain sample decoupling features corresponding to each sample feature output by each semantic subspace set, where the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces; further, the terminal 101 obtains a sample fusion feature corresponding to the sample input based on each sample decoupling feature, and determines a sample prediction result output by the prediction model based on the sample fusion feature; finally, the terminal 101 obtains a prediction loss of the prediction model based on the standard prediction result and the sample prediction result corresponding to the sample input, and trains each feature gating network based on the prediction loss until convergence.

The server 103 may be a business server providing various services. The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide a distributed service), or may be implemented as a single software or software module, which is not specifically limited herein.

Alternatively, the system architecture may not include the server 103, in other words, the server 103 may be an optional device in the embodiment of the present specification, that is, the method provided in the embodiment of the present specification may be applied to a system architecture including only the terminal 101, which is not limited in the embodiment of the present specification.

It should be understood that the number of terminals, networks, and servers in fig. 1 is merely illustrative, and any number of terminals, networks, and servers may be used as desired for implementation.

Referring to fig. 2, fig. 2 is a flow chart of a feature gating network training method according to an embodiment of the present disclosure. The execution subject of the embodiments of the present disclosure may be a terminal that executes feature-gating network training, or may be a processor in a terminal that executes a feature-gating network training method, or may be a feature-gating network training service in a terminal that executes a feature-gating network training method. For convenience of description, a specific implementation procedure of the feature gating network training method is described below by taking an example that the implementation subject is a processor in the terminal.

As shown in fig. 2, the feature gating network training method at least may include:

s202, acquiring sample features of at least two feature types corresponding to sample input, respectively projecting each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and obtaining sample decoupling features corresponding to each sample feature output by each semantic subspace set, wherein the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces.

Alternatively, since the multivariate features originate from various data sources, complex and difficult-to-explicitly-modeled compatibility information and complementarity information exist between the multivariate features and the feature features with obvious source differences, fusion performance of some direct-spliced and additive feature fusion modes is very limited. In order to promote the fusion effect between the multi-element feature characterization, the information relationship between the multi-element features needs to be considered, namely, the sample features of each feature type from different data sources need to be decoupled in information, and the whole and complex information contained in each sample feature is decoupled into aspect feature information corresponding to a plurality of semantic aspects respectively, so that the prediction model can strengthen attention on aspect semantic information most needed by a target prediction task without being interfered by unimportant aspect semantic information. Finally, all the sample decoupling characteristics contain various semantic information required by the target prediction task, but are not information which is forced to pay attention to by the prediction model due to redundancy and complementation among the characteristic information.

Alternatively, in order to decouple the feature information of the sample feature of each feature type, multiple semantic aspect information of the sample feature may be obtained by using multiple different semantic aspect expert networks, respectively. Specifically, a feature fusion sub-module Model is designed in the prediction Model, wherein the feature fusion sub-module Model is used for carrying out multi-element feature fusion, a plurality of semantic subspaces exist for respectively decoupling sample features from different semantic aspects, the principle is similar to a hybrid Expert system (Mixtures of Experts, moE), the hybrid Expert system can decompose a prediction modeling task into a plurality of sub-tasks, an Expert Model (Expert Model) is trained on each sub-task, which Expert is selected to be trusted according to input to be predicted through a Gating Model (Gating Model), the output of the corresponding sub-task is completed through a trusted Expert network, and a final prediction result is combined according to the output of all the sub-tasks. Then capturing underlying semantic information of sample features from different semantic angles may be achieved when using semantic subspaces corresponding to the semantic aspects.

Further, when the plurality of semantic subspaces are used for decoupling and remapping the sample features, all the sample features are input into all the semantic subspaces, so that all the semantic subspaces decouple the sample features from different semantic aspects, but at the moment, the fact that when the activation function of the fully-connected layer of the prediction model is a softmax function, the softmax function amplifies weight differences among different tasks in an iterative process, for example, when the initial weights are different, the softmax function tends to trust the weight differences at the moment, and then amplifies the weight differences further on the basis, which causes the situation that a winner generally eats, the gradient vanishes after training for a period of time, and accordingly, in all the semantic subspaces, the attention of the semantic aspects is higher gradually as training is gradually changed, the corresponding semantic subspaces are stronger, the corresponding assigned feature weights are very large, so that the weight of the semantic subspaces in other semantic aspects is very small, the final feature decoupled through all the semantic subspaces is very small, and the final feature information decoupled from the semantic subspaces is very small, and the effect of the final feature information is very small, and the effect of the decoupling effect is very small, and the effect of the final feature information is very small, and the effect of the decoupling effect is very small, and the effect is very low because the effect is very low when the effect is very decoupling the feature information is very low.

Optionally, in order to solve the problem of gradient disappearance of a part of semantic subspaces caused when all semantic subspaces are used for carrying out information decoupling on each sample feature, the sample features of each feature type can be randomly decoupled by using a part of semantic subspaces, the effective decoupling of information is realized from multiple semantic aspects by increasing the uncertainty in the training of a prediction model, the limitation that the weight gap cannot be adaptively adjusted when all semantic subspaces are used for carrying out information decoupling together is changed, the effective fusion among multiple features is enhanced, and thus after decoupling, the prediction model can obtain more comprehensive and accurate prediction results based on the sample decoupling features corresponding to the sample features, and the final prediction performance of the prediction model is directly improved.

Specifically, when the sample features of each feature type are decoupled by using part of semantic subspaces at random, corresponding feature gating networks can be set for each feature type, and the distribution of the semantic subspace subsets of the sample features of the corresponding feature type is completed through each feature gating network. When the feature gating network distributes partial semantic subspaces of sample features of corresponding feature types for the first time, a subspace subset is randomly determined from all semantic subspaces to serve as a semantic subspace set corresponding to the feature types, and all sample features of the feature types are placed into the semantic subspace set by the corresponding feature gating network to be decoupled in multiple aspects of semantic information.

Further, for the training process of the single prediction network, firstly, sample features of at least two feature types corresponding to sample input are obtained, each sample feature is projected to a semantic subspace set corresponding to each feature type respectively based on a feature gating network corresponding to each feature type in the prediction model, and the semantic subspace set is random for each feature gating network. At this time, each semantic subspace set corresponding to each feature type can respectively perform multi-semantic-angle decoupling on the received sample features, and finally each semantic subspace set outputs sample decoupling features corresponding to each sample feature.

It should be noted that, since the feature gating network is related to the feature types received by the prediction network during the setting, the specific feature types of the multiple features to be processed need to be explicitly preset in the modeling stage of the prediction model, and the feature gating network is correspondingly set for each feature type based on the specific feature types. At this time, each time the model is trained and deployed in an actual scene, the feature types of the received sample features are all feature types preset in the model modeling.

Referring to fig. 3, fig. 3 is a logic diagram of allocating a random semantic subspace set based on a feature gating network according to an embodiment of the present disclosure. As shown in FIG. 3, in the feature type S ^p Feature type S ^l The characteristic type b is taken as three preset characteristic types as examples, and the obtained characteristics related to the sample input E are respectively as follows: feature type S ^p Is of the sample characteristics of (a)Feature type S ^l Sample characteristics of->Sample feature E (b) of feature type b _i,j ) The method comprises the steps of carrying out a first treatment on the surface of the Sample characteristics->By means of characteristic type S ^p Corresponding feature gating network g ^p Distribution of (-) to N _s Corresponding semantic subspace set N 'in each semantic subspace' _s In (1) obtaining sample characteristicsCorresponding sample decoupling characteristics->Similarly, sample characteristics->By means of characteristic type S ^l Corresponding feature gating network g ^l (. Cndot.) assignment to corresponding semantic subspace sets, resulting in sample features +.>Corresponding sample decoupling characteristics->Similarly, sample feature E (b _i,j ) Feature gating network g corresponding to feature type b ^b (. Cndot.) is assigned to the corresponding semantic subspace set, resulting in sample feature E (b) _i,j ) Corresponding sample decoupling feature E' (b _i,j ). For ease of illustration, in FIG. 3, the semantic subspace set N' _s Represented as shadows, N _s Not selected as semantic subspace set N 'in the semantic subspaces' _s The remaining subspaces of (2) are shown as dashed lines.

S204, obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by the prediction model based on the sample fusion characteristics.

Optionally, after the sample features of each feature type are decoupled in multiple semantic aspects, sample decoupling features corresponding to each sample feature are obtained respectively, and at this time, the sample decoupling features reflect multiple aspects of semantic information of each sample feature, so that sample fusion features corresponding to sample input can be obtained according to each sample decoupling feature, and the determined sample fusion features consider information relations among sample features of multiple feature types. The information coupling of each semantic aspect in each sample decoupling feature is reduced, the information relation existing among the corresponding sample decoupling features is also reduced, so that the fused sample fusion features can more effectively represent the multi-element information of the sample input in various semantic aspects, the final prediction model can accurately predict the sample input through the multi-element features just as the sample prediction result output by the final prediction model and the sample fusion features.

Optionally, when obtaining the sample fusion feature corresponding to the sample input based on each sample decoupling feature, since the sample decoupling feature at this time has already completed multi-aspect information decoupling, a fusion means such as splicing, bitwise addition and the like may be directly used in the feature fusion stage, and in this process, the sample decoupling feature of each feature type may be further adaptively weighted, or a weight matrix corresponding to the feature type may be manually adjusted according to an actual scene requirement, which is not specifically limited in this embodiment of the present specification.

Alternatively, in a preferred embodiment, a fusion manner of splicing may be adopted for each sample decoupling feature, and after each sample decoupling feature is spliced, a sample fusion feature corresponding to the sample input is obtained. Referring to FIG. 3, for feature type S ^p Corresponding sample decoupling featuresFor the characteristic type S ^l Corresponding sample decoupling characteristics->And sample decoupling features E' (b) corresponding to feature type b _i,j ) Feature fusion is carried out in a splicing mode, and embedded representation of sample fusion features of the final sample input E is obtained>The sample fusion features are then feature representations used by the prediction model to complete the target prediction task for the sample input.

S206, obtaining the prediction loss of the prediction model based on the standard prediction result and the sample prediction result which correspond to the sample input, and training each characteristic gating network based on the prediction loss until the prediction model converges.

Optionally, in the model training process, a loss function is generally used to calculate a loss value between an output value of each training round of the model and a standard value corresponding to a sample, the model uses the loss value to adjust parameters, and the model tends to fit in a direction in which the loss value decreases, and when the loss value meets a preset target value, the training result of the model is indicated to reach the preset target. In this way, when the prediction model is trained, first, when sample characteristics of sample input are prepared, a standard prediction result corresponding to the sample input is also prepared, a prediction loss value of the prediction model can be calculated from the standard prediction result and the sample prediction result of the sample input, and the prediction model can be trained until the prediction model converges using the prediction loss value. In particular, the deviation between the sample prediction result and the standard prediction result may be calculated as the prediction loss value using a cross entropy formula.

Optionally, as a part of the prediction model, the feature gating network corresponding to each feature type also needs to learn how to better decouple the feature characterization of each feature type in many ways according to training, and the prediction loss obtained by the prediction model can reflect the effect of the feature gating network on decoupling many ways of the selected subspace set of each feature for the feature gating network, so that the prediction loss trains the prediction model, and is also used for training each feature gating network in the prediction model, so that each feature gating network adjusts its network parameters in the training process, and when the prediction model converges, the performance of each feature gating network is also illustrated to converge to the target state.

In an embodiment of the present disclosure, a method for training a feature gating network is provided, where sample features of at least two feature types corresponding to sample input are obtained, each sample feature is projected to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, so as to obtain sample decoupling features corresponding to each sample feature output by each semantic subspace set, where the semantic subspace set is a subspace subset determined by each feature gating network randomly from all semantic subspaces; obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by a prediction model based on the sample fusion characteristics; and obtaining the prediction loss of the prediction model based on the sample input corresponding standard prediction result and the sample prediction result, and training each characteristic gating network based on the prediction loss until the prediction model converges. Because the corresponding feature gating network is set for each feature type, the feature gating network randomly distributes subspace subsets for the features of each feature type, and decouples the multiple aspects of information of the features through the subspace subsets, so that the problem of subspace weight imbalance existing when the random subspace is not distributed is solved, further, decoupling of the features of different types from multiple aspects is realized, and aspect information effective for prediction tasks in the features of each type is more accurately acquired, so that effective fusion of complex multiple features is facilitated.

Referring to fig. 4, fig. 4 is a flow chart of a feature gating network training method according to an embodiment of the present disclosure.

As shown in fig. 4, the feature gating network training method at least may include:

s402, sample characteristics of at least two characteristic types corresponding to sample input are obtained.

Optionally, for the training process of the single prediction network, firstly, sample features of at least two feature types corresponding to sample input are obtained, where a specific feature type is a specific feature type of a multi-element feature preset in a model modeling stage, so that when the sample features of the preset feature type are input to a model, semantic information decoupling can be performed in a corresponding semantic subspace through corresponding feature gating networks.

S404, based on a feature gating network corresponding to each feature type in the prediction model, each sample feature is respectively projected to a semantic subspace set corresponding to each feature type, and each sample feature is subjected to semantic remapping through each semantic subspace in each semantic subspace set, so that sample semantic features of each sample feature under each corresponding semantic subspace are obtained.

Optionally, in order to decouple semantic information from each sample feature at multiple angles, it is necessary to solve the problem of partial semantic subspace weight bias that occurs when using the full semantic subspace for semantic information decoupling. By assigning an independent feature gating network to each feature type, each feature gating network is used to assign features of each sample to a semantic subspace set corresponding to each randomly selected feature type.

The number of the semantic subspaces in each semantic subspace set is equal, and when the corresponding semantic subspaces in the sample feature distribution are subjected to information decoupling, the semantic remapping is carried out on each sample feature through each semantic subspace in each semantic subspace set, so that the sample semantic features of each sample feature under each corresponding semantic subspace are obtained, namely, the sample semantic features are feature representations of the corresponding semantic aspects of the sample features obtained in each semantic subspace.

Optionally, for easy understanding, each sample feature is semantically remapped through each semantic subspace in each semantic subspace set, so that when sample semantic features of each sample feature under each corresponding semantic subspace are obtained, the sample semantic features can satisfy the following formula:

wherein, the feature type is S ^p For example, feature type S ^p The corresponding sample is characterized byThe semantic subspace set is N _s ' the dimension of the semantic subspace is d _s Finally, the sample characteristic is obtained>Sample semantic feature under semantic subspace n is +.>And so on to obtain a semantic subspace set N _s Sample semantic features obtained after the sample features are subjected to semantic remapping by each semantic subspace in' the method.

S406, fusing all sample semantic features corresponding to the sample features to obtain sample decoupling features corresponding to the sample features.

Optionally, for the sample feature, obtaining the sample semantic feature after the sample feature is remapped by each semantic subspace in the semantic subspace set, and fusing all sample semantic features corresponding to each sample feature is required to obtain the sample decoupling feature corresponding to each sample feature. When all sample semantic features corresponding to the sample features are fused respectively, the weight distribution of the sample features corresponding to different semantic aspects is considered, for example, the influence of the information of the sample features of the feature type A on the target prediction task is more important than the information of the sample features of the feature type A on the semantic aspect C, and the weight required by the semantic subspace of the decoupling semantic aspect B is larger than the weight required by the semantic subspace of the decoupling semantic aspect C, so that the fact that the final sample decoupling features of the sample features have semantic information influencing the target prediction task is ensured, and the final prediction model can acquire enough important information to output an accurate prediction result.

Optionally, each semantic subspace in each semantic subspace set corresponding to each feature type has a feature weight allocated by the feature gating network corresponding to the feature type. Taking one sample feature as an example, multiplying each sample semantic feature corresponding to each semantic subspace by the feature weight of each corresponding semantic subspace to obtain a sample decoupling sub-feature corresponding to each sample semantic feature, and then respectively performing bit-wise addition calculation on all sample decoupling sub-features corresponding to each sample feature to obtain the sample decoupling feature corresponding to the sample feature.

Optionally, for example, to facilitate understanding, all sample semantic features corresponding to each sample feature are fused, and when sample decoupling features corresponding to each sample feature are obtained, the sample decoupling features satisfy the following formula:

wherein the feature type S ^p The corresponding feature gating network is denoted g ^p (. Cndot.) the feature weight of the semantic subspace n isThe value range of (1, 0) to obtain the sample characteristic +.>The corresponding sample decoupling features are

Further, when the feature weights are distributed by the feature gating network corresponding to each feature type, the following formula is satisfied:

wherein the characteristic gating network is g ^p (. Cndot.) the semantic subspace set is N _s The feature weight matrix formed by the feature weights distributed by each semantic subspace in' is as follows

S408, obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by the prediction model based on the sample fusion characteristics.

For step S408, please refer to the detailed description in step S204, and the detailed description is omitted here.

S4010, obtaining the prediction loss of the prediction model based on the standard prediction result and the sample prediction result which correspond to the sample input, and adjusting the feature weight matrix in each feature gating network based on the prediction loss until the prediction model converges.

Optionally, as a part of the prediction model, the feature gating network corresponding to each feature type also needs to learn how to better decouple the feature characterization of each feature type according to training, so that the prediction loss trains the prediction model, and is also used for training each feature gating network in the prediction model, so that each feature gating network adjusts own network parameters in the training process, and when the prediction model converges, the performance of each feature gating network is also converged to a target state. Specifically, for the feature gating network, in continuous training, the importance of semantic information in each aspect of each feature type is controlled by adjusting the weight matrix allocated to the corresponding semantic subspace set, so that the finally obtained sample decoupling features of each sample feature can have the semantic information required by the target prediction task, and then a better multi-feature fusion effect can be obtained by finally fusing a plurality of sample decoupling features.

In the embodiment of the specification, a feature gating network training method is provided, which is used for carrying out information decoupling on sample features of each feature type from different data sources, and decoupling integral and complex information contained in each sample feature into aspect feature information respectively corresponding to a plurality of semantic aspects, so that a downstream prediction model can strengthen attention on aspect semantic information most needed by a target prediction task without being interfered by unimportant aspect semantic information; corresponding feature gating networks are set for the feature types, so that the feature gating networks randomly allocate subspace subsets for the features of the feature types, and the subspace subsets are used for decoupling the features in multiple aspects, so that the problem of subspace weight imbalance in the process of not random subspace allocation is solved, the features of different types can be decoupled from multiple aspects, effective fusion of complex multi-element features is achieved, and further the final prediction model can obtain enough important information to output accurate prediction results.

Referring to fig. 5, fig. 5 is a flow chart of a feature fusion method according to an embodiment of the present disclosure.

As shown in fig. 5, the feature fusion method may at least include:

S502, obtaining to-be-fused features of at least two feature types corresponding to target input, respectively projecting each to-be-fused feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and obtaining to-be-fused decoupling features corresponding to each to-be-fused feature output by each semantic subspace set.

Optionally, in a practical scenario, a prediction model trained and converged by the feature gating network training method in any of the embodiments described above is deployed, and a feature gating network in the prediction model. And then acquiring at least two feature types to be fused corresponding to the target input, respectively projecting each feature to be fused to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in the prediction model, and obtaining a decoupling feature to be fused corresponding to each feature to be fused output by each semantic subspace set. At the moment, after information decoupling is carried out on the multiple features to be fused from each semantic aspect, deep information of each feature to be fused at multiple semantic angles is obtained, and the feature gating network in the prediction model solves the problem of subspace weight imbalance existing in the absence of random actions due to random semantic subspace set extraction actions in the training process, so that more effective decoupling features to be fused are finally obtained.

S504, obtaining target fusion characteristics corresponding to target input based on the decoupling characteristics to be fused, and determining a target prediction result output by the prediction model based on the target fusion characteristics.

Similarly, in an actual scenario, a target fusion feature corresponding to a target input may be obtained based on each decoupling feature to be fused, and at this time, a training convergence prediction model is used to output a corresponding target prediction result based on the target fusion feature, where a specific flow is a flow in which the feature gating network training method in any one of the embodiments of the foregoing description trains and converges the prediction model to obtain the prediction result, which is not described herein again. At the moment, the prediction model obtains effective decoupling characteristics to be fused through the characteristic gating network, so that the target fusion characteristics fused by the decoupling characteristics to be fused can effectively represent deep information of target input under multiple aspects of semantics, and further a more accurate prediction result can be obtained.

In an embodiment of the present disclosure, a feature fusion method is provided, where features to be fused of at least two feature types corresponding to a target input are obtained, each feature to be fused is projected to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, and a decoupling feature to be fused corresponding to each feature to be fused output by each semantic subspace set is obtained; and obtaining target fusion characteristics corresponding to the target input based on the decoupling characteristics to be fused, and determining a target prediction result output by the prediction model based on the target fusion characteristics. The corresponding feature gating network is set for each feature type, so that the feature gating network randomly distributes subspace subsets for the features of each feature type, and decouples various aspects of information of the features through the subspace subsets, so that the problem of subspace weight imbalance existing when the random subspace is not distributed is solved, further, decoupling of various different types of features from various aspects is realized, and aspect information effective for prediction tasks in various types of features is more accurately acquired, so that effective fusion of complex multi-element features is facilitated.

Referring to fig. 6, fig. 6 is a block diagram of a feature gating network training device according to an embodiment of the present disclosure. As shown in fig. 6, the feature gating network training apparatus 600 includes:

the single feature decoupling module 610 is configured to obtain sample features of at least two feature types corresponding to sample input, respectively project each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in the prediction model, so as to obtain sample decoupling features corresponding to each sample feature output by each semantic subspace set, where the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces;

the multiple feature fusion module 620 is configured to obtain a sample fusion feature corresponding to the sample input based on each sample decoupling feature, and determine a sample prediction result output by the prediction model based on the sample fusion feature;

and the gating network training module 630 is configured to obtain a prediction loss of the prediction model based on the standard prediction result and the sample prediction result corresponding to the sample input, and train each feature gating network based on the prediction loss until the prediction model converges.

Optionally, the single feature decoupling module 610 is further configured to project each sample feature to a semantic subspace set corresponding to each feature type, and perform semantic remapping on each sample feature through each semantic subspace in each semantic subspace set, so as to obtain a sample semantic feature of each sample feature under each corresponding semantic subspace; and fusing all sample semantic features corresponding to the sample features to obtain sample decoupling features corresponding to the sample features.

Optionally, the single feature decoupling module 610 is further configured to perform semantic remapping on each sample feature through each semantic subspace in each semantic subspace set, so as to obtain a sample semantic feature of each sample feature in each corresponding semantic subspace, where the sample semantic feature satisfies the following formula:

wherein the feature type corresponding to the sample feature is S ^p Feature type S ^p The corresponding sample is characterized byThe semantic subspace set is N _s ' the dimension of the semantic subspace is d _s Sample characterization->Sample semantic features at semantic subspace +n are

Optionally, the single feature decoupling module 610 is further configured to multiply each sample semantic feature by a feature weight of each semantic subspace corresponding to each sample semantic feature to obtain a sample decoupling sub-feature corresponding to each sample semantic feature, where the feature weight is allocated by a feature gating network corresponding to each feature type; and respectively carrying out bit-wise addition calculation on all sample decoupling sub-features corresponding to each sample feature to obtain the sample decoupling features corresponding to each sample feature.

Optionally, the single feature decoupling module 610 is further configured to fuse all sample semantic features corresponding to each sample feature to obtain a sample decoupling feature corresponding to each sample feature, where the sample decoupling feature satisfies the following formula:

Wherein the feature type S ^p The corresponding characteristic gating network is g ^p (. Cndot.) the feature weight of the semantic subspace n isObtaining sample characteristics->The corresponding sample decoupling feature is +.>

Optionally, the single feature decoupling module 610 is further configured to, when the feature weights are allocated by the feature gating network corresponding to each feature type, satisfy the following formula:

Optionally, the gating network training module 630 is further configured to adjust the feature weight matrix in each feature gating network based on the prediction loss.

Optionally, the multi-feature fusion module 620 is further configured to splice the sample decoupling features to obtain sample fusion features corresponding to the sample input.

In an embodiment of the present disclosure, a feature gating network training device is provided, where a single feature decoupling module is configured to obtain sample features of at least two feature types corresponding to a sample input, and project each sample feature to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, so as to obtain sample decoupling features corresponding to each sample feature output by each semantic subspace set, where the semantic subspace set is a subspace subset determined randomly by each feature gating network from all semantic subspaces; the multi-feature fusion module is used for obtaining sample fusion features corresponding to sample input based on the sample decoupling features and determining sample prediction results output by the prediction model based on the sample fusion features; and the gating network training module is used for obtaining the prediction loss of the prediction model based on the standard prediction result and the sample prediction result which correspond to the sample input, and training each characteristic gating network based on the prediction loss until the prediction model converges. Because the corresponding feature gating network is set for each feature type, the feature gating network randomly distributes subspace subsets for the features of each feature type, and decouples the multiple aspects of information of the features through the subspace subsets, so that the problem of subspace weight imbalance existing when the random subspace is not distributed is solved, further, decoupling of the features of different types from multiple aspects is realized, and aspect information effective for prediction tasks in the features of each type is more accurately acquired, so that effective fusion of complex multiple features is facilitated.

Referring to fig. 7, fig. 7 is a block diagram of a feature fusion device according to an embodiment of the present disclosure.

As shown in fig. 7, the feature fusion apparatus 700 includes:

the feature decoupling module 710 is configured to obtain features to be fused of at least two feature types corresponding to the target input, and project each feature to be fused to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in the prediction model, so as to obtain a feature to be fused decoupling corresponding to each feature to be fused output by each semantic subspace set;

the feature fusion module 720 is configured to obtain a target fusion feature corresponding to the target input based on each decoupling feature to be fused, and determine a target prediction result output by the prediction model based on the target fusion feature;

the prediction network is a prediction model obtained after training and converging the feature gating network training method in any embodiment of the specification, and the feature gating network is a feature gating network in the prediction model obtained after training and converging the feature gating network training method in any embodiment of the specification.

In an embodiment of the present disclosure, a feature fusion device is provided, where a feature decoupling module is configured to obtain features to be fused of at least two feature types corresponding to a target input, and project each feature to be fused to a semantic subspace set corresponding to each feature type based on a feature gating network corresponding to each feature type in a prediction model, so as to obtain a feature to be fused decoupling corresponding to each feature to be fused output by each semantic subspace set; and the feature fusion module is used for obtaining target fusion features corresponding to the target input based on the decoupling features to be fused and determining a target prediction result output by the prediction model based on the target fusion features. The corresponding feature gating network is set for each feature type, so that the feature gating network randomly distributes subspace subsets for the features of each feature type, and decouples various aspects of information of the features through the subspace subsets, so that the problem of subspace weight imbalance existing when the random subspace is not distributed is solved, further, decoupling of various different types of features from various aspects is realized, and aspect information effective for prediction tasks in various types of features is more accurately acquired, so that effective fusion of complex multi-element features is facilitated.

The present description provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the steps of the method of any of the above embodiments.

The present description also provides a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to carry out the steps of the method according to any of the embodiments described above.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure. As shown in fig. 8, the terminal 800 may include: at least one terminal processor 801, at least one network interface 804, a user interface 803, memory 805, at least one communication bus 802.

Wherein a communication bus 802 is used to enable connected communication between these components.

The user interface 803 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 803 may further include a standard wired interface and a wireless interface.

The network interface 804 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the terminal processor 801 may comprise one or more processing cores. The terminal processor 801 connects various parts within the entire terminal 800 using various interfaces and lines, performs various functions of the terminal 800 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 805, and invoking data stored in the memory 805. Alternatively, the terminal processor 801 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The terminal processor 801 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the terminal processor 801 and may be implemented by a single chip.

The Memory 805 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Optionally, the memory 805 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 805 may be used to store instructions, programs, code, sets of codes, or instruction sets. The memory 805 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 805 may also optionally be at least one storage device located remotely from the aforementioned terminal processor 801. As shown in FIG. 8, an operating system, a network communication module, a user interface module, a feature gating network training program, and/or a feature fusion program may be included in memory 805, which is a type of computer storage medium.

In the terminal 800 shown in fig. 8, the user interface 803 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the terminal processor 801 may be configured to invoke the feature gating network training program stored in the memory 805 and specifically perform the following operations:

obtaining sample fusion characteristics corresponding to sample input based on the sample decoupling characteristics, and determining a sample prediction result output by a prediction model based on the sample fusion characteristics;

and obtaining the prediction loss of the prediction model based on the sample input corresponding standard prediction result and the sample prediction result, and training each characteristic gating network based on the prediction loss until the prediction model converges.

In some embodiments, when the terminal processor 801 performs projection of each sample feature onto a semantic subspace set corresponding to each feature type, to obtain a sample decoupling feature corresponding to each sample feature output by each semantic subspace set, the following steps are specifically performed: projecting each sample feature to a semantic subspace set corresponding to each feature type, and carrying out semantic remapping on each sample feature through each semantic subspace in each semantic subspace set to obtain sample semantic features of each sample feature under each corresponding semantic subspace; and fusing all sample semantic features corresponding to the sample features to obtain sample decoupling features corresponding to the sample features.

In some embodiments, when executing the semantic remapping of each sample feature through each semantic subspace in each semantic subspace set, the terminal processor 801 obtains a sample semantic feature of each sample feature in each corresponding semantic subspace, where the sample semantic feature satisfies the following formula:

In some embodiments, when the terminal processor 801 performs fusion of all sample semantic features corresponding to each sample feature to obtain a sample decoupling feature corresponding to each sample feature, the following steps are specifically performed: multiplying each sample semantic feature by the feature weight of each semantic subspace corresponding to each sample semantic feature to obtain a sample decoupling sub-feature corresponding to each sample semantic feature, wherein the feature weights are distributed by a feature gating network corresponding to each feature type; and respectively carrying out bit-wise addition calculation on all sample decoupling sub-features corresponding to each sample feature to obtain the sample decoupling features corresponding to each sample feature.

In some embodiments, when the terminal processor 801 performs fusion of all sample semantic features corresponding to each sample feature to obtain a sample decoupling feature corresponding to each sample feature, the sample decoupling feature satisfies the following formula:

In some embodiments, the terminal processor 801, when performing the allocation of feature weights by the feature gating network corresponding to each feature type, satisfies the following equation:

In some embodiments, the terminal processor 801, when performing training of feature gating networks based on predictive loss, specifically performs the following steps: and adjusting a feature weight matrix in each feature gating network based on the prediction loss.

In some embodiments, the terminal processor 801, when executing the sample fusion feature corresponding to the sample input based on each sample decoupling feature, specifically performs the following steps: and splicing the sample decoupling characteristics to obtain sample fusion characteristics corresponding to the sample input.

In the terminal 800 shown in fig. 8, the user interface 803 is mainly used for providing an input interface for a user, and acquiring data input by the user; the terminal processor 801 may also be used to invoke a feature fusion program stored in the memory 805 and specifically perform the following operations:

obtaining target fusion characteristics corresponding to target input based on the decoupling characteristics to be fused, and determining a target prediction result output by the prediction model based on the target fusion characteristics;

In the several embodiments provided in this specification, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the computer program instructions described above are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present specification are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a digital versatile Disk (Digital Versatile Disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In addition, it should be further noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The foregoing describes a method, apparatus and storage medium for training a feature gating network and feature fusion according to embodiments of the present disclosure, and is not to be construed as limiting the embodiments of the present disclosure, since modifications in terms of implementation and application may be apparent to those skilled in the art from the concepts of the embodiments of the present disclosure.

Claims

1. A method of feature gating network training, the method comprising:

2. The method of claim 1, wherein projecting each sample feature onto a semantic subspace set corresponding to each feature type to obtain a sample decoupling feature corresponding to each sample feature output by each semantic subspace set, includes:

projecting each sample feature to a semantic subspace set corresponding to each feature type, and carrying out semantic remapping on each sample feature through each semantic subspace in each semantic subspace set to obtain sample semantic features of each sample feature under each corresponding semantic subspace;

and fusing all sample semantic features corresponding to the sample features to obtain sample decoupling features corresponding to the sample features.

3. The method of claim 2, wherein the semantic remapping is performed on each sample feature through each semantic subspace in each semantic subspace set, so as to obtain a sample semantic feature of each sample feature in each corresponding semantic subspace, where the sample semantic feature satisfies the following formula:

wherein the feature type corresponding to the sample feature is S ^p The feature type S ^p The corresponding sample features areThe semantic subspace set is N _s ' the dimension of the semantic subspace is d _s The sample feature->Sample semantic feature under semantic subspace +n is +.>

4. The method according to claim 3, wherein the fusing all sample semantic features corresponding to each sample feature to obtain a sample decoupling feature corresponding to each sample feature includes:

multiplying each sample semantic feature by the feature weight of each semantic subspace corresponding to each sample semantic feature to obtain a sample decoupling sub-feature corresponding to each sample semantic feature, wherein the feature weights are distributed by a feature gating network corresponding to each feature type;

and respectively carrying out bit-wise addition calculation on all sample decoupling sub-features corresponding to each sample feature to obtain the sample decoupling features corresponding to each sample feature.

5. The method of claim 4, wherein the fusing is performed on all sample semantic features corresponding to each sample feature to obtain sample decoupling features corresponding to each sample feature, and the sample decoupling features satisfy the following formula:

wherein the feature type S ^p The corresponding characteristic gating network is g ^p (. Cndot.) the feature weights of the semantic subspace n areObtaining the sample characteristics->The corresponding sample decoupling feature is +.>

6. The method of claim 5, wherein the feature weights when assigned by the feature gating network corresponding to each feature type satisfy the following equation:

Wherein the characteristic gating network is g ^p (. Cndot.) N for the semantic subspace set _s The feature weight matrix formed by the feature weights distributed by each semantic subspace in' is as follows

7. The method of claim 6, the training feature gating networks based on the predicted loss, comprising:

and adjusting a feature weight matrix in each feature gating network based on the prediction loss.

8. The method of claim 1, the deriving sample fusion features corresponding to the sample inputs based on each sample decoupling feature, comprising:

and splicing the sample decoupling characteristics to obtain sample fusion characteristics corresponding to the sample input.

9. A method of feature fusion, the method comprising:

The prediction network is a prediction model obtained after training and converging the feature gating network training method according to any one of claims 1 to 8, and the feature gating network is a feature gating network in the prediction model obtained after training and converging the feature gating network training method according to any one of claims 1 to 8.

10. A feature-gated network training device, the device comprising:

11. A feature fusion device, the device comprising:

12. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the steps of the method of any of claims 1 to 8 or 9.

13. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method according to any one of claims 1 to 8 or 9.

14. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 8 or 9 when the program is executed.