CN116127175A - Mobile application classification and recommendation method based on multi-modal feature fusion - Google Patents

Mobile application classification and recommendation method based on multi-modal feature fusion Download PDF

Info

Publication number
CN116127175A
CN116127175A CN202210751368.6A CN202210751368A CN116127175A CN 116127175 A CN116127175 A CN 116127175A CN 202210751368 A CN202210751368 A CN 202210751368A CN 116127175 A CN116127175 A CN 116127175A
Authority
CN
China
Prior art keywords
mobile application
layer
features
model
embedded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210751368.6A
Other languages
Chinese (zh)
Inventor
曹步清
钟为是
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Science and Technology
Original Assignee
Hunan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Science and Technology filed Critical Hunan University of Science and Technology
Priority to CN202210751368.6A priority Critical patent/CN116127175A/en
Publication of CN116127175A publication Critical patent/CN116127175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a mobile application classification and recommendation method based on multi-mode feature fusion, which comprises the following steps: (1) a mobile application feature extraction layer; (2) a mobile application classification layer; (3) a mobile application recommendation layer. The invention belongs to the technical field of computer networks, and particularly relates to a mobile application classification and recommendation method based on multi-mode feature fusion, which has better recommendation precision and quality and is superior to other methods in indexes such as Macro F1, accurac, AUC, loglos and the like.

Description

Mobile application classification and recommendation method based on multi-modal feature fusion
Technical Field
The invention belongs to the technical field of computer networks, and particularly relates to a mobile application classification and recommendation method based on multi-mode feature fusion.
Background
According to Statista statistics, the application quantity of Chinese mobile phones is close to 399 ten thousand by 2021, and the Chinese mobile phones are first worldwide. The rich applications of electronic commerce, online take-out, games, self-media and the like can comprehensively influence the clothing and eating residence of people, and the life style of people is changed. In recent years, the number of mobile applications on the internet has grown exponentially. In the face of these massive mobile applications, although there are already a large number of sample data available for training, when there is new data to process, the problems of cold start, data sparseness and the like still face. How to train the model by using the existing large-scale classified data samples has the main problem of selecting a proper model. When a new mobile application appears, the mobile application contains information such as pictures, descriptions and publishers. On the one hand, for practitioners, it is difficult for them to perform overall Bench Mark and related analysis on the mobile application market, so that the mobile application needs to be accurately classified to complete subsequent tasks such as risk control, data analysis and the like; on the other hand, it is difficult for users to select a mobile application that is suitable for their own personalized preferences and needs. Therefore, it is necessary to provide a high-quality mobile application recommendation mechanism, so as to improve the user's good experience.
Traditional mobile application classification methods, such as a multi-layer perceptron and a support vector machine, wherein the performance of most classification models depends on the quality of the labeling data set, and acquiring high-quality labeling data requires a great deal of labor cost. However, the method depends on manual design, is affected by human factors, has poor popularization capability, and has excellent characteristics in one field and not necessarily other fields. Traditional mobile application recommendation methods, such as collaborative filtering and matrix decomposition, generally convert mobile application recommendation problems into supervised learning problems. Essentially, such models are first embedded into the user and application programs, respectively, and then the interaction information between them is used to optimize the model and to execute recommendations. These methods perform well in many recommended and ranked tasks. However, the above methods also suffer from drawbacks, for example, they are sensitive to sparse data, have limited predictive power for new users, and only learn linear interactions between users and services.
With the growth of multimodal data over networks, content information from different modalities (visual, auditory, etc.) has recently been used to provide complementary signature signals for traditional text features. Most of the existing research in this area focuses on emotion classification in conversations. Specifically, pora et al propose a multi-core learning method and LSTM based sequential architecture to fuse text features, visual features, and audio features, respectively, in 2015 and 2017. According to this work, zaeh et al and zaeh et al further designed tensor fusion networks and memory fusion networks to better capture interactions between different modes. However, these approaches are designed for coarse-grained classification, which may not be very effective for our fine-grained, object-oriented mobile application classification.
Disclosure of Invention
In order to solve the problems, the invention provides a mobile application classification and recommendation method based on multi-mode feature fusion, which has better recommendation precision and quality and is superior to other methods in indexes such as Macro F1, accurac, AUC, loglos and the like.
In order to realize the functions, the novel technical scheme adopted by the invention is as follows: a mobile application classification and recommendation method based on multi-modal feature fusion comprises the following steps:
(1) Mobile application feature extraction layer
Extracting a set of multimodal samples D from the mobile application dataset, for each sample c e D, comprising a sentence S of n mobile application descriptive information words (w 1, …, wn) and an associated mobile application image I; taking the D as a training corpus, training and learning in a mobile application classifier, and correctly predicting the class labels of the mobile application in a sample which is not learned; after the initial normalization and the self-coding tokenization preprocessing are completed, extracting the mobile application description characteristics by using a Bert model in a characteristic extraction layer, and extracting the image characteristics by using a residual error network (RedNet) of an inner coil module;
(2) Mobile application classification layer
Distinguishing and fusing the feature importance of different modes by using a self-attention and multi-head attention mechanism in a transducer, and classifying the mobile application according to the fused feature information by using a Softmax classifier;
(3) Mobile application recommendation layer
Inputting the classified data into a FiBiNet model according to the category of the data, and dynamically learning the importance of the features through the relation between the weight fitting features and the samples; for more important features, more weight will be given and the weight of non-critical features will be weakened; utilizing bilinear operation to simultaneously consider the importance of each dimension so as to finish mobile application recommendation; the upper half part of the FiBiNet model is a deep part, mainly an MLP network integrates the output connection of a bilinear interaction layer into a dense vector through a connection layer, then cross combination characteristics are input into a neural network, and a prediction score is obtained in a prediction layer; the lower shallow part is the core of the FiBiNet and mainly processes the input features.
Further, the extracting the characteristics described in the step 1 includes the following steps:
selecting a pre-trained double expected library BERT as an initial model, and adjusting and learning parameters of the initial model in a Fine-Tune mode; converting each position in the input sequence into a weighted sum of the input layers using a multi-headed self-attention layer; specifically, for the ith head note, the input layer X εRd N is transformed based on the dot product attention mechanism:
Figure SMS_1
wherein ,{WQi ,W Ki ,W Vi }∈R d/m×d Are learnable parameters corresponding to the query, key, and value, respectively; then, the outputs of the m attention mechanisms are connected in series for linear transformation;
characterizing the description information of each mobile application in a self-coding mode and inputting the description information into a pre-trained BERT; in addition to token of the word, a specific classification token ([ CLS ]) is inserted at the beginning of each sequence of the input, the last transducer layer output corresponding to the classification token is used to aggregate the whole sequence characterization information, and the [ CLS ] vector and the extracted semantic vector are reserved as output O to improve model accuracy:
O=[H 0 ,H [CLS] ]
and then, carrying out linear change on the output O through a Softmax function to obtain D-dimension N-dimension text information of the mobile application I and finally representing the vector HS.
Further, the image feature extraction in the step 1 includes the following steps:
inner coil core H ,j The e Rv is generated by a function phi, taking a single pixel at (i, j) as a condition, then rearranging channels to space, decomposing a closed multiply-add operation into two steps, wherein the multiply operation is to multiply tensors of C channels with an inner core H respectively, the add operation is to add elements in the range of the inner core to the inner core, the inner core is specially customized by pixels Xi, j at corresponding coordinates (i, j), but shared on the channels, G calculates the number of groups of the inner core, which are shared by each group, and the multiply-add operation is performed by using inner core check input, so that the characterization output definition of the inner core module is obtained:
Figure SMS_2
the kernel generation function is signed as phi and the function mapping for each location (i, j) is abstracted as:
Figure SMS_3
inputting the mobile application image I in the data set into the visual model RedNet-152 to obtain the final layer convolution layer output:
ResNet(I)={r j ∣r j ∈R 2048 ,j=1,2,...,49}
the original mobile application image is segmented into 7×7=49 regions, each represented by a 2048-dimensional vector rj, the mobile application visual features are projected to the same space of text features using a linear transformation function: g=wvranet (I), where Wv e rd×2048 is a learnable parameter, and then linearly varying the output resanet (I) by Softmax function, resulting in a final characterization vector g=wtresanet (I) of d×2048 dimensions of mobile application image information.
Further, the transform formula in the step 2 is as follows:
Figure SMS_4
wherein Lm is the number of layers of the multi-mode encoder, and the final hidden state marked by "[ CLS ]" is used for the mobile application classification task to effectively capture dynamic attention within and between modes of the mobile application.
The softmax generalized function formula is as follows:
Figure SMS_5
further, the shallow layer part in the step 3 includes the following steps:
the classified mobile application is input into an initial embedding layer in the FibiNet according to the category, sparse features can be embedded into low-dimensional continuous real-valued vectors, the sparse matrix is converted into a dense matrix through linear transformation, hidden features of the matrix are extracted, and generalization capability of the model is improved. The output of the embedded layer is expressed as follows:
E=[e 1 ,e 2 ,..,e i ,…,e]
and introducing a SENET network to perform training learning, obtaining the embedding weight and outputting a final embedding result. And performing dimension reduction operation on the embedded features obtained in the embedded layer to obtain global features. Then, the Sigmoid activating operation is carried out on the embedded weights, and the relation between each embedded weights are learned to obtain the embedded weights of different domains. And finally, multiplying the original embedding results to obtain a final embedding result.
Further, the dimension reduction includes the following steps:
compressing the original embedded E to a statistical vector z= [ Z ] by using an average pooling operation 1 ,…,z i ,…,Z f ]In z i The calculation can be performed by the following formula:
Figure SMS_6
wherein ,zi Is global information about the i-th feature representation, k is the embedding size.
Further, the activating includes the steps of:
the embedded weights for each domain are learned based on the statistical vector Z and two full-connected layer learning weights are used. The first fully connected layer is the parameter W 1 Is used for reducing dimension of (a) using sigma 1 As a nonlinear function. The second full connection layer is achieved by using the parameter W 2 The dimension is added to restore the original dimension. Formally, the domain-embedded weights can be calculated as follows:
A=σ 2 (W 2 σ 1 (W 1 Z))
wherein
Figure SMS_7
To characterize the vector, σ 1 and σ2 Is an activation function.
Further, the re-weighting includes the steps of:
each field of the embedding layer is multiplied by a corresponding weight to obtain a final embedding result v= { V 1 ,…,V f }. The overall operation can be seen as learning the weight coefficients embedded per domain, which makes the model more discriminative to the features embedded per domain. The SENET mechanism is utilized to increase the weight of important features and reduce the weight of the features with insufficient information, so that the output V of the SENET layer is obtained, and the output V is expressed as follows:
V=[a 1 ·e 1 ,...,a f ·e f ]=[v 1 ,...,v f ]
after obtaining the mobile application characterization embedding of the initial embedding layer and the SENET layer, performing second-order and higher-order feature interaction on sparse features and dense features;
the interaction vectors p and q of the output E of the embedded layer and the output V of the SENET layer are obtained through calculation:
p ij =v i ·W ij ⊙v j
p=[p 1 ,...,p i ,...,p n ]
q=[q 1 ,...,q i ,...,q n ]
and connecting the two obtained interaction vectors and inputting the two interaction vectors into a deep part.
Further, the deep part calculation formula in the step 3 is as follows:
Figure SMS_8
wherein ,
Figure SMS_9
recommended predicted values are applied to the model movement of the section epsilon (0, 1), sigma is a sigmoid function, m is a characteristic size, and the rest is a linear regression part;
optimization objective function was recommended using logoss as model:
Figure SMS_10
where y is the actual tag of the ith mobile application,
Figure SMS_11
n is the total number of mobile applications corresponding to the predictive label of the ith mobile application.
The invention adopts the structure to obtain the beneficial effects as follows:
1. the residual error network held by the inner coil module is introduced into the mobile application image feature extraction for the first time, so that the attention to the local features in the mobile application Logo image is facilitated, and the image feature extraction performance is improved;
2. the importance of feature dynamics of different modes is learned by using an attention mechanism, feature interaction is learned in a fine granularity mode, and the accuracy of service classification and recommendation is improved;
3. the performance of the method provided by the invention in terms of Macro F1, accurac, AUC and Loglos is superior to that of all comparison models.
Drawings
FIG. 1 is a method framework diagram of a mobile application classification and recommendation method based on multi-modal feature fusion provided by the invention;
FIG. 2 is a diagram of a FiBiNet model of the mobile application classification and recommendation method based on multimodal feature fusion provided by the invention;
FIG. 3 is a view of mobile application classification Accurcy of the method for mobile application classification and recommendation based on multi-modal feature fusion provided by the invention;
FIG. 4 is a Macro-F1 diagram of mobile application classification and recommendation method based on multi-modal feature fusion;
FIG. 5 is a graph of mobile application recommendation Loglos for the mobile application classification and recommendation method based on multimodal feature fusion provided by the invention;
fig. 6 is a mobile application recommendation AUC chart of the mobile application classification and recommendation method based on multi-modal feature fusion provided by the invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the mobile application classification and recommendation method based on multi-modal feature fusion provided by the invention mainly comprises three parts: (1) The mobile application feature extraction layer is used for extracting features of images and description information of the mobile application node; (2) The mobile application classification layer uses a self-attention and multi-head attention mechanism in a transducer to distinguish and fuse the feature importance of different modes, and uses a Softmax classifier to classify mobile applications according to the fused feature information; (3) And the mobile application recommendation layer inputs the classified data into a FiBiNet model according to the classification of the data, and dynamically learns the importance of the features by fitting the relation between the features and the samples through weights. For more important features, more weight will be given and the weight of non-critical features will be weakened; mobile application recommendation is accomplished using bilinear operations while considering the importance of each dimension.
As shown in fig. 2, the upper half of the bilinear feature interaction model (FiBiNet) is a deep layer part, mainly, the MLP network integrates the output connection of the bilinear interaction layer into a dense vector through a connection layer, and then inputs the cross combination features into the neural network to obtain a prediction score at a prediction layer; the lower shallow part is the core of the FiBiNet and mainly processes the input features. First, in the lower left part of the graph, high-dimensional sparse input features (Sparse features of APP) are mapped to low-dimensional dense vector representations after passing through an initial embedding layer, and the vector representations are embedded with the importance of the dynamic learning features passing through a SENET layer, so that SENET-Like embedding is obtained. And then, respectively inputting the initial characterization embedding and the SENET-Like embedding into a bilinear interaction layer for feature intersection, and finally inputting the output intersection feature establishment into the MLP to finish mobile application recommendation.
Specific example 1:
1. mobile application classification experiment and analysis
The top 5, 10, 15, and 20 categories with the largest number of mobile applications were selected as experimental data, with the distribution of the top 20 categories with the largest number of mobile applications shown in table 1. 60% of the experimental data were selected as training set, 20% as validation set, and 20% as test set.
Table 1K AGGLE Data set information
Figure SMS_12
2. Mobile application classification experiment and analysis
(1) Evaluation index
To evaluate the effectiveness of mobile application classification, two commonly used evaluation criteria were used in the experiment, namely Macro F1 and Accuracy.
Accuracy: the ratio of the number of times of determination to the number of times of all determinations is shown. The number of times of correct judgment is the sum of the true example TP and the true negative example TN, the number of times of all judgment is the sum of four judgment possibilities (false positive example FP, false negative example FN, true example TP, true negative example TN), and the calculation formula of Accuracy is as follows:
Figure SMS_13
macro F1: by calculating the recall rate (Rec i ) And accuracy (Pre) i ) Get the average recall of all N categories (Rec ma ) And average accuracy (Pre) ma ) Finally, macro F1 is calculated. Wherein, recall rate Rec i Describing the proportion of correctly classified mobile applications to all such mobile applications; accuracy Pre i The proportion of mobile applications that do belong to that class in the final classification result of the description model. Macro F1 is a harmonic mean of recall and accuracy, and the calculation formula is as follows:
Figure SMS_14
(2) Contrast method
TResBert: the character part is characterized by text features and position codes extracted by BERT, the picture part is input by adding corresponding position codes to picture region features extracted by original ResNet, vector splicing operation is carried out on the two characterization vectors, the two characterization vectors are input into an Encoder layer in a transducer, the weights among multiple modes and among the modes are dynamically distributed by using the attention mechanism of the characterization vectors, and finally the mobile application is classified according to the final characterization by a Softmax classifier.
Res-bert: the character part is characterized by text features and position codes extracted by BERT, the picture part is input by adopting picture region features extracted by original ResNet and corresponding position codes, the two characterization vectors are subjected to vector splicing operation only, and the two characterization vectors are directly input into a Softmax classifier to obtain mobile application classification.
Red-bert: the character part is characterized by text features and position codes extracted by BERT, the picture part is input by adopting picture region features extracted by an inner coil residual error network RedNet and corresponding position codes, the two characterization vectors are subjected to vector splicing operation only, and the two characterization vectors are directly input into a Softmax classifier to obtain mobile application classification.
Bert: mobile applications are classified only by the mobile application description features extracted by Bert.
(3) Experimental results and analysis
The relevant parameter settings include: the BatchSize was 32, the learning rate was 5e-5, and the hot start rate was 0.1. The experimental results of all methods are shown in tables 2 and 3 and fig. 3 and 4, and it can be found that:
when the data is preprocessed, the mobile application text data is not subjected to desensitization, and only the mobile application document is subjected to token and self-coding processing, so that the overall experimental precision is not high.
Of all the comparison methods, only Bert was used with the worst performance. Namely, the precision of classifying the mobile application by using the text information alone is the worst, so that the related information between different mode data, such as image-text feature interaction, is more finely mined, the model is enabled to establish the correlation between words and object objects, and the multi-mode pre-training model and the single-mode model can obtain better precision under the same experimental setting.
In most cases, the model accuracy of mobile application image feature extraction using resolution instead of CNN is higher, thus it is seen that the importance level of different features can be better distinguished using the attention mechanism when multi-modal features are fused.
Overall, TRedBert maintains better performance. In particular, TRedBert has 50.77%, 66.55%, 76.75% and 83.6% improvement over TResbert, redbert, resbert and Bert on Accuracy, respectively, when the category number is 20. Compared with a model using vector stitching only, the model using the transducer for feature fusion has higher precision, so that the importance degree of different features can be better distinguished by using an attention mechanism during multi-mode feature fusion, and the characterization fine granularity is closer to a downstream task.
Table 2 mobile application classification Accuracy
Figure SMS_15
Table 3 mobile application class M ACRO -F1
Figure SMS_16
3. Mobile application recommendation experiment and analysis
(1) Evaluation index
AUC: generally, for binary classification problems, we can set a threshold to classify the sample into positive and negative classes. And calculating corresponding coordinate points in the ROC according to different thresholds to form an ROC curve. AUC is the area under the ROC curve. When 0.5< auc <1, the model is superior to the random classifier. In particular, the closer the AUC is to 1.0, the higher the authenticity; when it is equal to 0.5, the authenticity is the lowest, and the calculation formula is as follows:
Figure SMS_17
where fpr represents the false positive rate and tpr represents the true positive rate. In ROC space, the coordinate points describe the trade-off between FP (false positive case) and TP (true positive case).
Logloss: the accuracy of the classifier is measured by punishing the classification of errors. Minimizing the log loss is substantially equivalent to maximizing the accuracy of the classifier. The average deviation of the samples is reflected by the loglos, and is often used as a model to optimize the loss function, and the calculation formula is as follows:
Figure SMS_18
(4) Contrast method
MLR: LR is a regression analysis that models the relationship between one or more independent and dependent variables using a least squares function called a linear regression equation. LR cannot fit nonlinear data, and MLR can fit nonlinear data by multivariate variables.
FNN: the FNN model only comprises a deep part for mobile application of high-order feature extraction, and interaction between spliced features is achieved, low-order features cannot be fitted due to the lack of a low part (machine learning model), and a pre-training model is needed.
AFM: AFM introduces an attention mechanism into the factorer model, which can give weights to different feature combinations. The overall idea is to give different attention to different combinations of mobile application features and refine the processing of the cross features.
NFM: the neural factorization machine is a neural networking attempt of an FM model, and the expression capacity of the model is enhanced by taking a second-order cross term of FM as an input of the Deep model.
Deep fm: deep fm is divided into two parts, wide & Deep. The Wide part extracts low-order features from FM and the Deep part extracts high-order features from DNN. In the mobile application recommendation scenario, either the low-order combination feature or the high-order combination feature may have an influence on the final recommendation result. Therefore, it is most important to learn the feature combinations underlying the user's click behavior.
(5) Mobile application recommended experimental result
The relevant parameter settings include: test_Size is 0.2, learning rate is 1e-5, and batch_Size is 32. The experimental results of all methods are shown in tables 4, 5, 6, and 7 and fig. 5 and 6, and it can be found that:
when the number of categories of the data set is increased and other experimental settings are unchanged, the overall recommendation performance is reduced along with the increase of the categories, particularly the FM-like model, and the feature interaction performance of the factorizer is also reduced along with the increase of the sparseness of the feature matrix, but the performance gap is not obvious when the number of the categories reaches more than 15 due to the larger data set.
In all comparative methods, the performance of MLR and AFM was poor. This is because they cannot learn higher order interaction features, resulting in the performance of mobile application recommendations being impacted. The NFM model and the deep FM model have better overall performance, which shows that the learned low-order and high-order characteristic interaction is beneficial to the improvement of recommended quality.
The performance of the depth models such as NFM and deep FM is superior to that of MLR. When the input is 20 categories, the performance of FNN and deep FM is improved by 15.88% and 13.72%, respectively. The results show that the depth model can better model and mine effective information when features are sparse.
Overall, tredbert+fibinet maintains better performance. In particular, fiBiNET has a 166.55%, 20.83%, 26.75% and 113.6% improvement in AUC compared to AFM, deepFM, NFM and MLR, respectively, when the number of categories is 20. Therefore, the importance degree of the multidimensional mobile application features and the learning of the fine-granularity high-low order feature interaction are distinguished through the attention mechanism, so that a model considering the high-low order feature interaction can obtain better recommendation performance under the same experimental setting.
Table 4 mobile application recommendation under five categories
Figure SMS_19
Table 5 mobile application recommendation under ten categories
Figure SMS_20
Figure SMS_21
Table 6 mobile application recommendation under fifteen categories
Figure SMS_22
Table 7 mobile application recommendation under twenty categories
Figure SMS_23
The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (9)

1. The mobile application classification and recommendation method based on the multi-mode feature fusion is characterized by comprising the following steps:
(1) Mobile application feature extraction layer
Extracting a set of multimodal samples D from the mobile application dataset, for each sample c e D, comprising a sentence S of n mobile application descriptive information words (w 1, …, wn) and an associated mobile application image I; taking the D as a training corpus, training and learning in a mobile application classifier, and correctly predicting the class labels of the mobile application in a sample which is not learned; after the initial normalization and the self-coding tokenization preprocessing are completed, extracting the mobile application description characteristics by using a Bert model in a characteristic extraction layer, and extracting the image characteristics by using a residual error network (RedNet) of an inner coil module;
(2) Mobile application classification layer
Distinguishing and fusing the feature importance of different modes by using a self-attention and multi-head attention mechanism in a transducer, and classifying the mobile application according to the fused feature information by using a Softmax classifier;
(3) Mobile application recommendation layer
Inputting the classified data into a FiBiNet model according to the category of the data, and dynamically learning the importance of the features through the relation between the weight fitting features and the samples; for more important features, more weight will be given and the weight of non-critical features will be weakened; utilizing bilinear operation to simultaneously consider the importance of each dimension so as to finish mobile application recommendation; the upper half part of the FiBiNet model is a deep part, mainly an MLP network integrates the output connection of a bilinear interaction layer into a dense vector through a connection layer, then cross combination characteristics are input into a neural network, and a prediction score is obtained in a prediction layer; the lower shallow part is the core of the FiBiNet and mainly processes the input features.
2. The mobile application classification and recommendation method based on multi-modal feature fusion according to claim 1, wherein the extracting of the description features in step 1 includes the steps of:
selecting a pre-trained double expected library BERT as an initial model, and adjusting and learning parameters of the initial model in a Fine-Tune mode; converting each position in the input sequence into a weighted sum of the input layers using a multi-headed self-attention layer; specifically, for the ith head note, the input layer X εRd N is transformed based on the dot product attention mechanism:
Figure QLYQS_1
wherein ,{WQi ,W Ki ,W Vi }∈R d/m×d Are learnable parameters corresponding to the query, key, and value, respectively; then, the outputs of the m attention mechanisms are connected in series for linear transformation;
characterizing the description information of each mobile application in a self-coding mode and inputting the description information into a pre-trained BERT; in addition to token of the word, a specific classification token ([ CLS ]) is inserted at the beginning of each sequence of the input, the last transducer layer output corresponding to the classification token is used to aggregate the whole sequence characterization information, and the [ CLS ] vector and the extracted semantic vector are reserved as output O to improve model accuracy:
O=[H 0 ,H [CLS] ]
and then, carrying out linear change on the output O through a Softmax function to obtain D-dimension N-dimension text information of the mobile application I and finally representing the vector HS.
3. The mobile application classification and recommendation method based on multi-modal feature fusion according to claim 2, wherein the image feature extraction in step 1 comprises the steps of:
inner coil core H ,j E Rv by function
Figure QLYQS_2
Generating, taking single pixel at (i, j) as a condition, then rearranging channels to space, decomposing the combined multiplication and addition operation into two steps, wherein the multiplication operation is to multiply tensors of C channels with an inner core H respectively, the addition operation is to add elements in the range of the inner core to the inner core, the inner core is specially customized by pixels Xi, j at corresponding coordinates (i, j), but shared on the channels, G calculates the number of groups of the inner core, which are shared by each group, and the multiplication and addition operation is carried out by using inner core check input, so that the characterization output definition of the inner core module is defined as follows:
Figure QLYQS_3
symbolize a kernel generation function as
Figure QLYQS_4
And abstracting the function mapping for each location (i, j) as:
H i,j =φ(X Ψi,j )
inputting the mobile application image I in the data set into the visual model RedNet-152 to obtain the final layer convolution layer output:
ResNet(I)={r j ∣r j ∈R 2048 ,j=1,2,...,49}
the original mobile application image is segmented into 7×7=49 regions, each represented by a 2048-dimensional vector rj, the mobile application visual features are projected to the same space of text features using a linear transformation function: g=wvranet (I), where Wv e rd×2048 is a learnable parameter, and then linearly varying the output resanet (I) by Softmax function, resulting in a final characterization vector g=wtresanet (I) of d×2048 dimensions of mobile application image information.
4. The method for classifying and recommending mobile applications based on multi-modal feature fusion according to claim 3, wherein the transform formula in step 2 is as follows:
Figure QLYQS_5
wherein Lm is the number of layers of the multi-mode encoder, and the final hidden state marked by "[ CLS ]" is used for the mobile application classification task to effectively capture dynamic attention within and between modes of the mobile application.
The softmax generalized function formula is as follows:
Figure QLYQS_6
5. the method for classifying and recommending mobile applications based on multi-modal feature fusion according to claim 4, wherein the shallow part in step 3 comprises the steps of:
the classified mobile application is input into an initial embedding layer in the FibiNet according to the category, sparse features can be embedded into low-dimensional continuous real-valued vectors, the sparse matrix is converted into a dense matrix through linear transformation, hidden features of the matrix are extracted, and generalization capability of the model is improved. The output of the embedded layer is expressed as follows:
E=[e 1 ,e 2 ,..,e i ,…,e f ]
and introducing a SENET network to perform training learning, obtaining the embedding weight and outputting a final embedding result. And performing dimension reduction operation on the embedded features obtained in the embedded layer to obtain global features. Then, the Sigmoid activating operation is carried out on the embedded weights, and the relation between each embedded weights are learned to obtain the embedded weights of different domains. And finally, multiplying the original embedding results to obtain a final embedding result.
6. The mobile application classification and recommendation method based on multi-modal feature fusion of claim 5, wherein the dimension reduction comprises the steps of:
compressing the original embedded E to a statistical vector z= [ Z ] by using an average pooling operation 1 ,…,z i ,…,Z f ]In z i The calculation can be performed by the following formula:
Figure QLYQS_7
wherein ,zi Is global information about the i-th feature representation, k is the embedding size.
7. The mobile application classification and recommendation method based on multimodal feature fusion as claimed in claim 6, wherein said activating comprises the steps of:
the embedded weights for each domain are learned based on the statistical vector Z and two full-connected layer learning weights are used. The first fully connected layer is the parameter W 1 Is used for reducing dimension of (a) using sigma 1 As a nonlinear function. The second full connection layer is achieved by using the parameter W 2 The dimension is added to restore the original dimension. Formally, the domain-embedded weights can be calculated as follows:
A=σ 2 (W 2 σ 1 (W 1 Z))
wherein
Figure QLYQS_8
To characterize the vector, σ 1 and σ2 Is an activation function.
8. The mobile application classification and recommendation method based on multi-modal feature fusion of claim 7, wherein the re-weighting comprises the steps of:
each field of the embedding layer is multiplied by a corresponding weight to obtain a final embedding result v= { V 1 ,…,V f }. The overall operation can be seen as learning the weight coefficients embedded per domain, which makes the model more discriminative to the features embedded per domain. The SENET mechanism is utilized to increase the weight of important features and reduce the weight of the features with insufficient information, so that the output V of the SENET layer is obtained, and the output V is expressed as follows:
V=[a 1 ·e 1 ,...,a f ·e f ]=[v 1 ,...,v f ]
after obtaining the mobile application characterization embedding of the initial embedding layer and the SENET layer, performing second-order and higher-order feature interaction on sparse features and dense features;
the interaction vectors p and q of the output E of the embedded layer and the output V of the SENET layer are obtained through calculation:
p ij =v i ·W ij ⊙v j
p=[p 1 ,...,p i ,...,p n ]
q=[q 1 ,...,q i ,...,q n ]
and connecting the two obtained interaction vectors and inputting the two interaction vectors into a deep part.
9. The method for classifying and recommending mobile applications based on multi-modal feature fusion according to claim 8, wherein the deep-layer part calculation formula in step 3 is:
Figure QLYQS_9
wherein ,
Figure QLYQS_10
recommended predicted values are applied to the model movement of the section epsilon (0, 1), sigma is a sigmoid function, m is a characteristic size, and the rest is a linear regression part;
optimization objective function was recommended using logoss as model:
Figure QLYQS_11
where y is the actual tag of the ith mobile application,
Figure QLYQS_12
n is the total number of mobile applications corresponding to the predictive label of the ith mobile application. />
CN202210751368.6A 2022-06-28 2022-06-28 Mobile application classification and recommendation method based on multi-modal feature fusion Pending CN116127175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210751368.6A CN116127175A (en) 2022-06-28 2022-06-28 Mobile application classification and recommendation method based on multi-modal feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210751368.6A CN116127175A (en) 2022-06-28 2022-06-28 Mobile application classification and recommendation method based on multi-modal feature fusion

Publications (1)

Publication Number Publication Date
CN116127175A true CN116127175A (en) 2023-05-16

Family

ID=86303206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210751368.6A Pending CN116127175A (en) 2022-06-28 2022-06-28 Mobile application classification and recommendation method based on multi-modal feature fusion

Country Status (1)

Country Link
CN (1) CN116127175A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630726A (en) * 2023-07-26 2023-08-22 成都大熊猫繁育研究基地 Multi-mode-based bird classification method and system
CN117611954A (en) * 2024-01-19 2024-02-27 湖北大学 Method, device and storage device for evaluating effectiveness of infrared video image

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630726A (en) * 2023-07-26 2023-08-22 成都大熊猫繁育研究基地 Multi-mode-based bird classification method and system
CN116630726B (en) * 2023-07-26 2023-09-22 成都大熊猫繁育研究基地 Multi-mode-based bird classification method and system
CN117611954A (en) * 2024-01-19 2024-02-27 湖北大学 Method, device and storage device for evaluating effectiveness of infrared video image
CN117611954B (en) * 2024-01-19 2024-04-12 湖北大学 Method, device and storage device for evaluating effectiveness of infrared video image

Similar Documents

Publication Publication Date Title
Sebe Machine learning in computer vision
CN109145245A (en) Predict method, apparatus, computer equipment and the storage medium of clicking rate
CN116127175A (en) Mobile application classification and recommendation method based on multi-modal feature fusion
Cao et al. Service recommendation based on attentional factorization machine
CN114936623A (en) Multi-modal data fused aspect-level emotion analysis method
CN114648031B (en) Text aspect emotion recognition method based on bidirectional LSTM and multi-head attention mechanism
Lai et al. Multimodal sentiment analysis: A survey
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN117407571B (en) Information technology consultation service method and system based on correlation analysis
Zhang et al. Integrating an attention mechanism and convolution collaborative filtering for document context-aware rating prediction
CN116975776A (en) Multi-mode data fusion method and device based on tensor and mutual information
CN114662652A (en) Expert recommendation method based on multi-mode information learning
CN113918764A (en) Film recommendation system based on cross modal fusion
Wang et al. DAN: a deep association neural network approach for personalization recommendation
CN117150320B (en) Dialog digital human emotion style similarity evaluation method and system
CN112541541B (en) Lightweight multi-modal emotion analysis method based on multi-element layering depth fusion
Li et al. Joint inter-word and inter-sentence multi-relation modeling for summary-based recommender system
Wang et al. Cognitive process-driven model design: A deep learning recommendation model with textual review and context
Zeng et al. Research on the application of knowledge mapping and knowledge structure construction based on adaptive learning model
Narengerile et al. [Retracted] An Intelligent Assessment Method of English Teaching Ability Based on Improved Machine Learning Algorithm
CN111552881B (en) Sequence recommendation method based on hierarchical variation attention
CN114328931A (en) Topic correction method, model training method, computer device, and storage medium
Xian et al. Design of an English vocabulary e-learning recommendation system based on word bag model and recurrent neural network algorithm
Yang et al. A Fuzzy Neural Network‐Based System for Alleviating Students’ Boredom in English Learning
Wang et al. Online Learning Resource Recommendation Based on Attention Convolutional Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination