CN113590945B

CN113590945B - Book recommendation method and device based on user borrowing behavior-interest prediction

Info

Publication number: CN113590945B
Application number: CN202110846763.8A
Authority: CN
Inventors: 赵雪青
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-07-28
Anticipated expiration: 2041-07-26
Also published as: CN113590945A

Abstract

The invention discloses a book recommendation method and device based on user borrowing behavior-interest prediction, wherein the method comprises the following steps: acquiring data of borrowing behaviors of a user; determining a basic feature tag based on the borrowing behavior data of the user, and determining a prediction type feature tag by adopting a weight calculation algorithm TFIDF and cosine similarity method; inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the vectorization of the emboding features, and inputting the feature vectors into a deep neural network after feature crossing, and outputting a recommendation result. According to the method, the interest prediction label is constructed on the basis of analyzing the borrowing behaviors of the library user, and the deep FM is adopted to recommend books to the user. The method effectively builds the user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirements of the accurate positioning user and improves the user satisfaction.

Description

Book recommendation method and device based on user borrowing behavior-interest prediction

Technical Field

The invention relates to the field of big data, in particular to a book recommendation method and device based on user borrowing behavior-interest prediction.

Background

Digital libraries are increasingly rising in the big data age, and it is a necessary trend to mine the reading hobbies of users and recommend books. With the rapid development of mobile internet and self-media, the attention of users is continuously being transferred from the computer end to the mobile end. How to effectively grasp the focus of the user in the shortest time and continuously improve the user satisfaction is always a problem to be solved by the recommendation system.

Currently, a neural network model (Deep Factorization Machine, hereinafter abbreviated as deep fm) constructed based on a factoring machine has been widely used in the field of CTR (Click-Through-Rate) such as recommendation and advertisement, and because the deep fm model has a memory capacity of logistic regression and a generalization capacity of a neural network, users and resource characteristics can be directly learned by means of the memory capacity thereof; the generalization capability of the neural network is adopted, so that rare characteristics of a user can be effectively mined, the correlation with a label can be found, and further, the characteristics are automatically combined through the neural network, and a relatively stable recommendation result is obtained.

In 2010, massanari et al propose that the user representation is a representation model formed by user features, which can effectively analyze important features of the user. For the borrowing behavior of library users, as the user behavior is single, the user characteristics are analyzed by adopting the traditional user portrait, and the various behavior characteristics of the users are difficult to finely describe, so that the application of the user portrait in the library field is still in an exploration period. In recent years, yu Chuanming et al propose a behavior-content fusion model, yao Yuan et al propose a method for constructing a user portrait with a knowledge graph, chen Dan et al consider that a user portrait tag can be obtained from three approaches of user behavior, user social data and a user tag set; he et al propose a hybrid model of decision tree and logistic regression; kong et al propose a new context aware attention convolutional neural network; zhou et al propose a deep interest network to generate different representation vectors with different candidate advertisements to address the description of the diverse behavioral characteristics of the user. Zhang et al propose a time period division algorithm that analyzes and quantifies the user's interest distribution over a period. However, the above studies do not accurately locate the core needs of the user.

Disclosure of Invention

In order to achieve more accurate library service supply side, the invention provides a book recommendation method and device based on combination of user borrowing behavior analysis and interest prediction.

The book recommendation method based on the user borrowing behavior-interest prediction provided by the embodiment of the invention comprises the following steps:

acquiring data of borrowing behaviors of a user;

determining a basic feature tag based on the user borrowing behavior data;

based on the borrowing behavior data of the user, determining a prediction type feature tag by adopting a weight calculation algorithm TFIDF and cosine similarity method;

inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the vectorization of the emboding features, and inputting the feature vectors into a deep neural network after feature crossing, and outputting a recommendation result.

In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction further includes:

preprocessing the borrowing behavior data of the user, which specifically comprises the following steps: data deduplication, outlier processing, missing value processing and time format normalization; wherein, the liquid crystal display device comprises a liquid crystal display device,

the outlier processing includes: carrying out normalization processing on abnormal data exceeding a normal time range;

the missing value processing includes: deleting the row with the book number of being empty;

the time format normalization includes: the data with non-uniform time format is subjected to time format conversion through strptime () function in python.

In one embodiment, the determining of the basic feature tag includes:

drawing a histogram and a word cloud picture on the preprocessed user borrowing behavior data, and performing basic feature analysis of data visualization; determining a basic feature label according to a basic feature analysis result of data visualization;

the basic feature tag comprises:

the fact type label comprises the total times of behaviors generated by the same user;

the rule class labels are used for setting thresholds for the fact class labels of the users by combining manual experience on the basis of the fact class labels, and different thresholds correspond to different user characteristics;

a text class label containing text class information generated by a user; wherein the text class label comprises: the gender, occupation, address and book name of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method.

In one embodiment, the weight calculation algorithm TFIDF includes:

the weight calculation algorithm TFIDF has two logical parts: interaction depth, TFIDF label total weight value;

the interaction depth refers to the depth of the user's behavior measured by the partial features under each interaction behavior: user behavior type weight, number of user behaviors and decay change of behaviors with time; the TFIDF label total weight value reflects the importance degree of different labels to the prediction result, the higher the importance degree is, the larger the weight is, and the calculation formula of the TFIDF label total weight is as follows:

W＝B _i ×I _t ×C _i ×TFIDF

wherein B is _i Representing behavior type weights, C _i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i _t Indicating time-decay interestingness, TFIDF indicates user behavior label assignment weights.

In one embodiment, the calculation formula of the user behavior label assignment weight is as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，L)

wherein TFIDF (U, L) represents the objective weight of the labels L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of that label in the total labels of the user (IDF (U, L));

the calculation formula of TF (U, L) is as follows:

where S (U, L) represents the number of times the tag L marks the user U, ΣS (U, L) _i ) The number of all the labels on the user U is represented, and TF (U, L) represents the proportion of the marking times of the labels L in the marking times of all the labels of the user U;

the calculation formula of IDF (U, L) is as follows:

wherein ΣΣs (U _i ，L _i ) Sum of all labels representing all users, Σs (U _i L) represents the sum of all L-tagged users, IDF (U, L) represents the degree of scarcity of the tag L among all the tags of the user U, i.e. the probability of occurrence of this tag.

In one embodiment, the calculation formula of the time-decay interestingness is as follows:

I _t ＝e ^-λΔt

wherein Deltat represents the number of days from the observation point at the behavior occurrence time t, lambda represents the attenuation factor, I _t Indicating the interestingness at each moment.

In one embodiment, the cosine similarity method specifically includes:

the input when the cosine similarity is weighted and classified is user borrowing behavior data, the label with the highest recommended score is output, and the calculation process is as follows:

calculating the similarity of every two labels, and orthogonalizing the two labels to obtain every two combinations of all the labels under each user;

calculating the number of users corresponding to each label, namely the number of each label in different users;

calculating the similarity of every two labels by using cosine similarity, and finally obtaining cosine similarity weight p;

calculating related labels recommended to the user, and corresponding the user to all related labels, wherein a recommendation score calculation formula is as follows:

R＝W×p

wherein W is the total weight of the tag; p is cosine similarity weight, and the calculation formula is as follows:

the σ represents the number of users who pay attention to the resource X and the resource Y at the same time, η represents the number of users who pay attention to the resource X, λ represents the number of users who pay attention to the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the users.

carrying out normalization processing on the numerical value type characteristics in the basic characteristic labels and the prediction type characteristic labels by adopting a logarithmic function conversion method; and converting the category type features in the basic feature tag and the predicted category feature tag into numerical vectors by using one-hot coding.

In one embodiment, the determining of the recommendation result includes:

deep fm model: the embedded type model comprises an input layer, an embedded layer, a model layer and an output layer;

inputting the basic feature tag and the prediction feature tag into a deep FM model, and carrying out Embedding feature vectorization through an Embedding layer; the model layer firstly splices the linear characteristics together through a linear model, wherein the linear model is the sum of the characteristics and the corresponding characteristic weights; the output layer outputs the recommended result through the two full-connection layers.

A book recommendation device based on user borrowing behavior-interest prediction, comprising:

the data acquisition module is used for acquiring the borrowing behavior data of the user;

the basic tag determining module is used for determining basic feature tags based on the borrowing behavior data of the user;

the category label determining module is used for determining a prediction category characteristic label by adopting a weight computing algorithm TFIDF and cosine similarity method based on the borrowing behavior data of the user;

the recommendation result determining module is used for inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factor decomposition machine to perform the vectorization of the Embedding feature, and inputting the feature vector into the deep neural network after feature crossing, and outputting a recommendation result.

Compared with the prior art, the book recommendation method and device based on the user borrowing behavior-interest prediction provided by the embodiment of the invention have the following beneficial effects:

aiming at the problem that the result is not accurate enough when library books are recommended, the invention provides a book recommendation method based on user borrowing behavior-interest prediction, an interest prediction label is constructed on the basis of analyzing library user borrowing behavior, and deep FM is adopted to recommend the books to the user. The method effectively builds the user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirements of the accurate positioning user and improves the user satisfaction.

Drawings

FIG. 1 is a flow diagram of the construction of a user borrowing behavior-interest prediction method provided in one embodiment;

FIG. 2 is a histogram of user type distributions provided in one embodiment;

FIG. 3 is a word cloud of family readers community provided in one embodiment;

FIG. 4 is a cloud of individual user words provided in one embodiment;

FIG. 5 is a graph comparing Accuracy curves of an unpredicted class-labeled book recommendation method and the method of the present invention (book recommendation method including predictive class labels) provided in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction is provided, the method comprising:

and step 1, reading data.

And 2, preprocessing data. And (3) preprocessing the data read in the step (1).

And 3, data visual analysis and basic label construction. And (3) drawing a histogram and a word cloud image on the data preprocessed in the step (2) to perform visual analysis on the data, and constructing 3-class basic tags of facts, rules and texts.

And 4, constructing a prediction type label. And (3) constructing a prediction type label by adopting a TFIDF and cosine similarity method for the data preprocessed in the step (2).

And 5, normalizing the feature labels. And (3) carrying out normalization processing on the numerical type feature labels and the category type feature labels generated in the steps (3) and (4).

And 6, recommending books. And (3) inputting the feature labels obtained in the step (5) into a deep FM model to perform the vectorization of the features, performing feature intersection on the feature vectors subjected to the vectorization, and then inputting the feature vectors into a deep neural network to output a recommendation result.

And 7, comparing and analyzing book recommendation results. And (3) taking the recommendation Accuracy (Accuracy) as an objective evaluation index, and comparing and analyzing book recommendation results according to the book recommendation method without the prediction type label and the book recommendation method comprising the prediction type label.

The data preprocessing in the step 2 specifically comprises the following steps:

preprocessing the data read in the step 1, wherein the preprocessing specifically comprises data deduplication, outlier processing, missing value processing and time format normalization. The abnormal value processing is mainly used for carrying out normalization processing on abnormal data exceeding a normal time range; the missing value processing means that a line with a book number being empty is deleted; time format normalization refers to time format conversion by strptime () function in python for data whose time format is not uniform.

The data visualization analysis in the step 3 specifically includes:

the data preprocessed in the step 2 are visualized through a method in a third party library wordcloud library and a drawing library matplotlib library for displaying word cloud pictures in python; the basic label construction in the step 3 specifically comprises the following steps:

on the basis of data visualization analysis, the 3-class basic tags of the fact class, the rule class and the text class are constructed by utilizing the preprocessed data. The fact-like tag contains the total number of actions that the same user has generated. The rule class labels are based on fact class labels, and are combined with manual experience to set thresholds for statistical class labels of users, and different thresholds correspond to different user characteristics. The text label comprises text information generated by a user, such as gender, occupation, address, picture name and the like of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method.

The process of constructing the prediction type label in the step 4 specifically includes:

and (3) constructing a prediction type label by adopting a TFIDF and cosine similarity method on the data preprocessed in the step (2), wherein the specific process is as follows:

the TFIDF weight calculation logic is largely divided into two parts: interaction depth, TFIDF label total weight value. The interaction depth refers to the depth of the user behavior, such as the weight of the user behavior type, the number of times of the user behavior and the attenuation change of the behavior with time, which can be measured by some characteristics of the user under each interaction behavior. And (3) calculating and reflecting the importance degree of different labels on the prediction result by using the TFIDF weight, wherein the higher the importance degree is, the larger the weight is. The formula for calculating TFIDF tag weights is as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，F) (1)

where TFIDF (U, L) represents the objective weight of the labels L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of that label in the total labels of the user (IDF (U, L)).

The calculation formula of TF (U, L) is as follows:

where S (U, L) represents the number of times the tag L marks the user U, ΣS (U, L) _i ) The number of all the tags on the user U is represented, and TF (U, L) represents the specific weight of the number of tags L in the number of tags of the user U.

The calculation formula of IDF (U, L) is as follows:

wherein ΣΣs (U _i ，L _i ) Sum of all labels representing all users, Σs (U _i L) represents the sum of all L-tagged users, IDF (U, L) represents the degree of scarcity of the tag L among all the tags of the user U, i.e. the probability of occurrence of this tag. If a tag L is present at a small rate, and the tag is used to mark the user U, this results in a tighter relationship between the user U and the tag L.

The interestingness expression formula of each moment in the time decay function in the time decay change analysis of the behavior along with time is as follows:

I _t ＝e ^-λΔt (4)

The calculation formula of the total weight of the user tag is as follows:

W＝B _i ×I _t ×C _i ×TFIDF (5)

The cosine similarity method in the step 4 specifically comprises the following steps:

cosine similarity is a measure for measuring the difference of two vectors by using the cosine value of the included angle of the two vectors, wherein the closer the cosine value is to 1, the more the included angle tends to 0 degree, the larger the correlation is, the following calculation formula is adopted,

wherein σ represents the number of users who pay attention to both the resource X and the resource Y, η represents the number of users who pay attention to only the resource X, λ represents the number of users who pay attention to only the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the user. The larger the p value, the higher the probability that the user is paying attention to both resources at the same time.

The input of the cosine similarity when the weight classification is carried out is a preprocessed data characteristic column, and the label with the highest recommended score is output. The calculation process is as follows:

1. and calculating the similarity of every two tags, and orthogonalizing the two tables to obtain every two combinations of all the tags under each user.

2. And calculating the number of users corresponding to each tag, namely the number of the tags in different users.

3. And calculating the similarity of every two labels by using the cosine similarity, and finally obtaining the cosine similarity weight p.

4. Calculating related labels recommended to the user, and corresponding the user to all the labels related to the user, wherein a recommendation score calculation formula is as follows:

R＝W×p (7)

wherein W is the total weight of the tag, and p is the cosine similarity weight.

The normalization of the characteristic label in the step 5 is specifically as follows:

and (3) carrying out normalization processing on the numerical type feature labels and the category type feature labels generated in the steps (3) and (4). The numerical type features (including user numbers, book numbers, ages and the like) are normalized by adopting a logarithmic function conversion method, wherein the logarithmic function conversion is shown as a formula (8), and the category type features (including gender, occupation, personal information and resource categories of the user) are converted into numerical vectors by using one-hot codes.

f(a)＝log ₁₀ (a) (8)

Wherein a is a specific value corresponding to the numerical feature to be processed, and f (a) is a numerical value normalized by a logarithmic function based on 10.

The specific process of book recommendation in the step 6 is as follows:

the deep FM model comprises an input layer, an Embedding layer, a model layer and an output layer, the feature tag obtained in the step 5 is input into the deep FM model, and the Embedding layer is used for carrying out the vectorization of the Embedding feature.

The model layer firstly splices linear features together through a generalized linear model, the linear model is as shown in a formula (9), feature intersection is carried out on the feature vector after the Embedding, and then the feature vector is input into a deep neural network:

y＝ω ₁ x ₁ +ω ₂ x ₂ +...+ω _n x _n (9)

wherein x is _i Refers to the various features of the input, ω _i The weight of each feature can be obtained through back propagation learning of the model to reach a value conforming to the prediction effect of the model.

The output layer outputs the recommended result through the two full-connection layers.

The specific process of the book recommendation result comparison analysis in the step 7 is as follows:

the method comprises the steps of adopting a recommendation Accuracy (Accurcy) as an objective evaluation index, and comparing and analyzing book recommendation results by calculating an Accuracy (Accurcy) value and drawing an Accurcy curve aiming at the book recommendation method without the prediction type label and the book recommendation method comprising the prediction type label.

Example 1

Executing the steps 1 and 2:

10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out by adopting off-line borrowing data of a certain provincial library to serve as an experimental data set. Wherein the user has the attributes of user number, age, sex, occupation and the like, and the book has the attributes of book number, book name, category, author, publishing agency and the like. Preprocessing the data comprises deduplication, outlier processing, missing value processing and time format normalization.

Executing the step 3:

the preprocessed data are visualized through a method in a third party library word closed library and a drawing library matplotlib library of a word cloud diagram displayed in python, basic labeling is carried out on the data on the basis of the visualization, the data are divided into a fact type label, a text type label and a rule type label 3 type basic label according to label generation types, and the labels of users are stored in a two-dimensional table structure format.

And 4, executing the step:

constructing a prediction type label for the preprocessed data by using a TFIDF and cosine similarity method, adding a time attenuation function when calculating the label weight by using the TFIDF, and indirectly reflecting the interest change of the user to the corresponding resource through the attenuation change of the user behavior along with time; and calculating the similarity of every two labels by using the cosine similarity, finally obtaining the cosine similarity weight, and outputting the label with the highest recommended score.

And (5) executing the steps (5, 6):

carrying out normalization processing on the numerical value type features by adopting a logarithmic function conversion method, and carrying out one-hot coding processing on the category type features to convert the category type features into numerical value vectors; performing Embedding feature vectorization on the processed feature tag through an Embedding layer; meanwhile, in order to enable the model to learn each feature in the data, the generalized linear model is utilized to splice the linear features together, then feature intersection is carried out on feature vectors after the Embedding, and meanwhile, according to the two features of the age and occupation of a user, the effect of book recommendation of the user is far better than the recommendation effect of the feature of only relying on single age or occupation, and finally the feature vectors after the Embedding are input into a deep neural network for training, and output results are obtained through two layers of fully connected layers.

Step 7 is executed:

when the deep FM model is used for testing, a basic label and a label result with a prediction type label are selected, 80% of the basic label and the label result with the prediction type label are used as training sets for training samples, 20% of the basic label and the label result with the prediction type label are used as testing sets, and the result of the test model is obtained. 20% of the training set was used as a validation set, with validation showing the results of each training. The Accumey change curve shows that the method (book recommendation method comprising the prediction type label) has better recommendation effect than the book recommendation method without the prediction type label.

Here, 10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out by using offline borrowing data of a certain provincial library as an experimental data set.

In the aspect of objective evaluation, a commonly used Accuracy (Accurcry) index is selected as a verification index of an experimental result, the calculation formula of Accurcry is as follows,

wherein TP represents the number of samples for which the label is positive and the prediction is also positive; TN represents the number of samples that are marked as negative and predicted as negative; FP represents the number of samples that the label is a negative sample and the prediction is a positive sample; FN represents the number of samples that are marked positive and predicted negative.

As can be seen from the Accuracy profile in fig. 5. The method has better recommending effect than the book recommending method without the predictive label, and can be used for book recommending service of a digital library.

In one embodiment, a book recommendation device based on user borrowing behavior-interest prediction is provided, the device comprising:

and the data acquisition module is used for acquiring the borrowing behavior data of the user.

And the basic tag determining module is used for determining basic feature tags based on the borrowing behavior data of the user.

And the category label determining module is used for determining a prediction category characteristic label by adopting a weight computing algorithm TFIDF and cosine similarity method based on the user borrowing behavior data.

The specific definition of a book recommendation device based on the user borrowing behavior-interest prediction can be referred to as the definition of a book recommendation method based on the user borrowing behavior-interest prediction hereinabove, and will not be described herein. The respective modules in the book recommendation device based on the user borrowing behavior-interest prediction may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description. Also, the above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A book recommendation method based on user borrowing behavior-interest prediction, comprising:

acquiring data of borrowing behaviors of a user;

determining a basic feature tag based on the user borrowing behavior data;

inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the emboding feature vectorization, and inputting the feature vector into a deep neural network after feature crossing to output a recommendation result;

the determining of the basic feature tag comprises the following steps:

the basic feature tag comprises:

a text class label containing text class information generated by a user; wherein the text class label comprises: the gender, occupation, address and book name of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method;

the weight calculation algorithm TFIDF includes:

W＝B _i ×I _t ×C _i ×TFIDF

wherein B is _i Representing behavior type weights, C _i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i _t The method comprises the steps of representing time decay interestingness, and TFIDF represents user behavior label allocation weights;

the cosine similarity method specifically comprises the following steps:

R＝W×p

2. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, further comprising:

3. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the calculation formula of the user behavior label assignment weight is as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，L)

the calculation formula of TF (U, L) is as follows:

wherein S (U, L) represents the number of times the tag L marks the user U, ΣS (U.L) _i ) The number of all the labels on the user U is represented, and TF (U, L) represents the proportion of the marking times of the labels L in the marking times of all the labels of the user U;

the calculation formula of IDF (U, L) is as follows:

4. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the calculation formula of the time-decay interest level is as follows:

I _t ＝e ^-λΔt

5. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, further comprising:

6. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the determining of the recommendation result comprises:

7. A book recommendation device based on user borrowing behavior-interest prediction, comprising:

the recommendation result determining module is used for inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factor decomposition machine to perform the vectorization of the Embedding feature, and inputting the feature vector into a deep neural network after feature crossing, and outputting a recommendation result;

the basic tag determining module is specifically configured to:

the basic feature tag comprises:

the weight calculation algorithm TFIDF includes:

W＝B _i ×I _t ×C _i ×TFIDF

the cosine similarity method specifically comprises the following steps:

R＝W×p