WO2023134084A1

WO2023134084A1 - Multi-label identification method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023134084A1
Application number: PCT/CN2022/090726
Authority: WO
Inventors: 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-11
Filing date: 2022-04-29
Publication date: 2023-07-20
Also published as: CN114358007A

Abstract

The present application relates to the technical field of artificial intelligence. Embodiments of the present application provide a multi-label identification method and apparatus, an electronic device, and a storage medium. The method comprises: performing normalization processing on user basic data to obtain user basic features; performing feature extraction on user behavior data by means of a graph convolution model to obtain a behavior feature matrix; performing word segmentation processing on user comment data to obtain comment text word segment vectors; inputting the comment text word segment vectors to a comparative learning model so that matrix multiplication is performed on the comment text word segment vectors and a reference word embedding matrix so as to obtain comment word embedding vectors; performing fusion processing on the user basic features, the behavior feature matrix, and the comment word embedding vectors to obtain standard portrait feature vectors; performing label identification processing on the standard portrait feature vectors by means of a label identification model to obtain probability values of portrait labels; and obtaining a target portrait label according to the probability values. According to embodiments of the present application, the accuracy with which portrait labels of a user are identified is improved.

Description

Multi-label identification method, device, electronic equipment and storage medium

This application claims the priority of the Chinese patent application with the application number 202210027793.0 filed on January 11, 2022, and the title of the invention is "multi-label identification method, device, electronic equipment and storage medium", the entire content of which is incorporated by reference incorporated in this application.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a multi-tag identification method, device, electronic equipment and storage medium.

Background technique

At present, when labeling portraits of Internet users, manual labeling or machine learning is often used to identify and classify the portrait labels. When the existing method uses manual labeling, it often needs to go through a long period of labeling processing, and the error rate is high, which affects the accuracy of recognition; and when using machine learning to identify multi-label portraits, it often needs to target different labels. Classifiers are trained separately, which often takes a lot of time for model training, which affects the recognition efficiency. Therefore, how to provide a multi-tag recognition method that can improve the recognition accuracy and recognition efficiency of user portrait tags has become an urgent technical problem to be solved.

Contents of the invention

The main purpose of the embodiment of the present application is to provide a multi-tag identification method, device, electronic device and storage medium, aiming at improving the recognition accuracy and recognition efficiency of user portrait tags.

technical solution

In the first aspect, the embodiment of the present application proposes a multi-label identification method, the method comprising:

Obtaining raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

performing normalization processing on the user basic data to obtain user basic features;

performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix;

Carry out word segmentation processing to described user comment data, obtain comment text word segment vector;

The comment text word segment vector is input into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector;

Perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

Carrying out label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label;

According to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.

In the second aspect, the embodiment of the present application proposes a multi-tag identification device, which includes:

A data acquisition module, configured to acquire raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

A normalization module, configured to perform normalization processing on the user basic data to obtain user basic features;

Feature extraction module, for carrying out feature extraction to described user behavior data by pre-trained graph convolution model, obtains behavior feature matrix;

A word segmentation module, configured to perform word segmentation processing on the user comment data to obtain a comment text word segment vector;

A comparative learning module, for inputting the comment text word segment vector into a pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied, Get the comment embedding vector;

The fusion module is used to perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

A tag recognition module, configured to perform tag recognition processing on the standard portrait feature vector through a pre-trained tag recognition model to obtain the probability value of each preset portrait tag;

The comparison module is used to obtain the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold.

In the third aspect, the embodiment of the present application provides an electronic device, the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a program for implementing the processor A data bus connecting and communicating with the memory, when the program is executed by the processor, a multi-label identification method is implemented, wherein the multi-label identification method includes: obtaining original data, wherein the original The data includes user basic data, user behavior data, and user comment data; the user basic data is normalized to obtain user basic features; the user behavior data is extracted through a pre-trained graph convolution model to obtain Behavior feature matrix; word segmentation processing is carried out to described user comment data, obtain comment text phrase vector; Described comment text phrase vector is input in the comparative learning model of pre-training, so that described comment text phrase vector and all The reference word embedding matrix in the comparative learning model is multiplied by matrix to obtain the comment word embedding vector; the user basic feature, the behavior feature matrix and the comment word embedding vector are fused to obtain a standard portrait feature vector; Perform label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label; obtain the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold.

In a fourth aspect, the embodiment of the present application provides a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and the one or more This program can be executed by one or more processors to implement a multi-label identification method, wherein the multi-label identification method includes: obtaining original data, wherein the original data includes user basic data, user behavior data, and User comment data; normalize the user basic data to obtain user basic features; perform feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix; Carry out participle processing, obtain comment text phrase vector; Described comment text phrase vector is input in the comparative learning model of pre-training, so that described comment text phrase vector and the reference word embedding matrix in described contrast learning model Perform matrix multiplication to obtain comment word embedding vectors; perform fusion processing on the user basic features, the behavior feature matrix, and the comment word embedding vectors to obtain standard portrait feature vectors; The standard portrait feature vector performs label recognition processing to obtain the probability value of each preset portrait label; according to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.

Beneficial effect

The multi-tag recognition method, device, electronic equipment, and storage medium proposed in this application can greatly shorten the model training time, improve the recognition efficiency, and improve the recognition accuracy of user portrait tags.

Description of drawings

The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

Fig. 1 is a flow chart of the multi-tag identification method provided by the embodiment of the present application;

Fig. 2 is the flowchart of step S103 in Fig. 1;

Fig. 3 is the flowchart of step S104 in Fig. 1;

Fig. 4 is the flowchart of step S105 in Fig. 1;

Fig. 5 is another flow chart of the multi-tag identification method provided by the embodiment of the present application;

Fig. 6 is the flowchart of step S107 in Fig. 1;

Fig. 7 is the flowchart of step S108 in Fig. 1;

Fig. 8 is a schematic structural diagram of a multi-tag identification device provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

At present, when labeling portraits of Internet users, manual labeling or machine learning is often used to identify and classify the portrait labels. When manual labeling is used, it often takes a long time to label, and the error rate is high, which affects the accuracy of recognition; and when machine learning is used to identify multi-label portraits, it is often necessary to target different label categories. Training the classifiers separately often takes a lot of time for model training, which affects the recognition efficiency. Therefore, how to provide a multi-tag recognition method that can improve the recognition accuracy and recognition efficiency of user portrait tags has become an urgent technical problem to be solved.

Based on this, embodiments of the present application provide a multi-tag identification method, device, electronic equipment, and storage medium, aiming at improving the identification accuracy of user portrait tags.

The multi-tag identification method, device, electronic device, and storage medium provided in the embodiments of the present application are specifically described through the following embodiments. First, the multi-tag identification method in the embodiment of the present application is described.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The multi-tag identification method provided in the embodiment of the present application relates to the fields of artificial intelligence and digital medical technology. The multi-tag identification method provided in the embodiment of the present application can be applied to a terminal, can also be applied to a server, and can also be software running on a terminal or a server. In some embodiments, the terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc.; the server end can be configured as an independent physical server, or can be configured as a server cluster or a distributed system composed of multiple physical servers, or It can be configured as a cloud that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The server; the software may be an application to realize the multi-label identification method, but is not limited to the above forms.

The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

Fig. 1 is an optional flow chart of the multi-tag identification method provided by the embodiment of the present application. The method in Fig. 1 may include but not limited to steps S101 to S108.

Step S101, obtaining original data, wherein the original data includes user basic data, user behavior data and user comment data;

Step S102, performing normalization processing on the basic user data to obtain basic user features;

Step S103, using the pre-trained graph convolution model to perform feature extraction on user behavior data to obtain a behavior feature matrix;

Step S104, performing word segmentation processing on user comment data to obtain comment text word segment vectors;

Step S105, inputting the comment text word segment vector into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector;

Step S106, performing fusion processing on user basic features, behavior feature matrix and comment word embedding vectors to obtain standard portrait feature vectors;

Step S107, using the pre-trained label recognition model to perform label recognition processing on the standard portrait feature vector to obtain the probability value of each preset portrait label;

Step S108, according to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.

After the above steps S101 to S108, the multi-label recognition method of the present application can realize the recognition of different portrait labels through a label recognition model. Compared with the traditional technology, which needs to train classifiers for different label categories separately, the model can be greatly shortened. The training time improves the recognition efficiency. At the same time, the multi-label recognition method of the present application performs corresponding data preprocessing for different types of user data, so that the obtained standard portrait feature vectors can better meet the recognition requirements, and can improve the recognition accuracy of user portrait labels.

In some embodiments, when step S101 is executed, the original user data can be crawled from multiple preset data sources by means of a web crawler, wherein the basic data includes the user's gender, education background, age group, etc.; behavior data Including the user's click data on the display of the course content and the click data of the recommended courses on the course page, etc.; the comment data is the user's text comment data on the course, etc.

In some embodiments, when step S102 is executed, a series of numbers can be set for different types of basic data according to preset normalization conditions. A collection of {0,1}, 0 for female and 1 for male. Educational background is divided into {1,2,3,4,5,6,7,8} set, 1 represents primary school, 2 represents junior high school, 3 represents technical secondary school, 4 represents high school, 5 represents junior college, 6 represents undergraduate, 7 represents Master, 8 represents Ph.D. The age group is divided into {5, 6, 7, 8, 9, 0} sets, 5 represents the post-50s, 6 represents the post-60s, and so on.

Referring to FIG. 2, in some embodiments, step S103 may include but not limited to include steps S201 to S204:

Step S201, mapping user behavior data to a preset vector space to obtain user behavior feature vectors;

Step S202, constructing a behavior feature map according to the preset course types and user behavior feature vectors;

Step S203, performing graph convolution processing on the behavior feature map to obtain a behavior degree matrix and a behavior adjacency matrix;

Step S204, performing difference processing on the behavior degree matrix and the behavior matrix to obtain a behavior characteristic matrix.

Specifically, in step S201, the user behavior data can be mapped to a preset vector space by using the MLP network to map the semantic space of the user behavior data to the vector space, and obtain the user behavior feature vector.

In step S202, record each preset course as a node, analyze the user's behavior data, and if it is detected that the user clicks on another course through the recommendation module of a course page, then establish a link between the two courses. side. According to this mapping relationship, the relationship between each course type and user behavior feature vector is constructed to obtain an undirected graph, which is the behavior feature graph.

In step S203, the behavior feature graph can be expressed as G=(V, E), where V represents a node and E represents an edge. The Laplacian matrix of this behavior feature map can be defined as L=D-A, L is the Laplacian matrix, D is the angle matrix (because the element on the diagonal is the degree of the vertex, the angle matrix refers to the element The number of linked elements); A is an adjacency matrix, which represents the adjacency relationship between any two vertices. If the two vertices are adjacent, the adjacency matrix is 1. If the two vertices are not adjacent, the adjacency matrix is 0; therefore, by performing graph convolution processing on the behavior feature map, the Laplace transform of the behavior feature map can be realized, and the behavior degree matrix (ie, the angle matrix D) and the behavior adjacency matrix (ie, the adjacency matrix A) can be obtained.

In step S204, since the property of the Laplacian matrix and the graph satisfies L=D-A, that is, the behavior degree matrix D and the behavior adjacency matrix A are subtracted to obtain the behavior characteristic matrix L1.

It should be noted that the graph convolution layer of this graph convolution model can be expressed as shown in formula (1):

Among them, y is the output value and σ is the sigmoid activation function. L is the Laplacian matrix, x is the input labeled behavioral feature map, and j is the number of rows of the Laplacian matrix, where j is generally much smaller than the number of nodes in the behavioral feature map. α is the weight matrix. The parameter values of the weight matrix are randomly generated when the graph convolution model is initialized, and can be adjusted later by training the graph convolution model. Specifically, calculate the error between the marked behavior features and the predicted features, and then The error is backpropagated to update the parameter value to optimize the graph convolution model.

Referring to FIG. 3, in some embodiments, step S104 may include but not limited to include steps S301 to S302:

Step S301, performing word segmentation processing on the user comment data through a preset word segmenter to obtain comment text segments;

Step S302, encoding the word segment of the review text to obtain a word segment vector of the review text.

Specifically, in step S301, when using the Jieba tokenizer to perform word segmentation processing on the user comment data, first generate a directed acyclic graph corresponding to the user comment data by comparing the dictionary in the Jieba tokenizer, and then according to the preset selection Patterns and dictionaries search for the shortest path on the directed acyclic graph, and intercept the user comment data according to the shortest path, or directly intercept the user comment data to obtain comment text segments.

Further, HMM (Hidden Markov Model) can be used to discover new words for comment text segments that are not in the dictionary. Specifically, the positions B, M, E, and S of the characters in the comment text segment are regarded as hidden states, and the characters are observed states, where B/M/E/S represent the words that appear at the beginning, middle, end, and word respectively into words. A dictionary file is used to store representation probability matrix, initial probability vector and transition probability matrix between characters respectively. Then use the Viterbi algorithm to solve the maximum possible hidden state, so as to obtain the comment text segment.

In step S302, a preset BERT encoder may be used to encode the word segment of the comment text, so that each character on the word segment of the comment text carries a corresponding code, thereby obtaining a word segment vector of the comment text.

In some embodiments, before step S105, the method further includes pre-training a contrastive learning model, which may specifically include but not limited to steps a to f:

a. Obtain sample user data;

b. Mapping and encoding the sample user data through the comparative learning model to obtain the initial embedded data;

c. Construct sample pairs according to the initial embedding data, wherein the sample pairs include positive example pairs and negative example pairs;

d. Input the sample pair into the contrastive learning model;

e. Calculate the first similarity degree of the positive example pair and the second similarity degree of the negative example pair by comparing the loss function of the learning model.

f. Optimizing the loss function of the contrastive learning model according to the first similarity and the second similarity, so as to update the contrastive learning model.

Specifically, to execute step a and step b, first obtain the sample user data, encode the sample user data, map the sample user data to the embedding space, and perform vector representation on the sample user data, so as to obtain the initial embedded data (ie Initial embedding data), the initial embedding data includes positive sample data and negative sample data.

In step c of some embodiments, the data enhancement processing is performed on the initial embedded data through the dropout mask mechanism; the embodiment of the present application replaces the traditional data enhancement method through the dropout mask mechanism, that is, the same sample data is input into the dropout encoder twice to obtain The two vectors of the two vectors are used as positive example pairs for comparative learning, and the effect is good enough, because for example, a different dropout mask is randomly generated for each dropout inside BERT, so only the same sample data (that is, the initial Embedding data) is input to the simCSE model twice, and the two vectors obtained are the results of applying two different dropout masks. It is understandable that the dropout mask is a random network model, which is the mask of the model parameter W, which prevents overfitting.

In a batch, the data (that is, the first vector and the second vector) obtained through data enhancement processing are positive example pairs, and other data that have not undergone data enhancement are negative example pairs. In the embodiment of the present application, some of the initial embedded data in a batch can be processed through data enhancement to obtain positive example pairs, and the other part of the initial embedded data can be used as negative example pairs.

Further, step d is performed to input the sample pair into the contrastive learning model.

In step e of some embodiments, both the first similarity and the second similarity are cosine similarity.

In some embodiments, step f may include, but is not limited to include:

Maximize the first similarity to the first value and minimize the second similarity to the first value to optimize the loss function; where the first similarity is the numerator of the loss function, the first similarity and the second The similarity is the denominator of the loss function, the first value is 1, and the second value is 0. In this loss function, the numerator is the first similarity of the corresponding positive example pair, the denominator is the first similarity and the second similarity of all negative example pairs, and then the value of the molecular formula composed of the numerator and the denominator is wrapped in -log() In this way, the loss function can be minimized by maximizing the numerator and minimizing the denominator. In the embodiment of this application, minimizing the loss function infoNCE loss is to maximize the numerator and minimize the denominator, that is, to maximize the first similarity of the positive pair and minimize the second similarity of the negative pair, and the loss The function is minimized to realize the optimization of the loss function. More specifically, the loss function is shown in formula (2):

In this loss function, l _i is the loss value of the loss function, the positive example pair is <z, z′>, N is the size of the batch (N is a variable), and the loss function indicates that the i-th sample should be compared with the batch The similarity is calculated for each sample of , and each sample in the batch will be calculated according to the loss function. Therefore, the loss function represents the loss of sample i; in the loss function, the numerator is the similarity of the positive pair , the denominator is the similarity between positive pairs and all negative pairs, and then wrap this value in -log(), so that maximizing the numerator and minimizing the denominator can minimize the loss function.

Referring to FIG. 4, in some embodiments, step S105 may include but not limited to include steps S401 to S402:

Step S401, input the comment text word segment vector into the comparative learning model, so that the comment text word segment vector and the reference word embedding matrix are matrix multiplied to obtain a plurality of basic word embedding vectors;

Step S402, performing mapping processing on the basic word embedding vectors to obtain review word embedding vectors.

Specifically, step S401 is executed, and the value of the reference word embedding matrix in the comparison model can be completely fixed by training the comparison model, and other model parameters of the comparison model are also fixed. Therefore, inputting the comment text word segment vector into the comparison model, the fixed reference word embedding matrix can be used to perform matrix multiplication with each comment text word segment vector to obtain multiple basic word embedding vectors.

In step S402, use the fixed MLP network in the comparison model to perform mapping processing on the basic word embedding vector to obtain the comment word embedding vector. Among them, the MLP network includes a linear layer, a ReLu activation function, and a linear layer.

In some embodiments, when step S106 is executed, the basic feature data and the behavioral feature matrix are respectively vectorized to obtain the basic feature vector and the behavioral feature vector, and then the basic feature vector, the behavioral feature vector and the word embedding feature vector are performed Fusion processing to get the standard feature vector. For example, the standard feature vector X=[gender, education, age group, [GCN], [BERT]]. Among them, GCN is a 256-dimensional vector, BERT is a 512-dimensional vector, and X is a 3+256+512 vector.

Please refer to FIG. 5 , in some embodiments, before step S107, the method further includes pre-training the label recognition model, which may specifically include but not limited to steps S501 to S505:

Step S501, acquiring marked user data;

Step S502, performing feature extraction on the labeled user data to obtain sample feature vectors;

Step S503, inputting the sample feature vector into the label recognition model;

Step S504, calculate the sample probability prediction value of each portrait label category through the loss function of the label recognition model;

Step S505, optimize the loss function of the label recognition model according to the sample probability prediction value, so as to update the label recognition model.

It should be noted that the label recognition model can be a textcnn model, and the label recognition model includes an Embedding layer, a convolution layer, a pooling layer and an output layer. Usually, the Embedding layer of the label recognition model can use ELMO, GLOVE, Word2Vector, Bert and other algorithms to generate a dense vector from the input text data. Then, the dense vector is convoluted and pooled through the convolution layer and pooling layer of the label recognition model to obtain the target feature vector, and then the target feature vector is input to the output layer, and the preset function in the output layer is The classification operation can be performed on the target feature vector to obtain the label feature vector and the probability value of each preset category.

Firstly, step S501 is executed to obtain marked user data, where the marked user data includes user portrait category labels. Furthermore, step S502 is executed, using the MLP network to perform multiple mapping processes on the labeled user data to obtain sample feature vectors.

Furthermore, step S503 is executed to input the sample feature vector into the label recognition model.

When step S504 is executed, the sample feature vector is generated into a dense feature vector through the Embedding layer of the label recognition model, and then the dense feature vector is convoluted and pooled through the convolution layer and the pooling layer to obtain the target The feature vector, and then input the target feature vector to the output layer, and calculate the sample probability prediction value of each portrait label category through the loss function; where the loss function is shown in formula (3):

Among them, t is the target value (target), and t needs to take a value between [0,1]. Since in the embodiment of this application, t is used as a portrait label category, the value of t is 0 or 1, and o indicates label identification The model's probabilistic predictions.

Finally, step S505 is executed to calculate the model loss of the label recognition model based on the sample probability prediction value, that is, the loss value, and then use the gradient descent method to backpropagate the loss value, feed the loss value back to the label recognition model, and modify the label recognition model. For model parameters, repeat the above process until the loss value satisfies the preset iteration condition, wherein the preset iteration condition is that the number of iterations can reach the preset value, or the variance of the loss function is smaller than the preset threshold. When the loss value satisfies the preset iteration condition, the backpropagation can be stopped, and the final model parameter can be used as the final model parameter to complete the update of the label recognition model.

Referring to FIG. 6, in some embodiments, step S107 may also include but not limited to steps S601 to S602:

Step S601, reconstructing the standard portrait feature vector according to the preset label dimension to obtain the label feature vector;

Step S602, using a preset function to identify the tag feature vector to obtain the probability value of each preset portrait tag.

Specifically, step S601 is first performed to reconstruct the standard image feature vector according to the preset label dimension and encoder, for example, encode the standard image feature vector according to the bottom-up encoding sequence and label dimension. For example, the standard portrait feature vector is encoded for the first time to obtain the bottom label feature vector z1, and then the downsampling process is performed layer by layer to obtain the label feature vector [z2, z3..., zk] corresponding to each label dimension.

In step S602, the preset function is a sigmoid function, and the sigmoid function can be expressed as shown in formula (4):

Identify the label feature vector through the sigmoid function. The sigmoid function will classify the label feature vector according to the preset portrait label category, and create a probability distribution on each portrait label category, so as to obtain the value of each preset portrait label. probability value.

Please refer to FIG. 7, in some embodiments, step S108 may also include but not limited to include steps S701 to S702:

Step S701, including the portrait labels whose probability value is greater than or equal to the preset probability threshold into the same set to obtain a set of candidate portrait labels;

Step S702, screening the candidate portrait label set to obtain the target portrait label.

Specifically, first execute step S701, if the probability value is less than the preset probability threshold, filter out the portrait label corresponding to the probability value; if the probability value is greater than or equal to the preset probability threshold, then include the portrait label corresponding to the probability value into the candidate Portrait labels set. For example, the preset probability threshold is 0.6, and when the probability value is greater than or equal to 0.6, it can be considered that the user has the current portrait tag.

Further, by executing step S702, the portrait tags in the candidate portrait tag set can be screened by means of manual review, etc., and the portrait tag with the highest matching degree with the current user is extracted to obtain the target portrait tag. In addition, the portrait tags in the candidate portrait tag set can also be arranged in descending order according to the size of the probability value, and the top five portrait tags are selected as the target portrait tags of the current user. Other methods may also be used to filter the portrait tags in the candidate portrait tag set, which is not limited thereto.

In this embodiment of the present application, raw data is acquired, wherein the raw data includes basic user data, user behavior data, and user comment data. Furthermore, the user basic data is normalized to obtain the user basic features; the user behavior data is extracted through the pre-trained graph convolution model to obtain the behavior feature matrix; the user comment data is word-segmented to obtain the comment text word segment vector, and input the comment text segment vector into the pre-trained comparative learning model, so that the review text segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the review word embedding vector. In this way, different types of data can be preprocessed separately to obtain user basic features, behavior feature matrices, and comment embedding vectors, which improves the rationality of user data. Furthermore, the standard portrait feature vector is obtained by fusing the user's basic features, behavior feature matrix, and comment word embedding vector. Finally, the pre-trained label recognition model is used to perform label recognition processing on the standard portrait feature vector to obtain the probability value of each preset portrait label, and obtain the target portrait label according to the relationship between the probability value and the preset probability threshold. The multi-label recognition method of the present application can realize the recognition of different portrait labels through a label recognition model. Compared with the traditional technology that needs to train classifiers for different label categories separately, it can greatly shorten the model training time and improve the recognition efficiency. At the same time, the multi-label recognition method of the present application performs corresponding data preprocessing for different types of user data, so that the obtained standard portrait feature vectors can better meet the recognition requirements, and can improve the recognition accuracy of user portrait labels.

Please refer to Figure 8, the embodiment of the present application also provides a multi-label identification device, which can realize the above-mentioned multi-label identification method, the device includes:

A data acquisition module 801, configured to acquire raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

A normalization module 802, configured to perform normalization processing on user basic data to obtain user basic features;

The feature extraction module 803 is used to perform feature extraction on user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix;

The word segmentation module 804 is used to perform word segmentation processing on the user comment data to obtain the comment text word segment vector;

Contrastive learning module 805, for inputting the comment text word segment vector into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector ;

The fusion module 806 is used to fuse the user basic features, behavior feature matrix and comment word embedding vectors to obtain standard portrait feature vectors;

The label recognition module 807 is used to perform label recognition processing on the standard portrait feature vector through the pre-trained label recognition model to obtain the probability value of each preset portrait label;

The comparison module 808 is configured to obtain the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold.

The specific implementation of the multi-tag identification device is basically the same as the specific embodiment of the above-mentioned multi-label identification method, and will not be repeated here.

The embodiment of the present application also provides an electronic device, the electronic device includes: a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory , when the program is executed by the processor, the above multi-label identification method is realized. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.

Please refer to FIG. 9. FIG. 9 illustrates a hardware structure of an electronic device in another embodiment. The electronic device includes:

The processor 901 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs, so as to realize The technical solutions provided by the embodiments of the present application;

The memory 902 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM). The memory 902 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 902 and called by the processor 901 to execute a multi- The label recognition method, wherein the multi-label recognition method includes: obtaining original data, wherein the original data includes user basic data, user behavior data, and user comment data; normalizing the user basic data to obtain user basic features; The trained graph convolution model extracts features from user behavior data to obtain a behavior feature matrix; performs word segmentation processing on user comment data to obtain comment text segment vectors; input comment text segment vectors into the pre-trained comparative learning model, Multiply the word segment vector of the comment text with the reference word embedding matrix in the comparative learning model to obtain the comment word embedding vector; perform fusion processing on the user basic features, behavior feature matrix and comment word embedding vector to obtain the standard portrait feature vector Carrying out label recognition processing on the standard portrait feature vector through the pre-trained label recognition model to obtain the probability value of each preset portrait label; according to the size relationship between the probability value and the preset probability threshold, the target portrait label is obtained;

The input/output interface 903 is used to realize information input and output;

The communication interface 904 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.);

bus 905, for transferring information between various components of the device (such as processor 901, memory 902, input/output interface 903 and communication interface 904);

The processor 901 , the memory 902 , the input/output interface 903 and the communication interface 904 are connected to each other within the device through the bus 905 .

An embodiment of the present application also provides a storage medium, which is a computer-readable storage medium for computer-readable storage, and the computer-readable storage medium may be non-volatile or volatile. The storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement a multi-label identification method, wherein the multi-label identification method includes: obtaining original data, wherein the original data Including user basic data, user behavior data and user comment data; normalize user basic data to obtain user basic features; use pre-trained graph convolution model to extract user behavior data to obtain behavior feature matrix; The user comment data is subjected to word segmentation processing to obtain the comment text segment vector; the comment text segment vector is input into the pre-trained comparative learning model, so that the comment text segment vector and the reference word embedding matrix in the comparative learning model are matrix correlated. Multiply the embedding vector of the comment to get the embedding vector of the comment; perform fusion processing on the user’s basic features, behavior feature matrix and embedding vector of the comment to obtain the standard portrait feature vector; use the pre-trained label recognition model to perform label recognition on the standard portrait feature vector to obtain each A probability value of a preset portrait label; according to a magnitude relationship between the probability value and a preset probability threshold, a target portrait label is obtained.

As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are to illustrate the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation to the technical solutions provided by the embodiments of the present application. Those skilled in the art know that with the evolution of technology and new For the emergence of application scenarios, the technical solutions provided by the embodiments of the present application are also applicable to similar technical problems.

Those skilled in the art can understand that the technical solutions shown in Figures 1-7 do not constitute a limitation to the embodiments of the present application, and may include more or fewer steps than those shown in the illustrations, or combine certain steps, or be different A step of.

The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc., which can store programs. medium.

The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims

A multi-label identification method, wherein the method comprises:

Obtaining raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

performing normalization processing on the user basic data to obtain user basic features;

performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix;

Carry out word segmentation processing to described user comment data, obtain comment text word segment vector;

The comment text word segment vector is input into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector;

Perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

Carrying out label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label;

According to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.
The multi-label recognition method according to claim 1, wherein the step of performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix includes:

Mapping the user behavior data to a preset vector space to obtain a user behavior feature vector;

Construct a behavior feature map according to the preset course type and the user behavior feature vector;

Carrying out graph convolution processing on the behavior feature map to obtain a behavior degree matrix and a behavior adjacency matrix;

Performing difference processing on the behavior degree matrix and the behavior matrix to obtain a behavior feature matrix.
The multi-label recognition method according to claim 1, wherein the step of performing word segmentation processing on the user comment data to obtain a comment text word segment vector includes:

Segmenting the user comment data through a preset word segmenter to obtain comment text segments;

Encoding processing is performed on the word segment of the comment text to obtain a word segment vector of the comment text.
The multi-label recognition method according to claim 1, wherein, the comment text word segment vector is input into a pre-trained contrastive learning model, so that the comment text word segment vector is the same as that in the contrastive learning model The reference word embedding matrix is multiplied by matrix, and the steps of obtaining the comment word embedding vector include:

The comment text word segment vector is input into the comparative learning model, so that the comment text word segment vector and the reference word embedding matrix are matrix multiplied to obtain a plurality of basic word embedding vectors;

The basic word embedding vector is mapped to obtain the comment word embedding vector.
The multi-label recognition method according to claim 1, wherein the step of performing label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label includes:

Reconstructing the standard portrait feature vector according to the preset label dimension to obtain the label feature vector;

The label feature vector is identified by using a preset function to obtain the probability value of each preset portrait label.
The multi-label identification method according to claim 1, wherein the step of obtaining the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold comprises:

Incorporating portrait tags whose probability values are greater than or equal to the preset probability threshold into the same set to obtain a set of candidate portrait tags;

The candidate portrait label set is screened to obtain the target portrait label.
The multi-label recognition method according to any one of claims 1 to 6, wherein, performing label recognition processing on the standard portrait feature vector through the pre-trained label recognition model to obtain the probability of each preset portrait label Before the step of value, the method also includes pre-training the label recognition model, specifically including:

Obtain marked user data;

performing feature extraction on the labeled user data to obtain a sample feature vector;

The sample feature vector is input into the label recognition model;

Calculate the sample probability prediction value of each portrait label category through the loss function of the label recognition model;

Optimizing the loss function of the label recognition model according to the sample probability prediction value, so as to update the label recognition model.
A multi-label identification device, wherein the device includes:

A data acquisition module, configured to acquire raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

A normalization module, configured to perform normalization processing on the user basic data to obtain user basic features;

Feature extraction module, for carrying out feature extraction to described user behavior data by pre-trained graph convolution model, obtains behavior feature matrix;

A word segmentation module, configured to perform word segmentation processing on the user comment data to obtain a comment text word segment vector;

A comparative learning module, for inputting the comment text word segment vector into a pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied, Get the comment embedding vector;

The fusion module is used to perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

A tag recognition module, configured to perform tag recognition processing on the standard portrait feature vector through a pre-trained tag recognition model to obtain the probability value of each preset portrait tag;

The comparison module is used to obtain the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold.
An electronic device, wherein the electronic device includes a memory, a processor, a program stored on the memory and operable on the processor, and a program for realizing the connection between the processor and the memory A data bus for communication, when the program is executed by the processor, the following steps are implemented:

Obtaining raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

performing normalization processing on the user basic data to obtain user basic features;

performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix;

Carry out word segmentation processing to described user comment data, obtain comment text word segment vector;

The comment text word segment vector is input into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector;

Perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

Carrying out label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label;

According to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.
The electronic device according to claim 9, wherein the step of performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix includes:

Mapping the user behavior data to a preset vector space to obtain a user behavior feature vector;

Construct a behavior feature map according to the preset course type and the user behavior feature vector;

Carrying out graph convolution processing on the behavior feature map to obtain a behavior degree matrix and a behavior adjacency matrix;

Performing difference processing on the behavior degree matrix and the behavior matrix to obtain a behavior feature matrix.
The electronic device according to claim 9, wherein the step of performing word segmentation processing on the user comment data to obtain the comment text word segment vector comprises:

Segmenting the user comment data through a preset word segmenter to obtain comment text segments;

Encoding processing is performed on the word segment of the comment text to obtain a word segment vector of the comment text.
The electronic device according to claim 9 , wherein the comment text word segment vector is input into a pre-trained comparative learning model, so that the comment text word segment vector and the reference in the comparative learning model The word embedding matrix is multiplied by matrix to obtain the steps of comment word embedding vector, including:

The comment text word segment vector is input into the comparative learning model, so that the comment text word segment vector and the reference word embedding matrix are matrix multiplied to obtain a plurality of basic word embedding vectors;

The basic word embedding vector is mapped to obtain the comment word embedding vector.
The electronic device according to claim 9, wherein the step of performing label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label comprises:

Reconstructing the standard portrait feature vector according to the preset label dimension to obtain the label feature vector;

The label feature vector is identified by using a preset function to obtain the probability value of each preset portrait label.
The electronic device according to claim 9, wherein the step of obtaining the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold comprises:

Incorporating portrait tags whose probability values are greater than or equal to the preset probability threshold into the same set to obtain a set of candidate portrait tags;

The candidate portrait label set is screened to obtain the target portrait label.
A storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, wherein the storage medium stores one or more programs, and the one or more programs can be used by one or more The processor executes to realize the following steps:

Obtaining raw data, wherein the raw data includes user basic data, user behavior data and user comment data;

performing normalization processing on the user basic data to obtain user basic features;

performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix;

Carry out word segmentation processing to described user comment data, obtain comment text word segment vector;

The comment text word segment vector is input into the pre-trained comparative learning model, so that the comment text word segment vector and the reference word embedding matrix in the comparative learning model are matrix multiplied to obtain the comment word embedding vector;

Perform fusion processing on the user basic features, the behavior feature matrix and the comment word embedding vector to obtain a standard portrait feature vector;

Carrying out label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label;

According to the magnitude relationship between the probability value and the preset probability threshold, the target portrait label is obtained.
The storage medium according to claim 15, wherein the step of performing feature extraction on the user behavior data through a pre-trained graph convolution model to obtain a behavior feature matrix includes:

Mapping the user behavior data to a preset vector space to obtain a user behavior feature vector;

Construct a behavior feature map according to the preset course type and the user behavior feature vector;

Carrying out graph convolution processing on the behavior feature map to obtain a behavior degree matrix and a behavior adjacency matrix;

Performing difference processing on the behavior degree matrix and the behavior matrix to obtain a behavior feature matrix.
The storage medium according to claim 15, wherein the step of performing word segmentation processing on the user comment data to obtain the comment text word segment vector comprises:

Segmenting the user comment data through a preset word segmenter to obtain comment text segments;

Encoding processing is performed on the word segment of the comment text to obtain a word segment vector of the comment text.
The storage medium according to claim 15, wherein the comment text word segment vector is input into a pre-trained comparative learning model, so that the comment text word segment vector and the reference in the comparative learning model The word embedding matrix is multiplied by matrix to obtain the steps of comment word embedding vector, including:

The comment text word segment vector is input into the comparative learning model, so that the comment text word segment vector and the reference word embedding matrix are matrix multiplied to obtain a plurality of basic word embedding vectors;

The basic word embedding vector is mapped to obtain the comment word embedding vector.
The storage medium according to claim 15, wherein the step of performing label recognition processing on the standard portrait feature vector through a pre-trained label recognition model to obtain the probability value of each preset portrait label includes:

Reconstructing the standard portrait feature vector according to the preset label dimension to obtain the label feature vector;

The label feature vector is identified by using a preset function to obtain the probability value of each preset portrait label.
The storage medium according to claim 15, wherein the step of obtaining the target portrait label according to the magnitude relationship between the probability value and the preset probability threshold comprises:

Incorporating portrait tags whose probability values are greater than or equal to the preset probability threshold into the same set to obtain a set of candidate portrait tags;

The candidate portrait label set is screened to obtain the target portrait label.