CN113704620B

CN113704620B - User tag updating method, device, equipment and medium based on artificial intelligence

Info

Publication number: CN113704620B
Application number: CN202111013277.4A
Authority: CN
Inventors: 纪曾文
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2023-08-18
Anticipated expiration: 2041-08-31
Also published as: CN113704620A

Abstract

The invention discloses a user tag updating method, a device, equipment and a storage medium based on artificial intelligence, which relate to the artificial intelligence technology and comprise the steps of firstly acquiring a subscription tag subset, clustering based on embedded vectors of user history data and embedded vector sets corresponding to other user sets, and acquiring target user clustering sub-clusters to which the embedded vectors belong and target user unique identification code sets corresponding to the target user clustering sub-clusters; then obtaining a hot user portrait tag set corresponding to the unique identification code set of the target user; and finally, combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique user identification code. The user label is provided with a fixed subscription label and a dynamic label fed back along with the clicking action of the user, and the diversity and the accuracy of the recommended content based on the user label are realized.

Description

User tag updating method, device, equipment and medium based on artificial intelligence

Technical Field

The present invention relates to the field of intelligent decision making technologies of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for updating a user tag based on artificial intelligence.

Background

At present, the recommendation system can all encounter the problem of the martai effect, in the information flow recommendation scene, the recommended content appears to be narrower and narrower on the user, the interest labels of the user are more and more concentrated, and therefore the recommended content for the user is more narrow, and the method is circulated.

The common solution is to perform similar interest expansion for the user or search for the user interest based on a method of user EE (the general name of EE in user EE is Exploration and Exploitation, which indicates searching and mining the user interest), but these methods still use the click action of the user as feedback to weight the content clicked by the user, so that the weighted content is recommended again, but another martai effect is trapped, so that the recommended content is also narrower.

Disclosure of Invention

The embodiment of the invention provides a user tag updating method, device, equipment and storage medium based on artificial intelligence, and aims to solve the problem that in the prior art, an information recommendation system takes click behaviors of users as feedback to weight contents clicked by the users, so that the weighted contents are recommended to be emphasized, and the recommended contents are more and more concentrated on the emphasized tags, so that the users cannot receive recommended information more comprehensively.

In a first aspect, an embodiment of the present invention provides an artificial intelligence based user tag updating method, including:

if a subscription tag set distribution instruction is detected, receiving a subscription tag subset uploaded by a user terminal, and acquiring user history data according to a unique user identification code of the user terminal;

invoking a pre-trained deep semantic matching model, and inputting the user history data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user history data;

acquiring an embedded vector set corresponding to other stored user sets, clustering according to the embedded vector and the embedded vector set to obtain a user cluster, and acquiring a target user cluster sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user cluster sub-cluster from the user cluster;

acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set, and counting the number of each user portrait tag to obtain a user portrait tag counting result;

sorting the user portrait tag statistical results according to the number of the user portrait tags in a descending order to obtain user portrait tag sorting results, and obtaining user portrait tags which do not exceed a preset ranking threshold in the user portrait tag sorting results to form a hot user portrait tag set; and

And combining the subscription tag subset with the hot user portrait tag set to obtain a user current optimal tag set corresponding to the user unique identification code.

In a second aspect, an embodiment of the present invention provides an artificial intelligence based user tag updating apparatus, including:

the user history data acquisition unit is used for receiving the subscription tag subset uploaded by the user side and acquiring user history data according to the unique user identification code of the user side if the subscription tag set distribution instruction is detected;

the embedded vector acquisition unit is used for calling a pre-trained deep semantic matching model, inputting the user history data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user history data;

the target identification code set acquisition unit is used for acquiring an embedded vector set corresponding to other stored user sets, clustering according to the embedded vector and the embedded vector set to obtain a user clustering cluster, and acquiring a target user clustering sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user clustering sub-cluster from the user clustering cluster;

The tag statistics unit is used for acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set and counting the number of each user portrait tag to obtain a user portrait tag counting result;

the hot tag set acquisition unit is used for ordering the user portrait tag statistical results according to the descending order of the number of the user portrait tags to obtain user portrait tag ordering results, and acquiring user portrait tags which do not exceed a preset ranking threshold in the user portrait tag ordering results to form a hot user portrait tag set; and

and the optimal tag set acquisition unit is used for combining the subscription tag subset with the popular user portrait tag set to obtain the current optimal tag set of the user corresponding to the unique user identification code.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the artificial intelligence based user tag updating method according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the artificial intelligence based user tag updating method according to the first aspect.

The embodiment of the invention provides a user tag updating method, device, equipment and storage medium based on artificial intelligence, which ensures that the user tag has a fixed subscription tag and a dynamic tag fed back along with the clicking behavior of the user, thereby realizing the diversity and accuracy of the recommended content based on the user tag.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an application scenario of an artificial intelligence-based user tag updating method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of an artificial intelligence based user tag updating method according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of an artificial intelligence based user tag updating apparatus provided by an embodiment of the present invention;

fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of an application scenario of an artificial intelligence-based user tag updating method according to an embodiment of the present application; fig. 2 is a schematic flow chart of an artificial intelligence-based user tag updating method according to an embodiment of the present application, where the artificial intelligence-based user tag updating method is applied to a server, and the method is executed by application software installed in the server.

As shown in fig. 2, the method includes steps S101 to S106.

S101, if a subscription tag set distribution instruction is detected, receiving a subscription tag set uploaded by a user terminal, and acquiring user history data according to a user unique identification code of the user terminal.

In this embodiment, in order to more clearly understand the technical solution, the following describes the execution subject concerned in detail. The application describes a technical scheme by taking a service as an execution subject.

And a server in which a plurality of tag-type content data (e.g., video data, text data, voice data, shopping product data, etc.) are stored. And a plurality of user history data tables are stored in the server, and each user history data table stores user history data of a user corresponding to the same user unique identification code (the user characteristics corresponding to the user, such as user portrait labels, can be obtained by analyzing a plurality of pieces of user history data in a certain user history data table). Also stored in the server is a DSSM model (DSSM is also known as Deep Structured Semantic Models, which represents a deep semantic matching model) that can both obtain a low-dimensional semantic vector representation sentence embedding of sentences and predict semantic similarity of two sentences, which is vector-transformed based on user history data. After the user history data is converted into the corresponding embedded vectors, clustering analysis can be performed according to the embedded vectors corresponding to the users, so that content is recommended according to the user labels.

The user terminal is an intelligent terminal (such as a smart phone, a tablet personal computer and the like) used by a user, and after the user starts a designated application program (such as a video APP, a music APP, a reader APP, an online shopping APP and the like) to establish communication with the server by operating the user terminal, the application program can collect user behavior data (such as the type of the browsed commodity of the purchased commodity, the type of the watched commodity and the like) and upload the user behavior data to the server and store the user behavior data in a corresponding user history data table in the use process by watching the video, listening to the music, watching the literature, purchasing the commodity and the like.

When the server detects a subscription tag set distribution instruction, the server indicates that the server expects to distribute fixed subscription tag sets to some users, and at the moment, the server firstly acquires the locally stored subscription tag sets and then sends the subscription tag sets to a user side. And when the subscription tag set distribution instruction is triggered, a target user end which receives the subscription tag set can be selected, and the target user end is directly and directionally transmitted to the target user end to realize the expansion of the tags of the target user end.

In one embodiment, step S101 includes:

if a subscription tag set distribution instruction is detected, acquiring a stored subscription tag set, and sending the subscription tag set to a user side;

Receiving a subscription tag subset sent by a user terminal according to the subscription tag set, mapping and binding the subscription tag subset and a user unique identification code corresponding to the user terminal, and storing the subscription tag subset and the user unique identification code in a local area;

and searching and obtaining corresponding user history data in a local user database according to the unique user identification code.

In this embodiment, when the user receives the subscription tag set sent by the server, it may be determined that the user corresponds to the selected subscription tag by:

firstly, directly judging whether the user label which is the same as the subscription label in the subscription label set is stored in the user data stored locally at the user terminal, if so, automatically selecting the subscription label in the subscription label set, and repeating the operation until the subscription label is determined according to the locally stored user data, thereby forming a subscription label subset. For example, the subscription label set originally sent by the server includes 10 subscription labels, after the data comparison operation, 3 subscription labels are selected to form a subscription label subset, and the label fixed weight value corresponding to each of the 3 subscription labels is determined according to user data (for example, historical frequency of clicking by the user for the content corresponding to the 3 types of subscription labels) local to the user terminal.

And secondly, directly displaying the subscription label set on the label on the user terminal interface for clicking and selecting by a user, and when the user finishes the selection of the labels on the interface and sets a label fixed weight value for each selected subscription label, forming a subscription label subset by the selected subscription labels.

Once the subset of subscription labels is obtained on the user side, it is sent to the server. When receiving a subscription tag subset, a server maps and binds the subscription tag subset with a user unique identification code corresponding to a user side and stores the subscription tag subset in a local place; the subscription label subset comprises at least one subscription label, each subscription label corresponds to a label fixed weight value, and the sum of the label fixed weight values of the subscription labels is recorded as a label fixed weight total value. Because the setting of the subscription label subset is completed, when the labels are added as the user labels of the corresponding users, the weight change of the subscription labels can not be caused along with the further operations of watching videos, listening to music, watching literary works, purchasing goods and the like of the users by using the user side, so that the server can always stably push corresponding contents aiming at the fixed labels of the users, the situation that the user clicks the contents corresponding to a certain label can not be caused, and the server only recommends the contents corresponding to the label later. The subscription label subset comprises at least one subscription label, each subscription label corresponds to a label fixed weight value, and the sum of the label fixed weight values of the subscription labels is recorded as a label fixed weight total value.

After the user fixed interest tags are set according to the subscription tag subset, the sum of the tag fixed weight values of all the subscription tags included in the subscription tag subset is recorded as a tag fixed weight total value, and the tag fixed weight total value is smaller than 1, namely, a partial weight space is reserved for assigning values to the dynamic user tags of the user, so that the dynamic adjustment of the user on some non-fixed tags is comprehensively considered. In order to dynamically adjust the user labels based on the user history data, the server searches and acquires the corresponding user history data in the local user database according to the unique user identification code.

S102, invoking a pre-trained deep semantic matching model, and inputting the user history data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user history data.

In this embodiment, the server locally stores a pre-trained deep semantic matching model (i.e., DSSM model), which can be generally divided into three layers, namely an input layer, a representation layer, and a matching layer.

User feature training data is input into the input layer, the user features comprise user dense features (such as user gender and the like, and are characterized in that the dimension is not particularly high, each sample appears) and user sparse features (such as user preference and the like, and are characterized in that the feature dimension is high, but the frequency of occurrence in each sample is low), wherein the user dense features are subjected to single-heat encoding operation, the user sparse features are subjected to ebedding dimension reduction to a low-dimensional space (64 or 32 dimensions), and then feature splicing operation is carried out. The advertisement side (also understood as the course side) is similar to the user side.

The resulting stitched features are then provided to respective deep learning network models. The user features and the advertisement features are converted into vectors with fixed lengths after passing through the two full connection layers, and user and ad ebedding with the same dimension are obtained. The number of network layers and dimensions within each tower may be different, but the dimensions of the output must be the same so that operations can be performed at the matching layer.

After the model is trained, user and ad-casting are obtained respectively, and if a crowd is recommended for a specific advertisement, the ad-casting of the advertisement is calculated to be similar to the user-casting of all crowds respectively. And selecting N crowd subsets closest to the advertisement serving crowd as advertisement serving crowd, so that the advertisement recommendation task is completed.

In this embodiment, only the input layer and the representation layer in the DSSM model are used, and the user dense feature and the user sparse feature in the user history data are respectively input and operated, so that the embedded vector corresponding to the user history data can be obtained; the Word Embedding vector (Word Embedding) can be used for converting a Word into a vector representation with a fixed length, so that mathematical processing is facilitated. Similarly, the embedded vectors of other users may also be calculated based on historical data of other users.

In one embodiment, step S102 includes:

acquiring user dense features and user sparse features in the user history data;

inputting the user dense features to an input layer of the depth semantic matching model for single-heat coding to obtain a first coding vector of a user;

inputting the sparse features of the user to an input layer of the depth semantic matching model for word embedding processing to obtain a second coding vector of the user;

performing feature stitching on the first coding vector of the user and the second coding vector of the user to obtain a current coding vector;

and inputting the current coding vector to a representation layer of the depth semantic matching model for full connection processing to obtain an embedded vector corresponding to the user history data.

In this embodiment, since the obtaining of the embedded vector of the user does not need to be processed by the matching layer of the DSSM model, at this time, the user history data is input to the input layer of the DSSM model to perform the one-hot encoding, word embedding processing and feature splicing, so as to obtain the current encoded vector, and then the current encoded vector is input to the representation layer of the deep semantic matching model to perform the full-connection processing, so as to obtain the embedded vector corresponding to the user history data, so that the embedded vector corresponding to the user can be obtained quickly. The word embedding process is used for forming some intermediate features through some dimension reduction and mapping in the sparse high-dimensional feature vector processing process.

S103, acquiring an embedded vector set corresponding to other stored user sets, clustering according to the embedded vector and the embedded vector set to obtain a user clustering cluster, and acquiring a target user clustering sub-cluster to which the embedded vector belongs and a target user unique identification code set corresponding to the target user clustering sub-cluster from the user clustering cluster.

In this embodiment, after obtaining the embedded vector set formed by the embedded vectors corresponding to each user, clustering may be performed on the embedded vector set in the server, so as to obtain a plurality of user clusters and target user cluster sub-clusters to which the embedded vectors belong, and accurately obtain the target user unique identification code set corresponding to the target user cluster sub-clusters.

In one embodiment, step S103 includes:

obtaining an embedded vector set corresponding to other stored user sets, and carrying out K-means clustering on the embedded vector and the embedded vector set to obtain user clustering clusters with the same number as the preset clustering group number;

and acquiring the user grouping sub-cluster corresponding to the embedded vector as a target user grouping sub-cluster, and acquiring the user unique identification codes respectively corresponding to the embedded vectors in the target user grouping sub-cluster to form a target user unique identification code set.

In this embodiment, K-means clustering is performed on the embedded vector and the embedded vector set to obtain a user cluster. And the user grouping sub-cluster to which the embedded vector belongs is directly obtained, so that the user corresponding to the user grouping sub-cluster can be determined, and at the moment, the unique user identification codes respectively corresponding to each embedded vector in the target user grouping sub-cluster are obtained to form a target user unique identification code set.

In an embodiment, the K-means clustering the embedded vector and the embedded vector set to obtain user clusters having the same number as the preset number of clusters, including:

selecting the same number of embedded vectors as the number of preset cluster groups in the embedded vector set, and taking the selected embedded vectors as the initial cluster center of each cluster;

dividing the embedded vector set according to cosine similarity between each embedded vector in the embedded vector set and each initial clustering center to obtain an initial clustering result;

acquiring an adjusted clustering center of each cluster according to the initial clustering result;

and dividing the embedded vectors of the embedded vector set according to the cosine similarity with the adjusted clustering center according to the adjusted clustering center until the same times of the clustering result are kept more than preset times, so as to obtain the user clustering.

In this embodiment, since the embedded vector set may be clustered by the K-means clustering method, the specific procedure is as follows:

a) Randomly selecting N2 embedded vectors from an embedded vector set comprising N1 embedded vectors, and taking the N2 embedded vectors as an initial cluster center of N2 clusters; wherein the initial total number of the embedded vectors in the embedded vector set is N1, N2 embedded vectors are arbitrarily selected from the N1 embedded vectors (N2 < N1, N2 is a preset cluster number, i.e., a preset cluster group number), and the initially selected N2 embedded vectors are used as initial cluster centers.

b) Respectively calculating cosine similarity of the rest embedded vectors to N2 initial clustering centers, and respectively classifying the rest embedded vectors into clusters with minimum cosine similarity to obtain initial clustering results; namely, each rest embedded vector selects an initial clustering center closest to the rest embedded vector and classifies the initial clustering center as a class; the embedded vector is then divided into N2 clusters with an initial cluster center initially selected, one for each cluster of data.

c) And recalculating the clustering centers of the N2 clusters according to the initial clustering result.

d) Reclustering all elements in the N1 embedded vectors according to a new clustering center;

e) Repeating the step d) until the clustering result is not changed, and obtaining the clustering result corresponding to the preset clustering number.

After the cluster classification is completed, the embedded vector sets can be quickly grouped, and a plurality of cluster clusters are obtained to form the user cluster.

S104, acquiring a user portrait tag set corresponding to each user unique identification code in the target user unique identification code set, and counting the number of each user portrait tag to obtain a user portrait tag counting result.

In this embodiment, statistics is performed on all user portrait tags included in the user portrait tag set corresponding to each user unique identifier in the target user unique identifier set, and the occurrence frequency of each user portrait tag, so that statistics is completed.

S105, sorting the user portrait tag statistical results according to the number of the user portrait tags in a descending order to obtain user portrait tag sorting results, and obtaining user portrait tags which do not exceed a preset ranking threshold in the user portrait tag sorting results to form a hot user portrait tag set.

In this embodiment, the user portrait tags that do not exceed the preset ranking threshold in the user portrait tag ranking result are used as hot user portrait tags, so as to form a hot user portrait tag set. At this time, it is equivalent to dividing the users into corresponding user groups based on the user history data, and then continuously adjusting the dynamic labels of the users based on the hot labels in the hot groups.

S106, combining the subscription tag subset with the popular user portrait tag set to obtain a user current optimal tag set corresponding to the user unique identification code.

In this embodiment, after the fixed user tag corresponding to the subscription tag subset and the dynamic tag corresponding to the popular user portrait tag set are obtained, the subscription tag subset and the popular user portrait tag set may be combined to obtain the current optimal tag set of the user corresponding to the unique user identification code.

And the sum of the label weight values corresponding to the hot user portrait labels in the hot user portrait label set is recorded as a label change weight total value, and the sum of the label change weight total value and the label fixed weight total value is 1. At this time, it should be noted that the sum of the tag variation weight total value and the tag fixed weight total value is 1, so that the server can provide the user with the corresponding tag corresponding content according to the weight corresponding to each tag in the current optimal tag set.

In an embodiment, step S106 further includes:

and if a hot user portrait tag updating instruction is detected, acquiring the user unique identification code, searching and acquiring corresponding current user data in a local user database, taking the current user data updating as user history data, returning to execute the invoking pre-trained deep semantic matching model, inputting the user history data into the deep semantic matching model for operation, and obtaining an embedded vector corresponding to the user history data.

In this embodiment, in order to dynamically adjust the user's hot user portrait tag set, a server may be further configured to periodically trigger generation of a hot user portrait tag update instruction (for example, a hot user portrait tag update instruction is automatically generated in 1 early morning of 1 st month of each natural month), and at this time, the server acquires current user data accumulated in the last natural month of the user, updates the current user data to serve as user history data, returns to execute the invoking pre-trained deep semantic matching model, and inputs the user history data to the deep semantic matching model for operation, so as to obtain an embedded vector corresponding to the user history data. By the method, the hot user portrait labels can be dynamically adjusted regularly, and the phenomenon of 'Martai effect' of the labels is avoided.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The method ensures that the labels of the users have fixed subscription labels and dynamic labels fed back along with the clicking behaviors of the users, and realizes the diversity and accuracy of the recommended content based on the labels of the users.

The embodiment of the invention also provides an artificial intelligence-based user tag updating device which is used for executing any embodiment of the artificial intelligence-based user tag updating method. In particular, referring to fig. 3, fig. 3 is a schematic block diagram of an artificial intelligence based user tag updating apparatus according to an embodiment of the present invention. The artificial intelligence based user tag updating apparatus 100 may be configured in a server.

The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution network (Content DeliveryNetwork, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

As shown in fig. 3, the artificial intelligence based user tag updating apparatus 100 includes: a user history data acquisition unit 101, an embedded vector acquisition unit 102, a target identification code set acquisition unit 103, a tag statistics unit 104, a popular tag set acquisition unit 105, and an optimal tag set acquisition unit 106.

The user history data obtaining unit 101 is configured to, if a subscription tag set distribution instruction is detected, receive a subscription tag subset uploaded by a user terminal, and obtain user history data according to a unique user identification code of the user terminal.

In this embodiment, when the server detects a subscription tab set distribution instruction, it indicates that the server desires to distribute a fixed subscription tab set to some users, and at this time, the server acquires a locally stored subscription tab set first, and then sends the subscription tab set to the user side. And when the subscription tag set distribution instruction is triggered, a target user end which receives the subscription tag set can be selected, and the target user end is directly and directionally transmitted to the target user end to realize the expansion of the tags of the target user end.

In one embodiment, the user history data acquiring unit 101 includes:

the subscription tag set sending unit is used for obtaining a stored subscription tag set and sending the subscription tag set to a user side if a subscription tag set distribution instruction is detected;

The subscription tag subset storage unit is used for receiving the subscription tag subset sent by the user terminal according to the subscription tag set, mapping and binding the subscription tag subset and a user unique identification code corresponding to the user terminal, and storing the subscription tag subset and the user unique identification code in a local area;

and the historical data retrieval unit is used for retrieving and acquiring corresponding user historical data in a local user database according to the unique user identification code.

The embedded vector obtaining unit 102 is configured to invoke a pre-trained deep semantic matching model, input the user history data to the deep semantic matching model for operation, and obtain an embedded vector corresponding to the user history data.

User feature training data is input in the input layer, the user features comprise user dense features and user sparse features, wherein the user dense features are subjected to single-heat coding operation, the user sparse features are subjected to ebadd dimension reduction to a low-dimension space (64 or 32 dimensions), and then feature splicing operation is performed. The advertisement side (also understood as the course side) is similar to the user side.

In this embodiment, only the input layer and the representation layer in the DSSM model are used, and the user dense feature and the user sparse feature in the user history data are respectively input and operated, so that the embedded vector corresponding to the user history data can be obtained; . Similarly, the embedded vectors of other users may also be calculated based on historical data of other users.

In an embodiment, the embedded vector acquisition unit 102 includes:

A user characteristic obtaining unit, configured to obtain a user dense characteristic and a user sparse characteristic in the user history data;

the first coding unit is used for inputting the user dense features into an input layer of the depth semantic matching model to perform one-time thermal coding to obtain a first coding vector of the user;

the second coding unit is used for inputting the sparse features of the user to an input layer of the depth semantic matching model to perform word embedding processing to obtain a second coding vector of the user;

and the full connection unit is used for inputting the current coding vector to the representation layer of the depth semantic matching model to perform full connection processing to obtain an embedded vector corresponding to the user history data.

In this embodiment, since the obtaining of the embedded vector of the user does not need to be processed by the matching layer of the DSSM model, at this time, the user history data is input to the input layer of the DSSM model to perform the one-time thermal encoding and feature stitching, so as to obtain the current encoded vector, and then the current encoded vector is input to the representation layer of the deep semantic matching model to perform the full-connection processing, so as to obtain the embedded vector corresponding to the user history data, so that the embedded vector corresponding to the user can be obtained quickly.

The target identification code set obtaining unit 103 is configured to obtain an embedded vector set corresponding to another stored user set, cluster according to the embedded vector and the embedded vector set to obtain a user cluster, obtain a target user cluster sub-cluster to which the embedded vector belongs from the user cluster, and obtain a target user unique identification code set corresponding to the target user cluster sub-cluster.

In an embodiment, the target id set obtaining unit 103 includes:

the K-means clustering unit is used for acquiring an embedded vector set corresponding to other stored user sets, and carrying out K-means clustering on the embedded vector set and the embedded vector set to obtain user clusters with the same number as the preset cluster group number;

the target user unique identification code set acquisition unit is used for acquiring the user grouping sub-cluster corresponding to the embedded vector as a target user grouping sub-cluster, and acquiring the user unique identification codes respectively corresponding to the embedded vectors in the target user grouping sub-cluster to form a target user unique identification code set.

In one embodiment, the K-means clustering unit comprises:

the initial cluster center acquisition unit is used for selecting the embedded vectors with the same number as the preset cluster group number in the embedded vector set, and taking the selected embedded vectors as the initial cluster center of each cluster;

the initial clustering unit is used for dividing the embedded vector sets according to cosine similarity between each embedded vector in the embedded vector sets and each initial clustering center to obtain initial clustering results;

the cluster center adjusting unit is used for acquiring an adjusted cluster center of each cluster according to the initial cluster result;

and the final clustering result acquisition unit is used for dividing the embedded vectors of the embedded vector set according to the cosine similarity with the adjusted clustering center until the number of times of the same clustering result is kept more than the preset number of times, so as to obtain the user clustering.

And the tag statistics unit 104 is used for acquiring the user portrait tag set corresponding to each user unique identification code in the target user unique identification code set and counting the number of each user portrait tag to obtain a user portrait tag counting result.

And the hot tag set obtaining unit 105 is configured to sort the user portrait tag statistical results according to the number of the user portrait tags in a descending order to obtain a user portrait tag sorting result, and obtain user portrait tags in the user portrait tag sorting result, where the user portrait tags do not exceed a preset ranking threshold value, so as to form a hot user portrait tag set.

And the optimal tag set obtaining unit 106 is configured to combine the subscription tag subset with the popular user portrait tag set to obtain a current optimal tag set of the user corresponding to the unique user identification code.

In one embodiment, the artificial intelligence based user tag updating apparatus 100 further comprises:

And the label updating unit is used for acquiring the user unique identification code, searching and acquiring corresponding current user data in a local user database if a hot user portrait label updating instruction is detected, taking the current user data updating as user history data, returning to execute the invoking pre-trained deep semantic matching model, and inputting the user history data into the deep semantic matching model for operation to obtain an embedded vector corresponding to the user history data.

The device ensures that the user label has a fixed subscription label and a dynamic label fed back along with the clicking action of the user, thereby realizing the diversity and accuracy of the recommended content based on the user label.

The artificial intelligence based user tag updating apparatus described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.

With reference to FIG. 4, the computer device 500 includes a processor 502, a memory, and a network interface 505, connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform an artificial intelligence based user tag update method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform an artificial intelligence based user tag update method.

The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 4 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor 502 is configured to execute a computer program 5032 stored in a memory to implement the artificial intelligence-based user tag updating method disclosed in the embodiment of the present invention.

Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 4 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 4, and will not be described again.

It should be appreciated that in embodiments of the present invention, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a nonvolatile computer readable storage medium or a volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the artificial intelligence based user tag updating method disclosed in the embodiments of the present invention.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An artificial intelligence based user tag updating method, comprising:

2. The method for updating user tag according to claim 1, wherein if the subscription tag set distribution command is detected, receiving the subscription tag subset uploaded by the user terminal, and obtaining the user history data according to the unique user identification code of the user terminal, comprises:

3. The method for updating user labels based on artificial intelligence according to claim 1, wherein the obtaining the set of embedded vectors corresponding to the stored set of other users, clustering according to the embedded vectors and the set of embedded vectors to obtain a user cluster, obtaining a target user cluster sub-cluster to which the embedded vectors belong from the user cluster, and a target user unique identification code set corresponding to the target user cluster sub-cluster, includes:

4. The method for updating user labels based on artificial intelligence according to claim 1, wherein the invoking the pre-trained deep semantic matching model, inputting the user history data into the deep semantic matching model for operation, obtaining an embedded vector corresponding to the user history data, comprises:

5. The artificial intelligence based user tag updating method of claim 3, wherein the K-means clustering the embedded vector and the set of embedded vectors to obtain user clusters having the same number as a preset number of clusters, comprises:

6. The method for updating user labels based on artificial intelligence according to claim 1, wherein after combining the subscription label subset with the popular user portrait label set to obtain a user current optimal label set corresponding to the user unique identification code, further comprises:

7. The method for updating user labels based on artificial intelligence according to claim 1, wherein the sum of the label weight values corresponding to the hot user portrait labels in the hot user portrait label set is denoted as a label variation weight total value, and the sum of the label variation weight total value and the label fixed weight total value is 1.

8. An artificial intelligence based user tag updating apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the artificial intelligence based user tag updating method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the artificial intelligence based user tag updating method of any of claims 1 to 7.