CN113674009A

CN113674009A - Method and device for determining target user

Info

Publication number: CN113674009A
Application number: CN202010414541.4A
Authority: CN
Inventors: 鞠明兴
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2021-11-19

Abstract

The invention discloses a method and a device for determining a target user, and relates to the technical field of computers. One embodiment of the method comprises: judging a target group to which each article in a preset article set belongs in a predetermined article group; for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a pre-trained feature generation model, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance to obtain a classification result of the primary user in the target group; and judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group. This embodiment enables accurate determination of a target user based on a predetermined group of items.

Description

Method and device for determining target user

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a target user.

Background

With the rapid development of internet technology and new internet-based business, a server needs to reach a target user more precisely, for example, a high-potential user with a higher consumption tendency. At present, a server generally constructs a target user judgment model based on fixed article class dimensions or brand dimensions, the model is generally a two-classification machine learning model, statistical characteristics are constructed mainly through behavior information of users on classes or brands and user attribute information, and whether the users consume or not is taken as a target variable.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: firstly, because fixed article types or brands split the incidence relation among different article types or different brands, the prior art cannot accurately depict the behavior track characteristics of a user aiming at various articles, and further cannot describe the consumption expectation of the user aiming at a specific article pool through the user behavior; secondly, the model in the prior art is associated with fixed article types or brands, the use range of the model is relatively fixed, and the model is not consistent with the association relation between dynamically changed articles based on user behaviors; in addition, in the prior art, a large amount of manpower and material resources are required to perform ETL (extract, transform, load) on the original data, which is not beneficial to realizing fast iteration of the service.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for determining a target user, which are capable of performing accurate discrimination of the target user based on a predetermined group of items, and automatically constructing features using a machine learning method, thereby avoiding adverse effects on actual business caused by performing ETL processing.

To achieve the above object, according to one aspect of the present invention, there is provided a method of determining a target user.

The method for determining the target user in the embodiment of the invention comprises the following steps: judging a target group to which each article in a preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence; for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group; and judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

Optionally, the group of items is determined by: generating an article map for representing the association relation among a plurality of articles according to the occurrence times and/or the occurrence sequence of the plurality of articles in the same user behavior sequence; performing graph embedding processing on the article map to obtain an embedding vector of each article in the plurality of articles; and clustering the multiple articles by using the embedded vectors to obtain multiple article groups.

Optionally, the method further comprises: after generating the item map: inputting attribute data and title data of the isolated articles in the article map into a pre-trained article similarity model to obtain at least one similar article of the isolated articles; and increasing the association relationship between the isolated item and the similar item in the item map.

Optionally, the determining a target group to which each article in the preset article set belongs in a predetermined article group includes: for any item in the item set, judging whether a predetermined item group contains the identifier of the item: if yes, determining an article group containing the identification of the article as a target group to which the article belongs; otherwise, inputting the attribute data and the title data of the article into the article similar model to obtain similar articles of the article; determining a target group to which the similar item belongs by using the item group to which the similar item belongs; and after determining the target group to which the article belongs, adding the article into the target group.

Optionally, the feature generation model is a deep neural network model, and includes an input layer, a first hidden layer, a second hidden layer, and an output layer, which are connected in sequence; the first hidden layer comprises a first sensing layer, a first normalization layer and a first activation layer; the second hidden layer comprises a second sensing layer, a second normalization layer and a second activation layer; the input data of the second sensing layer is the data after the output data of the first activation layer and the original characteristic data are spliced; and the input data of the output layer is the data after the output data of the second activation layer is spliced with the original characteristic data.

Optionally, the original feature data of any initially selected user for any item in the corresponding target group includes: attribute data of the primary user, attribute data of the article and behavior record data of the primary user for the article; and the obtaining of the optimized feature data of the initially selected user in the target group includes: inputting the original feature data of any primary user for each article in the corresponding target group into the feature generation model, and acquiring a plurality of pieces of multi-dimensional data of the primary user, which are output by the second activation layer; dividing each initial dimension of the multi-dimensional data into at least one target dimension according to a numerical interval, and converting the multi-dimensional data into an optimized characteristic data of the primarily selected user in the target dimension; wherein, the value of the optimized characteristic data in any target dimension is as follows: and the times that the value of the multi-dimensional data in the initial dimension corresponding to the target dimension is in the value interval corresponding to the target dimension.

Optionally, the determining, by using the classification result of each primary user in each target group, whether the primary user is a target user corresponding to the item set includes: obtaining a weighted judgment result of each primary user by utilizing the classification result of each primary user in each target group and a weight value preset for each target group; and when the weighted judgment result is greater than a preset threshold value, determining the primary user as a target user corresponding to the item set.

Optionally, the user classification model is an XGBoost model; and, the feature generation model and the user classification model corresponding to the same group of items are trained by: connecting the feature generation model and the user classification model into a comprehensive model; the input data of the user classification model is as follows: optimized feature data formed by converting output data of a second active layer in the feature generation model; constructing a training sample set applied to the comprehensive model; each training sample in the training sample set comprises original feature data serving as a feature part and a user classification result serving as a label part; and training the feature generation model and the user classification model by using the training sample set.

To achieve the above object, according to another aspect of the present invention, there is provided an apparatus for determining a target user.

The apparatus for determining a target user according to an embodiment of the present invention may include: the group dividing unit is used for judging a target group to which each article in a preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence; a feature generation and classification unit to: for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group; and the target user judging unit is used for judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

Optionally, the apparatus may further comprise: the group pre-determining unit is used for generating an article map for representing the association relation among the various articles according to the occurrence times and/or the occurrence sequence of the various articles in the same user behavior sequence; performing graph embedding processing on the article map to obtain an embedding vector of each article in the plurality of articles; and clustering the multiple articles by using the embedded vectors to obtain multiple article groups.

To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.

An electronic device of the present invention includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of determining a target user provided by the present invention.

To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.

A computer-readable storage medium of the present invention has stored thereon a computer program which, when executed by a processor, implements the method of determining a target user provided by the present invention.

According to the technical scheme of the invention, the embodiment of the invention has the following advantages or beneficial effects: constructing an article map in advance in an offline state according to the occurrence times and/or the occurrence sequence of articles in a user behavior sequence, determining an Embedding vector of the articles from the article map by using a Graph Embedding method (Graph Embedding), and clustering the articles by using the Embedding vector to form a plurality of article groups; when the target user aiming at the article set needs to be determined, the article group to which each article in the article set belongs can be determined in a mode of inquiring or judging similar articles, and then subsequent target user judgment is performed according to the article group. Through the arrangement, the incidence relation among different types of articles or different brands of articles can be accurately embodied, the inherent defect that the target user is judged based on the article types or brands is overcome, and the accurate judgment of the target user is facilitated. Meanwhile, the article group can be dynamically updated according to the user behavior data, so that the article group is consistent with the dynamically changed association relation among the articles based on the user behavior. In addition, the invention uses a deep learning method to directly and automatically generate optimized feature data from the original feature data without ETL processing, thereby improving the working efficiency and saving a large amount of manpower and material resources.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of a method for determining a target user in an embodiment of the present invention;

FIG. 2 is an article map schematic of an embodiment of the invention;

FIG. 3 is a schematic diagram of a feature generation model composition of an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for determining a target user according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the components of an apparatus for identifying a target user in an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic structural diagram of an electronic device for implementing the method for determining the target user in the embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of the main steps of a method for determining a target user according to an embodiment of the present invention.

As shown in fig. 1, the method for determining a target user according to the embodiment of the present invention may be specifically executed according to the following steps:

step S101: and judging a target group to which each article in the preset article set belongs in the predetermined article groups.

In this step, the item set refers to a set consisting of a plurality of items specified by a service person, and the target user to be identified by the present invention is the item set. Illustratively, the target user may be a user having a strong consumer preference for a plurality of items in the collection of items. It is understood that the specific meaning of the target user may vary according to the actual scenario, and the present invention is not limited thereto. Generally, items in an item collection may relate to multiple categories or brands.

In the embodiment of the present invention, the item group may be predetermined in an offline state, and the step of determining the item group may be as follows: first, a certain number of various items (e.g., the entire items of the server platform) and a plurality of users who have performed predetermined actions (e.g., searching, browsing, clicking, paying attention, shopping cart adding, and/or order placing, etc.) on the various items within a recent period of time (e.g., within 3 months) are selected. And then, generating an article map according to the appearance times and/or appearance sequence of the plurality of articles in the same user behavior sequence (the article map is a knowledge map for representing the association relationship among the plurality of articles). It will be appreciated that a sequence of user actions may be made up of a number of items that are related in turn by a complete user action (e.g., submitting an order) or by a number of consecutive user actions of the same type (e.g., consecutive searches, browses, clicks, concerns, or shopping carts).

For example, user behavior sequence 1 is item D, A, B searched sequentially by user a, user behavior sequence 2 is item B, E browsed sequentially by user b, user behavior sequence 3 is item D, E, F clicked sequentially by user b, user behavior sequence 4 is item E, C, B watched sequentially by user c, and user behavior sequence 5 is item B, A to which user c refers sequentially when submitting an order. The item map may be created using the number of occurrences and/or the order of occurrences of the item A, B, C, D, E, F in the user action sequences 1-5. The number of occurrences refers to the number of common occurrences of multiple articles in the same user behavior sequence, and in the embodiment of the present invention, the number of occurrences may be the number of common occurrences of two adjacent articles in the same user behavior sequence, for example, in user behavior sequence 1, the number of occurrences of D, A is 1, the number of occurrences of A, B is 1, and the number of occurrences of D, B is zero.

For example, when the article graph is established, first nodes representing various articles are determined (the nodes may be represented by circles), then directional edges with arrows are added between two nodes according to the appearance sequence of the articles in each user behavior sequence, the directions of the arrows are consistent with the appearance sequence of the articles, and the appearance times of the two nodes can be used as the weight values of the corresponding directional edges. Fig. 2 is a schematic diagram of an article map established according to user behavior sequences 1 to 5 in the embodiment of the present invention, and it can be seen that the article map includes a plurality of nodes representing articles and directed edges between the nodes, and in fig. 2, a weight value of each directed edge is 1, which indicates that the number of occurrences of two adjacent articles in 5 user behavior sequences is 1. It can be understood that, in the above manner of establishing an article map by using the occurrence times and the occurrence sequence simultaneously, if the article map is established by using only the occurrence times, the edges in the article map are all undirected edges; if the article map is built only by using the appearance sequence, all the edges in the article map are directed edges without weight values.

In practical application, the article map established according to the above manner often has isolated articles (i.e., articles corresponding to isolated nodes, which may be defined according to actual needs, for example, as nodes having corresponding edges smaller than two), resulting in poor connectivity of the article map. As a preferred approach, the connectivity of the object map can be enhanced by adding at least one edge to the isolated object in the following manner. Firstly, attribute data (such as price, category and weight of an article) and title data (such as words formed after word segmentation processing is carried out on the title of the article) of an isolated article in an article map can be input into a pre-trained article similar model to obtain at least one similar article of the isolated article; then, an edge (generally, a non-directional edge) between the isolated article and the similar article thereof is added in the article map to represent the association relationship between the isolated articles, and the similarity between the isolated article and the similar article thereof can be used as a weight value of the edge.

Preferably, the above-described item-similarity model may be trained using the following method. Firstly, a training sample set comprising a plurality of training samples is constructed, wherein the characteristic part of each training sample is attribute data and title data of an article, and the label part is the identification (such as stock keeping unit SKU) of a plurality of similar articles and the similarity of each similar article. And then, training the similar model of the article by using the training sample set and an applicable machine learning algorithm. When the trained article similarity model is used for obtaining similar articles of the isolated articles, the multiple similar articles output by the article similarity model can be arranged in a descending order according to the similarity, and the article atlas is optimized by selecting the previous preset number of similar articles.

After the article map is established, image Embedding processing (i.e., Graph Embedding) can be performed on the article map, so that Embedding vectors of various articles in the article map are obtained, and then the various articles can be clustered by using the Embedding vectors and the existing clustering algorithm such as K-Means, so that a plurality of article groups are finally determined. The method for obtaining the embedded vector by using the graph embedding process is a known method, and the specific implementation details thereof are not described herein.

It can be understood that the item embedding vector obtained by the method can embody the item association relation based on the user behavior, and a plurality of items with high user angle relevance can be gathered together according to the item group formed by embedding vector clustering, and the items with low user angle relevance are distributed in different item groups, so that the item classification mode superior to the item class or the brand is realized. The article classification mode based on the article group can not split the incidence relation between different articles or different brands of articles, is beneficial to accurately depicting the behavior track characteristics of a user aiming at various articles and further is beneficial to the accurate judgment of a target user, and can flexibly combine articles according to the article group so as to carry out accurate marketing. For example, a user purchases a plurality of items having high relevance and different categories and brands for a plurality of times, and when the target user is determined based on the categories or brands, the user cannot be determined as the target user; when the target user is determined based on the item group, the great probability that the plurality of different types/branded items purchased by the user belong to the same item group is high, so that the user can be determined as the target user. Therefore, the accuracy of the target user distinguishing method based on the article group is higher than that of the traditional method based on the article type/brand. It should be noted that the group of articles may also be divided according to the similarity between the articles or the activity of the articles, which is not limited in the present invention.

The above is the process of determining the group of items for the offline state. In the specific application process of step S101, after receiving the item set, an item group to which each item in the item set belongs may be determined first (hereinafter, the item group to which the items belong is referred to as a target group). Specifically, for any item in the item set, it is determined whether the item group contains an identification (e.g., SKU) of the item: if yes, determining an article group containing the identification of the article as a target group to which the article belongs; otherwise, inputting the attribute data and the title data of the article into the article similarity model to obtain a plurality of similar articles of the article, and determining the target group to which the article belongs by using the article group to which the similar articles belong. For example, after obtaining a plurality of similar articles output by the article similarity model, the plurality of similar articles are first arranged in a descending order according to the similarity, a preset number of previous similar articles are selected, then an article group to which each selected similar article belongs is judged, and finally the article group containing the most similar articles is determined as a target group to which the article belongs. In an actual scenario, after the target group to which the item belongs is determined, the item may be added to the target group for expansion.

Step S102: for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining the classification result of the initially selected user in the target group.

After acquiring a plurality of target groups through step S101, the initial user of each target group may be acquired first in this step. The selection rule of the initially selected user may be determined according to actual needs, for example, the user who has performed a preset action (e.g., browsing, clicking, paying attention to, etc.) on any item in any target group within a recent time period (e.g., within 15 days) may be determined. It should be noted that the same user may be used as the primary user of a plurality of target groups. Then, for any primary user of any target group, the initial feature data input pre-training of the primary user can be constructed, the optimal feature data is obtained from the feature generation model corresponding to the target group, the optimal feature data is input into the user classification model corresponding to the target group, the primary user is classified, and the classification result can be used for judging whether the primary user is the target user corresponding to the article set.

In the embodiment of the present invention, the user classification model may be various supervised classification models, such as an XGBoost model (the model is a lifting tree model formed by integrating a plurality of weak classifiers). The feature generation model may be a Deep Neural Network (DNN) model, and the structure of the feature generation model may include an input layer, a first hidden layer, a second hidden layer, and an output layer, which are connected in sequence as shown in fig. 3. The first hidden layer comprises a first sensing layer, a first normalization layer and a first activation layer; the second hidden layer includes a second sensing layer, a second normalization layer, and a second activation layer. The first perception layer and the second perception layer are main body parts of the hidden layer and are provided with a plurality of neurons; the first normalization layer and the second normalization layer may both be batch no normalization; the activation functions of the first activation layer and the second activation layer may adopt an leakage ReLU (leakage corrected Linear Unit), and the output layer may adopt a softmax function. In a specific scenario, the feature generation model may also adopt other applicable machine learning models.

In the off-line state, the feature generation model and the user classification model corresponding to the same group of items may be trained by the following steps. First, the feature generation model and the user classification model are connected into a comprehensive model. In general, the second activation layer of the feature generation model may be connected to the input layer of the user classification model, where the input data of the user classification model is optimized feature data formed by converting the output data of the second activation layer. Then, constructing a training sample set applied to the comprehensive model; wherein each training sample in the set of training samples may comprise raw feature data as a feature portion and user classification results as a label portion. The original characteristic data may be related data of a first-selected user of the group of items, and the original characteristic data of any first-selected user for any item in the corresponding target group may include: attribute data (such as age, sex and/or address, etc.) of the primary user, attribute data of the type of item, and behavior record data (such as browsing record, search record, click record, attention record, shopping cart record and/or purchase record, etc., which may be the execution times, execution duration, etc. of the relevant behavior) of the primary user for the type of item. In practical applications, the behavior record data may be transaction-level data obtained from a user behavior log. In addition, the original feature data of any primary user is a plurality of pieces of data, including the original feature data of the primary user for each article in the corresponding target group, the original feature data of the primary user for any article includes the original feature data corresponding to each preset time interval, and the original feature data corresponding to any preset time interval includes the behavior record data of the primary user for the article, the attribute data of the primary user and the attribute data of the article.

And finally, training the feature generation model and the user classification model by utilizing a training sample set. In the data processing process, after the original feature data of any initially selected user is input into an input layer of the feature generation model, the original feature data passes through the input layer, the first perception layer and the first normalization layer and is output through the first activation layer. And splicing the output data of the first activation layer with the original characteristic data to be used as input data of a second sensing layer, wherein the data passes through the second sensing layer and a second normalization layer and is output through the second activation layer. And splicing the output data of the second activation layer with the original characteristic data to be used as the input data of the output layer.

In the embodiment of the present invention, the data output by the second activation layer may be processed through the following steps, so as to generate the optimized feature data of the initially selected user in the group of items. Specifically, a plurality of pieces of multi-dimensional data (each dimension may be referred to as an initial dimension) of the primary user output by the second activation layer are obtained first, then each initial dimension is divided into at least one target dimension according to a numerical interval, and the multi-dimensional data are converted into an optimized feature data of the primary user in the target dimension; wherein, the value of the optimized characteristic data in any target dimension is as follows: and the number of times that the value of the multi-dimensional data in the initial dimension corresponding to the target dimension is in the value interval corresponding to the target dimension. For example, if there are two features 0, 1 in the initial dimension, the initial user has the following 6 pieces of multidimensional data:

serial number	feature0	feature1
			1	0.0091	0.2231
2	0.3256	0.3365
			3	0.5298	0.4987
4	0.7789	0.5986
			5	0.7569	0.8899
6	0.9685	0.8522

It can be seen that, due to the normalization layer, the value of the multidimensional data in each initial dimension is a value between 0 and 1. At this time, feature0 and feature1 are respectively divided into a plurality of target dimensions according to preset numerical intervals, for example, feature0-bin0, feature0-bin1 and feature0-bin2 are sequentially divided into three target dimensions, namely, feature1 is sequentially divided into three target dimensions, namely, feature1-bin0, feature1-bin1 and feature1-bin 352 according to [0,0.4], (0.4, 0.7) and (0.7,1], and then 6 pieces of multi-dimensional data are converted into an optimized feature data, namely, the statistical multi-dimensional data are respectively taken as the statistics of the number of times of the feature0 in [0,0.3], (0.3,0.6], (0.6, 6861) in the example, and are taken as the statistics of the number of times of the feature 583-bin 0 and the statistics of the number of times of the feature 24-bin 1 in the example, the feature data are taken as the statistics of the number of the target dimensions, respectively, and the statistics of the feature 583-bin 1 and the number of the data are respectively taken as the values of the target dimensions of the statistics of the three target dimensions, the times of 0.4], (0.4, 0.7) and (0.7, 1) (in the example, the times are respectively 2, 2 and 2), and the three times are respectively taken as the values of the optimized feature data in the target dimensions feature1-bin0, feature1-bin1 and feature1-bin2, so that the following optimized feature vectors of the initially selected user can be obtained:

through the arrangement, a plurality of transaction-level data corresponding to any user can be converted into one user-level data of the user, more information of the original data can be reserved, and ETL processing is not required to be executed on the original data, so that the working efficiency is improved, and a large amount of manpower and material resources are saved. In practical application, the optimized feature data can also be generated by calculating the average value, the maximum value, the minimum value and the like of the same initial dimension in the multi-dimensional data.

The above is an offline training process for the feature generation model and the user classification model. In the specific application process related to step S102, for the target group to which each article in the article set belongs, a plurality of initial users of the target group may be first obtained, and the original feature data of each initial user for each article in the target group is input into the feature generation model that is trained in advance and corresponds to the target group, so as to obtain the optimized feature data of the initial user in the target group. And then inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, thereby obtaining the classification result of the initially selected user in the target group. Generally, the classification result is related to whether the initial user has consumption tendency in a future preset time period (for example, within 5 days), and two different numerical representations can be adopted, wherein when the initial user has consumption tendency, the classification result is 1; when the primary user does not have consumption tendency, the classification result is zero.

Step S103: and judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

Through step S102, the classification result of each primary user in each target group can be obtained. In this step, the classification result of each target group may be integrated to determine whether the primary user is the target user corresponding to the item set. Specifically, first, the classification result of each primary user in each target group and the weight value set for each target group in advance are used to obtain the weighted discrimination result of the primary user (i.e. the weighted sum of the classification results of the primary users in each target group, generally, when a user is not a primary user in a target group, the classification result of the user in the target group can be considered to be zero); then judging whether the weighted judgment result is larger than a preset threshold value: if yes, determining the primary user as a target user corresponding to the item set; otherwise, judging that the initial selection user is not the target user corresponding to the item set.

Fig. 4 is a specific flowchart of a method for determining a target user in an embodiment of the present invention. As shown in fig. 4, the scheme of the present invention can be divided into an offline training phase and a target user determination phase. In the off-line training stage, firstly, an article map is constructed according to the co-occurrence relation (namely the occurrence times and/or the occurrence sequence) of the articles in the user behavior sequence, and the connectivity of the isolated articles in the article map is enhanced by using an article similarity model trained by article data (including attribute data and title data of the articles). And then, acquiring an item embedding vector from the item map by using a map embedding method, and clustering the item embedding vector to form a plurality of item groups. And finally, constructing a training sample set by using the original characteristic data and the optimized characteristic data of the primarily selected users in the article group to train a characteristic generation model and a user classification model. In the target user determination stage, firstly, the identification of each item in the item set is inquired in the item group, if the identification is inquired, the target group to which the item belongs is directly determined, and if the identification is not inquired (the item is indicated as a new item), the similar item of the item is obtained by using the item similarity model, so that the target group to which the item belongs is determined. And then, acquiring optimized feature data of each primary user in each target group by using the trained feature generation model, calculating a classification result of each primary user by using the trained user classification model, and finally performing weighted aggregation on the classification results to determine the target user corresponding to the item set.

In the technical scheme of the embodiment of the invention, an article map is constructed in an off-line state in advance according to the occurrence times and/or the occurrence sequence of articles in a user behavior sequence, an embedding vector of the articles is determined from the article map by using a map embedding method, and the articles are clustered by using the embedding vector to form a plurality of article groups; when the target user aiming at the article set needs to be determined, the article group to which each article in the article set belongs can be determined in a mode of inquiring or judging similar articles, and then subsequent target user judgment is performed according to the article group. Through the arrangement, the incidence relation among different types of articles or different brands of articles can be accurately embodied, the inherent defect that the target user is judged based on the article types or brands is overcome, and the accurate judgment of the target user is facilitated. Meanwhile, the article group can be dynamically updated according to the user behavior data, so that the article group is consistent with the dynamically changed association relation among the articles based on the user behavior. In addition, the invention uses a deep learning method to directly and automatically generate optimized feature data from the original feature data without ETL processing, thereby improving the working efficiency and saving a large amount of manpower and material resources.

It should be noted that, for the convenience of description, the foregoing method embodiments are described as a series of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, and that some steps may in fact be performed in other orders or concurrently. Moreover, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required to implement the invention.

To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.

Referring to fig. 5, an apparatus 500 for determining a target user according to an embodiment of the present invention may include a group dividing unit 501, a feature generating and classifying unit 502, and a target user determining unit 503.

The group dividing unit 501 may be configured to determine a target group to which each article in the preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence; the feature generation and classification unit 502 may be configured to: for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group; the target user determining unit 503 may be configured to determine whether each of the primary users is a target user corresponding to the item set according to the classification result of each of the primary users in each of the target groups.

In an embodiment of the present invention, the apparatus 500 may further include: the group pre-determining unit is used for generating an article map for representing the association relation among a plurality of articles according to the occurrence times and/or the occurrence sequence of the articles in the same user behavior sequence; performing graph embedding processing on the article map to obtain an embedding vector of each article in the plurality of articles; and clustering the multiple articles by using the embedded vectors to obtain multiple article groups.

In practical applications, the group pre-determining unit may be further configured to: inputting attribute data and title data of the isolated articles in the article map into a pre-trained article similarity model to obtain at least one similar article of the isolated articles; and increasing the association relationship between the isolated item and the similar item in the item map.

In a specific application, the group dividing unit 501 may be further configured to: for any item in the item set, judging whether a predetermined item group contains the identifier of the item: if yes, determining an article group containing the identification of the article as a target group to which the article belongs; otherwise, inputting the attribute data and the title data of the article into the article similar model to obtain similar articles of the article; determining a target group to which the similar item belongs by using the item group to which the similar item belongs; and after determining the target group to which the article belongs, adding the article into the target group.

As a preferred scheme, the feature generation model is a deep neural network model and comprises an input layer, a first hidden layer, a second hidden layer and an output layer which are sequentially connected; the first hidden layer comprises a first sensing layer, a first normalization layer and a first activation layer; the second hidden layer comprises a second sensing layer, a second normalization layer and a second activation layer; the input data of the second sensing layer is the data after the output data of the first activation layer and the original characteristic data are spliced; and the input data of the output layer is the data after the output data of the second activation layer is spliced with the original characteristic data.

Preferably, in the embodiment of the present invention, the original feature data of any one of the primary users for any one of the items in the corresponding target group includes: attribute data of the primary user, attribute data of the article and behavior record data of the primary user for the article; and, the feature generation and classification unit 502 may be further configured to: inputting the original feature data of any primary user for each article in the corresponding target group into the feature generation model, and acquiring a plurality of pieces of multi-dimensional data of the primary user, which are output by the second activation layer; dividing each initial dimension of the multi-dimensional data into at least one target dimension according to a numerical interval, and converting the multi-dimensional data into an optimized characteristic data of the primarily selected user in the target dimension; wherein, the value of the optimized characteristic data in any target dimension is as follows: and the times that the value of the multi-dimensional data in the initial dimension corresponding to the target dimension is in the value interval corresponding to the target dimension.

In some embodiments, the target user determination unit 503 may be further configured to: obtaining a weighted judgment result of each primary user by utilizing the classification result of each primary user in each target group and a weight value preset for each target group; and when the weighted judgment result is greater than a preset threshold value, determining the primary user as a target user corresponding to the item set.

In addition, in the embodiment of the present invention, the user classification model is an XGBoost model; and, the apparatus 500 may further comprise a training unit for training the feature generation model and the user classification model corresponding to the same group of items. The training unit may be operable to: connecting the feature generation model and the user classification model into a comprehensive model; the input data of the user classification model is as follows: optimized feature data formed by converting output data of a second active layer in the feature generation model; constructing a training sample set applied to the comprehensive model; each training sample in the training sample set comprises original feature data serving as a feature part and a user classification result serving as a label part; and training the feature generation model and the user classification model by using the training sample set.

Fig. 6 illustrates an exemplary system architecture 600 of a method of determining a target user or a device for determining a target user to which embodiments of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604 and a server 605 (this architecture is merely an example, and the components included in a specific architecture may be adjusted according to the specific application). The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. Various client applications may be installed on the

terminal devices

601, 602, 603, such as an application that determines a target user (for example only).

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server that provides various services, such as a calculation server (for example only) that provides support for a user to utilize an application of the

terminal device

601, 602, 603 that determines a target user. The calculation server may process the received target user determination request and feed back a processing result (e.g., the determined target user — just an example) to the

terminal device

601, 602, 603.

It should be noted that the method for determining the target user provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for determining the target user is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The invention also provides the electronic equipment. The electronic device of the embodiment of the invention comprises: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method of determining a target user provided by the present invention.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with the electronic device implementing an embodiment of the present invention. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer system 700 are also stored. The CPU701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the central processing unit 701, performs the above-described functions defined in the system of the present invention.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a group division unit, a feature generation and classification unit, and a target user determination unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the group division unit may also be described as "a unit providing a target group to the feature generation and classification unit".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to perform steps comprising: judging a target group to which each article in a preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence; for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group; and judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for identifying a target user, comprising:

judging a target group to which each article in a preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence;

for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group; and

and judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

2. The method of claim 1, wherein the group of items is determined by:

generating an article map for representing the association relation among a plurality of articles according to the occurrence times and/or the occurrence sequence of the plurality of articles in the same user behavior sequence;

performing graph embedding processing on the article map to obtain an embedding vector of each article in the plurality of articles; and

and clustering the multiple articles by using the embedded vectors to obtain multiple article groups.

3. The method of claim 2, further comprising: after generating the item map:

inputting attribute data and title data of the isolated articles in the article map into a pre-trained article similarity model to obtain at least one similar article of the isolated articles;

and increasing the association relationship between the isolated item and the similar item in the item map.

4. The method according to claim 3, wherein the determining a target group to which each article in the preset article set belongs in the predetermined article group comprises:

for any item in the item set, judging whether a predetermined item group contains the identifier of the item:

if yes, determining an article group containing the identification of the article as a target group to which the article belongs;

otherwise, inputting the attribute data and the title data of the article into the article similar model to obtain similar articles of the article; determining a target group to which the similar item belongs by using the item group to which the similar item belongs; and after determining the target group to which the article belongs, adding the article into the target group.

5. The method according to claim 1, wherein the feature generation model is a deep neural network model, and comprises an input layer, a first hidden layer, a second hidden layer and an output layer which are connected in sequence; wherein the content of the first and second substances,

the first hidden layer comprises a first sensing layer, a first normalization layer and a first activation layer;

the second hidden layer comprises a second sensing layer, a second normalization layer and a second activation layer;

the input data of the second sensing layer is the data after the output data of the first activation layer and the original characteristic data are spliced;

and the input data of the output layer is the data after the output data of the second activation layer is spliced with the original characteristic data.

6. The method of claim 5, wherein the raw feature data of any one of the initially selected users for any one of the items in the corresponding target group comprises: attribute data of the primary user, attribute data of the article and behavior record data of the primary user for the article; and the obtaining of the optimized feature data of the initially selected user in the target group includes:

inputting the original feature data of any primary user for each article in the corresponding target group into the feature generation model, and acquiring a plurality of pieces of multi-dimensional data of the primary user, which are output by the second activation layer;

dividing each initial dimension of the multi-dimensional data into at least one target dimension according to a numerical interval, and converting the multi-dimensional data into an optimized characteristic data of the primarily selected user in the target dimension; wherein, the value of the optimized characteristic data in any target dimension is as follows: and the times that the value of the multi-dimensional data in the initial dimension corresponding to the target dimension is in the value interval corresponding to the target dimension.

7. The method of claim 1, wherein the determining whether each primary user is a target user corresponding to the set of items using the classification result of each primary user in each target group comprises:

obtaining a weighted judgment result of each primary user by utilizing the classification result of each primary user in each target group and a weight value preset for each target group;

and when the weighted judgment result is greater than a preset threshold value, determining the primary user as a target user corresponding to the item set.

8. The method of claim 5 or 6, wherein the user classification model is an XGboost model; and, the feature generation model and the user classification model corresponding to the same group of items are trained by:

connecting the feature generation model and the user classification model into a comprehensive model; the input data of the user classification model is as follows: optimized feature data formed by converting output data of a second active layer in the feature generation model;

constructing a training sample set applied to the comprehensive model; each training sample in the training sample set comprises original feature data serving as a feature part and a user classification result serving as a label part; and

and training the feature generation model and the user classification model by using the training sample set.

9. An apparatus for determining a target user, comprising:

the group dividing unit is used for judging a target group to which each article in a preset article set belongs in a predetermined article group; wherein, the article group is determined according to the occurrence times and/or the occurrence sequence of a plurality of articles in the same user behavior sequence;

a feature generation and classification unit to: for each target group: acquiring a plurality of primary users of the target group, inputting the original feature data of each primary user aiming at each article in the target group into a feature generation model which is trained in advance and corresponds to the target group, and acquiring the optimized feature data of the primary users in the target group; inputting the optimized characteristic data into a user classification model which is trained in advance and corresponds to the target group, and obtaining a classification result of the initially selected user in the target group;

and the target user judging unit is used for judging whether the primary user is the target user corresponding to the item set or not by utilizing the classification result of each primary user in each target group.

10. The apparatus of claim 9, further comprising:

the group pre-determining unit is used for generating an article map for representing the association relation among the various articles according to the occurrence times and/or the occurrence sequence of the various articles in the same user behavior sequence; performing graph embedding processing on the article map to obtain an embedding vector of each article in the plurality of articles; and clustering the multiple articles by using the embedded vectors to obtain multiple article groups.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.