CN114912009A - User portrait generation method, device, electronic equipment and computer program medium - Google Patents

User portrait generation method, device, electronic equipment and computer program medium Download PDF

Info

Publication number
CN114912009A
CN114912009A CN202110184908.2A CN202110184908A CN114912009A CN 114912009 A CN114912009 A CN 114912009A CN 202110184908 A CN202110184908 A CN 202110184908A CN 114912009 A CN114912009 A CN 114912009A
Authority
CN
China
Prior art keywords
user
target
candidate
users
candidate users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110184908.2A
Other languages
Chinese (zh)
Inventor
刘雨丹
郝晓波
葛凯凯
刘诗万
林乐宇
张旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110184908.2A priority Critical patent/CN114912009A/en
Publication of CN114912009A publication Critical patent/CN114912009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a user portrait generation method and device, electronic equipment and a computer program medium, and relates to the technical field of artificial intelligence. The method in the embodiment of the application comprises the following steps: acquiring user characteristic information of a target user; respectively generating a plurality of user characteristic vectors of a target user under a user characteristic dimension based on the user characteristic information; inputting a plurality of user feature vectors into a pre-trained machine learning model; acquiring preference level labels of target objects under the target classification attributes of target users output by a pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute. The technical scheme of the embodiment of the application improves the accuracy of the obtained user portrait.

Description

User portrait generation method and device, electronic equipment and computer program medium
Technical Field
The application relates to the technical field of computers, in particular to a user portrait generation method and device, an electronic device and a computer program medium.
Background
With the development of internet technology, a user portrait is obtained based on big data, and then multiple services are realized through the user portrait to become one of core technologies of the internet.
In the related art, when a user portrait of a user is generated, portrait tags related to user behavior data are generally counted by extracting the portrait tags from the user behavior data, and the portrait tags of each user are scored according to the frequency of counting, and then the user portrait is obtained according to the scoring of the portrait tags. For the user who does not generate behaviors, the accuracy of the user portrait which is difficult to obtain based on the tag statistics or the accuracy of the user portrait which is obtained based on the tag statistics is low due to less behavior data, and the accuracy of related services performed according to the user portrait is further influenced.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, an electronic device, and a computer program medium for generating a user representation, so as to improve accuracy of an obtained user representation.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a method for generating a user representation, including: acquiring user characteristic information of a target user, wherein the user characteristic information comprises user attribute information and historical behavior data of the target user; respectively generating a plurality of user feature vectors of the target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user attribute information, the operation event category contained in the historical behavior data and the classification attribute of the operation object; inputting the plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes; acquiring preference level labels of target objects under the target classification attributes of the target users output by the pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating the user portrait of the target user based on the target classification attribute.
According to an aspect of an embodiment of the present application, there is provided a user representation generation apparatus, including: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring user characteristic information of a target user, and the user characteristic information comprises user attribute information and historical behavior data of the target user; a first generating unit, configured to generate, based on the user feature information, a plurality of user feature vectors of the target user in a user feature dimension, where the user feature dimension is a feature dimension formed by a category of the user attribute information, and an operation event category and a classification attribute of an operation object included in the historical behavior data; the input unit is used for inputting the plurality of user feature vectors into a pre-trained machine learning model, and the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes; the output unit is used for acquiring a preference level label of a target object under the target classification attribute of a target user output by the pre-trained machine learning model; and the second generation unit is used for generating the user portrait of the target user based on the target classification attribute if the acquired preference level label is matched with a preset preference level label.
In some embodiments of the present application, based on the foregoing solution, the apparatus for generating a user representation further includes: a third generating unit, configured to perform conversion processing on the multiple user feature vectors respectively, and generate first feature vectors corresponding to the multiple user feature vectors, where the first feature vectors corresponding to the multiple user features are vectors of the same dimension; the aggregation unit is used for performing aggregation processing on first feature vectors corresponding to the plurality of user feature vectors based on preset incidence relations among the plurality of user feature information to generate a plurality of aggregated second feature vectors; and the predicting unit is used for predicting preference level labels of the target users on the target objects under the target classification attributes based on the aggregated second feature vectors, and the first feature vectors corresponding to the user features are vectors of the same dimension.
In some embodiments of the present application, based on the foregoing solution, the apparatus for generating a user representation further includes: the second acquisition unit is used for acquiring training set sample data used for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a plurality of user feature vectors of a sample user under a user feature dimension and a preference level label of the sample user on a target object under a target classification attribute; and the training unit is used for training the machine learning model to be trained through the training set sample data to obtain a pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: acquiring user characteristic information of candidate users in a candidate user set; for each candidate user in the candidate user set, determining a first class of candidate users and a second class of candidate users based on historical behavior data in user feature information of the candidate users, wherein the first class of candidate users are candidate users including target operation events, the second class of candidate users are candidate users not including the target operation events, and the target operation events refer to operation events for operating target objects under the target classification attributes; generating a plurality of user feature vectors of the first type candidate users under the user feature dimension based on the user feature information of the first type candidate users, and adding preference level labels with high preference degrees to the first type candidate users to obtain sample data of positive sample users; generating a plurality of user feature vectors of the second type candidate users under the user feature dimension based on the user feature information of the second type candidate users, and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users; and obtaining training set sample data used for training a machine learning model to be trained based on the sample data of the positive sample user and the sample data of the negative sample user.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: extracting candidate users from the second type of candidate users; and generating a plurality of user feature vectors of the extracted candidate users under the user feature dimension based on the extracted user feature information of the candidate users, and adding preference grade labels with low preference degrees to the extracted candidate users to obtain sample data of the negative sample users.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: determining portrait intensity of the second type candidate users based on frequency of historical behavior data contained in user characteristic information of the second type candidate users, wherein the portrait intensity and the frequency are in positive correlation; candidate users are extracted at a first ratio among the second class of candidate users having a portrait intensity higher than a predetermined portrait intensity threshold, and candidate users are extracted at a second ratio among the second class of candidate users having a portrait intensity lower than or equal to the predetermined portrait intensity threshold, the first ratio being greater than the second ratio.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: acquiring a configuration file, wherein the configuration file is used for configuring the attribute type of the acquired user attribute information, the operation event type of the acquired historical behavior data and the classification attribute of an operation object; for each candidate user in the candidate user set, acquiring user attribute information of the candidate user under the configured attribute category based on the category of the user attribute information configured in the configuration file; and for each candidate user in the candidate user set, acquiring historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object based on the operation event category and the classification attribute of the operation object configured in the configuration file.
According to an aspect of an embodiment of the present application, there is provided a computer readable medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the method for generating a user representation as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of user representation generation as described in the embodiments above.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for generating a user representation provided in the various alternative embodiments described above.
In the technical scheme provided by some embodiments of the application, user characteristic information of a target user is obtained, wherein the user characteristic information comprises user attribute information and historical behavior data of the target user; respectively generating a plurality of user feature vectors of a target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user attribute information, the category of an operation event contained in historical behavior data and the classification attribute of an operation object; inputting a plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes; acquiring preference level labels of target objects under the target classification attributes of target users output by a pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating the user portrait of the target user based on the target classification attribute. Compared with the method for obtaining the user portrait based on the label statistics, the user characteristic vector can comprehensively represent the user preference, so that the accuracy of predicting the preference level label of the user on the target object under the specific classification attribute can be improved, the target user with the specific user portrait can be conveniently found, and the accuracy of recommending related services can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
FIG. 2 illustrates a flow diagram of a method of user representation generation according to one embodiment of the present application.
FIG. 3 illustrates a flow diagram of a method of user representation generation in one embodiment in accordance with the present application.
FIG. 4 illustrates a structural schematic of a pre-trained machine learning model in one embodiment according to the present application.
Fig. 5 illustrates a detailed structural diagram of capsule layers in a pre-trained machine learning model in an embodiment in accordance with the present application.
FIG. 6 illustrates a flow diagram of a method of user representation generation in one embodiment in accordance with the present application.
FIG. 7 shows a detailed flowchart of step S610 of a user representation generation method according to an embodiment of the present application.
FIG. 8 shows a detailed flowchart of step S710 of a user representation generation method according to an embodiment of the present application.
FIG. 9 shows a detailed flowchart of step S740 of a user representation generation method according to an embodiment of the application.
FIG. 10 shows a detailed flowchart of step S910 of a user representation generation method according to an embodiment of the present application.
FIG. 11 shows a block diagram of a user representation generation apparatus, according to an embodiment of the present application.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. For example, in the embodiment of the application, a plurality of user feature vectors of a target user in a user feature dimension are determined through an artificial intelligence technology, and based on the plurality of user feature vectors of the target user in the user feature dimension, a preference level label of the target user for a target object under a target classification attribute is determined, so that the preference level label of the target user for the target object under the target classification attribute generates a user portrait of the target user.
User portrait: the user representation is a tagged user model abstracted according to information such as user social attributes, living habits, consumption behaviors and the like. The core task in constructing a user representation is to label the user with a "tag", which is a highly refined feature identifier obtained by analyzing the user information.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in FIG. 1, the system architecture may include a user representation demander platform 101, a network 102, and a user representation provider platform 103. User representation requestor platform 101 and user representation provider platform 103 are connected via network 102, and data interaction is performed over network 102, which may include various types of connections, such as wired communication links, wireless communication links, and the like.
It should be understood that the number of user representation requestor platforms 101, network 102, and user representation provider platforms 103 in FIG. 1 are illustrative only. There may be any number of user representation requestor platforms 101, networks 102, and user representation provider platforms 103, as desired for the implementation. For example, user representation provider platform 103 may be a server cluster providing user representation generation services, user representation consumer platform 101 may be a server cluster or client that needs to obtain a user representation, and the client may be one or more of a cell phone, a tablet, a laptop, and a desktop, although not limited thereto. User representation provider platform 103 may be a platform that provides various business services for users, such as social applications or instant messaging applications, which may contain user attribute information and historical behavior data for a large number of users.
The user representation demander platform 101 provides identification information of a target user of a user representation needing to be generated, and the user representation provider platform 103 acquires user characteristic information of the target user based on the identification information of the user, wherein the user characteristic information comprises user attribute information and historical behavior data of the target user; respectively generating a plurality of user feature vectors of a target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user feature information, the category of an operation event contained in historical behavior data and the classification attribute of an operation object; inputting a plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes; acquiring preference level labels of target objects under the target classification attributes of target users output by a pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute.
Compared with the method for obtaining the user image based on the label statistics, the method has the advantages that the user characteristic vector can comprehensively represent the user preference, so that the accuracy of predicting the preference level label of the user on the target object under the specific classification attribute can be improved, the target user with the specific user image can be conveniently found, and the accuracy of recommending the related service can be improved.
It should be noted that the user representation generation method provided in the embodiment of the present application is generally executed by the user representation provider platform 103, and accordingly, the user representation generation apparatus is generally disposed in the user representation provider platform 103. However, in other embodiments of the present application, user representation requestor platform 101 may also have similar functionality to user representation provider platform 103, thereby implementing aspects of the user representation generation methods provided by embodiments of the present application. The details of implementation of the technical solutions of the embodiments of the present application are set forth in detail below.
FIG. 2 shows a flow diagram of a method of user representation generation, which may be performed by a user representation provider platform, which may be the user representation provider platform 103 shown in FIG. 1, according to one embodiment of the application. Referring to FIG. 2, the method for generating a user representation at least comprises steps S210 to S250, which are described in detail below.
In step S210, user characteristic information of the target user is obtained, where the user characteristic information includes user attribute information and historical behavior data of the target user.
In one embodiment of the present application, the user characteristic information refers to characteristic information of a user in multiple dimensions, and specifically includes user attribute information and historical behavior data, and the target user refers to a user who needs to generate a user representation.
In an embodiment of the present application, the user characteristic information of the target user may be obtained by the user profile provider platform from the registration information of the target user, and the user attribute information of the target user may include a plurality of categories of attribute information, such as, but not limited to, age, year, month, day, sex, address, and position of birth.
In one embodiment of the present application, the historical behavior data of the target user includes historical behavior data of the target user in the user representation provider platform. The behavior data comprises operation events and attribute information of operation objects, and the operation events can be clicking, browsing, collecting, commenting, forwarding and the like. The operation object may be a commodity advertisement or content, etc., and the classification attribute of the operation object may include a subject, a category, a tag, etc.
Alternatively, when the operation object is a commodity advertisement, the operation event may be a browsing event, and the classification attribute thereof may be a category; when the operation object is content, the operation event may be a click event, a browse event, a collection event, a comment event, or a forward event, and the classification attribute may be a topic, a category, or a tag.
In step S220, a plurality of user feature vectors of the target user in a user feature dimension are generated based on the user feature information, where the user feature dimension is a feature dimension formed by the category of the user attribute information, the operation event category included in the historical behavior data, and the classification attribute of the operation object.
In one embodiment of the present application, the user feature dimension refers to a feature dimension formed by a category of the user attribute information, and a category of the operation event and a classification attribute of the operation object included in the historical behavior data.
In an embodiment of the present application, for the feature dimension formed by the categories of the user attribute information, a user feature vector is generated for the user attribute information of each category, and if the user attribute information includes an age and a gender, the age and the gender form two feature dimensions, and thus a user feature vector is generated according to the age and the gender respectively.
In one embodiment of the present application, the feature dimension formed by the operation event category and the classification attribute of the operation object included in the historical behavior data may be implemented in various ways.
In one embodiment, the feature dimension may be determined from a classification attribute of the operation object. Taking the operation object as content as an example, when the classification attribute of the content includes a theme, a category and a label, the theme, the category and the label are respectively used as a feature dimension, and then user feature vectors of three feature dimensions are generated.
In another embodiment, the feature dimension may be determined according to the classification attribute of the operation object and the operation event category. Taking the example that the operation object is content, when the classification attribute of the content includes a topic, a category, and a tag, and the operation event category includes a click event and a comment event, the click event for the topic, the comment event for the topic, the click event for the category, the comment event for the category, the click event for the tag, and the comment event for the tag are respectively used as one feature dimension, and then a user feature vector of six feature dimensions is generated.
The embodiment of the feature dimension division is not limited to the above embodiment, and other embodiments may be used, and the present application is not limited to this.
In step S230, a plurality of user feature vectors are input into the pre-trained machine learning model, and the pre-trained machine learning model includes a plurality of user feature vectors of the sample user in the user feature dimension and a preference level label of the sample user for the target object under the target classification attribute.
In an embodiment of the present application, a plurality of user feature vectors of a target user in a user feature dimension may be input into a pre-trained machine learning model, and the pre-trained machine learning model labels preference levels of the target user to target objects under a target classification attribute, where the pre-trained machine learning model may be a look-like model or the like, or may be a deep Neural Network model or a CNN (Convolutional Neural Network) model. The preference level label may reflect the preference degree of the target user for the target object under the target classification attribute, and the preference level label may include two grades of "interested" and "not interested" from high to low, specifically, the preference level label may also include four grades of "particularly interested", "not interested", and "particularly not interested", and the higher the grade is, the higher the preference degree of the target user for the target object under the target classification attribute is.
Referring to fig. 3, fig. 3 shows a flowchart of a method for generating a user profile according to an embodiment of the present application, where the pre-trained machine learning model determines a preference level label of a target user for a target object under a target classification attribute based on the following method, which may specifically include steps S310 to S330.
Referring to fig. 4, fig. 4 shows a schematic structural diagram of a pre-trained machine learning model according to an embodiment of the present application, and the pre-trained machine learning model shown in fig. 4 may specifically include a Projection Layer (Projection Layer)402, a Capsule Layer (Capsule Layer)403, and a Capsule Attention Layer (Capsule Attention Layer) 405.
Referring to fig. 5, fig. 5 shows a detailed structural diagram of a capsule layer in a pre-trained machine learning model according to an embodiment of the present application, and the following describes steps S310 to S330 in detail with reference to fig. 3 to 5.
In step S310, a plurality of user feature vectors are respectively subjected to conversion processing, and first feature vectors corresponding to the plurality of user feature vectors are generated, where the first feature vectors corresponding to the plurality of user features are vectors of the same dimension.
In an embodiment of the present application, as shown in fig. 4, after a plurality of user feature vectors 401 are input into a pre-trained machine learning model, the plurality of user feature vectors are converted by a Projection Layer (Projection Layer)402 to generate first feature vectors corresponding to the plurality of user feature vectors, where it is noted that the first feature vectors corresponding to the plurality of user features are vectors of the same dimension.
Specifically, when the Projection Layer (Projection Layer)402 performs conversion processing on a plurality of user feature vectors to generate first feature vectors corresponding to the plurality of user feature vectors, the conversion processing can be implemented by a formula
Figure BDA0002942713150000117
Figure BDA0002942713150000118
For the ith user feature vector,
Figure BDA0002942713150000119
a first feature vector corresponding to the ith user feature vector, e and m are fixed parameters, W d Is a conversion matrix for converting a plurality of user feature vectors. Multiple uses by Projection Layer (Projection Layer)The user characteristic vectors are converted into vectors with the same dimensionality, and the data processing efficiency of the pre-trained machine learning model can be improved.
In step S320, based on the preset association relationship among the multiple pieces of user feature information, the first feature vectors corresponding to the multiple user feature vectors are aggregated to generate multiple aggregated second feature vectors, where the first feature vectors corresponding to the multiple user features are vectors of the same dimension.
In an embodiment of the present application, for a first feature vector corresponding to a plurality of user feature vectors obtained through a Projection Layer (Projection Layer), a Capsule Layer (Capsule Layer)403 performs aggregation processing on the first feature vectors corresponding to the plurality of user feature vectors according to an association relationship between a plurality of user feature information, to generate a plurality of aggregated second feature vectors, where the first feature vectors corresponding to the plurality of user features are vectors of the same dimension
Specifically, for the ith first feature vector input into the Capsule Layer (Capsule Layer)403
Figure BDA0002942713150000111
May be according to a predetermined formula
Figure BDA0002942713150000112
Performing aggregation processing on the vector data to generate corresponding vector
Figure BDA0002942713150000113
And to
Figure BDA0002942713150000114
By the formula
Figure BDA0002942713150000115
Normalization processing is performed to generate unit vectors
Figure BDA0002942713150000116
I u Representing the sequence number of the first eigenvector, S being the transformation matrix, w ij Is the ithWeight when the first eigenvector is aggregated with the jth first eigenvector, b ij Is a parameter initialized when the Capsule Layer (Capsule Layer)403 performs forward propagation calculation, and can pass through w ij ←softmax(b ij ) Determination of w ij I.e. to b ij Normalization processing is carried out to obtain w ij . It should be noted that, when training the machine learning model to be trained, the forward propagation calculation is performed to update the parameters in the machine learning model to be trained, and b ij Is according to the formula
Figure BDA0002942713150000121
To be updated by the user, the user can select the user,
Figure BDA0002942713150000122
is to be
Figure BDA0002942713150000123
When the machine learning model to be trained reaches the convergence condition, the weight w of the first feature vector is aggregated ij And also determines therewith that Squash is a pair
Figure BDA0002942713150000124
And (5) carrying out normalization processing on the conversion matrix.
In step S330, a preference level label of the target user for the target object under the target classification attribute is predicted based on the aggregated plurality of second feature vectors.
In an embodiment of the present application, the Capsule Attention Layer (Capsule Attention Layer)405 performs analysis processing on the aggregated multiple second feature vectors, predicts a preference score of the target user on the target object under the target classification attribute, and predicts a preference level label of the target user on the target object under the target classification attribute according to a corresponding relationship between the preference score and the preference level label. It will be appreciated that the higher the preference score, the higher the corresponding preference level label.
In particular, the capsule attention layer 405 is responsive to the input of multiple second characteristics after polymerizationWhen the polymerization treatment is carried out, the polymerization treatment can be carried out according to the formula
Figure BDA0002942713150000125
Calculating to obtain high-order feature intersection
Figure BDA0002942713150000126
And by the formula
Figure BDA0002942713150000127
Come to right
Figure BDA0002942713150000128
Normalization processing is carried out to obtain a after normalization processing ij Based on a ij To predict the target user's preference score for the target object under the target classification attribute. Wherein b is a fixed parameter, W is belonged to R t×k ,b∈R t ,h∈R t T denotes the hidden layer dimension of the capsule attention layer 405, k denotes the vector dimension of the second feature vector, h T Representing a set of input aggregated second feature vectors, v i And v j Respectively an ith second feature vector and a jth second feature vector in the aggregated multiple second feature vectors i ·v j Is the product of the number of the ith second eigenvector and the jth second eigenvector.
Referring to fig. 6, fig. 6 shows a flowchart of a user representation generation method in an embodiment according to the present application, and the user representation generation method in the embodiment may further include steps S610 to S620, which are described in detail below.
In step S610, training set sample data for training a machine learning model to be trained is obtained, where each sample data in the training set sample data includes a plurality of user feature vectors of a sample user in a user feature dimension and a preference level label of the sample user for a target object in a target classification attribute.
In one embodiment of the present application, the training set sample data is sample data that includes a large number of samples for training the machine learning model to be trained. When generating sample data in the training set sample data, an existing user in a user portrait provider platform can be used as a sample user, a plurality of user feature vectors of the sample user under a user feature dimension are generated respectively on the basis of the user attribute information and the historical behavior data of the sample user by acquiring the user attribute information and the historical behavior data of the sample user, in addition, for each sample user, the preference condition of a target object under the target classification attribute of the sample user is determined according to the specific user attribute information and the historical behavior data, and then the preference grade label of the sample user on the target object under the target classification attribute is generated according to the determined preference condition.
Referring to fig. 7, fig. 7 shows a detailed flowchart of step S610 of a user representation generation method according to an embodiment of the present application, where the step S610 may include steps S710 to S750, which are described in detail below.
In step S710, user feature information of the candidate users in the candidate user set is acquired.
In one embodiment, the set of candidate users is an existing set of users in the user representation provider platform that contains a large number of candidate users that can be selected as sample users.
Referring to fig. 8, fig. 8 is a detailed flowchart illustrating step S710 of a user representation generation method according to an embodiment of the present application, where the step S710 may include steps S810 to S830, which are described in detail below.
In step S810, a configuration file is acquired for configuring the attribute category of the acquired user attribute information, and the operation event category and the classification attribute of the operation object of the acquired historical behavior data.
In an embodiment of the present application, in order to facilitate generating a specific user portrait according to specific service scene requirements when obtaining user feature information of a candidate user, an attribute category of user attribute information to be obtained, an operation event category of historical behavior data, and a classification attribute of an operation object may be configured in advance to generate a corresponding configuration file.
Specifically, when the user portrait to be generated is directed to a commodity advertisement, in the configuration file, the attribute categories of the user attribute information to be acquired may include age, gender and address, the operation event categories of the historical behavior data may be a click event and a browse event, the classification attributes of the operation object may include the commodity category in the commodity advertisement, and it is noted that the commodity category may include one or more levels of classification categories. Taking the example that the commodity category comprises a primary commodity category and a secondary commodity category, the primary commodity category can comprise household appliances, food and drink, medical health care and the like, and the secondary commodity category is a specific category in the primary commodity category, for example, the secondary commodity category of the household appliances can comprise televisions, air conditioners, refrigerators and the like.
In step S820, for each candidate user in the candidate user set, based on the category of the user attribute information configured in the configuration file, the user attribute information of the candidate user under the configured attribute category is acquired.
In an embodiment of the present application, after the configuration file is obtained, for each candidate user in the candidate user set, user attribute information of the candidate user in the configured attribute category may be respectively obtained based on the category of the configured user attribute information in the configuration file.
In step S830, for each candidate user in the candidate user set, based on the operation event category and the classification attribute of the operation object configured in the configuration file, historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object is acquired.
In an embodiment of the application, after the configuration file is obtained, for each candidate user in the candidate user set, historical behavior data of the candidate user under the corresponding operation event category and the classification attribute of the operation object may be obtained based on the configured operation event category and the classification attribute of the operation object in the configuration file.
In the technical solution of the embodiment shown in fig. 8, the attribute type of the user attribute information to be acquired, the operation event type of the historical behavior data to be acquired, and the classification attribute of the operation object can be configured through the configuration file, so that the user feature data with a specific dimension can be acquired according to the requirement, the required feature data can be acquired in a targeted manner, and thus, sample data conforming to a specific service scene can be generated.
Still referring to fig. 7, in step S720, for each candidate user in the candidate user set, based on the historical behavior data in the user feature information of the candidate user, a first class of candidate users and a second class of candidate users are determined, where the first class of candidate users are candidate users including a target operation event, the second class of candidate users are candidate users not including the target operation event, and the target operation event is an operation event for operating a target object under the target classification attribute.
In an embodiment of the application, the pre-trained machine learning model is to predict a preference level label of a user for a target object under a specific target classification attribute, so sample users for training the machine learning model may include a positive sample user and a negative sample user, the positive sample user is the sample user with a higher preference degree for the target object under the specific target classification attribute, the negative sample user is the sample user with a lower preference degree for the target object under the specific target classification attribute, and the machine learning model is trained through sample data formed by the positive sample user and the negative sample user, so that the machine learning model may effectively identify whether the target user is the user with a higher preference degree for the target object under the specific target classification attribute.
In an embodiment of the present application, the target operation event is an operation event that a user refers to operate a target object under the target classification attribute, such as a click event, a browsing event, and the like for the target object under the target classification attribute. When the historical behavior data of the user contains the target operation event, it can be stated that the interest level of the target object under the target classification attribute is high, and on the contrary, it is stated that the interest level of the target object under the target classification attribute is low. Therefore, for each candidate user in the candidate user set, the candidate users may be classified according to whether the historical behavior data of the candidate user includes the target operation event, so as to obtain a first class of candidate users of the candidate users including the target operation event and a second class of candidate users of the candidate users not including the target operation event.
In step S730, based on the user feature information of the first class of candidate users, a plurality of user feature vectors of the first class of candidate users in the user feature dimension are generated, and preference level tags with high preference degrees are added to the first class of candidate users, so as to obtain sample data of the positive sample user.
In an embodiment of the application, when sample data of a positive sample user is generated, a plurality of user feature vectors of the first class candidate users in a user feature dimension may be generated based on user feature information of the first class candidate users, and a preference level label with a high preference degree is added to the first class candidate users.
Optionally, for the first category of candidate users, all candidate users in the first category of candidate users may be selected as positive sample users.
Optionally, for the first category of candidate users, part of the first category of candidate users may be selected as positive sample users.
In step S740, based on the user feature information of the second type of candidate users, a plurality of user feature vectors of the second type of candidate users in the user feature dimension are generated, and preference level tags with low preference degrees are added to the second type of candidate users, so as to obtain sample data of the negative sample user.
In an embodiment of the application, when sample data of a negative sample user is generated, a plurality of user feature vectors of a second type of candidate users in a user feature dimension may be generated based on user feature information of the second type of candidate users, and a preference level label with a low preference degree is added to the second type of candidate users.
Optionally, for the second class of candidate users, all candidate users in the second class of candidate users may be selected as negative sample users.
Referring to fig. 9, fig. 9 shows a detailed flowchart of step S740 of the user portrait generation method according to an embodiment of the present application, and the step S740 may include steps S910 to S920, which are described in detail as follows.
In step S910, among the second class of candidate users, candidate users are extracted.
Referring to fig. 10, fig. 10 shows a detailed flowchart of step S910 of the user representation generating method according to an embodiment of the present application, and the step S910 may include steps S1010 to S1020, which are described in detail as follows.
In step S1010, the portrait intensity of the candidate users of the second category is determined based on the frequency of the historical behavior data included in the user feature information of the candidate users of the second category, and the portrait intensity and the frequency are in a negative correlation relationship.
In one embodiment of the present application, for the second class of candidate users, all candidate users in the second class of candidate users may be selected as negative sample users.
In an embodiment of the application, since the second type of candidate users are candidate users that are not at the target operation event in the historical behavior data, there are two types of candidate users that have rich historical behaviors and sparse historical behaviors, when the historical behaviors of the candidate users are rich and the historical behavior data of the candidate users is not at the target operation event, the confidence of the candidate users being negative sample users is higher, and when the historical behaviors of the candidate users are sparse and the historical behavior data of the candidate users is not at the target operation event, there is a case that some candidate users are selected as negative sample users due to sparse historical behaviors. Therefore, it is necessary to select a candidate sample with relatively more historical behaviors as a negative sample user and select a candidate sample with relatively less historical behaviors as a negative sample user.
The image intensity is used as a measure for reflecting whether the historical behaviors of the user are rich or not, and the larger the image intensity is, the richer the historical behaviors of the user are.
The image intensity can be determined according to the frequency of the historical behavior data contained in the user feature information of the second type of candidate users, and the image intensity and the frequency of the historical behavior data are in positive correlation. The frequency of the historical behavior data may be calculated in a variety of ways.
Alternatively, the sum of the frequencies of the operation events of the categories included in the historical behavior data may be used as the frequency of the historical behavior data.
Alternatively, different weights may be assigned to the operation events of the categories included in the historical behavior data, and the weighted sum of the frequencies of the operation events of the categories included in the historical behavior data may be used as the frequency of the historical behavior data.
In step S1020, candidate users are extracted at a first ratio among candidate users of a second category having a picture density higher than a predetermined picture density threshold, and candidate users are extracted at a second ratio among candidate users of a second category having a picture density lower than or equal to the predetermined picture density threshold, the first ratio being greater than the second ratio.
In one embodiment of the present application, the portrait intensity threshold is a metric that determines whether the candidate user is a user with rich historical behavior, if the portrait intensity is higher than a predetermined portrait intensity threshold, the candidate user is a user with rich historical behavior, and if the portrait intensity is lower than or equal to the predetermined portrait intensity threshold, the candidate user is a user with sparse historical behavior. In a second category of candidate users having a portrait intensity higher than a predetermined portrait intensity threshold, candidate users may be extracted at a first ratio, and in a second category of candidate users having a portrait intensity lower than or equal to the predetermined portrait intensity threshold, candidate users may be extracted at a second ratio, and the first ratio may be set to be greater than the second ratio
In the technical solution of the embodiment shown in fig. 10, the accuracy of the selected negative sample user can be improved by selecting the candidate sample with relatively more historical behaviors as the negative sample user and selecting the candidate sample with relatively less historical behaviors as the negative sample user from the second class of candidate users with historical behavior data that are not in the target operation event, so as to improve the classification confidence of the pre-trained machine learning model.
Referring to fig. 9 again, in step S920, based on the extracted user feature information of the candidate user, a plurality of user feature vectors of the extracted candidate user in the user feature dimension are generated, and a preference level label with a low preference degree is added to the extracted candidate user, so as to obtain sample data of the negative sample user.
In an embodiment of the application, when generating sample data corresponding to candidate users extracted from the second class of candidate users, multiple user feature vectors of the extracted candidate users in a user feature dimension may be generated based on user feature information of the extracted candidate users, and preference level tags with low preference degrees are added to the extracted candidate users, so as to generate sample data of a negative sample user.
Still referring to fig. 7, in step S750, based on the sample data of the positive sample user and the sample data of the negative sample user, training set sample data for training the machine learning model to be trained is obtained.
In an embodiment, after obtaining the sample data of the positive sample user and the sample data of the negative sample user, the obtained sample data of the positive sample user and the obtained sample data of the negative sample user may be used as sample data of a training set, so as to implement training of a machine learning model to be trained through the sample data of the training set.
Still referring to fig. 6, in step S620, the machine learning model to be trained is trained through the training set sample data, so as to obtain a pre-trained machine learning model.
In an embodiment of the application, training set sample data is input into a machine learning model to be trained, and the machine learning model to be trained is trained through the training set sample data to obtain a pre-trained machine learning model. The process of training the machine learning model is to adjust each coefficient in the network layer corresponding to the machine learning model, so that for a plurality of user feature vectors of the input target user under the user feature dimension, each coefficient in the network layer corresponding to the machine learning model is operated, and the preference level label of the target user to the target object under the target classification attribute is output and obtained.
Still referring to fig. 2, in step S240, preference level labels of target objects under the target classification attributes of the target users output by the pre-trained machine learning model are obtained.
In an embodiment of the application, a preference level label of a target user for a target object under a target classification attribute output by a pre-trained machine learning model is obtained, so that a preference condition of the target user for the target object under the target classification attribute can be obtained.
In step S250, if the obtained preference level tag matches a preset preference level tag, a user representation of the target user is generated based on the target classification attribute.
In an embodiment of the present application, the preset preference level tag is a preference level tag representing a higher degree of preference of the target user for the target attribute of the target classification attribute, such as a "interested" and "especially interested" preference level tag, or may be only a "especially interested" preference level tag. When the preference level label obtained from the pre-trained machine learning model is matched with the preset preference level label, the target user can be determined to have a high preference level for the target object under the target classification attribute, and therefore the user portrait of the target user can be generated according to the target classification attribute.
The user characteristic information of the target user is obtained, and the user characteristic information comprises user attribute information and historical behavior data of the target user; respectively generating a plurality of user feature vectors of a target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user attribute information, the category of an operation event contained in historical behavior data and the classification attribute of an operation object; inputting a plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes; acquiring preference level labels of target objects under the target classification attributes of target users output by a pre-trained machine learning model; and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute. Compared with the method for obtaining the user portrait based on the label statistics, the user characteristic vector can comprehensively represent the user preference, so that the accuracy of predicting the preference level label of the user on the target object under the specific classification attribute can be improved, the target user with the specific user portrait can be conveniently found, and the accuracy of recommending related services can be improved.
Embodiments of the apparatus of the present application are described below, which may be used to implement the method for generating a user representation of the embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method described above in the present application.
FIG. 11 shows a block diagram of an apparatus for generating a user representation according to an embodiment of the present application.
Referring to FIG. 11, an apparatus 1100 according to an embodiment of the application includes: a first acquisition unit 1110, a first generation unit 1120, an input unit 1130, an output unit 1140, and a second generation unit 1150. The first obtaining unit 1110 is configured to obtain user characteristic information of a target user, where the user characteristic information includes user attribute information and historical behavior data of the target user; a first generating unit 1120, configured to generate, based on the user feature information, a plurality of user feature vectors of the target user in a user feature dimension, where the user feature dimension is a feature dimension formed by a category of the user attribute information, and an operation event category and a classification attribute of an operation object included in the historical behavior data; an input unit 1130, configured to input the plurality of user feature vectors into a pre-trained machine learning model, where the pre-trained machine learning model includes a plurality of user feature vectors of sample users in a user feature dimension and preference level labels of the sample users for target objects under a target classification attribute; an output unit 1140, configured to obtain preference level labels of target objects under the target classification attributes of the target users output by the pre-trained machine learning model; a second generating unit 1150, configured to generate a user portrait of the target user based on the target classification attribute if the obtained preference level tag matches a preset preference level tag.
In some embodiments of the present application, based on the foregoing solution, the apparatus for generating a user representation further includes: a third generating unit, configured to perform conversion processing on the multiple user feature vectors respectively, and generate first feature vectors corresponding to the multiple user feature vectors, where the first feature vectors corresponding to the multiple user features are vectors of the same dimension; the aggregation unit is used for performing aggregation processing on first feature vectors corresponding to the plurality of user feature vectors based on preset incidence relations among the plurality of user feature information to generate a plurality of aggregated second feature vectors; and the predicting unit is used for predicting preference level labels of the target users on the target objects under the target classification attributes based on the aggregated second feature vectors, and the first feature vectors corresponding to the user features are vectors of the same dimension.
In some embodiments of the present application, based on the foregoing solution, the apparatus for generating a user representation further includes: the second acquisition unit is used for acquiring training set sample data used for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a plurality of user feature vectors of a sample user under a user feature dimension and a preference level label of the sample user on a target object under a target classification attribute; and the training unit is used for training the machine learning model to be trained through the training set sample data to obtain a pre-trained machine learning model.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: acquiring user characteristic information of candidate users in a candidate user set; for each candidate user in the candidate user set, determining a first class of candidate users and a second class of candidate users based on historical behavior data in user feature information of the candidate users, wherein the first class of candidate users are candidate users containing target operation events, the second class of candidate users are candidate users not containing the target operation events, and the target operation events refer to operation events for operating target objects under the target classification attributes; generating a plurality of user characteristic vectors of the first type candidate users under the user characteristic dimension based on the user characteristic information of the first type candidate users, and adding preference grade labels with high preference degrees to the first type candidate users to obtain sample data of a positive sample user; generating a plurality of user feature vectors of the second type candidate users under the user feature dimension based on the user feature information of the second type candidate users, and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users; and obtaining training set sample data used for training a machine learning model to be trained based on the sample data of the positive sample user and the sample data of the negative sample user.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: extracting candidate users from the second type of candidate users; and generating a plurality of user feature vectors of the extracted candidate users under the user feature dimension based on the extracted user feature information of the candidate users, and adding preference grade labels with low preference degrees to the extracted candidate users to obtain sample data of the negative sample users.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: determining portrait intensity of the second type candidate users based on frequency of historical behavior data contained in user characteristic information of the second type candidate users, wherein the portrait intensity and the frequency are in positive correlation; candidate users are extracted at a first ratio among the second class of candidate users having a portrait intensity higher than a predetermined portrait intensity threshold, and candidate users are extracted at a second ratio among the second class of candidate users having a portrait intensity lower than or equal to the predetermined portrait intensity threshold, the first ratio being greater than the second ratio.
In some embodiments of the present application, based on the foregoing scheme, the second obtaining unit is configured to: acquiring a configuration file, wherein the configuration file is used for configuring the attribute type of the acquired user attribute information, the operation event type of the acquired historical behavior data and the classification attribute of an operation object; for each candidate user in the candidate user set, acquiring user attribute information of the candidate user under the configured attribute category based on the category of the user attribute information configured in the configuration file; and for each candidate user in the candidate user set, acquiring historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object based on the operation event category and the classification attribute of the operation object configured in the configuration file.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.
It should be noted that the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU)1201, which can perform various appropriate actions and processes, such as executing the method described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU 1201, ROM 1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, and the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to embodiments of the present application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication portion 1209 and/or installed from the removable medium 1211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.
It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for generating a user representation, comprising:
acquiring user characteristic information of a target user, wherein the user characteristic information comprises user attribute information and historical behavior data of the target user;
respectively generating a plurality of user feature vectors of the target user under a user feature dimension based on the user feature information, wherein the user feature dimension is a feature dimension formed by the category of the user attribute information, the operation event category contained in the historical behavior data and the classification attribute of the operation object;
inputting the plurality of user feature vectors into a pre-trained machine learning model, wherein the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes;
acquiring a preference level label of a target object under the target classification attribute of a target user output by the pre-trained machine learning model;
and if the acquired preference level label is matched with a preset preference level label, generating a user portrait of the target user based on the target classification attribute.
2. The method of generating a user representation as claimed in claim 1 wherein the pre-trained machine learning model determines a preference level label of a target user for a target object under a target classification attribute based on:
respectively carrying out conversion processing on the plurality of user characteristic vectors to generate first characteristic vectors corresponding to the plurality of user characteristic vectors, wherein the first characteristic vectors corresponding to the plurality of user characteristics are vectors with the same dimensionality;
based on preset incidence relations among the plurality of user feature information, carrying out aggregation processing on first feature vectors corresponding to the plurality of user feature vectors to generate a plurality of aggregated second feature vectors, wherein the first feature vectors corresponding to the plurality of user features are vectors with the same dimensionality;
and predicting preference level labels of the target users on the target objects under the target classification attributes based on the aggregated second feature vectors.
3. A user representation generation method as claimed in claim 1, further comprising:
acquiring training set sample data for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a plurality of user feature vectors of a sample user under a user feature dimension and a preference level label of the sample user for a target object under a target classification attribute;
and training the machine learning model to be trained through the training set sample data to obtain the pre-trained machine learning model.
4. The method of generating a user representation of claim 3, wherein said obtaining training set sample data for training a machine learning model to be trained comprises:
acquiring user characteristic information of candidate users in a candidate user set;
for each candidate user in the candidate user set, determining a first class of candidate users and a second class of candidate users based on historical behavior data in user feature information of the candidate users, wherein the first class of candidate users are candidate users including target operation events, the second class of candidate users are candidate users not including the target operation events, and the target operation events refer to operation events for operating target objects under the target classification attributes;
generating a plurality of user characteristic vectors of the first type candidate users under the user characteristic dimension based on the user characteristic information of the first type candidate users, and adding preference grade labels with high preference degrees to the first type candidate users to obtain sample data of a positive sample user;
generating a plurality of user feature vectors of the second type candidate users under the user feature dimension based on the user feature information of the second type candidate users, and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users;
and obtaining training set sample data used for training a machine learning model to be trained based on the sample data of the positive sample user and the sample data of the negative sample user.
5. The method of claim 4, wherein the generating a plurality of user feature vectors of the second type candidate users in the user feature dimension based on the user feature information of the second type candidate users and adding preference level labels with low preference degrees to the second type candidate users to obtain sample data of negative sample users comprises:
extracting candidate users from the second type of candidate users;
and generating a plurality of user feature vectors of the extracted candidate users under the user feature dimension based on the extracted user feature information of the candidate users, and adding preference grade labels with low preference degrees to the extracted candidate users to obtain sample data of the negative sample users.
6. The method of claim 5, wherein the extracting candidate users from the second category of candidate users comprises:
determining portrait intensity of the second type candidate users based on frequency of historical behavior data contained in user characteristic information of the second type candidate users, wherein the portrait intensity and the frequency are in positive correlation;
candidate users are extracted at a first ratio among the second category of candidate users having a portrait intensity higher than a predetermined portrait intensity threshold, and candidate users are extracted at a second ratio among the second category of candidate users having a portrait intensity lower than or equal to the predetermined portrait intensity threshold, the first ratio being greater than the second ratio.
7. The method of claim 4, wherein the obtaining user feature information of the candidate users in the candidate user set comprises:
acquiring a configuration file, wherein the configuration file is used for configuring the attribute category of the acquired user attribute information, the operation event category of the acquired historical behavior data and the classification attribute of an operation object;
for each candidate user in the candidate user set, acquiring user attribute information of the candidate user under the configured attribute category based on the category of the user attribute information configured in the configuration file;
and for each candidate user in the candidate user set, acquiring historical behavior data of the candidate user under the operation event category and the classification attribute of the operation object based on the operation event category and the classification attribute of the operation object configured in the configuration file.
8. An apparatus for generating a user representation, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring user characteristic information of a target user, and the user characteristic information comprises user attribute information and historical behavior data of the target user;
a first generating unit, configured to generate, based on the user feature information, a plurality of user feature vectors of the target user in a user feature dimension, where the user feature dimension is a feature dimension formed by a category of the user attribute information, and an operation event category and a classification attribute of an operation object included in the historical behavior data;
the input unit is used for inputting the plurality of user feature vectors into a pre-trained machine learning model, and the pre-trained machine learning model comprises a plurality of user feature vectors of sample users under user feature dimensions and preference level labels of the sample users for target objects under target classification attributes;
the output unit is used for acquiring a preference level label of the target user on the target object under the target classification attribute output by the pre-trained machine learning model;
and the second generation unit is used for generating the user portrait of the target user based on the target classification attribute if the acquired preference level label is matched with a preset preference level label.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out a method of user representation generation as claimed in any one of claims 1 to 7.
10. A computer program medium having computer readable instructions stored thereon which, when executed by a processor, implement a method of user representation generation as claimed in any of claims 1 to 7.
CN202110184908.2A 2021-02-10 2021-02-10 User portrait generation method, device, electronic equipment and computer program medium Pending CN114912009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110184908.2A CN114912009A (en) 2021-02-10 2021-02-10 User portrait generation method, device, electronic equipment and computer program medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110184908.2A CN114912009A (en) 2021-02-10 2021-02-10 User portrait generation method, device, electronic equipment and computer program medium

Publications (1)

Publication Number Publication Date
CN114912009A true CN114912009A (en) 2022-08-16

Family

ID=82761573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110184908.2A Pending CN114912009A (en) 2021-02-10 2021-02-10 User portrait generation method, device, electronic equipment and computer program medium

Country Status (1)

Country Link
CN (1) CN114912009A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522450A (en) * 2023-11-23 2024-02-06 赵瑞涛 User portrayal method, apparatus, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117522450A (en) * 2023-11-23 2024-02-06 赵瑞涛 User portrayal method, apparatus, device and storage medium

Similar Documents

Publication Publication Date Title
WO2022041979A1 (en) Information recommendation model training method and related device
Lian et al. Scalable content-aware collaborative filtering for location recommendation
CN109299994B (en) Recommendation method, device, equipment and readable storage medium
CN112163165A (en) Information recommendation method, device, equipment and computer readable storage medium
CN115917535A (en) Recommendation model training method, recommendation device and computer readable medium
CN112765480B (en) Information pushing method and device and computer readable storage medium
US12020267B2 (en) Method, apparatus, storage medium, and device for generating user profile
Wang et al. A novel multi-label classification algorithm based on K-nearest neighbor and random walk
CN115631008B (en) Commodity recommendation method, device, equipment and medium
Basha et al. A roadmap towards implementing parallel aspect level sentiment analysis
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
Afoudi et al. An enhanced recommender system based on heterogeneous graph link prediction
CN114912009A (en) User portrait generation method, device, electronic equipment and computer program medium
Xu et al. Research on context-aware group recommendation based on deep learning
CN116910357A (en) Data processing method and related device
Li et al. Empowering multi-class medical data classification by Group-of-Single-Class-predictors and transfer optimization: Cases of structured dataset by machine learning and radiological images by deep learning
CN116992124A (en) Label ordering method, device, equipment, medium and program product
Xin et al. When factorization meets heterogeneous latent topics: an interpretable cross-site recommendation framework
CN114298118B (en) Data processing method based on deep learning, related equipment and storage medium
Liu Restricted Boltzmann machine collaborative filtering recommendation algorithm based on project tag improvement
Wu et al. A Review of User Profiling Based on Social Networks
CN112101015A (en) Method and device for identifying multi-label object
CN111444338A (en) Text processing device, storage medium and equipment
CN111046300A (en) Method and device for determining crowd attributes of users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination