CN113780333A - User group classification method and device - Google Patents

User group classification method and device Download PDF

Info

Publication number
CN113780333A
CN113780333A CN202110694486.3A CN202110694486A CN113780333A CN 113780333 A CN113780333 A CN 113780333A CN 202110694486 A CN202110694486 A CN 202110694486A CN 113780333 A CN113780333 A CN 113780333A
Authority
CN
China
Prior art keywords
particle
classification
user
determining
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110694486.3A
Other languages
Chinese (zh)
Inventor
王守艺
刘斌
孙维维
令狐长清
陶帅帅
尹琦
赵荣超
郭媛媛
张宏峥
刘侨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Tuoxian Technology Co Ltd
Original Assignee
Beijing Jingdong Tuoxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Tuoxian Technology Co Ltd filed Critical Beijing Jingdong Tuoxian Technology Co Ltd
Priority to CN202110694486.3A priority Critical patent/CN113780333A/en
Publication of CN113780333A publication Critical patent/CN113780333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user group classification method and device, and relates to the technical field of internet. One embodiment of the method comprises: acquiring historical behavior data corresponding to one or more users respectively; determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data; dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster; determining pre-classification results corresponding to a plurality of groups to be classified respectively on a plurality of examples in parallel by using a classification model; and determining a user group corresponding to one or more users respectively by using a classification model according to the pre-classification result corresponding to each of the plurality of groups to be classified. The embodiment reduces the consumption of hardware resources when the user group is divided, and saves the hardware resources.

Description

User group classification method and device
Technical Field
The invention relates to the technical field of internet, in particular to a user group classification method and device.
Background
With the development of internet technology and electronic commerce, more and more users choose to view and purchase items on the e-commerce platform. For the e-commerce platform, in order to recommend articles to users more accurately, the users need to be classified, and the users are accurately classified into user groups. However, since the data corresponding to the users is too much, the excessive consumption of hardware resources when dividing the user group becomes a problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a user group classification method and apparatus, which can utilize a classification model running in parallel on multiple instances in a distributed cluster, first divide multiple behavior preference attributes corresponding to one or more users into multiple groups to be classified, then determine pre-classification results corresponding to the multiple groups to be classified in parallel on the multiple instances, and further determine a user group corresponding to one or more users according to the pre-classification results, thereby reducing consumption of hardware resources when the user group is divided, and saving hardware resources.
To achieve the above object, according to an aspect of an embodiment of the present invention, a user group classification method is provided.
The user group classification method of the embodiment of the invention comprises the following steps:
acquiring historical behavior data corresponding to one or more users respectively;
determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data;
dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster;
determining pre-classification results corresponding to a plurality of groups to be classified respectively on a plurality of examples in parallel by using a classification model;
and determining a user group corresponding to one or more users respectively by using a classification model according to the pre-classification result corresponding to each of the plurality of groups to be classified.
Alternatively,
the classification model is trained based on a particle swarm algorithm.
Alternatively,
utilizing a classification model to divide a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified, wherein the classification model comprises the following steps:
taking each behavior preference attribute of each user as a particle to obtain a particle swarm corresponding to one or more users, wherein the particle swarm comprises a plurality of particles;
the particle swarm is used as the input of a classification model, one particle swarm is divided into a plurality of particle subgroups by the classification model, and the particle subgroups are used as groups to be classified.
Alternatively,
the difference of the particle numbers of the respective particle groups is not greater than a preset number threshold.
Alternatively,
utilizing a classification model to determine pre-classification results respectively corresponding to a plurality of groups to be classified in parallel on a plurality of instances, wherein the pre-classification results comprise:
adding a plurality of sub-particle groups into a plurality of instances respectively by utilizing a classification model so that each instance corresponds to one sub-particle group respectively;
utilizing the classification model, executing the following steps on each instance in parallel to determine the pre-classification results respectively corresponding to a plurality of groups to be classified:
calculating the optimal solution of each particle in the sub-particle group corresponding to the example;
according to the optimal solution of each particle, calculating the optimal solution of the sub-particle group corresponding to the example;
and taking the optimal solution of the sub-particle group as a pre-classification result of the group to be classified corresponding to the sub-particle group.
Alternatively,
according to the pre-classification results respectively corresponding to the multiple groups to be classified, determining user groups respectively corresponding to one or more users by using a classification model, wherein the method comprises the following steps:
determining a target classification result of a particle swarm to which the sub-particle swarm belongs according to a pre-classification result by utilizing a classification model;
and judging whether the particle swarm meets a convergence condition or not according to the target classification result, and if so, determining a user group corresponding to one or more users respectively according to the target classification result.
Alternatively,
the target classification result indicates a plurality of particle cluster clusters;
according to the target classification result, determining a user group corresponding to one or more users respectively, comprising:
determining a central aggregation point of each particle aggregation cluster;
determining the distance between the central aggregation point and the edge of the particle aggregation cluster;
determining a region corresponding to a user group from the particle cluster according to a preset proportion and the distance between the central aggregation point and the edge of the particle cluster;
and determining whether the user belongs to a user group corresponding to the region according to whether the particle corresponding to the behavior preference attribute corresponding to the user is located in the region.
Alternatively,
and when the classification model judges that the particle swarm does not meet the convergence condition, the particle swarm is divided into a plurality of particle swarm again.
Alternatively,
determining a plurality of behavior preference attributes respectively corresponding to one or more users according to the historical behavior data, wherein the determining comprises the following steps:
for each user, performing:
inputting historical behavior data corresponding to a user into a user portrait model, wherein the user portrait model comprises a plurality of preset labels, and obtaining a value of the user corresponding to each preset label;
and taking each preset label and the score corresponding to the preset label as behavior preference attributes corresponding to the user.
Alternatively,
the method further comprises the following steps:
and recommending the articles to the user according to an article recommendation strategy preset for the user group.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a user group classification apparatus.
The user group classification device comprises a data acquisition module, an attribute determination module, a group division module, a pre-classification module and a user group determination module; wherein:
the data acquisition module is used for acquiring historical behavior data corresponding to one or more users respectively;
the attribute determining module is used for determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data;
the group dividing module is used for dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing the classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster;
the pre-classification module is used for determining pre-classification results corresponding to a plurality of groups to be classified in parallel on a plurality of examples by utilizing a classification model;
and the user group determining module is used for determining one or more user groups respectively corresponding to the users by using the classification model according to the pre-classification results respectively corresponding to the multiple groups to be classified.
To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a user group classification electronic device.
An electronic device for classifying a user group according to an embodiment of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by one or more processors, the one or more processors implement the user group classification method of the embodiment of the invention.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.
A computer-readable storage medium of an embodiment of the present invention stores thereon a computer program, which, when executed by a processor, implements a user group classification method of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model which runs on a plurality of examples in parallel in a distributed cluster, determining pre-classification results corresponding to the plurality of groups to be classified in parallel on the plurality of examples, and determining user groups corresponding to one or more users according to the pre-classification results, so that the consumption of hardware resources is reduced when the user groups are divided, and the hardware resources are saved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a user group classification method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a process for determining a plurality of behavior preference attributes corresponding to one or more users, respectively, according to historical behavior data;
FIG. 3 is a schematic diagram of a process of dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by using a classification model;
FIG. 4 is a schematic diagram of a process for determining pre-classification results corresponding to a plurality of groups to be classified in parallel on a plurality of instances by using a classification model;
FIG. 5 is a schematic diagram of a specific implementation process of steps S403-S404;
FIG. 6 is a schematic diagram of a specific implementation process of step S105;
fig. 7 is a schematic diagram of a specific implementation process of step S603;
FIG. 8 is a schematic diagram of the main steps of another user population classification method;
FIG. 9 is a schematic diagram of the main modules of a user group classification apparatus according to an embodiment of the present invention;
FIG. 10 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 11 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram of the main steps of a user group classification method according to an embodiment of the present invention.
As shown in fig. 1, a user group classification method according to an embodiment of the present invention mainly includes the following steps:
step S101: and acquiring historical behavior data corresponding to one or more users respectively.
In embodiments of the present invention, historical behavior data includes, but is not limited to, any one or more of: categories of items purchased by a user (e.g., daily necessities, athletic fitness equipment, clothing, cosmetics, snacks, etc.), number of times the user purchased the same item, purchase price preferences of the user, specific information about items purchased by the user, location of the user (precision may go to province and/or city), and age of the user.
In the embodiment of the invention, after historical behavior data corresponding to one or more users are obtained, denoising processing can be performed on the historical behavior data. Specifically, data that cannot identify the user characteristics in the historical behavior data may be removed, for example, for a certain class of articles, if a certain user purchases the articles only once, data corresponding to this purchase (including the class of the articles, the number of times the user purchases the articles, specific information of the articles, and the like) is not sufficient to identify the characteristics of the user, such data may be removed to reduce the data amount of the historical behavior data, further reduce the data amount of the subsequent input classification model, and reduce hardware resources consumed during the operation of the classification model.
Step S102: and determining a plurality of behavior preference attributes corresponding to one or more users respectively according to the historical behavior data.
In the embodiment of the present invention, a process of determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data is shown in fig. 2, and specifically, the following steps may be performed for each user:
step S201: inputting historical behavior data corresponding to a user into a user portrait model, wherein the user portrait model comprises a plurality of preset labels, and obtaining a value of the user corresponding to each preset label;
step S202: and taking each preset label and the score corresponding to the preset label as behavior preference attributes corresponding to the user.
In embodiments of the present invention, the preset labels include, but are not limited to, any one or more of the following:
Figure BDA0003127538460000071
Figure BDA0003127538460000081
in the embodiment of the invention, the score of the user corresponding to each preset label is determined by the user portrait model according to the preset label and the historical behavior data of the user, the score is used for representing the conformity degree of the preset label and the user characteristic, and different scores represent different conformity degrees of the preset label and the user characteristic. In a preferred embodiment of the present invention, a score of 1 corresponding to a preset tag indicates that the user is a severe fan of the preset tag; 2, the user is characterized as a moderate fan of the preset label; 3, representing that the user is a mild fan of the preset label; -1 score characterizing the user as a non-hobbyist of the preset label.
Step S103: dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster.
In the embodiment of the invention, the classification model is trained based on a particle swarm algorithm. In a preferred embodiment of the present invention, the classification model is trained based on a binary particle swarm algorithm.
In the embodiment of the invention, the classification model runs on a plurality of instances in a distributed cluster based on a MapReduce distributed computing framework in parallel. Correspondingly, the classification model also adopts a MapReduce distributed computing mode during parallel operation. Specifically, the method comprises the following steps: the MapReduce distributed computing framework abstracts a complex parallel computing process running on a large-scale cluster into a Map function and a Reduce function, so that after a plurality of behavior preference attributes are divided into a plurality of groups to be classified by a classification model, each group to be classified is treated as a Map task, the Map tasks are carried out in parallel, and after each Map task is completely executed, an execution result is collected to the Reduce task to be treated. By means of the particle swarm algorithm and the MapReduce distributed computing mode, the problem that hardware resources are excessively consumed during parallel computing of large-scale data can be solved.
In the embodiment of the present invention, a process of dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by using a classification model is shown in fig. 3, and mainly includes the following steps:
step S301: taking each behavior preference attribute of each user as a particle to obtain a particle swarm corresponding to one or more users, wherein the particle swarm comprises a plurality of particles;
step S302: the particle swarm is used as the input of a classification model, one particle swarm is divided into a plurality of particle subgroups by the classification model, and the particle subgroups are used as groups to be classified.
In the embodiment of the present invention, the number of sub-particle groups may be configured so that the classification model divides one particle group into a specified number of sub-particle groups. The configuration number may be determined according to hardware resources of the distributed cluster, and is an integer greater than 1.
In the embodiment of the present invention, a process of dividing the particle swarm into a plurality of sub-swarm groups by the classification model may be random, and a difference value of the numbers of particles of the respective sub-swarm groups is not greater than a preset number threshold. In a preferred embodiment of the present invention, the predetermined number threshold is 0, that is, the number of particles in each sub-particle group is equal.
Step S104: and determining the pre-classification results corresponding to the multiple groups to be classified respectively on the multiple instances in parallel by utilizing the classification model.
In the embodiment of the present invention, a process of determining, in parallel, pre-classification results corresponding to a plurality of groups to be classified on a plurality of instances by using a classification model is shown in fig. 4, and mainly includes the following steps:
step S401: adding a plurality of sub-particle groups into a plurality of instances respectively by utilizing a classification model so that each instance corresponds to one sub-particle group respectively;
step S402: executing steps S403-S405 on each instance in parallel by using the classification model to determine pre-classification results corresponding to a plurality of groups to be classified respectively;
step S403: calculating the optimal solution of each particle in the sub-particle group corresponding to the example;
step S404: according to the optimal solution of each particle, calculating the optimal solution of the sub-particle group corresponding to the example;
step S405: and taking the optimal solution of the sub-particle group as a pre-classification result of the group to be classified corresponding to the sub-particle group.
In the embodiment of the present invention, a specific implementation process of steps S403-S404 is shown in fig. 5, and mainly includes the following steps:
step S501: setting the maximum speed and the search space corresponding to the sub-particle group and the initial speed and the initial position of each particle in the sub-particle group to initialize the sub-particle group;
step S502: calculating a target function of each particle in the particle group to obtain an optimal solution of each particle;
step S503: calculating the current optimal solution of the sub-particle group according to the optimal solution of each particle;
step S504: updating the speed and position of each particle in the particle group;
step S505: judging whether the sub-particle group meets the convergence condition, if not, returning to execute the step S502, and if so, executing the step S506;
step S506: and taking the current optimal solution of the sub-particle group as the optimal solution of the sub-particle group.
In the embodiment of the present invention, the convergence condition in step S504 may be that a deviation between current optimal solutions of two adjacent sub-particle groups is smaller than a first preset threshold. Specifically, when the deviation between the current optimal solutions of two adjacent sub-particle groups is not less than a first preset threshold, it is determined that the sub-particle groups do not satisfy the convergence condition; and when the deviation between the current optimal solutions of the two adjacent sub-particle groups is smaller than a first preset threshold value, judging that the sub-particle groups meet the convergence condition.
In this embodiment of the present invention, in step S505, the speed and the position of each particle in the sub-particle group may be updated through an update formula, where the update formula is:
Figure BDA0003127538460000101
wherein the content of the first and second substances,
Figure BDA0003127538460000111
characterizing a velocity of an ith particle at a kth iteration;
Figure BDA0003127538460000112
characterizing the speed of the ith particle at the (k + 1) th iteration;
Figure BDA0003127538460000113
characterizing the position of the ith particle at the kth iteration; w represents an inertia factor; c. C1And c2Called the acceleration constant, generally taken as c1=c2∈[0,4];r1And r2Is [0,1 ]]A random number over the interval;
Figure BDA0003127538460000114
the self history best point searched by the ith particle at the kth iteration is characterized,
Figure BDA0003127538460000115
and (4) representing the historical best point of the whole sub-particle group searched by the ith particle in the k iteration.
Step S105: and determining a user group corresponding to one or more users respectively by using a classification model according to the pre-classification result corresponding to each of the plurality of groups to be classified.
In the embodiment of the present invention, a specific implementation process of step S105 is shown in fig. 6, and mainly includes the following steps:
step S601: determining a target classification result of a particle swarm to which the sub-particle swarm belongs according to a pre-classification result by utilizing a classification model;
step S602: judging whether the particle swarm meets a convergence condition or not according to the target classification result, if so, executing a step S603, and if not, executing a step S604;
step S603: determining a user group corresponding to one or more users respectively according to the target classification result;
step S604: the population of particles is subdivided into a plurality of sub-population groups.
In the embodiment of the present invention, the specific implementation process of step S601 may be: and summarizing a plurality of pre-classification results obtained after the Map tasks are executed to a Reduce task by using a classification model, and processing to obtain a target classification result of the particle swarm. And the target classification result is the current optimal solution of the particle swarm calculated according to the optimal solution of each particle swarm, and the current optimal solution of the particle swarm at the moment may or may not meet the convergence condition. When the convergence condition is not met, the particle swarm is divided into a plurality of particle swarm again, and the pre-classification result of each particle swarm is determined again; when the convergence condition is met, the user groups corresponding to one or more users respectively can be determined according to the current optimal solution of the particle swarm.
In the embodiment of the present invention, the convergence condition in step S602 may be that a deviation between the target classification results of two adjacent particle swarms is smaller than a second preset threshold. Specifically, when the deviation between the target classification results of two adjacent particle swarms is not less than a second preset threshold, judging that the particle swarms do not meet the convergence condition; and when the deviation between the target classification results of the two adjacent particle swarms is smaller than a second preset threshold value, judging that the particle swarms meet the convergence condition. The second preset threshold may be the same as the first preset threshold, or may be different from the first preset threshold.
In the embodiment of the present invention, the target classification result indicates a plurality of particle cluster clusters, and at this time, the specific implementation process of step S603 is as shown in fig. 7, and for each particle cluster, the following steps are performed:
step S701: determining a central aggregation point of the particle aggregation cluster;
step S702: determining the distance between the central aggregation point and the edge of the particle aggregation cluster;
step S703: determining a region corresponding to a user group from the particle cluster according to a preset proportion and the distance between the central aggregation point and the edge of the particle cluster;
step S704: and determining whether the user belongs to a user group corresponding to the region according to whether the particle corresponding to the behavior preference attribute corresponding to the user is located in the region.
In this embodiment of the present invention, the manner of determining the area corresponding to the user group in step S703 may be: the distance between the central gathering point and the edge of the area corresponding to the user group/the distance between the central gathering point and the edge of the particle gathering cluster is equal to a preset proportion. In a preferred embodiment of the invention, the predetermined ratio is 0.5.
In the embodiment of the present invention, after the step S604 re-divides the particle group into a plurality of sub-particle groups, each sub-particle group may be regarded as a group to be classified, and the step S104 is executed to re-determine the pre-classification result.
In this embodiment of the present invention, after determining a user group corresponding to each of one or more users, the method further includes: and recommending the articles to the user according to an article recommendation strategy preset for the user group, so that the articles matched with the user can be recommended more accurately.
A more complete embodiment is described below in detail for a user group classification method provided in the present solution, as shown in fig. 8, fig. 8 is a schematic diagram of main steps of another user group classification method, and the method mainly includes the following steps:
step S801: acquiring historical behavior data corresponding to one or more users respectively;
step S802: determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data;
step S803: dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster;
steps S8041-S804 n: respectively adding a group 1 to be classified, a group 2 to be classified, … … and a group n to be classified into a Map task and starting;
steps S8051-S805 n: calculating the optimal solution of each particle in the group 1 to be classified, the group 2 to be classified, the group … … to be classified and the group n to be classified in parallel;
steps S8061-S806 n: calculating the pre-classification results of the group 1 to be classified, the group 2 to be classified, … … and the group n to be classified in parallel;
step S807: summarizing the pre-classification results of a plurality of groups to be classified, adding the pre-classification results into the Reduce task, and starting the Reduce task;
step S808: determining target classification results of particle swarms corresponding to a plurality of groups to be classified according to a plurality of pre-classification results by using a classification model;
step S809: judging whether the particle swarm meets the convergence condition or not according to the target classification result, if so, executing a step S810, and if not, returning to execute the step S803;
step S810: and determining a user group corresponding to one or more users respectively according to the target classification result.
According to the user group classification method provided by the embodiment of the invention, a classification model which runs on a plurality of examples in a distributed cluster in parallel can be utilized, a plurality of behavior preference attributes corresponding to one or more users are firstly divided into a plurality of groups to be classified, then the pre-classification results corresponding to the plurality of groups to be classified are determined in parallel on the plurality of examples, and further the user groups corresponding to one or more users are determined according to the pre-classification results, so that the consumption of hardware resources is reduced when the user groups are divided, and the hardware resources are saved.
Fig. 9 is a schematic diagram of main modules of a user group classification device according to an embodiment of the present invention.
As shown in fig. 9, a user group classification apparatus 900 according to an embodiment of the present invention includes: a data acquisition module 901, an attribute determination module 902, a group division module 903, a pre-classification module 904 and a user group determination module 905; wherein:
a data obtaining module 901, configured to obtain historical behavior data corresponding to one or more users respectively;
an attribute determining module 902, configured to determine, according to historical behavior data, a plurality of behavior preference attributes corresponding to one or more users, respectively;
a group dividing module 903, configured to divide, by using a classification model, a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified; wherein the classification model runs in parallel on multiple instances in the distributed cluster;
a pre-classification module 904, configured to determine, in parallel, pre-classification results corresponding to the multiple groups to be classified on multiple instances by using the classification model;
and the user group determining module 905 is configured to determine, according to the pre-classification results respectively corresponding to the multiple groups to be classified, a user group respectively corresponding to one or more users by using the classification model.
In the embodiment of the invention, the classification model is trained based on a particle swarm algorithm.
In this embodiment of the present invention, the group division module 903 is further configured to: taking each behavior preference attribute of each user as a particle to obtain a particle swarm corresponding to one or more users, wherein the particle swarm comprises a plurality of particles; the particle swarm is used as the input of a classification model, one particle swarm is divided into a plurality of particle subgroups by the classification model, and the particle subgroups are used as groups to be classified.
In the embodiment of the present invention, the difference between the numbers of particles in each of the sub-particle groups is not greater than the preset number threshold.
In an embodiment of the present invention, the pre-classification module 904 is further configured to: adding a plurality of sub-particle groups into a plurality of instances respectively by utilizing a classification model so that each instance corresponds to one sub-particle group respectively; utilizing the classification model, executing the following steps on each instance in parallel to determine the pre-classification results respectively corresponding to a plurality of groups to be classified: calculating the optimal solution of each particle in the sub-particle group corresponding to the example; according to the optimal solution of each particle, calculating the optimal solution of the sub-particle group corresponding to the example; and taking the optimal solution of the sub-particle group as a pre-classification result of the group to be classified corresponding to the sub-particle group.
In this embodiment of the present invention, the user group determining module 905 is further configured to: determining a target classification result of a particle swarm to which the sub-particle swarm belongs according to a pre-classification result by utilizing a classification model; and judging whether the particle swarm meets a convergence condition or not according to the target classification result, and if so, determining a user group corresponding to one or more users respectively according to the target classification result.
In an embodiment of the present invention, the target classification result indicates a plurality of particle cluster clusters; the user population determination module 905 is further configured to: determining a central aggregation point of each particle aggregation cluster; determining the distance between the central aggregation point and the edge of the particle aggregation cluster; determining a region corresponding to a user group from the particle cluster according to a preset proportion and the distance between the central aggregation point and the edge of the particle cluster; and determining whether the user belongs to a user group corresponding to the region according to whether the particle corresponding to the behavior preference attribute corresponding to the user is located in the region.
In this embodiment of the present invention, the user group determining module 905 is further configured to: and when the classification model judges that the particle swarm does not meet the convergence condition, the particle swarm is divided into a plurality of particle swarm again.
In this embodiment of the present invention, the attribute determining module 902 is further configured to: for each user, performing: inputting historical behavior data corresponding to a user into a user portrait model, wherein the user portrait model comprises a plurality of preset labels, and obtaining a value of the user corresponding to each preset label; and taking each preset label and the score corresponding to the preset label as behavior preference attributes corresponding to the user.
In this embodiment of the present invention, the user group determining module 905 is further configured to: and recommending the articles to the user according to an article recommendation strategy preset for the user group.
According to the user group classification device provided by the embodiment of the invention, a classification model which runs on a plurality of examples in a distributed cluster in parallel can be utilized, a plurality of behavior preference attributes corresponding to one or more users are firstly divided into a plurality of groups to be classified, then the pre-classification results corresponding to the plurality of groups to be classified are determined in parallel on the plurality of examples, and further the user groups corresponding to one or more users are determined according to the pre-classification results, so that the consumption of hardware resources is reduced when the user groups are divided, and the hardware resources are saved.
Fig. 10 shows an exemplary system architecture 1000 of a user population classification method or a user population classification apparatus to which an embodiment of the present invention may be applied.
As shown in fig. 10, the system architecture 1000 may include terminal devices 1001, 1002, 1003, a network 1004, and an electronic device 1005. The network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the electronic device 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may interact with an electronic device 1005 via a network 1004 using terminal devices 1001, 1002, 1003 to receive or transmit messages or the like. Various communication client applications, such as shopping applications, may be installed on the terminal devices 1001, 1002, and 1003.
The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The electronic device 1005 may be a server that provides various services, for example, a background management server that supports a shopping site browsed by a user using the terminal devices 1001, 1002, and 1003. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that, a user group classification method provided by the embodiment of the present invention is generally executed by the electronic device 1005, and accordingly, a user group classification apparatus is generally disposed in the electronic device 1005.
It should be understood that the number of terminal devices, networks, and electronic devices in fig. 10 is merely illustrative. There may be any number of terminal devices, networks, and electronic devices, as desired for implementation.
Referring now to FIG. 11, shown is a block diagram of a computer system 1100 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 1101.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a data acquisition module, an attribute determination module, a group partitioning module, a pre-classification module, and a user group determination module. The names of the modules do not limit the module itself in some cases, for example, the data acquisition module may also be described as a "module for acquiring historical behavior data corresponding to one or more users respectively".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring historical behavior data corresponding to one or more users respectively; determining a plurality of behavior preference attributes corresponding to one or more users respectively according to historical behavior data; dividing a plurality of behavior preference attributes corresponding to one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in the distributed cluster; determining pre-classification results corresponding to a plurality of groups to be classified respectively on a plurality of examples in parallel by using a classification model; and determining a user group corresponding to one or more users respectively by using a classification model according to the pre-classification result corresponding to each of the plurality of groups to be classified.
According to the technical scheme of the embodiment of the invention, a classification model which runs on a plurality of examples in a distributed cluster in parallel can be utilized, a plurality of behavior preference attributes corresponding to one or a plurality of users are firstly divided into a plurality of groups to be classified, then the pre-classification results corresponding to the plurality of groups to be classified are determined in parallel on the plurality of examples, and then the user groups corresponding to one or a plurality of users are determined according to the pre-classification results, so that the consumption of hardware resources is reduced when the user groups are divided, and the hardware resources are saved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A user group classification method is characterized by comprising the following steps:
acquiring historical behavior data corresponding to one or more users respectively;
determining a plurality of behavior preference attributes corresponding to the one or more users respectively according to the historical behavior data;
dividing a plurality of behavior preference attributes corresponding to the one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in a distributed cluster;
determining pre-classification results respectively corresponding to the multiple groups to be classified in parallel on the multiple instances by utilizing the classification model;
and determining user groups respectively corresponding to the one or more users by utilizing the classification model according to the pre-classification results respectively corresponding to the multiple groups to be classified.
2. The method of claim 1,
the classification model is trained based on a particle swarm algorithm.
3. The method of claim 2, wherein the using the classification model to divide the plurality of behavior preference attributes corresponding to the one or more users into a plurality of groups to be classified comprises:
taking each behavior preference attribute of each user as a particle to obtain a particle swarm corresponding to the one or more users, wherein the particle swarm comprises a plurality of particles;
and taking the particle swarm as the input of the classification model, dividing the particle swarm into a plurality of particle subgroups by using the classification model, and taking the particle subgroups as the group to be classified.
4. The method of claim 3,
the difference of the particle numbers of the sub-particle groups is not greater than a preset number threshold.
5. The method according to claim 3, wherein the determining, in parallel, the pre-classification results corresponding to the plurality of groups to be classified respectively on the plurality of instances by using the classification model comprises:
adding the plurality of sub-particle groups into the plurality of instances respectively by using the classification model so that each instance corresponds to one sub-particle group respectively;
using the classification model, executing the following steps on each instance in parallel to determine the pre-classification results respectively corresponding to the multiple groups to be classified:
calculating the optimal solution of each particle in the sub-particle group corresponding to the example;
calculating the optimal solution of the sub-particle group corresponding to the instance according to the optimal solution of each particle;
and taking the optimal solution of the sub-particle group as a pre-classification result of the group to be classified corresponding to the sub-particle group.
6. The method according to claim 3, wherein the determining, by using the classification model, user groups respectively corresponding to the one or more users according to the pre-classification results respectively corresponding to the plurality of groups to be classified comprises:
determining a target classification result of a particle swarm to which the particle swarm belongs according to the pre-classification result by utilizing the classification model;
and judging whether the particle swarm meets a convergence condition or not according to the target classification result, and if so, determining a user swarm corresponding to the one or more users respectively according to the target classification result.
7. The method of claim 6, wherein the target classification result indicates a plurality of particle cluster clusters;
the determining, according to the target classification result, user groups respectively corresponding to the one or more users includes:
for each particle cluster, determining a central cluster point of the particle cluster;
determining a distance between the central focal point to an edge of the cluster of particle clusters;
determining a region corresponding to the user group from the particle aggregation cluster according to a preset proportion and the distance between the central aggregation point and the edge of the particle aggregation cluster;
and determining whether the user belongs to a user group corresponding to the region according to whether the particles corresponding to the behavior preference attribute corresponding to the user are located in the region.
8. The method of claim 6,
and when the classification model judges that the particle swarm does not meet the convergence condition, the particle swarm is divided into a plurality of particle swarm again.
9. The method of claim 1, wherein determining a plurality of behavior preference attributes corresponding to the one or more users respectively according to the historical behavior data comprises:
for each of the users, performing:
inputting historical behavior data corresponding to the user into a user portrait model, wherein the user portrait model comprises a plurality of preset labels, and the value of the user corresponding to each preset label is obtained;
and taking each preset label and the score corresponding to the preset label as behavior preference attributes corresponding to the user.
10. The method of claim 1, further comprising: recommending the articles to the user according to an article recommendation strategy preset for the user group.
11. A user group classification device is characterized by comprising a data acquisition module, an attribute determination module, a group division module, a pre-classification module and a user group determination module; wherein:
the data acquisition module is used for acquiring historical behavior data corresponding to one or more users respectively;
the attribute determining module is used for determining a plurality of behavior preference attributes corresponding to the one or more users respectively according to the historical behavior data;
the group dividing module is used for dividing a plurality of behavior preference attributes corresponding to the one or more users into a plurality of groups to be classified by utilizing a classification model; wherein the classification model runs in parallel on multiple instances in a distributed cluster;
the pre-classification module is used for determining pre-classification results corresponding to the multiple groups to be classified in parallel on the multiple instances by using the classification model;
and the user group determining module is used for determining the user groups respectively corresponding to the one or more users by using the classification model according to the pre-classification results respectively corresponding to the multiple groups to be classified.
12. An electronic device for user population classification, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202110694486.3A 2021-06-22 2021-06-22 User group classification method and device Pending CN113780333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110694486.3A CN113780333A (en) 2021-06-22 2021-06-22 User group classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110694486.3A CN113780333A (en) 2021-06-22 2021-06-22 User group classification method and device

Publications (1)

Publication Number Publication Date
CN113780333A true CN113780333A (en) 2021-12-10

Family

ID=78835800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110694486.3A Pending CN113780333A (en) 2021-06-22 2021-06-22 User group classification method and device

Country Status (1)

Country Link
CN (1) CN113780333A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662595A (en) * 2022-03-25 2022-06-24 王登辉 Big data fusion processing method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114662595A (en) * 2022-03-25 2022-06-24 王登辉 Big data fusion processing method and system

Similar Documents

Publication Publication Date Title
EP4198771A1 (en) Data processing method and apparatus, computer readable medium, and electronic device
WO2019052263A1 (en) Object filtering method and system
CN111523977A (en) Wave order set creating method and device, computing equipment and medium
CN112330382B (en) Item recommendation method, device, computing equipment and medium
CN112818230A (en) Content recommendation method and device, electronic equipment and storage medium
CN112749323A (en) Method and device for constructing user portrait
CN113780333A (en) User group classification method and device
CN112950321A (en) Article recommendation method and device
CN111382315A (en) Merging method of sub-graph isomorphic matching results, electronic device and storage medium
CN110807097A (en) Method and device for analyzing data
WO2021208174A1 (en) Distributed-type graph computation method, terminal, system, and storage medium
CN112449217B (en) Method and device for pushing video, electronic equipment and computer readable medium
CN113360816A (en) Click rate prediction method and device
CN111667018A (en) Object clustering method and device, computer readable medium and electronic equipment
CN112667770A (en) Method and device for classifying articles
CN111833085A (en) Method and device for calculating price of article
CN115238676A (en) Method and device for identifying hot spots of bidding demands, storage medium and electronic equipment
CN114036397A (en) Data recommendation method and device, electronic equipment and medium
CN113762535A (en) Article recall method and device
CN113961797A (en) Resource recommendation method and device, electronic equipment and readable storage medium
CN113191840A (en) Article information display method and device, electronic equipment and computer readable medium
CN113743973A (en) Method and device for analyzing market hotspot trend
CN112905885A (en) Method, apparatus, device, medium, and program product for recommending resources to a user
CN113111133A (en) User classification method and device
CN111783808A (en) Method and apparatus for generating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination