CN111563775A - Crowd division method and device - Google Patents

Crowd division method and device Download PDF

Info

Publication number
CN111563775A
CN111563775A CN202010383874.5A CN202010383874A CN111563775A CN 111563775 A CN111563775 A CN 111563775A CN 202010383874 A CN202010383874 A CN 202010383874A CN 111563775 A CN111563775 A CN 111563775A
Authority
CN
China
Prior art keywords
data
model
behavior data
crowd
evaluation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010383874.5A
Other languages
Chinese (zh)
Inventor
李见黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenyan Intelligent Technology Co ltd
Original Assignee
Beijing Shenyan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenyan Intelligent Technology Co ltd filed Critical Beijing Shenyan Intelligent Technology Co ltd
Priority to CN202010383874.5A priority Critical patent/CN111563775A/en
Publication of CN111563775A publication Critical patent/CN111563775A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0254Targeted advertisements based on statistics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a crowd division method and device. Wherein, the method comprises the following steps: receiving behavior data to be evaluated, wherein the behavior data is data for operating target information; and inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training multiple groups of training data, and each group of training data comprises historical behavior data and corresponding scores. The invention solves the technical problem that the actual condition of the crowd is difficult to reflect due to the fact that certain standards are lacked in the crowd division according to the labels in the related technology.

Description

Crowd division method and device
Technical Field
The invention relates to the field of crowd division, in particular to a crowd division method and device.
Background
In the advertisement putting process, certain crowd division is needed, and certain standards are often lacked or certain fixed labels are used for carrying out the crowd division, so that the current condition of the crowd is difficult to reflect. A score based on the real-time behavior of the population better reflects this value.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a crowd division method and a device, which are used for at least solving the technical problem that the actual condition of crowds is difficult to reflect due to the fact that certain standards are lacked in crowd division according to labels in the related art.
According to an aspect of an embodiment of the present invention, there is provided a crowd division method, including: receiving behavior data of a crowd, wherein the behavior data is data of the crowd operating target information to be put; inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training multiple groups of training data, and each group of training data comprises historical behavior data and corresponding scores; classifying the population according to the score.
Optionally, inputting the behavior data into the evaluation model, and before outputting the score of the behavior data by the evaluation model, the method includes: selecting a model algorithm according to the data characteristics of the historical behavior data to construct a basic model; establishing a plurality of groups of training data through the historical behavior data and the corresponding scores, and training the basic model; and optimizing the trained basic model, and determining the evaluation model.
Optionally, the trained basic model is optimized, and it is determined that the evaluation model includes at least one of the following: optimizing and adjusting the parameters of the trained basic model through a model parameter adjusting algorithm to determine the evaluation model; segmenting the training data, combining the training data into different training data, and training the evaluation model again to determine the evaluation model; and fusing a plurality of different models trained for a plurality of times to determine the evaluation model.
Optionally, the segmenting the training data, combining the training data into different training data, and the retraining the evaluation model includes: segmenting the training data in a cross validation mode, combining the training data into different training data, and training the evaluation model again; wherein the cross-validation comprises at least one of: a simple cross verification method, an S-turn cross verification method and a left cross verification method.
Optionally, fusing the multiple different models trained multiple times includes at least one of: fusing a plurality of different models in a weighted average mode; fusing a plurality of different models in a weighted voting mode; a plurality of different said models, being a plurality of primary learners, are fused by a secondary learner.
Optionally, after the inputting the behavior data into the evaluation model and the outputting the score of the behavior data by the evaluation model, the method further includes: determining a predictive scoring curve for the behavioral data; and calibrating the prediction scoring curve according to the scoring curve of the historical behavior data, and determining the calibrated score of the behavior data.
Optionally, classifying the population according to the score includes: and determining that the crowd belongs to the crowd category corresponding to the grading level in which the behavior data falls according to the grading and a preset grading level, wherein the preset grading level is multiple.
According to another aspect of the embodiments of the present invention, there is also provided a crowd division apparatus, including: the system comprises a receiving module, a display module and a display module, wherein the receiving module is used for receiving behavior data of a crowd, and the behavior data is data of the crowd operating target information to be released; the evaluation module is used for inputting the behavior data into an evaluation model and outputting the scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training a plurality of groups of training data, and each group of training data comprises historical behavior data and corresponding scores; and the classification module is used for classifying the crowd according to the scores.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the crowd division method according to any one of the above.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the crowd division method according to any one of the above.
In the embodiment of the invention, the behavior data of the received crowd is adopted, wherein the behavior data is the data of the crowd operating the target information to be released; inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training multiple groups of training data, and each group of training data comprises historical behavior data and corresponding scores; according to the mode of classifying the crowd according to the score, the crowd is classified according to the behavior data of the crowd and the evaluation model, the crowd is classified according to the score of the behavior data of the crowd, the aim of accurately classifying the crowd is fulfilled, the technical effect of improving the accuracy of the crowd classification is achieved, and the technical problem that the crowd classification according to the label in the related technology lacks certain standards and is difficult to reflect the actual situation of the crowd is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a crowd division method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a crowd division scheme according to an embodiment of the invention;
FIG. 3 is a flow diagram of a crowd division scheme according to an embodiment of the invention;
FIG. 4 is a schematic illustration of a predicted distribution curve according to an embodiment of the present invention;
FIG. 5 is a schematic illustration of a calibrated predicted distribution curve according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a crowd division apparatus according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method embodiment of a crowd division method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that herein.
Fig. 1 is a flowchart of a crowd division method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, receiving behavior data of a crowd, wherein the behavior data is data of the crowd operating the target information to be released;
step S104, inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training a plurality of groups of training data, and each group of training data comprises historical behavior data and corresponding scores;
and step S106, classifying the crowd according to the scores.
Through the steps, receiving behavior data of the crowd, wherein the behavior data is data of the crowd operating the target information to be released; inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training multiple groups of training data, and each group of training data comprises historical behavior data and corresponding scores; according to the mode of classifying the crowd according to the score, the crowd is classified according to the behavior data of the crowd and the evaluation model, the crowd is classified according to the score of the behavior data of the crowd, the aim of accurately classifying the crowd is fulfilled, the technical effect of improving the accuracy of the crowd classification is achieved, and the technical problem that the crowd classification according to the label in the related technology lacks certain standards and is difficult to reflect the actual situation of the crowd is solved.
The crowd may include a plurality of user accounts, the target information of the advertisement delivery may be advertisement information, and the behavior data may be operation information, such as exposure, click, collection, storage, and the like, after the user accounts of the crowd receive the advertisement information, and it may be determined whether the user accounts have a delivery value for the advertisement information through the behavior data, for example, the longer the exposure time of the advertisement information is, the more the exposure times are, the more the user accounts are interested in the advertisement information, and the more possible the user accounts are to purchase, thereby generating the benefit of advertisement delivery. For example, the higher the click rate, the more the advertisement information is collected or saved, which may indicate that the user account is interested in the advertisement, i.e. indicate that the user account has a higher value for placing the advertisement.
The evaluation model can be a machine learning model, and can include a machine learning network, a deep learning network, a convolutional neural network and the like, and can include an input layer, an intermediate layer and an output layer, wherein the number of the intermediate layers can be multiple, the machine learning model is trained through multiple sets of training data, and each set of training data includes historical behavior data and corresponding scores. The score of the historical behavior data may be determined by the value of the historical behavior data generated for the advertisement information, and the higher the generated value, the higher the score.
And inputting behavior data according to the evaluation model, and outputting scores to which the behaviors belong by the evaluation model, so that the behavior data of the crowd are scored, the value of the behavior data of the crowd on the advertisement information is further determined, and the crowd can be divided according to the value, so that different information delivery strategies can be carried out according to the crowds with different values. For example, the lower the value of the impression, the less frequent the impression and the shorter the impression time.
The crowd is classified according to the scores, compared with a mode of classifying the crowd according to the labels in the related art, the technical effect of the crowd classification accuracy is improved, and the technical problem that the actual situation of the crowd is difficult to reflect due to the fact that certain standards are lacked in the crowd classification according to the labels in the related art is solved.
Optionally, inputting the behavior data into the evaluation model, and before outputting the score of the behavior data by the evaluation model, the method includes: selecting a model algorithm according to the data characteristics of the historical behavior data to construct a basic model; establishing a plurality of groups of training data through historical behavior data and corresponding scores, and training a basic model; and optimizing the trained basic model and determining an evaluation model.
Specifically, in the above-mentioned scene of crowd classification of advertisement information delivery, a model algorithm is selected according to the data characteristics of historical behavior data, and it can be considered that the basic data characteristics are not high-dimensional, sparse and relatively dense, and are based on the inherent advantages of the algorithm: regularization; training parallelism of large-scale data; the XGboost algorithm is selected in the embodiment, and the AUC interval of the model is between 0.75 and 0.82 on different data sets.
The historical behavior data and the corresponding scores can be stored in a database, and the historical behavior data and the corresponding scores in the database are updated at any time along with use, so that the validity of the data is ensured. Selecting a preset number of historical behavior data and corresponding scores from the database, establishing a plurality of groups of training data, wherein each group of training data comprises the historical behavior data and the corresponding scores, training the basic model, optimizing the trained basic model, and determining the evaluation model.
Optionally, the trained basic model is optimized, and the determination of the evaluation model includes at least one of the following: optimizing and adjusting parameters of the trained basic model through a model parameter adjusting algorithm to determine an evaluation model; segmenting the training data, combining the training data into different training data, and training the evaluation model again to determine the evaluation model; and fusing a plurality of different models trained for a plurality of times to determine an evaluation model.
In the above scenario of classifying the population for delivering the advertisement information, the parameter of the trained basic model is optimized and adjusted through the model parameter tuning algorithm, which may be a method of performing multiple verifications by using GridSearchCV (a parameter tuning algorithm), and optimizing the model parameter: the optimal value of n _ estimators is 400, max _ depth is 10, min _ child _ weight is 5, colsample _ byte is 0.3, learning rate is 0.1, etc.
The training data is segmented and combined into different training data, the evaluation model is trained again, namely, the evaluation model is cross-verified, specifically, the data is repeatedly used, the obtained sample data is segmented and combined into different training sets and test sets, the training sets are used for training the model, and the test sets are used for evaluating the quality of model prediction.
Optionally, segmenting the training data, combining the segmented training data into different training data, and performing retraining on the evaluation model includes: segmenting the training data in a cross validation mode, combining the training data into different training data, and training the evaluation model again; wherein the cross-validation comprises at least one of: a simple cross verification method, an S-turn cross verification method and a left cross verification method.
Under the above-mentioned crowd classified scene of putting in advertisement information of this embodiment, select S-turn cross verification method, promote the canonicalization ability of evaluation model, find optimum model parameter.
Optionally, fusing the multiple different models trained multiple times includes at least one of: fusing a plurality of different models in a weighted average mode; fusing a plurality of different models in a weighted voting mode; a plurality of different models, which are a plurality of primary learners, are fused by a secondary learner.
In the above scenario of crowd classification for delivering advertisement information, a voting method is selected, which contributes about 2% to AUC of the evaluation model, so as to optimize the model parameters of the evaluation model.
Optionally, after the behavior data is input into the evaluation model and the evaluation model outputs the score of the behavior data, the method further includes: determining a predictive scoring curve for the behavioral data; and calibrating the prediction scoring curve according to the scoring curve of the historical behavior data, and determining the calibrated score of the behavior data.
The distribution of the predicted value has a larger relation with the proportion of the positive sample and the negative sample, and corresponding calibration is carried out according to the real distribution, so that the accuracy of the predicted value is improved, and the accuracy of crowd division is further improved.
Optionally, classifying the population according to the score includes: and determining that the crowd belongs to the crowd category corresponding to the grade in which the behavior data falls according to the grade and the preset grade, wherein the preset grade is multiple.
It should be noted that this embodiment also provides an alternative implementation, which is described in detail below.
The present embodiment provides a scoring based on real-time behavior of the population that better reflects this value. When people are scored in real time, the behavior characteristic data of people and other related data are mainly concerned, and a comprehensive model is constructed for scoring. The specific steps can be divided into data collection, data processing, score model construction, score model use and the like.
FIG. 2 is a schematic diagram of a crowd division scheme according to an embodiment of the invention; FIG. 3 is a flow diagram of a crowd division scheme according to an embodiment of the invention; as shown in fig. 2 and fig. 3, in the model selection of the present embodiment, it is considered that the basic data features are not high-dimensional and sparse and are relatively dense, and based on the inherent advantages of the algorithm: 1. and (4) regularizing. 2. And (4) training parallelism of large-scale data. 3. And the XGboost algorithm is selected according to the characteristics of flexibility, missing value processing and the like, and the AUC interval of the model is between 0.75 and 0.82 on different data sets.
The optimization model comprises the following steps:
1) model parameter optimization
And (3) adopting GridSearchCV for multiple verification, and optimizing model parameters: the optimal value of n _ estimators is 400, max _ depth is 10, min _ child _ weight is 5, colsample _ byte is 0.3, learning rate is 0.1, etc.
2) Cross validation
The cross validation is that data is repeatedly used, obtained sample data is segmented and combined into different training sets and test sets, the training sets are used for training the model, and the test sets are used for evaluating the quality of model prediction, so that a plurality of groups of different training sets and test sets can be obtained on the basis, and a simple cross method, an S-fold cross validation method and a leave-one cross validation method are generally adopted. In the method, an S-fold cross-validation method is used for improving the model normalization capability and finding the optimal model parameters.
3) Model fusion
By fusing a plurality of different models, the performance of machine learning can be improved. Common model fusion methods are:
1, averaging method: the average method includes a general evaluation and a weighted average. For the averaging method, the method is generally used in a regression prediction model, and in the Boosting series fusion model, weighted average fusion is generally adopted.
2, voting method: there are absolute majority votes (more than half votes), relative majority votes (most votes), weighted votes. The method is generally used for classification models and is used in bagging models.
3, learning method: a more powerful combination strategy is to use "learning", i.e. combining by another learner, the individual learner being referred to as the primary learner and the learner used for combining being referred to as the secondary learner or meta-learner.
Here we chose the voting method, which contributed about 2% to the AUC of the model.
4) Calibration
And carrying out corresponding calibration according to the real data distribution and the predicted data distribution.
Fig. 4 is a schematic diagram of a predicted distribution curve according to an embodiment of the present invention, where fig. 4 shows a predicted value distribution of two data sets, where the predicted value distribution has a larger relationship with the ratio of positive and negative samples, and a corresponding calibration is performed according to the actual distribution, as shown in fig. 5, and fig. 5 is a schematic diagram of a calibrated predicted distribution curve according to an embodiment of the present invention.
In this embodiment, the model application includes:
regularly generating labels for large-scale data to construct user portrait service, and dividing users in different levels: high-value users, medium-value risks, low-value users and non-value users, and different corresponding measures can be adopted subsequently according to different value levels.
The method and the device are beneficial to the measurement of the user value quantification and the subsequent different user operation.
Fig. 6 is a schematic diagram of a crowd division apparatus according to an embodiment of the present invention, and as shown in fig. 6, according to another aspect of the embodiment of the present invention, there is further provided a crowd division apparatus including: a receiving module 62, a scoring module 64, and a classification module 66, which are described in detail below.
The receiving module 62 is configured to receive behavior data of a crowd, where the behavior data is data of the crowd operating on delivered target information; a scoring module 64 connected to the receiving module 62, configured to input the behavior data into an evaluation model, and output a score of the behavior data by the evaluation model, where the evaluation model is a machine learning model, the evaluation model is obtained by training multiple sets of training data, and each set of training data includes historical behavior data and a corresponding score; a classification module 66, connected to the scoring module 64, for classifying the population according to the score.
By the device, the behavior data of the crowd is received by the receiving module 62, wherein the behavior data is data for operating the released target information by the crowd; the scoring module 64 inputs the behavior data into an evaluation model, and the evaluation model outputs scores of the behavior data, wherein the evaluation model is a machine learning model and is obtained by training a plurality of groups of training data, and each group of training data comprises historical behavior data and corresponding scores; the classification module 66 classifies the crowd according to the rating mode, scores the crowd through the evaluation model according to the behavior data of the crowd, and classifies the crowd according to the rating of the behavior data of the crowd, so that the purpose of accurately classifying the crowd is achieved, the technical effect of improving the accuracy of the crowd classification is achieved, and the technical problem that the crowd classification according to the label in the related art lacks certain standards and is difficult to reflect the actual situation of the crowd is solved.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and where the program is executed to control a device in which the storage medium is located to perform any one of the crowd division methods described above.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the crowd division method according to any one of the above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of crowd division, comprising:
receiving behavior data of a crowd, wherein the behavior data is data of the crowd operating target information to be put;
inputting the behavior data into an evaluation model, and outputting scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training multiple groups of training data, and each group of training data comprises historical behavior data and corresponding scores;
classifying the population according to the score.
2. The method of claim 1, wherein inputting the behavior data into the assessment model, prior to outputting the score for the behavior data by the assessment model, comprises:
selecting a model algorithm according to the data characteristics of the historical behavior data to construct a basic model;
establishing a plurality of groups of training data through the historical behavior data and the corresponding scores, and training the basic model;
and optimizing the trained basic model, and determining the evaluation model.
3. The method of claim 2, wherein the trained base model is optimized and determining the evaluation model comprises at least one of:
optimizing and adjusting the parameters of the trained basic model through a model parameter adjusting algorithm to determine the evaluation model;
segmenting the training data, combining the training data into different training data, and training the evaluation model again to determine the evaluation model;
and fusing a plurality of different models trained for a plurality of times to determine the evaluation model.
4. The method of claim 3, wherein the training data is segmented and combined into different training data, and wherein retraining the evaluation model comprises:
segmenting the training data in a cross validation mode, combining the training data into different training data, and training the evaluation model again;
wherein the cross-validation comprises at least one of: a simple cross verification method, an S-turn cross verification method and a left cross verification method.
5. The method of claim 3, wherein fusing the plurality of different models trained a plurality of times comprises at least one of:
fusing a plurality of different models in a weighted average mode;
fusing a plurality of different models in a weighted voting mode;
a plurality of different said models, being a plurality of primary learners, are fused by a secondary learner.
6. The method of claim 1, wherein inputting the behavior data into the assessment model, after outputting the score for the behavior data by the assessment model, further comprises:
determining a predictive scoring curve for the behavioral data;
and calibrating the prediction scoring curve according to the scoring curve of the historical behavior data, and determining the calibrated score of the behavior data.
7. The method of claim 1, wherein classifying the population according to the score comprises:
and determining that the crowd belongs to the crowd category corresponding to the grading level in which the behavior data falls according to the grading and a preset grading level, wherein the preset grading level is multiple.
8. A crowd-sourcing device, comprising:
the system comprises a receiving module, a display module and a display module, wherein the receiving module is used for receiving behavior data of a crowd, and the behavior data is data of the crowd operating target information to be released;
the evaluation module is used for inputting the behavior data into an evaluation model and outputting the scores of the behavior data by the evaluation model, wherein the evaluation model is a machine learning model, the evaluation model is obtained by training a plurality of groups of training data, and each group of training data comprises historical behavior data and corresponding scores;
and the classification module is used for classifying the crowd according to the scores.
9. A storage medium comprising a stored program, wherein the program, when executed, controls a device on which the storage medium is located to perform the crowd division method according to any one of claims 1 to 7.
10. A processor configured to run a program, wherein the program is configured to perform the crowd division method according to any one of claims 1 to 7 when running.
CN202010383874.5A 2020-05-08 2020-05-08 Crowd division method and device Pending CN111563775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010383874.5A CN111563775A (en) 2020-05-08 2020-05-08 Crowd division method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010383874.5A CN111563775A (en) 2020-05-08 2020-05-08 Crowd division method and device

Publications (1)

Publication Number Publication Date
CN111563775A true CN111563775A (en) 2020-08-21

Family

ID=72073343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010383874.5A Pending CN111563775A (en) 2020-05-08 2020-05-08 Crowd division method and device

Country Status (1)

Country Link
CN (1) CN111563775A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781201A (en) * 2021-08-19 2021-12-10 支付宝(杭州)信息技术有限公司 Risk assessment method and device for electronic financial activity

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300156A1 (en) * 2015-04-10 2016-10-13 Facebook, Inc. Machine learning model tracking platform
CN106412644A (en) * 2016-09-19 2017-02-15 北京喂呦科技有限公司 Accurate advertising method and system based on smart TV player
CN108230009A (en) * 2017-11-30 2018-06-29 北京三快在线科技有限公司 The Forecasting Methodology and device of a kind of user preference, electronic equipment
CN108416669A (en) * 2018-03-13 2018-08-17 腾讯科技(深圳)有限公司 User behavior data processing method, device, electronic equipment and computer-readable medium
CN108830645A (en) * 2018-05-31 2018-11-16 厦门快商通信息技术有限公司 A kind of visitor's attrition prediction method and system
US20190197361A1 (en) * 2017-12-27 2019-06-27 Marlabs Innovations Private Limited System and method for predicting and scoring a data model
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN110490625A (en) * 2018-05-11 2019-11-22 北京京东尚科信息技术有限公司 User preference determines method and device, electronic equipment, storage medium
WO2020022639A1 (en) * 2018-07-18 2020-01-30 한국과학기술정보연구원 Deep learning-based evaluation method and apparatus
CN110991875A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Platform user quality evaluation system
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300156A1 (en) * 2015-04-10 2016-10-13 Facebook, Inc. Machine learning model tracking platform
CN106412644A (en) * 2016-09-19 2017-02-15 北京喂呦科技有限公司 Accurate advertising method and system based on smart TV player
CN108230009A (en) * 2017-11-30 2018-06-29 北京三快在线科技有限公司 The Forecasting Methodology and device of a kind of user preference, electronic equipment
US20190197361A1 (en) * 2017-12-27 2019-06-27 Marlabs Innovations Private Limited System and method for predicting and scoring a data model
CN108416669A (en) * 2018-03-13 2018-08-17 腾讯科技(深圳)有限公司 User behavior data processing method, device, electronic equipment and computer-readable medium
CN110490625A (en) * 2018-05-11 2019-11-22 北京京东尚科信息技术有限公司 User preference determines method and device, electronic equipment, storage medium
CN108830645A (en) * 2018-05-31 2018-11-16 厦门快商通信息技术有限公司 A kind of visitor's attrition prediction method and system
WO2020022639A1 (en) * 2018-07-18 2020-01-30 한국과학기술정보연구원 Deep learning-based evaluation method and apparatus
CN110222762A (en) * 2019-06-04 2019-09-10 恒安嘉新(北京)科技股份公司 Object prediction method, apparatus, equipment and medium
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment
CN110991875A (en) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 Platform user quality evaluation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘书敏;颜娜;谢瑾奎;: "基于用户相似度和特征分化的广告点击率预测研究", 计算机科学, no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781201A (en) * 2021-08-19 2021-12-10 支付宝(杭州)信息技术有限公司 Risk assessment method and device for electronic financial activity
CN113781201B (en) * 2021-08-19 2023-02-03 支付宝(杭州)信息技术有限公司 Risk assessment method and device for electronic financial activity

Similar Documents

Publication Publication Date Title
CN109447364B (en) Label-based electric power customer complaint prediction method
CN110163647B (en) Data processing method and device
CN102708131B (en) By consumer's automatic classification in fine point
CN111967971B (en) Bank customer data processing method and device
CN111797320B (en) Data processing method, device, equipment and storage medium
CN105786711A (en) Data analysis method and device
CN111882420A (en) Generation method of response rate, marketing method, model training method and device
CN111986027A (en) Abnormal transaction processing method and device based on artificial intelligence
CN106127333A (en) Movie attendance Forecasting Methodology and system
US20230410020A1 (en) Systems and methods for real-time lead grading
CN118552253A (en) Model training method, click rate prediction method, device, medium and product
CN111444930B (en) Method and device for determining prediction effect of two-classification model
CN118505457A (en) Learning early warning method based on improved random forest algorithm
CN107644268B (en) Open source software project incubation state prediction method based on multiple features
CN111563775A (en) Crowd division method and device
US20200193486A1 (en) System and method for determining bid vector transformed predictive click-through rate
WO2020162833A1 (en) Method and system for generating content data
Mujtaba et al. Multi-objective optimization of item selection in computerized adaptive testing
Daneshmandi et al. A hybrid data mining model to improve customer response modeling in direct marketing
CN116912016A (en) Bill auditing method and device
CN111861674B (en) Product recommendation method and system
CN112365302B (en) Product recommendation network training method, device, equipment and medium
CN104636489B (en) The treating method and apparatus of attribute data is described
CN112581177A (en) Marketing prediction method combining automatic feature engineering and residual error neural network
CN112200602A (en) Neural network model training method and device for advertisement recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination