CN108280542B

CN108280542B - User portrait model optimization method, medium and equipment

Info

Publication number: CN108280542B
Application number: CN201810035915.4A
Authority: CN
Inventors: 宋国庆; 罗伟东
Original assignee: Shenzhen Hexun Huagu Information Technology Co ltd
Current assignee: Shenzhen Hexun Huagu Information Technology Co ltd
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2021-05-11
Anticipated expiration: 2038-01-15
Also published as: CN108280542A

Abstract

The invention provides a method, medium and equipment for optimizing a user portrait model. The method comprises the following steps: acquiring user behavior data; obtaining a first prediction result based on a pre-established first prediction model according to the behavior data; and training and optimizing a second prediction model according to the first prediction result and the behavior data. The method comprises the steps of obtaining a first prediction result according to a first prediction model by obtaining user behavior data, training and optimizing a second prediction model according to the first prediction result and the behavior data, and improving the prediction accuracy of the second prediction model by adding an input sample data type of the training and optimizing model to the second prediction model; meanwhile, the second prediction model is trained and optimized by utilizing the prediction result of the first prediction model, and when the first prediction model is changed, automatic training and optimization of the second prediction model can be realized, so that time can be saved, and cost can be reduced.

Description

User portrait model optimization method, medium and equipment

Technical Field

The invention relates to the field of big data machine learning, in particular to an optimization method, medium and equipment of a user portrait model.

Background

Under the background of big data, advertisements recommended by purposive and classified products need to be pushed according to labels of behaviors, sexes, ages and the like of users, so that the purposes of user portrait subdivision and accurate marketing are achieved. With the continuous upgrading of internet technology, when a user uses user data to establish a user portrait, the user receives various information at any moment, the selection made at each moment is different, and the user is difficult to analyze and predict behaviors by using a fixed model. Therefore, the real-time performance is lost, the phenomena of message lag and inaccurate prediction result are caused, the marketing effect is poor, and the advertisement promotion conversion rate is low. If the model is manually modified and updated each time, the method is huge in time cost and labor cost and is not good in effect.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an optimization method, medium and equipment of a user portrait model, which can improve the accuracy of a prediction model, save the model optimization time and reduce the cost.

In a first aspect, the present invention provides a method for optimizing a user portrait model, including:

acquiring user behavior data;

obtaining a first prediction result based on a pre-established first prediction model according to the behavior data;

and training and optimizing a second prediction model according to the first prediction result and the behavior data.

Optionally, before the step of obtaining a first prediction result based on a first pre-established prediction model according to the first behavior data, the method further includes:

acquiring sample data;

classifying the sample data;

performing data cleaning on the classified sample data to obtain characteristic sample data of the sample data;

and training a first prediction model according to the characteristic sample data.

Optionally, the classifying the sample data includes:

and classifying the sample data according to the behavior mode.

Optionally, before the step of training the first prediction model according to the feature sample data, the method further includes:

combining the characteristic sample data to obtain new characteristic sample data;

training a first prediction model according to the feature sample data, including:

and training a first prediction model according to the new characteristic sample data.

Optionally, after the step of combining the feature sample data to obtain new feature sample data, the method further includes:

carrying out normalization processing on the new characteristic sample data;

and training a first prediction model according to the new characteristic sample data after normalization processing.

Optionally, after the step of training the first prediction model, the method further includes:

acquiring test data;

calculating an accuracy score of the first predictive model from the test data;

judging whether the accuracy score is larger than a preset accuracy threshold value or not;

if so, executing the step of obtaining a first prediction result based on a pre-established first prediction model according to the behavior data;

if not, the step of training the first prediction model according to the characteristic sample data is executed again.

Optionally, before the step of training and optimizing the second prediction model according to the first prediction result and the behavior data, the method further includes:

obtaining a confidence level of the first prediction result based on the first prediction model;

judging whether the confidence coefficient is smaller than a corresponding first threshold value; if the characteristic sample data is less than the preset characteristic sample data, the step of training the first prediction model according to the characteristic sample data is executed again;

if not, outputting the first prediction result; judging whether the confidence coefficient is larger than a corresponding second threshold value;

if so, judging that the first prediction result can be used as a training feature; if not, judging that the first prediction result can not be used as a training feature;

training and optimizing a second prediction model according to the first prediction result and the behavior data, wherein the training and optimizing the second prediction model comprises the following steps:

and training and optimizing a second prediction model according to the first prediction result and the behavior data, wherein the confidence coefficient is larger than the second threshold value.

In a second aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of optimizing a user representation model as described above.

In a third aspect, the present invention provides an apparatus for optimizing a user profile model, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of optimizing a user representation model as described above when executing the program.

The invention provides an optimization method of a user portrait model, which comprises the following steps: acquiring user behavior data; obtaining a first prediction result based on a pre-established first prediction model according to the behavior data; and training and optimizing a second prediction model according to the first prediction result and the behavior data. The method comprises the steps of obtaining a first prediction result according to a first prediction model by obtaining user behavior data, training and optimizing a second prediction model according to the first prediction result and the behavior data, and improving the prediction accuracy of the second prediction model by adding an input sample data type of the training and optimizing model to the second prediction model; meanwhile, the second prediction model is trained and optimized by utilizing the prediction result of the first prediction model, and when the first prediction model is changed, automatic training and optimization of the second prediction model can be realized, so that time can be saved, and cost can be reduced.

The invention provides a computer readable storage medium and a user portrait model optimization device, which have the same beneficial effects with the user portrait model optimization method based on the same inventive concept.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of a method for optimizing a user profile model according to the present invention;

FIG. 2 is a schematic structural diagram of an optimizing apparatus for a user portrait model according to the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

The invention provides a method, medium and equipment for optimizing a user portrait model. Embodiments of the present invention will be described below with reference to the drawings.

The first embodiment:

referring to fig. 1, fig. 1 is a schematic diagram of an optimization method of a user portrait model according to an embodiment of the present invention, where the optimization method of the user portrait model according to the embodiment includes:

step S101: and acquiring user behavior data.

Step S102: and obtaining a first prediction result based on a first pre-established prediction model according to the behavior data.

Step S103: and training and optimizing a second prediction model according to the first prediction result and the behavior data.

The method comprises the steps of obtaining a first prediction result according to a first prediction model by obtaining user behavior data, training and optimizing a second prediction model according to the first prediction result and the behavior data, and improving the prediction accuracy of the second prediction model by adding an input sample data type of the training and optimizing model to the second prediction model; meanwhile, the second prediction model is trained and optimized by utilizing the prediction result of the first prediction model, and when the first prediction model is changed, automatic training and optimization of the second prediction model can be realized, so that time can be saved, and cost can be reduced.

In the invention, the first prediction model can be a plurality of prediction models or one prediction model; the second predictive model may be a plurality of predictive models or may be one predictive model. The number of the first prediction model and the second prediction model is not limited herein and is within the scope of the present invention.

For example, the first prediction results of the plurality of first prediction models may be used as training optimization data for the plurality of second prediction models.

In a specific embodiment of the present invention, before the step of obtaining a first prediction result based on a first pre-established prediction model according to the first behavior data, the method further includes: acquiring sample data; classifying the sample data; performing data cleaning on the classified sample data to obtain characteristic sample data of the sample data; and training a first prediction model according to the characteristic sample data.

Before the first prediction result is obtained by using the first prediction model, the method further includes: a first predictive model is trained.

The process of training the first predictive model is as follows:

the method comprises the following steps of firstly, obtaining sample data, wherein the sample data comprises input sample data and output sample data. Wherein the sample data may be user behavior data. E.g. what APPs are installed, which APPs are turned on, which places to go, the house price of residential cells, etc.

Secondly, classifying the sample data; during classification, sample data can be classified according to the behavior pattern. For example, the sample data may be categorized according to online and offline behavior, with users purchasing goods online being categorized into one category and purchasing goods offline being categorized into another category.

And thirdly, performing data cleaning on the classified sample data to obtain characteristic sample data. And cleaning the data, including deleting null values, deleting error data and the like.

And fourthly, training the first prediction model according to the characteristic sample data. When the prediction model is trained, a machine learning algorithm can be used, and a deep learning algorithm, a random forest algorithm and the like can also be used, which are all suitable for the invention.

When the random forest algorithm is used for training the model, parameter searching can be performed according to data dimensionality, and a plurality of models with different parameters can be trained simultaneously.

By classifying the sample data, unnecessary sample data can be avoided from being trained, and the calculation amount is saved. By cleaning the sample data, the input of error data, null data and the like can be avoided, and the accuracy of the prediction model can be improved.

In an embodiment of the present invention, before the step of training the first prediction model according to the feature sample data, the method may further include: combining the characteristic sample data to obtain new characteristic sample data; training a first prediction model according to the feature sample data, including: and training a first prediction model according to the new characteristic sample data.

After the step of obtaining the feature sample data and before the step of training the first prediction model according to the feature sample data, the method may further include: and combining the characteristic sample data to obtain new characteristic sample data.

When the feature sample data is combined, the relevance among the features can be analyzed, various features are combined, multi-layer linear transformation is carried out, and the feature space is rotated, so that more valuable features can be obtained. For example, we have a lot of APPs, which may all be some kind of APPs, such as ofo, mobarae, small blue bicycle, etc., which are all shared bicycles, though different APPs, but represent the same meaning, all installed by those who have riding requirements; the invention can combine and summarize the characteristics of installation ofo, Mobai, small blue bicycle and the like to obtain a new characteristic of 'talent installation with riding requirement'. Similarly, the invention can also be applied to APP of financial class, which is in the protection scope of the invention.

By combining the feature sample data, more new features can be obtained, and the prediction model is obtained by training the new feature sets, so that the accuracy and the real-time performance of the prediction model can be improved, the time and the labor cost can be obviously reduced, correct guidance is provided for operation and advertisement putting, and the operation and advertisement putting effect is greatly improved.

In a specific embodiment provided by the present invention, after the step of combining the feature sample data to obtain new feature sample data, the method may further include: carrying out normalization processing on the new characteristic sample data; training a first prediction model according to the feature sample data, including: and training a first prediction model according to the new characteristic sample data after normalization processing.

After combining the feature sample data to obtain new feature sample data, there may be a case where the span of the value range between the same features is large. By normalizing the features, namely performing maximum and minimum normalization on each feature: (x-min (x)/(max (x) — min (x)) so that the characteristic values of the columns can be scaled to 0 to 1, thereby increasing the calculation speed.

For example, there are two features: A. the user has money, the house price of a residence is 150000 yuan/square meter, B, another user rents a house, the house rents only 200 yuan/month, the difference value between the two values is too large, the convergence speed of a prediction model is extremely low in the subsequent calculation, the working efficiency is influenced, and the calculation speed can be improved after normalization processing is carried out on all the characteristics.

In an embodiment of the present invention, after the step of training the first prediction model, the method may further include: acquiring test data; calculating an accuracy score of the first predictive model from the test data; judging whether the accuracy score is larger than a preset accuracy threshold value or not; if so, executing the step of obtaining a first prediction result based on a pre-established first prediction model according to the behavior data; if not, the step of training the first prediction model according to the characteristic sample data is executed again.

When the first prediction model is a binary model, when the first prediction model is tested by using test data, the prediction result of each binary model is compared with the real data in the test data, and the quality of the model is evaluated by calculating F1 score: f1 ═ 2TP/(2TP + FP + FN). Higher F1 values indicate better model and higher accuracy. Wherein the F1 value is the accuracy score of the binary model. TP is that the real sample is a positive sample, and the prediction result is the number of the positive samples; FP is that the real sample is the negative sample, the prediction result is the number of the positive sample; FN is that the real sample is a negative sample, and the prediction result is the number of the negative samples.

When the first prediction model is a multi-classification model, the accuracy score is an accuracy rate. The accuracy of the stochastic test of the predictive model can be used as the accuracy score of the multi-classification model.

The present invention is applicable to any prediction model, such as an incorporatability prediction model, a gender prediction model, an age prediction model, etc., but the accuracy threshold of each model is different, which is set according to the specific model.

When the calculated accuracy score is greater than a preset accuracy threshold, the first prediction model can be used for inputting behavior data into the first prediction model to obtain a first prediction result; and when the calculated accuracy score is not greater than the preset accuracy threshold, the first prediction model is indicated to be unavailable, and the first prediction model needs to be trained again according to the characteristic sample data until the calculated accuracy score is greater than the accuracy threshold.

In the invention, when the first prediction model is predicted by using the test data, the first prediction model can be predicted in a sampling mode, so that the accuracy of the accuracy score can be ensured.

By calculating the accuracy value of the first prediction model and judging the quality of the prediction model according to the accuracy value, a more accurate prediction result can be obtained when the prediction model is used.

In a specific embodiment of the present invention, before the step of training and optimizing the second prediction model according to the first prediction result and the behavior data, the method further includes: obtaining a confidence level of the first prediction result based on the first prediction model; judging whether the confidence coefficient is smaller than a corresponding first threshold value; if the characteristic sample data is less than the preset characteristic sample data, the step of training the first prediction model according to the characteristic sample data is executed again; if not, outputting the first prediction result; judging whether the confidence coefficient is larger than a corresponding second threshold value; if so, judging that the first prediction result can be used as a training feature; if not, judging that the first prediction result can not be used as a training feature; training and optimizing a second prediction model according to the first prediction result and the behavior data, wherein the training and optimizing the second prediction model comprises the following steps: and training and optimizing a second prediction model according to the first prediction result and the behavior data, wherein the confidence coefficient is larger than the second threshold value.

After the first prediction result is obtained, the second prediction model can be trained and optimized by using the first prediction result and the behavior data of the user, so that the second prediction model can be trained and optimized by adopting more characteristics, and the real-time performance and the accuracy of the second prediction model can be improved.

Before training and optimizing the second prediction model, the method further comprises the following steps: based on the first prediction model, a confidence level of the first prediction result is obtained. When each prediction result is output, the confidence of the prediction result can be output. Confidence may be used to indicate the accuracy of each prediction, with higher confidence indicating more accurate prediction.

When the confidence coefficient is smaller than a first threshold value, the prediction result is unreliable, and the first prediction model needs to be retrained; if not, the result can be used, and guidance can be provided for advertisement promotion; although the prediction result may be used, it is not necessary that the prediction result may be used as training sample data or optimization sample data of the second prediction model. And when the confidence of the prediction result is greater than a second threshold, judging that the prediction result can be used as a training feature, and when the confidence of the prediction result is not greater than the second threshold, indicating that the prediction result cannot be used as the training feature.

If the prediction result can be used as a training feature of the second prediction model, then the second prediction model can be trained and optimized according to the prediction result and the behavior data. In this way, by ensuring the accuracy of the training data, the accuracy of the second prediction model can be improved.

For example, the collected user behavior data does not have the characteristic of gender, but the first prediction model can be used for predicting gender characteristics, the accuracy of the prediction result of the first prediction model can be ensured through the verification of the first prediction model, then, the gender prediction result of the first prediction model can be used for training and optimizing the second prediction model, the interest and preference characteristics can be predicted by the second prediction model, and the gender characteristics are added for training and optimizing the second prediction model, so that the accuracy of the interest and preference characteristic prediction can be improved.

By the method, a plurality of abundant characteristics can be obtained, and along with the improvement of the accuracy of a certain model, the accuracy of other models can be improved, so that other models can be optimized in real time.

The above is a method for optimizing a user portrait model according to the present invention.

Second embodiment:

in the first embodiment, an optimization method for a user representation model is provided, and in combination with the first embodiment, a second embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the optimization method for a user representation model provided by the first embodiment.

The third embodiment:

in combination with the first embodiment, the present invention provides a method for optimizing a user portrait model, and further provides an apparatus for optimizing a user portrait model, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for optimizing a user representation model as provided by the first embodiment described above when executing the program. FIG. 2 is a schematic diagram illustrating a hardware structure of an optimization apparatus for a user representation model according to an embodiment of the present invention.

Specifically, the processor 201 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing the embodiments of the present invention.

Memory 202 may include mass storage for data or instructions. By way of example, and not limitation, memory 202 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 202 may include removable or non-removable (or fixed) media, where appropriate. The memory 202 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 202 is a non-volatile solid-state memory. In a particular embodiment, the memory 202 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The processor 201 may implement any of the above described embodiments of the method for optimizing a user representation model by reading and executing computer program instructions stored in the memory 202.

In one example, the user representation model optimization device may also include a communication interface 203 and a bus 210. As shown in fig. 2, the processor 201, the memory 202, and the communication interface 203 are connected via a bus 210 to complete communication therebetween.

The communication interface 203 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present invention.

Bus 210 includes hardware, software, or both to couple the components of the optimization device of the user representation model to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 210 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A method for optimizing a user portrait model, comprising:

acquiring user behavior data;

training and optimizing a second prediction model according to the first prediction result and the behavior data;

before the step of obtaining a first prediction result based on a first pre-established prediction model according to the behavior data, the method further includes:

acquiring sample data;

classifying the sample data;

training a first prediction model according to the characteristic sample data;

before the step of training the first prediction model according to the feature sample data, the method further comprises:

training a first prediction model according to the new characteristic sample data;

when the feature sample data is combined, the relevance among the features is analyzed, various features are combined, multi-layer linear transformation is carried out, and the feature space is rotated;

after the step of combining the feature sample data to obtain new feature sample data, the method further includes: carrying out normalization processing on the new characteristic sample data; training a first prediction model according to the feature sample data, including: training a first prediction model according to the new characteristic sample data after normalization processing;

after the step of training the first prediction model, the method further comprises:

acquiring test data;

calculating an accuracy score of the first predictive model from the test data;

if not, re-executing the step of training the first prediction model according to the characteristic sample data;

before the step of training and optimizing a second prediction model according to the first prediction result and the behavior data, the method further comprises the following steps:

2. The method of claim 1, wherein said classifying said sample data comprises:

and classifying the sample data according to the behavior mode.

3. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of one of claims 1-2.

4. An apparatus for optimizing a user representation model, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of one of claims 1-2 when executing the program.