CN114693321A

CN114693321A - Image label prediction method, device, equipment and storage medium

Info

Publication number: CN114693321A
Application number: CN202011566723.XA
Authority: CN
Inventors: 张孟旭; 刘启明
Original assignee: Beijing Qianli Richeng Technology Co ltd
Current assignee: Beijing Qianli Richeng Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-07-01

Abstract

The invention provides a portrait label prediction method, a portrait label prediction device, portrait label prediction equipment and a storage medium, wherein advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted are obtained, target characteristics in the advertisement monitoring data are extracted and converted based on spark ML characteristic engineering, and a test set is obtained; and respectively inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are trained and constructed in advance based on different user attributes and a random forest model for prediction, and combining the obtained first user attribute prediction result and the second user attribute prediction result to be used as an portrait label of the equipment to be predicted. In the invention, a user attribute prediction model which can be used for predicting mass data is obtained by training based on user attributes and a random forest model in advance, and portrait label prediction is completed by combining sparkML characteristic engineering, so that the aim of obtaining accurate portrait labels while processing mass data is fulfilled.

Description

Image label prediction method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of data processing, in particular to a portrait label prediction method, a portrait label prediction device, portrait label prediction equipment and a storage medium.

Background

At present, under a scene of analyzing a user, a portrait label needs to be predicted for the user, so that the portrait label is utilized to accurately complete the analysis of the user.

In the prior art, the portrait label is predicted based on a machine learning mode, and the portrait label is mainly predicted by adopting two modes, namely a sciit-leran single-machine learning package and a classification or clustering model provided by Hadoop Mahout. In practical application, when advertisement targeting delivery is carried out according to portrait tags, massive data needs to be faced, but the scimit-leann single-machine version learning package cannot meet the requirement of large-data operation processing, and a classification or clustering model provided by Hadoop Mahout can process massive data, but the algorithm is not rich enough, so that the performance is relatively low when massive data is processed, and the portrait tags cannot be accurately predicted.

Therefore, when the image label is predicted by the prior art method, the problem that the processing of mass data cannot be satisfied or the prediction of the image label is not accurate exists.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a method, an apparatus, a device and a storage medium for predicting an image tag, which can satisfy the requirement of processing a large amount of data and accurately predict an image tag when the image tag is predicted by using the technical solution of the present invention.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

in a first aspect, a portrait label prediction method includes: acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted, wherein the advertisement monitoring data at least comprises equipment data; extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set; inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model; and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

In a second aspect, a portrait tag prediction apparatus, the apparatus comprising: the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted, and the advertisement monitoring data at least comprises equipment data; the characteristic processing module is used for extracting target characteristics in the advertisement monitoring data based on spark ML characteristic engineering and converting the target characteristics based on a preset format to obtain a test set; the prediction module is used for inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model; and the merging module is used for merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

In a third aspect, a storage medium includes a stored program, where the program is executed to control a device in which the storage medium is located to execute the above-mentioned portrait label prediction method.

In a fourth aspect, an electronic device includes at least one processor, and at least one memory, a bus, connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke a program in the memory, the program at least being configured to implement the portrait label prediction method described above.

Based on the portrait label prediction method, the portrait label prediction device, the portrait label prediction equipment and the storage medium, advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted are obtained, wherein the advertisement monitoring data at least comprise equipment data; extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set; inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model; and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted. In the invention, a user attribute prediction model which can predict mass data is obtained by training based on user attributes and a random forest model in advance, the advertisement monitoring data of the device to be predicted is subjected to feature extraction by utilizing spark ML feature engineering, the extracted features are converted into a test set, and the test set is predicted by utilizing the user attribute prediction model, so that the aim of obtaining an accurate portrait label while processing the mass data is fulfilled.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for image tag prediction according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a process of constructing a portrait label prediction model according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating the addition of data to a predictive model according to an embodiment of the present invention;

FIG. 4 is a block diagram of an image tag prediction apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It can be known from the background art that when advertisement targeting delivery is performed according to portrait tags, massive data needs to be faced, but the adoption of scimit-leann single-machine version machine learning package cannot meet the requirement of large data operation processing, while the adoption of a classification or clustering model provided by Hadoop Mahout can process massive data, but because the algorithm is not rich enough, when massive data is processed, the performance is relatively low, and the portrait tags cannot be accurately predicted.

Therefore, when a portrait label is predicted for a user, a user attribute prediction model meeting the requirement of predicting mass data is obtained by training based on user attributes and a random forest model in advance, and portrait label prediction is completed by combining with spark ML feature engineering, so that the aim of obtaining an accurate portrait label while meeting the requirement of mass data processing is fulfilled.

Referring to fig. 1, a flowchart of a portrait tag prediction method according to an embodiment of the present invention is shown, where the portrait tag prediction method includes the following steps:

step S101: and acquiring advertisement monitoring data obtained by monitoring advertisements of the equipment to be predicted.

In step S101, the advertisement monitoring data at least includes device data, and may further include advertisement exposure data and user behavior data of clicking on an advertisement. In the embodiment of the present invention, the advertisement exposure data and the user behavior data of clicking on the advertisement refer to the number of times or frequency of the user browsing and clicking on the advertisement. Device data includes, but is not limited to, device ID, device brand, and device type.

In step S101, the device to be predicted is a plurality of devices, terminals or terminal systems related to the advertisement delivery. By monitoring advertisements on the various devices, terminals or terminal systems, advertisement monitoring data can be obtained.

In the specific implementation process of step S101, advertisement monitoring data obtained when the advertisement monitoring system monitors advertisements for the device to be predicted in real time is received.

Step S102: and extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set.

In step S102, the SparkML feature engineering is an important component in the SparkMLlib machine learning process, and feature processing is performed on the advertisement monitoring data through the SparkML feature engineering, so that the processing of the mass data can be satisfied, and meanwhile, the operation performance of processing the mass data can be improved.

The target characteristics in the advertisement monitoring data may be set by a technician, including but not limited to, the device ID of the device to be predicted, the device brand, the device type, and characteristic information such as the time and geographic location of generating the advertisement monitoring data.

In the specific implementation process of step S102, the target features to be extracted are determined, the SparkML feature engineering is used to extract the target features from the advertisement monitoring data, and the extracted target features are converted into data that can be subsequently predicted. In general, a technician may predetermine the feature conversion method to be used according to the requirements of subsequent tests, so as to obtain the features of the prediction format, and gather the converted target features to obtain the test set.

Optionally, in the process of specifically implementing the target feature conversion, the target feature may be converted into a corresponding number and collected by pre-constructing a corresponding relationship between the target feature and the number, so as to obtain a test set.

As shown in table 1, the correspondence between the target feature and the number is shown.

Table 1:

taking table 1 above as an example, specific numbers can be set by the skilled person.

Step S103: and inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result.

In step S103, the first user attribute prediction model performs random forest model training and construction by using first user attributes, and the second user attribute prediction model performs random forest model training and construction by using second user attributes. The first user attribute and the second user attribute are used to indicate different user attributes. The user attributes include, but are not limited to, user age, user gender, user scholarship, user nationality, and the like.

In the specific implementation process of step S103, the first user attribute prediction model constructed based on the first user attribute and the random forest model provided by sparkmllb machine learning predicts the input test set to obtain a first user attribute prediction result.

Similarly, the input test set is predicted by a second user attribute prediction model constructed based on the second user attribute and a random forest model provided by spark MLlib machine learning, and a second user attribute prediction result is obtained.

It should be noted that, if the first user attribute is the user gender, the first user attribute prediction model is constructed by using the user gender and the two-classification random forest model provided by sparkmllb machine learning. And if the second user attribute can be the user age, constructing a second user attribute prediction model by using the user age and a multi-classification random forest model provided by spark MLlib machine learning.

Step S104: and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

In the specific implementation process of step S104, the first user attribute prediction result and the second user attribute prediction result are aggregated, and the obtained aggregation is used as an image label of the device to be predicted.

According to the portrait label prediction method provided by the embodiment of the invention, advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted are obtained, target features in the advertisement monitoring data are extracted and converted based on spark ML feature engineering, and a test set is obtained; and respectively inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are trained and constructed in advance based on different user attributes and a random forest model for prediction, and combining the obtained first user attribute prediction result and the second user attribute prediction result to be used as an portrait label of the equipment to be predicted. In the embodiment of the invention, a user attribute prediction model which can be used for predicting mass data is obtained by training in advance based on user attributes and a random forest model, the advertisement monitoring data of the equipment to be predicted is subjected to feature extraction by utilizing spark ML feature engineering, the extracted features are converted into a test set, and the test set is predicted by utilizing the user attribute prediction model, so that the aim of obtaining an accurate portrait label while processing mass data is fulfilled.

Based on the portrait label prediction method provided by the embodiment of the present invention, the pre-constructed first user attribute prediction model and second user attribute prediction model are involved in executing step S103. As shown in fig. 2, a schematic flow chart for constructing a first user attribute prediction model and a second user attribute prediction model in advance according to an embodiment of the present invention mainly includes the following steps:

step S201: advertisement monitoring data for a sample device is obtained.

In step S201, the sample device is a device for which portrait tags are known.

The advertisement monitoring data at least comprises equipment data, advertisement exposure data and user behavior data of clicking advertisements.

The device data includes, but is not limited to, the device ID, the device brand, and the device type of the sample device.

In the process of implementing step S201 specifically, advertisement monitoring data of a plurality of sample devices monitored by the advertisement monitoring system is obtained.

Step S202: and associating the advertisement monitoring data with sample library data with the same equipment ID according to the equipment ID of the sample equipment to obtain the original data of the sample equipment.

In step S202, the sample library is used to store user attributes, device information, and the like. Alternatively, it may be a database of the upper numbers.

In the sample library, each sample library data corresponds to a device ID to indicate its origin. Each sample library data includes at least a first user attribute and a second user attribute.

In the embodiment of the present invention, the first user attribute and the second user attribute are only used for distinguishing different user attributes, and the number of the sample library data attributes is not limited.

In the process of implementing step S202 specifically, for each sample device, sample library data that is the same as the device ID of the sample device is searched, and the sample library data is associated with advertisement monitoring data of the sample device having the same device ID, so as to obtain original data of the sample device. If there are multiple sample devices, raw data associated with the multiple sample devices is obtained.

For example, original data obtained by associating sample data corresponding to three sample devices with advertisement monitoring data of sample devices having the same device ID may be represented in a table manner, as shown in table 2:

table 2:

table 2 above is by way of example only.

Step S203: and extracting target features in the original data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a sample data set.

In step S203, the target features at least comprise device data of the sample device, each of the target features being associated with the first and second user attributes.

In the process of specifically implementing step S203, based on the raw data shown in table 2, optionally, based on SparkML feature engineering, extracting a device brand, a device type, and a geographic location in the raw data, and converting the target feature based on the correspondence between the target feature and the number shown in table 1 to obtain a sample data set.

The sample data set can also be represented in the manner of table 3.

Table 3:

step S204: the sample data set is divided into a training set and a validation set.

In step S204, the training set is used to train the user attribute prediction model. The validation set is used to validate the user attribute prediction model.

In the specific implementation step S204, the sample data set is randomly divided into two parts according to the number of the sample devices, wherein one part is used as a training set, and the other part is used as a verification set.

Optionally, the sample data set is sampled without repeated sampling, the sample data set is randomly divided into K parts, N parts of the K parts are used as a training set, and the remaining K-N parts are used as a verification set after associated user attributes are removed. Wherein, the value of K is a positive integer larger than 2, and the value of N is a positive integer larger than 1.

Taking table 3 as an example, tables 4 and 5 were obtained after the division. Where table 4 represents a training set and table 5 represents a validation set.

Table 4:

table 5:

step S205: the training set is divided into a first training set labeled as a first user attribute and a second training set labeled as a second user attribute based on the first user attribute and the second user attribute.

In the process of implementing step S205 specifically, taking table 4 as an example, table 6 shows field information involved in training participation of a first training set labeled as a first user attribute. Table 7 shows the field information involved in the training for a second training set labeled as a second user attribute.

It should be noted that, as many training sets as possible, many first training sets and second training sets are divided.

Table 6:

table 7:

step S206: respectively carrying out random forest model training on the first training set and the second training set to obtain a first user attribute prediction model and a second user attribute prediction model;

in the specific implementation process of step S206, random forest model parameters are obtained, the random forest model parameters are used for constructing a random forest model, a random forest training model is constructed based on sparkmlllb machine learning and the random forest model parameters, the random forest training model is trained by using the first training set as the input quantity of the random forest training model, and a first user attribute prediction model is obtained.

Similarly, a random forest training model is constructed based on spark MLlib machine learning and random forest model parameters, the random forest training model is trained by taking the second training set as the input quantity of the random forest training model, and a second user attribute prediction model is obtained.

It should be noted that, if the first training set and the second training set are multiple, multiple first user attribute prediction models and multiple second user attribute prediction models are obtained.

Here, the first user attribute prediction model is taken as a gender prediction model, and the second user attribute prediction model is taken as an age prediction model for example.

As shown in fig. 3, a schematic flow chart of the gender prediction model and the age prediction model training provided by the embodiment of the present invention mainly includes the following steps:

step S301: and acquiring parameters of the two-classification random forest model.

In step S301, the two-class random forest model parameters are used to participate in the training of the gender prediction model.

Step S302: and taking the parameters of the first training set and the two-classification random forest model as the input of a spark MLlib machine learning model to train a two-classification random forest model, outputting the two-classification random forest model corresponding to the first training set, and taking the two-classification random forest model as a gender prediction model.

Step S303: and acquiring multi-classification random forest model parameters.

In step S303, the multi-classification random forest model parameters are used to participate in the training of the age prediction model.

Step S304: and taking the second training set and the multi-classification random forest model parameters as the input of a spark MLlib machine learning model to train a multi-classification random forest model, outputting the multi-classification random forest model corresponding to the second training set, and taking the multi-classification random forest model as an age prediction model.

Step S207: and verifying the first user attribute model and the second user attribute model based on the verification set, if the first user attribute model and the second user attribute model pass the verification, executing the step S208, and if the first user attribute model and the second user attribute model do not pass the verification, continuing to execute the step S206.

In the process of implementing step S207 specifically, the verification set is input into the multiple first user attribute models and the multiple second user attribute models, if the obtained first user attribute is the same as or similar to the actual first user attribute corresponding to each sample device in the verification set, and the obtained second user attribute is the same as or similar to the actual second user attribute corresponding to each sample device in the verification set, it is determined that the verification is passed, otherwise, if any one of the obtained second user attributes is different, it is determined that the verification is passed, and the multiple first user attribute models and the multiple second user attribute models continue to be trained until the verification is passed.

Optionally, for the verified first user attribute model and the verified second user attribute model, the first user attribute model with the prediction result closest to or the same as the real first user attribute is selected as the final first user attribute model. And similarly, selecting a second user attribute model with the prediction result closest to or the same as the real second user attribute as a final second user attribute model.

In an optional implementation manner, first, the target features to be measured in the verification set are respectively used as the input of the first user attribute prediction model and the second user attribute prediction model to perform prediction, so as to obtain a first user attribute prediction result and a second user attribute prediction result.

And then, calculating a deviation value of the first user attribute prediction result and the first user attribute associated with the target feature to be detected, and calculating a deviation value of the second user attribute prediction result and the second user attribute associated with the target feature to be detected.

If the deviation value is smaller than the threshold value, determining that the verification is passed;

and if the deviation value is not smaller than the threshold value, determining that the verification is not passed.

And finally, selecting the first user attribute prediction model and the second user attribute prediction model with the minimum deviation value from the first user attribute prediction model and the second user attribute prediction model which pass the verification as the final first user attribute prediction model and the final second user attribute prediction model.

Step S208: and determining that the first user attribute model and the second user attribute model are built.

In the embodiment of the invention, the user attribute prediction model which can predict the mass data is obtained by training based on the user attribute and the random forest model in advance, so that the method is favorable for performing portrait label prediction on the equipment to be predicted subsequently by combining with spark ML feature engineering to perform feature processing on the equipment to be predicted and predicting the test set by using the user attribute prediction model, thereby achieving the purpose of obtaining the accurate portrait label while satisfying the mass data processing.

Corresponding to the image tag prediction method described in the above embodiment of the present invention, referring to fig. 4, a block diagram of an image tag prediction apparatus according to an embodiment of the present invention is provided, where the image tag prediction apparatus includes: an acquisition module 401, a feature processing module 402, a prediction module 403, and a merging module 404.

The obtaining module 401 is configured to obtain advertisement monitoring data obtained by monitoring an advertisement of a device to be predicted, where the advertisement monitoring data at least includes device data.

And the feature processing module 402 is configured to extract a target feature in the advertisement monitoring data based on sparkML feature engineering, and convert the target feature based on a preset format to obtain a test set.

The feature processing module 402 optionally includes:

and the extraction unit is used for extracting the target characteristics in the advertisement monitoring data based on spark ML characteristic engineering.

And the conversion unit is used for converting the target characteristics into corresponding numbers according to the pre-established corresponding relation between the target characteristics and the numbers and collecting the numbers to obtain a test set.

The prediction module 403 is configured to input the test set into a first user attribute prediction model and a second user attribute prediction model that are pre-constructed respectively to perform prediction, so as to obtain a first user attribute prediction result and a second user attribute prediction result, where the first user attribute prediction model performs random forest model training and construction by using first user attributes, and the second user attribute prediction model performs random forest model training and construction by using second user attributes.

A merging module 404, configured to merge the first user attribute prediction result and the second user attribute prediction result as a portrait label of the device to be predicted.

Optionally, the portrait label prediction apparatus further includes a construction module, where the construction module includes:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring advertisement monitoring data of a sample device, the advertisement monitoring data at least comprises device data, advertisement exposure data and user behavior data of clicking advertisements, and the device data at least comprises a device ID of the sample device;

the correlation unit is used for correlating the advertisement monitoring data of the sample device with sample library data with the same device ID according to the device ID of the sample device to obtain original data of the sample device, wherein the sample library data at least comprises a first user attribute and a second user attribute;

the characteristic processing unit is used for extracting target characteristics in the original data based on spark ML characteristic engineering, and converting the target characteristics based on a preset format to obtain a sample data set, wherein the target characteristics at least comprise equipment data of the sample equipment, and each target characteristic is associated with the first user attribute and the second user attribute;

and the dividing unit is used for dividing the sample data set into a training set and a verification set, and dividing the training set into a first training set with labels as first user attributes and a second training set with labels as second user attributes on the basis of the first user attributes and the second user attributes.

And the training unit is used for respectively carrying out random forest model training on the first training set and the second training set to obtain a first user attribute prediction model and a second user attribute prediction model.

It should be noted that, if the first user attribute prediction model is a gender prediction model, and the second user attribute prediction model is an age prediction model, the training unit is specifically configured to:

acquiring parameters of a binary random forest model; taking the first training set and the two-classification random forest model parameters as the input of a spark MLlib machine learning model to carry out two-classification random forest model training, outputting a two-classification random forest model corresponding to the first training set, and taking the two-classification random forest model as a gender prediction model; acquiring multi-classification random forest model parameters; and taking the second training set and the multi-classification random forest model parameters as the input of a spark MLlib machine learning model to train a multi-classification random forest model, outputting the multi-classification random forest model corresponding to the second training set, and taking the multi-classification random forest model as an age prediction model.

A verification unit configured to verify the first user attribute model and the second user attribute model based on the verification set; and if the verification is passed, determining that the first user attribute model and the second user attribute model are constructed, and if the verification is not passed, continuing to execute the training unit.

Optionally, the verification unit is specifically configured to: respectively taking the target characteristics to be detected in the verification set as the input of the first user attribute prediction model and the second user attribute prediction model to predict, and obtaining a first user attribute prediction result and a second user attribute prediction result; calculating a deviation value of the first user attribute prediction result and a first user attribute associated with the target feature to be detected, and calculating a deviation value of the second user attribute prediction result and a second user attribute associated with the target feature to be detected; if the deviation value is smaller than the threshold value, determining that the verification is passed; and if the deviation value is not smaller than the threshold value, determining that the verification is not passed.

In the portrait label prediction device provided by the invention, advertisement monitoring data obtained by monitoring advertisements on equipment to be predicted are obtained, and target characteristics in the advertisement monitoring data are extracted and converted based on spark ML characteristic engineering to obtain a test set; and respectively inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are trained and constructed in advance based on different user attributes and a random forest model for prediction, and combining the obtained first user attribute prediction result and the second user attribute prediction result to be used as an portrait label of the equipment to be predicted. In the embodiment of the invention, the user attribute prediction model which can predict the mass data is obtained by training based on the user attribute and the random forest model in advance, and the portrait label prediction is completed by combining the spark ML characteristic engineering, so that the aim of obtaining the accurate portrait label while processing the mass data is fulfilled.

Based on the data processing apparatus disclosed in the above embodiment of the present invention, the above modules and units may be implemented by a hardware device composed of a processor and a memory. The method specifically comprises the following steps: the modules and units are stored in a memory as program units, and a processor executes the program units stored in the memory to realize data processing.

The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more, and data processing is realized by adjusting kernel parameters.

An embodiment of the present invention provides a storage medium having a program stored thereon, the program implementing a portrait label prediction process when executed by a processor.

The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes the portrait label prediction method disclosed in any of fig. 1 to 2 when running.

An embodiment of the present invention provides a data processing apparatus 50, and as shown in fig. 5, a schematic structural diagram of the data processing apparatus 50 provided in the embodiment of the present invention is shown.

The data processing device in the embodiment of the present invention may be a server, a PC, a PAD, a mobile phone, or the like.

The data processing device comprises at least one processor 501 and at least one memory 502 connected to the processor, and a bus 503.

The processor 501 and the memory 502 communicate with each other via a bus 503. A processor 501 for executing programs stored in the memory 502.

A memory 502 for storing a program for at least: acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted, wherein the advertisement monitoring data at least comprises equipment data; extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set; inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model; and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:

acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted; extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set; inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result; and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A portrait tag prediction method, the method comprising:

acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted, wherein the advertisement monitoring data at least comprises equipment data;

extracting target features in the advertisement monitoring data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a test set;

inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model;

and merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

2. The method of claim 1, wherein the converting the target feature based on the predetermined format to obtain a test set comprises:

and converting the target characteristics into corresponding numbers according to the corresponding relation between the target characteristics and the numbers, and collecting the numbers to obtain a test set.

3. The method according to claim 1, wherein the pre-constructing of the first user attribute prediction model and the second user attribute prediction model comprises:

acquiring advertisement monitoring data of a sample device, wherein the advertisement monitoring data at least comprises device data, advertisement exposure data and user behavior data of clicking advertisements, and the device data at least comprises a device ID of the sample device;

associating the advertisement monitoring data of the sample device with sample library data with the same device ID according to the device ID of the sample device to obtain original data of the sample device, wherein the sample library data at least comprises a first user attribute and a second user attribute;

extracting target features in the original data based on spark ML feature engineering, and converting the target features based on a preset format to obtain a sample data set, wherein the target features at least comprise equipment data of the sample equipment, and each target feature is associated with the first user attribute and the second user attribute;

dividing the sample data set into a training set and a verification set;

dividing the training set into a first training set with labels as first user attributes and a second training set with labels as second user attributes based on the first user attributes and the second user attributes;

respectively carrying out random forest model training on the first training set and the second training set to obtain a first user attribute prediction model and a second user attribute prediction model;

verifying the first user attribute model and the second user attribute model based on the verification set;

if the verification is passed, determining that the first user attribute model and the second user attribute model are constructed,

and if the verification fails, continuing to train the first user attribute model and the second user attribute model until the verification passes.

4. The method of claim 3, wherein the verifying the first user attribute model and the second user attribute model based on the verification set comprises:

respectively taking the target characteristics to be detected in the verification set as the input of the first user attribute prediction model and the second user attribute prediction model to predict, and obtaining a first user attribute prediction result and a second user attribute prediction result;

calculating a deviation value of the first user attribute prediction result and a first user attribute associated with the target feature to be detected, and calculating a deviation value of the second user attribute prediction result and a second user attribute associated with the target feature to be detected;

5. The method according to claim 3, wherein if the first user attribute prediction model is a gender prediction model and the second user attribute prediction model is an age prediction model, the performing random forest model training on the first training set and the second training set respectively to obtain a first user attribute prediction model and a second user attribute prediction model comprises:

acquiring parameters of a binary classification random forest model;

taking the first training set and the two-classification random forest model parameters as the input of a spark MLlib machine learning model to carry out two-classification random forest model training, outputting a two-classification random forest model corresponding to the first training set, and taking the two-classification random forest model as a gender prediction model;

acquiring multi-classification random forest model parameters;

and taking the second training set and the multi-classification random forest model parameters as the input of a spark MLlib machine learning model to train a multi-classification random forest model, outputting the multi-classification random forest model corresponding to the second training set, and taking the multi-classification random forest model as an age prediction model.

6. A portrait label prediction apparatus, comprising:

the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring advertisement monitoring data obtained by monitoring advertisements of equipment to be predicted, and the advertisement monitoring data at least comprises equipment data;

the characteristic processing module is used for extracting target characteristics in the advertisement monitoring data based on spark ML characteristic engineering and converting the target characteristics based on a preset format to obtain a test set;

the prediction module is used for inputting the test set into a first user attribute prediction model and a second user attribute prediction model which are constructed in advance respectively for prediction to obtain a first user attribute prediction result and a second user attribute prediction result, wherein the first user attribute prediction model utilizes first user attributes to train and construct a random forest model, and the second user attribute prediction model utilizes second user attributes to train and construct a random forest model;

and the merging module is used for merging the first user attribute prediction result and the second user attribute prediction result to be used as the portrait label of the equipment to be predicted.

7. The apparatus of claim 6, wherein the feature processing module comprises:

the extraction unit is used for extracting target features in the advertisement monitoring data based on spark ML feature engineering;

8. The apparatus of claim 6, further comprising:

a dividing unit, configured to divide the sample data set into a training set and a verification set, and divide the training set into a first training set labeled as a first user attribute and a second training set labeled as a second user attribute based on a first user attribute and a second user attribute;

the training unit is used for respectively carrying out random forest model training on the first training set and the second training set to obtain a first user attribute prediction model and a second user attribute prediction model;

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the portrait label prediction method according to any one of claims 1 to 5.

10. An electronic device, wherein the electronic process comprises at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to invoke a program in the memory, the program at least being configured to implement the portrait tag prediction method of any of claims 1 to 5.