CN112084408B

CN112084408B - List data screening method, device, computer equipment and storage medium

Info

Publication number: CN112084408B
Application number: CN202010936334.5A
Authority: CN
Inventors: 徐杰
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2023-11-21
Anticipated expiration: 2040-09-08
Also published as: CN112084408A

Abstract

The embodiment of the application belongs to the technical field of big data, and relates to a list data screening method, which comprises the following steps: acquiring a history information record, and determining corresponding initial list data according to the history information record; based on a preset similarity model, acquiring the total similarity of the initial list data and the sample data; screening the initial list data according to the total similarity to obtain invalid data and candidate data; when the duty ratio of the invalid data is smaller than the preset duty ratio, the candidate data is screened again to obtain final selectable list data; based on the stage model, calculating a prediction score corresponding to the selectable list data, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold as final list data. The application also provides a list data screening device, computer equipment and a storage medium. Furthermore, the present application also relates to blockchain techniques, where the final list data may be stored in the blockchain. The application realizes accurate screening of the user data.

Description

List data screening method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method and apparatus for screening list data, a computer device, and a storage medium.

Background

With the rapid development of information technology, a large amount of data is transmitted every day, whether on-line or off-line, and people receive a large amount of different information transmitted from the outside every day. However, how to push information to users in a targeted manner, so that the group receiving the information is the group that really needs the information, is the hotspot of current research.

Traditional user screening generally adopts offline manual visiting to collect data, then performs manual screening on users, and then performs information recommendation on the users, and the mode is generally low in efficiency and cannot accurately analyze and process a large amount of data. Currently, for an online user population, although data analysis can be performed through big data, a user population is obtained that may require some sort of information. However, when screening is performed in this way, there is still a problem that the screening accuracy is poor due to the large data volume, users who really need the current recommended content cannot be screened, and the screened users have low viscosity. Therefore, how to realize high-precision user data screening in a large amount of information is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application aims to provide a list data screening method, a device, computer equipment and a storage medium, so as to solve the technical problem that high-precision data screening cannot be realized in a large amount of information at present.

In order to solve the above technical problems, the embodiment of the present application provides a method for screening list data, which adopts the following technical scheme:

acquiring a history information record, and determining corresponding initial list data according to the history information record;

based on a preset similarity model, acquiring the total similarity of the initial list data and the sample data;

screening the initial list data according to the total similarity to obtain invalid data and candidate data;

calculating a first occupation ratio of the invalid data in the initial list data, and when the first occupation ratio of the invalid data is smaller than a preset occupation ratio, rescreening the candidate data until a second occupation ratio of the total of the accumulated and screened invalid data in the initial list data is larger than or equal to the preset occupation ratio, so as to obtain final optional list data;

and calculating a prediction score corresponding to each piece of selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold as final list data.

Further, before the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model, the method further includes:

pre-establishing a basic training model, and training the basic training model based on acquired training data, wherein the training data comprises positive sample data and negative sample data;

and determining that the basic training model is trained to obtain a corresponding similarity model when the recognition success rate of the basic training model to the positive sample data and the negative sample data reaches a preset success rate.

Further, the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model specifically includes:

calculating the data of each item of data in the initial list data and the data of the corresponding item in the sample data through a preset similarity model to obtain sub-similarity corresponding to each item of data in the initial list data;

and calculating the sum of all the sub-similarity and the ratio of the sum in the total item number of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data.

Further, the step of calculating the prediction score corresponding to each piece of selectable list data specifically includes:

acquiring the data category of the selectable list data;

service processing tracking is carried out on the selectable list data according to the data category, and the processing duration of the selectable list data is obtained;

and calculating a prediction score corresponding to the selectable list data through a preset stage model based on the processing time length.

Further, the step of performing service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data specifically includes:

acquiring first processing time length of the first service processing when the first service processing is carried out on the selectable list data;

when the first processing time length is larger than or equal to a preset processing threshold value, re-tracking the selectable list data to obtain the re-tracking times and the re-tracking time length of each re-tracking;

and calculating the processing time length of the selectable list data according to the first processing time length, the tracking times and the re-tracking time length by weighted summation.

Further, before the step of collecting the first processing duration of the first service processing, the method further includes:

And when the service processing tracking is carried out on the selectable list data, determining whether service tracking failure exists, and when the number of times of service tracking failure reaches a preset number of times, canceling the service processing tracking on the selectable list data.

Further, the step of calculating, based on the processing duration, a prediction score corresponding to the selectable list data through a preset stage model specifically includes:

normalizing the processing time length through a preset stage model to obtain a basic score corresponding to the selectable list data;

and acquiring the historical scores of the selectable list data, and calculating the prediction scores corresponding to the selectable list data according to the historical scores and the basic scores.

In order to solve the technical problems, the embodiment of the application also provides a list data screening device, which adopts the following technical scheme:

the first acquisition module is used for acquiring a history information record and determining corresponding initial list data according to the history information record;

the second acquisition module is used for acquiring the total similarity of the initial list data and the sample data based on a preset similarity model;

the screening module is used for screening the initial list data according to the total similarity to obtain invalid data and candidate data;

The comparison module is used for calculating a first occupation ratio of the invalid data in the initial list data, and rescreening the candidate data when the first occupation ratio of the invalid data is smaller than a preset occupation ratio until a second occupation ratio of the total sum of the accumulated and screened invalid data in the initial list data is larger than or equal to the preset occupation ratio, so that final selectable list data is obtained;

and the confirmation module is used for calculating the prediction score corresponding to each piece of selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score greater than or equal to a preset score threshold value as final list data.

In order to solve the technical problem, the embodiment of the application also provides a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor realizes the steps of the list data screening method when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer readable storage medium, where computer readable instructions are stored on the computer readable storage medium, and the computer readable instructions implement the steps of the above list data screening method when executed by a processor.

According to the method, the history information record is acquired, corresponding initial list data is determined according to the history information record, and the initial list data is the user data acquired in advance; based on a preset similarity model, acquiring the total similarity of the initial list data and the sample data; screening the initial list data according to the total similarity to obtain invalid data and candidate data, wherein the invalid data is the initial list data with the total similarity smaller than a preset screening threshold value, calculating a first occupation ratio of the invalid data in the initial list data, and when the first occupation ratio of the invalid data is smaller than the preset occupation ratio, namely, the screened invalid data is less, at the moment, screening the candidate data again, wherein the screening can be performed by adjusting the preset screening threshold value until the second occupation ratio of the total of the screened invalid data in the initial list data is larger than or equal to the preset occupation ratio, so as to obtain final optional list data; based on a preset stage model, calculating a prediction score corresponding to each piece of selectable list data, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold value as final list data, so that accurate screening of user data in a large amount of information is realized, the accuracy rate of information pushing and the information transfer efficiency are improved, the transmission of a large amount of invalid information is avoided, and resource waste caused by invalid pushing is further avoided.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of screening of list data in accordance with the present application;

FIG. 3 is a schematic diagram of an embodiment of a tabular data screening device in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Reference numerals: the list data screening apparatus 400 includes: a first acquisition module 401, a second acquisition module 402, a screening module 403, a comparison module 404, and a validation module 405.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the method for screening the list data provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the device for screening the list data is generally set in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a method of screening of list data according to the present application is shown. The list data screening method comprises the following steps:

Step S201, acquiring a history information record, and determining corresponding initial list data according to the history information record;

in this embodiment, the history information record is a stored history consumption record, and the history information record can be obtained by a buried point manner. When the history information record is obtained, the history information record is analyzed, and initial list data can be determined, wherein the initial list data is initial sample data comprising the basic information of the gender, age, consumption condition and the like of the user.

Step S202, acquiring the total similarity of the initial list data and the sample data based on a preset similarity model;

in this embodiment, the similarity model is a model for confirming similarity, which is trained in advance. Taking a lightgbm model as an example, the lightgbm model adopts a decision tree algorithm based on learning, and has faster running speed and smaller memory. And the data output can be carried out on the similarity between the acquired initial list data and the sample data through the lightgbm model. The sample data is positive sample data, for example, in a recommendation system, the sample data is user data that recommended products in the recommendation system have been accepted. Specifically, based on the similarity model, sub-similarity of each item of data in the initial list data and corresponding item of data in the sample data is obtained, and total similarity of the initial list data and the sample data is obtained through calculation according to the sub-similarity.

Step S203, screening the initial list data according to the total similarity to obtain invalid data and candidate data;

in this embodiment, when the total similarity between the initial list data and the sample data is obtained according to the similarity model, invalid data in the initial list data is filtered according to the total similarity. Specifically, the initial list data with the total similarity with the sample data being greater than or equal to a preset screening threshold value is candidate data, and the initial list data with the total similarity with the sample data being less than the preset screening threshold value is invalid data.

Step S204, calculating a first occupation ratio of the invalid data in the initial list data, and when the first occupation ratio of the invalid data is smaller than a preset occupation ratio, rescreening the candidate data until a second occupation ratio of the total of the accumulated and screened invalid data in the initial list data is larger than or equal to the preset occupation ratio, so as to obtain final optional list data;

when invalid data is obtained through first screening, calculating a first occupation ratio of the invalid data in the initial list data, and if the first occupation ratio is larger than or equal to a preset occupation ratio, not rescreening candidate data obtained through first screening, wherein the candidate data is final optional list data; if the first duty ratio is smaller than the preset duty ratio, the preset screening threshold is adjusted, if the preset screening threshold is increased, the larger the preset screening threshold is, the more invalid data are screened according to the preset screening threshold. And (3) re-screening the candidate data obtained by screening according to the adjusted preset screening threshold value, wherein the basis of each screening is that the candidate data obtained in the previous time is screened, and the sum of all the invalid data screened in the accumulated way is calculated when each screening is finished until the second occupation ratio of the sum of the invalid data screened in the accumulated way in the initial list data is larger than or equal to the preset occupation ratio, and the invalid data is removed from the candidate data screened in the last time, so that the remaining candidate data is the final selectable list data.

In addition, when the invalid data in the initial list data is screened according to the total similarity, a first occupation ratio of the invalid data in the initial list data can be obtained after the initial list data is screened for the first time according to a preset screening threshold value, and if the first occupation ratio is smaller than the preset occupation ratio, the preset screening threshold value is adjusted; and (3) filtering the initial list data again according to the adjusted preset filtering threshold value, wherein the basis of each filtering is that the filtering is performed on the basis of the initial list data until the first occupation ratio of the filtered invalid data in the initial list data is greater than or equal to the preset occupation ratio, and at the moment, determining the candidate data screened out from the initial list data for the last time as final selectable list data.

Step S205, calculating a prediction score corresponding to each piece of selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score greater than or equal to a preset score threshold as final list data.

In this embodiment, when the data of the selectable lists are obtained, a prediction score corresponding to each selectable list is calculated. The prediction score of the selectable list data can be calculated according to a preset stage model, and in particular, the stage model is a classification model, and can be used for estimating the probability of something, such as a logistic regression model, a support vector machine (Support Vector Machine, SVM) model, and the like. The stage model can be used for carrying out predictive scoring on the selectable list data. The prediction score can be obtained by calculation according to normalized data of the selectable list data in different dimensions, such as different dimensions of processing time length, consumption records and the like of the service of the selectable list data, and the normalized data in different dimensions can be obtained by statistics of set hive-sql sentences and normalization calculation. When the prediction scores corresponding to the selectable list data are calculated according to the normalized data in different dimensions, the selectable list data with the prediction scores greater than or equal to a preset score threshold are used as final list data.

Furthermore, when the final list data is obtained, a preset return visit period can be obtained, and when the preset return visit period is reached, the data reflux processing is carried out on the final list data. Specifically, when a preset return visit period is reached, basic information is acquired again for the final list data, and calculation of a prediction score is performed again for the final list data according to the basic information and the prediction score. And deleting the final list data from all stored final list data when the predictive score is smaller than a preset score threshold value, so that the final list data is updated regularly.

In the embodiment of the application, taking an insurance service scene as an example, a history information record is obtained, wherein in the current insurance service scene, the history information record comprises information such as gender, age, residence, consumption record of a user and the like. And when the history information record is obtained, counting the users in the history information record to obtain initial list data. And determining the total similarity of the initial list data and sample data based on a preset similarity model, wherein the sample data is pre-acquired user list data purchased with insurance, and the information comprises the gender, age, residence and the like of the users. And calculating the total similarity between the initial list data and the sample data through a preset similarity model. And determining the initial list data with the total similarity with the sample data being greater than or equal to a preset screening threshold value as candidate data, and determining the initial list data with the total similarity with the sample data being less than the preset screening threshold value as invalid data. And when the first duty ratio of the invalid data in the initial list data is smaller than the preset duty ratio, the candidate data is rescreened. And when the screening is carried out again, adjusting a preset screening threshold, if the preset screening threshold is improved, carrying out ineffective data screening on the candidate data according to the adjusted preset screening threshold. And if the second occupation ratio of the ineffective data obtained by screening is still smaller than the preset occupation ratio in the candidate data, the preset screening threshold value is adjusted again, screening is performed based on the candidate data obtained in the previous time until the second occupation ratio of the total sum of all the ineffective data obtained by screening in the initial list data is larger than or equal to the preset occupation ratio, and the ineffective data screened at the moment is removed from the candidate data obtained by screening in the last time, so that the obtained candidate data is the optional list data. When the selectable list data is obtained, calculating to obtain a prediction score of the selectable list data according to a preset stage model. In the current insurance business scene, the prediction score of each user in the current selectable list data can be obtained by calculating three dimensions of service processing time length, insurance purchase times and purchase accumulated cost of each user from an agent based on a stage model. And the selectable list data with the prediction score being greater than or equal to a preset score threshold value is used as final list data.

It should be emphasized that, to further ensure the privacy and security of the final list data, the final list data may also be stored in a blockchain node.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

According to the application, through screening of different dimensions of the initial list data, accurate screening of user data in a large amount of information is realized, the accuracy rate of information pushing and the information transfer efficiency are improved, the transmission of a large amount of invalid information is avoided, and the resource waste caused by invalid pushing is further avoided.

In some embodiments of the present application, before the obtaining the total similarity between the initial list data and the sample data based on the preset similarity model, the method includes:

In this embodiment, the similarity model may be obtained by training the basic training model. Specifically, a basic training model is pre-established, and the basic training model is the basic model which is not trained by any training data. And acquiring a plurality of groups of training data in advance, wherein the training data comprises positive sample data and negative sample data, taking a recommendation system as an example, the positive sample data is the sample data successfully recommended before, and the negative sample data is the sample data failed in recommendation. And inputting the positive sample data and the negative sample data into a basic training model for training, and determining that the basic training model is trained when the recognition success rate of the basic training model to the positive sample data and the negative sample data reaches a preset success rate, so as to obtain a corresponding similarity model.

According to the embodiment, the basic training model is trained to obtain the similarity model, so that the similarity of the initial list data and the sample data can be accurately obtained through the similarity model.

In some embodiments of the present application, the obtaining the total similarity between the initial list data and the sample data based on the preset similarity model includes:

In this embodiment, the data of each item in the initial list data and the data of the corresponding item in the sample data are calculated through the similarity model to obtain an output result, dissimilarity and similarity of the output result of the similarity model are respectively represented by 0 and 1, and when the output result is similar, namely, the data of the item in the initial list data and the data of the corresponding item in the sample data are represented as similar, and the sub-similarity is 1; when the result output is dissimilar, that is, the data representing the item in the initial list data is dissimilar to the data of the corresponding item in the sample data, the sub-similarity is 0. When the sub-similarity corresponding to each item of data in the initial list data is calculated, calculating the sum of all the sub-similarities, and the occupation proportion of the sum of the calculated sub-similarities to the total number of items in the initial list data, wherein the occupation proportion is the total similarity of the initial list data and the sample data. For example, the initial list data includes 5 data, where the sub-similarity of 4 data in the 5 data is 1, the sum of all the corresponding sub-similarities is 4, and the calculated occupancy proportion in the total number of terms, that is, the total similarity is 80%.

According to the embodiment, the total similarity is calculated according to the sub-similarity corresponding to each item of data in the initial list data, so that the calculated total similarity is more accurate, and the screening precision of the initial list data is further improved.

In some embodiments of the present application, the calculating the prediction score corresponding to each of the selectable list data includes:

acquiring the data category of the selectable list data;

In this embodiment, the data types may be classified into different types according to the region to which the user belongs, and different service processing tracks are allocated according to the different data types, for example, for the selectable list data belonging to the region a, the service processing track of the corresponding region a is allocated according to the mapping relationship. Service processing tracking includes telephony tracking, network tracking, etc. By performing service processing tracking on the selectable list data and monitoring the processing time length of the selectable list data, the processing time lengths corresponding to the service processing tracking of different categories may be different. And when the processing time length corresponding to the selectable list data is obtained, performing scoring calculation on the selectable list data according to a preset stage model to obtain a corresponding prediction score, wherein the longer the processing time length is, the higher the prediction score of the selectable list data obtained by the stage model calculation is.

The embodiment realizes different service processing tracking of the selectable list data under different data types, so that the processing time length of different selectable list data can be accurately tracked through the service processing tracking, the user data is judged according to the processing time length, and the accuracy rate of identifying the user data is further improved.

In some embodiments of the present application, the performing service processing tracking on the selectable list data according to the data category, and obtaining a processing duration of the selectable list data includes:

In this embodiment, when performing first service processing on the selectable list data, acquiring a first processing duration of the first service processing, and if the first processing duration is greater than or equal to a preset processing threshold, re-tracking the selectable list data; and if the first processing time length is smaller than a preset processing threshold value, stopping service processing tracking of the selectable list data. And when the tracking times of the re-tracking reach the preset times, stopping the re-tracking of the current selectable list data. And carrying out weighted summation calculation on the acquired first processing time length, tracking times and re-tracking time length, and obtaining the processing time length of the current selectable list.

The embodiment realizes the accurate calculation of the processing time length according to the tracking time length and the tracking times, and further improves the screening precision of the user data.

In some embodiments of the present application, before the collecting the first processing duration of the first service processing, the method further includes:

In this embodiment, before service processing tracking is performed on the optional list data, it is determined whether service tracking failure exists on the optional list data before the first service processing, and when the number of service tracking failures on the optional list data reaches a preset number of times, the service tracking processing on the optional list data is canceled.

According to the embodiment, the selectable list data with multiple tracking failures is cancelled, so that the waste of service resources is avoided, and the utilization rate of the service resources is improved.

In some embodiments of the present application, calculating, based on the processing duration, a prediction score corresponding to the selectable list data through a preset stage model includes:

In this embodiment, when calculating the prediction score of the selectable list data, the prediction score corresponding to the selectable list data may be obtained by combining the historical score corresponding to the current selectable list data with the basic score obtained by calculating the processing duration according to the stage model, obtaining a preset weight corresponding to the historical score and the basic score, and performing weighted summation calculation on the historical score and the basic score according to the preset weight. Specifically, when the processing duration of the selectable list data is obtained, normalizing the processing duration through the preset stage model to obtain a value corresponding to the processing duration, wherein the value is the basic score of the selectable list data. Of course, if the selectable list data includes record data of different dimensions such as consumption records, in addition to the processing duration, normalization processing is performed on the record data of different dimensions to obtain corresponding normalized data, and the normalized data of all dimensions are summed to obtain the basic score of the current selectable list data. And if the selectable list data also has a corresponding historical score, carrying out weighted summation on the historical score and the basic score according to the preset weight to obtain a prediction score of the selectable list data. And if the history score does not exist in the selectable list data, the basic score of the selectable list data is the prediction score.

The embodiment realizes the accurate scoring calculation of the user data, so that the user data can be more accurately screened through the scoring, and the screening efficiency and accuracy of the user data are further improved.

Those skilled in the art will appreciate that implementing all or part of the processes of the methods of the embodiments described above may be accomplished by way of computer readable instructions, stored on a computer readable storage medium, which when executed may comprise processes of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a list data screening apparatus, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the list data screening apparatus 400 according to this embodiment includes: a first acquisition module 401, a second acquisition module 402, a screening module 403, a comparison module 404, and a validation module 405. Wherein:

the first obtaining module 401 is configured to obtain a history information record, and determine corresponding initial list data according to the history information record.

A second obtaining module 402, configured to obtain the total similarity between the initial list data and the sample data based on a preset similarity model.

Wherein the second obtaining module 402 includes:

the first calculation unit is used for calculating the data of each item of data in the initial list data and the data of the corresponding item of sample data through a preset similarity model to obtain sub-similarity corresponding to each item of data in the initial list data;

and the second calculation unit is used for calculating the sum of all the sub-similarity and the ratio of the sum in the total number of items of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data.

And a screening module 403, configured to screen the initial list data according to the total similarity, so as to obtain invalid data and candidate data.

And the comparison module 404 is configured to calculate a first occupation ratio of the invalid data in the initial list data, and when the first occupation ratio of the invalid data is smaller than a preset occupation ratio, re-screen the candidate data until a second occupation ratio of the sum of the accumulated and screened invalid data in the initial list data is greater than or equal to the preset occupation ratio, so as to obtain final optional list data.

And the confirmation module 405 is configured to calculate a prediction score corresponding to each piece of selectable list data based on a preset stage model, and select the selectable list data with the prediction score greater than or equal to a preset score threshold as final list data.

Wherein, the confirmation module 405 includes:

an acquisition unit, configured to acquire a data category of the selectable list data;

the tracking unit is used for carrying out service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data;

And the third calculation unit is used for calculating the prediction scores corresponding to the selectable list data through a preset stage model based on the processing time.

Wherein the tracking unit comprises:

the acquisition subunit is used for acquiring first processing duration of the first service processing when the first service processing is carried out on the selectable list data;

the tracking subunit is used for re-tracking the selectable list data when the first processing time length is greater than or equal to a preset processing threshold value, and acquiring the re-tracking times and the re-tracking time length of each re-tracking;

and the first calculating subunit is used for calculating the processing time length of the selectable list data according to the first processing time length, the tracking times and the re-tracking time length by weighted summation.

And the confirmation subunit is used for determining whether service tracking failure exists when service processing tracking is carried out on the selectable list data, and canceling service processing tracking on the selectable list data when the number of times of service tracking failure reaches a preset number of times.

Wherein the third computing unit further comprises:

the scoring subunit is used for carrying out normalization processing on the processing time length through a preset stage model to obtain a basic score corresponding to the selectable list data;

And the second calculating subunit is used for acquiring the historical scores of the selectable list data and calculating the prediction scores corresponding to the selectable list data according to the historical scores and the basic scores.

The list data screening device of the application further comprises:

the training module is used for pre-establishing a basic training model and training the basic training model based on acquired training data, wherein the training data comprises positive sample data and negative sample data;

and the identification module is used for determining that the basic training model is trained to obtain a corresponding similarity model when the success rate of identification of the basic training model on the positive sample data and the negative sample data reaches a preset success rate.

The list data screening device provided by the application realizes accurate screening of user data in a large amount of information, improves the accuracy rate of information pushing and the information transmission efficiency, avoids transmission of a large amount of invalid information, and further avoids resource waste caused by invalid pushing.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only computer device 6 having components 61-63 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 61 includes at least one type of readable storage media including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 61 is generally used to store an operating system and various application software installed on the computer device 6, such as computer readable instructions of a list data screening method. Further, the memory 61 may be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the list data screening method.

The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The computer equipment provided by the application realizes accurate screening of user data in a large amount of information, improves the accuracy rate of information pushing and the information transmission efficiency, avoids the transmission of a large amount of invalid information, and further avoids the resource waste caused by invalid pushing.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the method for screening list data as described above.

The computer readable storage medium provided by the application realizes accurate screening of user data in a large amount of information, improves the accuracy rate of information pushing and the information transmission efficiency, avoids the transmission of a large amount of invalid information, and further avoids the resource waste caused by invalid pushing.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. The list data screening method is characterized by comprising the following steps of:

calculating a prediction score corresponding to each piece of selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold as final list data;

the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model specifically comprises the following steps:

calculating the sum of all the sub-similarity and the ratio of the sum in the total item number of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data;

the step of calculating the prediction score corresponding to each piece of selectable list data specifically includes:

acquiring the data category of the selectable list data;

calculating a prediction score corresponding to the selectable list data through a preset stage model based on the processing time length;

the step of obtaining the processing duration of the selectable list data specifically includes:

calculating the processing time length of the selectable list data by weighted summation according to the first processing time length, the tracking times and the re-tracking time length;

the step of calculating the prediction score corresponding to the selectable list data through a preset stage model based on the processing time length specifically includes:

2. The method according to claim 1, wherein before the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model, the method further comprises:

3. The method of claim 1, further comprising, prior to the step of collecting a first processing duration of the first service processing:

4. A listing data screening apparatus, comprising:

The confirmation module is used for calculating a prediction score corresponding to each piece of selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold value as final list data;

wherein the second acquisition module includes:

the second calculation unit is used for calculating the sum of all the sub-similarity and the ratio of the sum in the total number of items of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data;

wherein, the confirmation module includes:

the third calculation unit is used for calculating a prediction score corresponding to the selectable list data through a preset stage model based on the processing time length;

Wherein the tracking unit comprises:

the first calculating subunit is used for calculating the processing time length of the selectable list data by weighted summation according to the first processing time length, the tracking times and the re-tracking time length;

wherein the third computing unit further comprises:

5. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the method of screening of roster data of any one of claims 1 to 3.

6. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the method of screening of roster data according to any of claims 1 to 3.