CN112084408A

CN112084408A - List data screening method and device, computer equipment and storage medium

Info

Publication number: CN112084408A
Application number: CN202010936334.5A
Authority: CN
Inventors: 徐杰
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2020-12-15
Anticipated expiration: 2040-09-08
Also published as: CN112084408B

Abstract

The embodiment of the application belongs to the technical field of big data, and relates to a business form data screening method, which comprises the following steps: acquiring a history information record, and determining corresponding initial list data according to the history information record; acquiring the total similarity of the initial list data and the sample data based on a preset similarity model; screening the initial list data according to the total similarity to obtain invalid data and candidate data; when the occupation ratio value of the invalid data is smaller than the preset occupation ratio, re-screening the candidate data to obtain final optional list data; and calculating a prediction score corresponding to the selectable list data based on the stage model, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold value as final list data. The application also provides a business form data screening device, computer equipment and a storage medium. In addition, the present application also relates to a blockchain technique, and the final list data can be stored in a blockchain. The method and the device realize accurate screening of the user data.

Description

List data screening method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for screening business form data, a computer device, and a storage medium.

Background

With the rapid development of information technology, whether online or offline, a large amount of data is transmitted every day, and people receive a large amount of various information transmitted from the outside every day. However, how to push information to users in a targeted manner is a hotspot of current research, so that a group receiving the information is a group really needing the information.

The traditional user screening usually adopts offline manual visit to collect data, then manually screens users, and then carries out information recommendation aiming at the users, and the mode is usually low in efficiency and cannot accurately analyze and process a large amount of data. Currently, for online user groups, although data analysis can be performed through big data, a user group that may need some kind of information is obtained. However, when the filtering is performed in this way, the filtering accuracy is still poor due to the large data volume, users who really need the current recommended content cannot be filtered, and the viscosity of the filtered users is low. Therefore, how to realize high-precision user data screening in a large amount of information is one of the technical problems to be solved.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for screening business form data, computer equipment and a storage medium, so as to solve the technical problem that high-precision data screening cannot be realized in a large amount of information at present.

In order to solve the above technical problem, an embodiment of the present application provides a method for screening business form data, which adopts the following technical scheme:

acquiring a history information record, and determining corresponding initial list data according to the history information record;

acquiring the total similarity of the initial list data and sample data based on a preset similarity model;

screening the initial list data according to the total similarity to obtain invalid data and candidate data;

calculating a first ratio of the invalid data in the initial list data, and when the first ratio of the invalid data is smaller than a preset ratio, re-screening the candidate data until a second ratio of the sum of the accumulated screened invalid data in the initial list data is greater than or equal to the preset ratio to obtain final selectable list data;

and calculating a prediction score corresponding to each selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score larger than or equal to a preset score threshold as final list data.

Further, before the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model, the method further includes:

establishing a basic training model in advance, and training the basic training model based on collected training data, wherein the training data comprises positive sample data and negative sample data;

and when the success rate of the basic training model for identifying the positive sample data and the negative sample data reaches a preset success rate, determining that the basic training model is trained to be finished, and obtaining a corresponding similarity model.

Further, the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model specifically includes:

calculating each item of data in the initial list data and the data of the corresponding item in the sample data through a preset similarity model to obtain the sub-similarity corresponding to each item of data in the initial list data;

and calculating the sum of all the sub-similarities and the ratio of the sum to the total number of items of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data.

Further, the step of calculating the prediction score corresponding to each selectable list data specifically includes:

acquiring the data category of the selectable list data;

performing service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data;

and calculating the prediction score corresponding to the selectable list data through a preset stage model based on the processing duration.

Further, the step of performing service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data specifically includes:

when the selectable list data is subjected to first service processing, acquiring the first processing time length of the first service processing;

when the first processing time length is greater than or equal to a preset processing threshold value, re-tracking the selectable list data to obtain the tracking times of re-tracking and the re-tracking time length of each re-tracking;

and according to the first processing time length, the tracking times and the retracing time length, carrying out weighted summation to calculate the processing time length of the selectable list data.

Further, before the step of collecting the first processing duration of the first service processing, the method further includes:

and when the service processing tracking is carried out on the selectable list data, determining whether service tracking failure exists or not, and canceling the service processing tracking of the selectable list data when the frequency of the service tracking failure reaches a preset frequency.

Further, the step of calculating the prediction score corresponding to the selectable list data through a preset stage model based on the processing duration specifically includes:

carrying out normalization processing on the processing duration through a preset stage model to obtain a basic score corresponding to the selectable list data;

and acquiring the historical score of the selectable list data, and calculating the prediction score corresponding to the selectable list data according to the historical score and the basic score.

In order to solve the above technical problem, an embodiment of the present application further provides a business form data screening apparatus, which adopts the following technical scheme:

the first acquisition module is used for acquiring a historical information record and determining corresponding initial list data according to the historical information record;

the second acquisition module is used for acquiring the total similarity between the initial list data and the sample data based on a preset similarity model;

the screening module is used for screening the initial list data according to the total similarity to obtain invalid data and candidate data;

the comparison module is used for calculating a first ratio of the invalid data in the initial list data, and when the first ratio of the invalid data is smaller than a preset ratio, re-screening the candidate data until a second ratio of the sum of the cumulatively screened invalid data in the initial list data is larger than or equal to the preset ratio to obtain final selectable list data;

and the confirmation module is used for calculating the prediction score corresponding to each selectable list data based on a preset stage model, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold value as final list data.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores computer readable instructions, and the processor implements the steps of the above list data screening method when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the above list data screening method are implemented.

According to the method, historical information records are obtained, and corresponding initial list data are determined according to the historical information records, wherein the initial list data are pre-collected user data; acquiring the total similarity of the initial list data and sample data based on a preset similarity model; screening the initial list data according to the total similarity to obtain invalid data and candidate data, wherein the invalid data is the initial list data of which the total similarity is smaller than a preset screening threshold, calculating a first proportion value of the invalid data in the initial list data, when the first proportion value of the invalid data is smaller than the preset proportion value, the screened invalid data is less, at the moment, re-screening the candidate data, and performing screening by adjusting the preset screening threshold until a second proportion value of the accumulated screened invalid data in the initial list data is larger than or equal to the preset proportion value, so as to obtain final selectable list data; based on a preset stage model, calculating a prediction score corresponding to each selectable list data, selecting the selectable list data with the prediction score larger than or equal to a preset score threshold as final list data, realizing accurate screening of user data in a large amount of information, improving the accuracy and information transmission efficiency of information push, avoiding the transmission of a large amount of invalid information, and further avoiding the resource waste caused by invalid push.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a roster data screening method according to the present application;

FIG. 3 is a schematic structural diagram of an embodiment of a device for screening list data according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Reference numerals: the list data screening apparatus 400 includes: a first obtaining module 401, a second obtaining module 402, a screening module 403, a comparing module 404 and a confirming module 405.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the method for screening list data provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the device for screening list data is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow diagram of one embodiment of a roster data screening method according to the present application is shown. The list data screening method comprises the following steps:

step S201, obtaining a history information record, and determining corresponding initial list data according to the history information record;

in this embodiment, the history information record is a stored history consumption record, and the history information record may be obtained by means of a point burying method. When the history information record is obtained, the history information record is analyzed, that is, initial list data can be determined, wherein the initial list data is initial sample data and includes basic information such as gender, age, consumption condition and the like of the user.

Step S202, acquiring the total similarity of the initial list data and sample data based on a preset similarity model;

in this embodiment, the similarity model is a model obtained by training in advance and used for confirming the similarity. Taking the lightgbm model as an example, the lightgbm model adopts a decision tree algorithm based on learning, and has faster running speed and smaller memory. And data output can be performed on the similarity between the acquired initial list data and the sample data through the lightgbm model. The sample data is positive sample data, for example, in a recommendation system, the sample data is user data that has been accepted by a product recommended in the recommendation system. Specifically, based on the similarity model, the sub-similarity between each item of data in the initial list data and the corresponding item of data in the sample data is obtained, and the total similarity between the initial list data and the sample data is calculated according to the sub-similarity.

Step S203, screening the initial list data according to the total similarity to obtain invalid data and candidate data;

in this embodiment, when the total similarity between the initial list data and the sample data is obtained according to the similarity model, invalid data in the initial list data is screened according to the total similarity. Specifically, the initial list data with the total similarity to the sample data being greater than or equal to the preset screening threshold is the candidate data, and the initial list data with the total similarity to the sample data being smaller than the preset screening threshold is the invalid data.

Step S204, calculating a first ratio of the invalid data in the initial list data, and when the first ratio of the invalid data is smaller than a preset ratio, re-screening the candidate data until a second ratio of the sum of the cumulatively screened invalid data in the initial list data is larger than or equal to the preset ratio to obtain final selectable list data;

when invalid data are obtained through first screening, calculating a first ratio of the invalid data in the initial list data, if the first ratio is larger than or equal to a preset ratio, not screening candidate data obtained through the first screening again, and at the moment, the candidate data are final optional list data; if the first occupation ratio value is smaller than a preset occupation ratio, the preset screening threshold value is adjusted, and if the preset screening threshold value is increased, the larger the preset screening threshold value is, the more invalid data are screened according to the preset screening threshold value. And re-screening the screened candidate data according to the adjusted preset screening threshold, wherein the screening is performed on the basis of the candidate data obtained at the previous time each time, the sum of all the cumulatively screened invalid data is calculated at the end of each screening until the second ratio of the sum of the cumulatively screened invalid data in the initial list data is greater than or equal to the preset ratio, the invalid data are removed from the candidate data screened at the last time, and the obtained remaining candidate data are the final selectable list data.

In addition, when the invalid data in the initial list data are screened according to the total similarity, a first proportion value of the invalid data in the initial list data can be obtained after the initial list data are screened for the first time according to a preset screening threshold, and if the first proportion value is smaller than the preset proportion value, the preset screening threshold is adjusted; and re-screening the initial list data according to the adjusted preset screening threshold, wherein screening is performed on the basis of the initial list data each time until the first proportion value of the screened invalid data in the initial list data is greater than or equal to the preset proportion, and at this time, the candidate data screened from the initial list data for the last time is determined as final selectable list data.

Step S205, based on a preset phase model, calculating a prediction score corresponding to each selectable list data, and selecting the selectable list data with the prediction score being greater than or equal to a preset score threshold as final list data.

In this embodiment, when the selectable list data is obtained, the prediction score corresponding to each selectable list is calculated. The prediction score of the selectable list data can be calculated according to a preset phase model, and specifically, the phase model is a classification model which can be used for estimating the possibility of a certain object, such as a logistic regression model, a Support Vector Machine (SVM) model, and the like. The optional list data can be subjected to predictive scoring through the phase model. The prediction score can be calculated according to normalized data of the selectable list data on different dimensions, for example, the processing duration of service of the selectable list data, consumption records and other different dimensions, and the normalized data on different dimensions can be calculated through set hive-sql statement statistics and normalization calculation. And when the prediction score corresponding to each selectable list data is calculated according to the normalized data on different dimensions, the selectable list data of which the prediction score is greater than or equal to a preset score threshold value is taken as final list data.

Further, when the final list data is obtained, a preset return visit period can be obtained, and when the preset return visit period is reached, data backflow processing is performed on the final list data. Specifically, when the preset return visit period is reached, the basic information of the final list data is collected again, and the final list data is subjected to calculation of the secondary prediction score according to the basic information and the prediction score. And when the prediction score is smaller than a preset score threshold, deleting the final list data from all stored final list data so as to realize the regular updating of the final list data.

In the embodiment of the application, by taking an insurance business scenario as an example, a historical information record is obtained, wherein in the current insurance business scenario, the historical information record includes information such as user gender, age, residence and consumption records of the user. And when the historical information record is obtained, counting the users in the historical information record to obtain initial list data. And determining the total similarity of the initial list data and sample data based on a preset similarity model, wherein the sample data is user name list data which is acquired in advance and has purchased insurance, and the user name list data comprises information of the gender, age, residence and the like of the user. And calculating to obtain the total similarity of the initial list data and the sample data through a preset similarity model. And determining the initial list data with the total similarity to the sample data being more than or equal to a preset screening threshold value as candidate data, and determining the initial list data with the total similarity to the sample data being less than the preset screening threshold value as invalid data. And when the first ratio of the invalid data in the initial list data is smaller than the preset ratio, re-screening the candidate data. And when the re-screening is carried out, adjusting the preset screening threshold, and if the preset screening threshold is increased, carrying out the re-screening of invalid data on the candidate data according to the adjusted preset screening threshold. And if the second ratio of the invalid data obtained by re-screening in the candidate data is still smaller than the preset ratio, adjusting the preset screening threshold again, screening based on the candidate data obtained at the previous time until the second ratio of the sum of all the accumulated screened invalid data in the initial list data is larger than or equal to the preset ratio, and removing the screened invalid data from the candidate data screened at the last time to obtain the candidate data which is the selectable list data. And when the selectable list data is obtained, calculating to obtain the prediction score of the selectable list data according to a preset stage model. In the current insurance business scene, the prediction score of each user in the current selectable list data can be obtained by calculating from three dimensions of service processing duration, insurance purchase times and purchase accumulated cost of each user by an agent based on a phase model. And taking the selectable list data with the prediction score being greater than or equal to the preset score threshold value as final list data.

It is emphasized that, in order to further ensure the privacy and security of the final list data, the final list data may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

According to the method and the device, the initial list data are screened in different dimensions, so that the user data are accurately screened in a large amount of information, the accuracy rate of information pushing and the information transmission efficiency are improved, the transmission of a large amount of invalid information is avoided, and the resource waste caused by invalid pushing is further avoided.

In some embodiments of the present application, before the obtaining of the total similarity between the initial list data and the sample data based on the preset similarity model, the method includes:

In this embodiment, the similarity model may be obtained by training the basic training model. Specifically, a basic training model is established in advance, and the basic training model is a basic model which is not trained by any training data. And collecting a plurality of groups of training data in advance, wherein the training data comprises positive sample data and negative sample data, taking the recommendation system as an example, the positive sample data is the sample data which is successfully recommended before, and the negative sample data is the sample data which is failed in recommendation. Inputting the positive sample data and the negative sample data into a basic training model for training, and determining that the basic training model is trained to obtain a corresponding similarity model when the success rate of the basic training model for recognizing the positive sample data and the negative sample data reaches a preset success rate.

In this embodiment, the similarity model is obtained by training the basic training model, so that the similarity between the initial list data and the sample data can be accurately obtained through the similarity model.

In some embodiments of the application, the obtaining of the total similarity between the initial list data and the sample data based on the preset similarity model includes:

In this embodiment, each item of data in the initial list data and the data of the corresponding item in the sample data are calculated through the similarity model to obtain an output result, the dissimilarity and the similarity of the output result of the similarity model are respectively represented by 0 and 1, when the output result is similar, the data of the corresponding item in the initial list data and the data of the corresponding item in the sample data are represented, and the sub-similarity is 1; when the result output is not similar, that is, the data in the initial list data is not similar to the data of the corresponding item in the sample data, and the sub-similarity is 0. When the sub-similarity corresponding to each item of data in the initial list data is obtained through calculation, the sum of all the sub-similarities is calculated, the occupation proportion of the sum of the sub-similarities obtained through calculation to the total item number in the initial list data is calculated, and the occupation proportion is the total similarity between the initial list data and the sample data. For example, the initial list data includes 5 items of data, where the sub-similarity of 4 items of data in 5 items of data is 1, the sum of all corresponding sub-similarities is 4, and the calculated occupation ratio in the total number of items is 80%, that is, the total similarity is.

According to the embodiment, the total similarity is calculated according to the sub-similarities corresponding to the data in the initial list data, so that the calculated total similarity is more accurate, and the screening precision of the initial list data is further improved.

In some embodiments of the present application, the calculating the prediction score corresponding to each of the selectable list data includes:

acquiring the data category of the selectable list data;

In this embodiment, the data categories may be divided into different categories according to the areas to which the users belong, and different service processing traces are allocated according to the different data categories, for example, for the selectable list data belonging to the a area, the corresponding service processing trace of the a area is allocated according to the mapping relationship. The service processing tracking comprises telephone tracking, network tracking and the like. By performing service processing tracking on the selectable list data and monitoring the processing time length of the selectable list data, the processing time lengths corresponding to different types of service processing tracking may be different. And when the processing time length corresponding to the selectable list data is obtained, performing score calculation on the selectable list data according to a preset stage model to obtain a corresponding prediction score, wherein the prediction score obtained by calculating the selectable list data with longer processing time length through the stage model is higher.

According to the embodiment, different service processing traces are carried out on the selectable list data under different data types, so that the processing time lengths of different selectable list data can be accurately traced through the service processing traces, the user data is judged according to the processing time lengths, and the accuracy rate of user data identification is further improved.

In some embodiments of the application, the tracking, according to the data type, of the service processing on the selectable list data, and the processing time for obtaining the selectable list data includes:

In this embodiment, when performing first service processing on the selectable list data, collecting a first processing time length of the first service processing, and if the first processing time length is greater than or equal to a preset processing threshold, re-tracking the selectable list data; and if the first processing time length is less than a preset processing threshold value, stopping the service processing tracking of the selectable list data. When the selectable list data is retraced, acquiring the retracing times and the retracing duration of each retracing, and stopping the retracing of the current selectable list data when the retracing times reaches the preset times. And performing weighted summation calculation on the acquired first processing time length, tracking times and retracing time length to obtain the processing time length of the current optional list.

According to the embodiment, the processing time length is accurately calculated according to the tracking time length and the tracking times, and the screening precision of the user data is further improved.

In some embodiments of the present application, before the collecting the first processing duration of the first service processing, the method further includes:

In this embodiment, before performing service processing tracking on the selectable list data and performing first service processing, it is determined whether service tracking failure exists on the selectable list data, and when the number of service tracking failure on the selectable list data reaches a preset number, the service tracking processing on the selectable list data is cancelled.

According to the embodiment, the optional list data which fails to be tracked for many times is subjected to the cancel tracking processing, so that the waste of service resources is avoided, and the utilization rate of the service resources is improved.

In some embodiments of the application, the calculating, based on the processing duration and through a preset stage model, a prediction score corresponding to the selectable list data includes:

In this embodiment, when the prediction score of the selectable list data is calculated, a preset weight corresponding to the historical score and the basic score may be obtained according to a historical score corresponding to the current selectable list data and a basic score calculated according to the stage model for the processing time, and the prediction score corresponding to the selectable list data is obtained by performing weighted summation on the historical score and the basic score according to the preset weight. Specifically, when the processing time length of the selectable list data is obtained, normalization processing is performed on the processing time length through the preset stage model, and a value corresponding to the processing time length is obtained, and the value is a basic score of the selectable list data. Of course, if the selectable list data includes record data of different dimensions such as consumption records and the like besides the processing duration, normalization processing is performed on different dimensions respectively to obtain corresponding normalized data, and the normalized data on all dimensions are summed to obtain the basic score of the current selectable list data. And if the selectable list data also has corresponding historical scores, carrying out weighted summation on the historical scores and the basic scores according to the preset weight to obtain the prediction scores of the selectable list data. And if the historical score does not exist in the optional list data, the basic score of the optional list data is the prediction score.

According to the method and the device, accurate scoring calculation of the user data is achieved, the user data can be accurately screened through the scoring, and screening efficiency and accuracy of the user data are further improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a business form data screening apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the apparatus 400 for screening list data according to this embodiment includes: a first obtaining module 401, a second obtaining module 402, a screening module 403, a comparing module 404 and a confirming module 405. Wherein:

the first obtaining module 401 is configured to obtain a history information record, and determine corresponding initial list data according to the history information record.

A second obtaining module 402, configured to obtain a total similarity between the initial list data and the sample data based on a preset similarity model.

Wherein the second obtaining module 402 comprises:

the first calculation unit is used for calculating each item of data in the initial list data and the data of the corresponding item in the sample data through a preset similarity model to obtain the sub-similarity corresponding to each item of data in the initial list data;

and the second calculating unit is used for calculating the sum of all the sub-similarities and the ratio of the sum to the total number of items of the initial list data, wherein the ratio is the total similarity of the initial list data and the sample data.

And a screening module 403, configured to screen the initial list data according to the total similarity, so as to obtain invalid data and candidate data.

A comparison module 404, configured to calculate a first ratio of the invalid data in the initial list data, and when the first ratio of the invalid data is smaller than a preset ratio, re-screen the candidate data until a second ratio of a sum of the cumulatively screened invalid data in the initial list data is greater than or equal to the preset ratio, so as to obtain final selectable list data.

The confirmation module 405 is configured to calculate a prediction score corresponding to each selectable list data based on a preset phase model, and select the selectable list data with the prediction score being greater than or equal to a preset score threshold as final list data.

Wherein the confirmation module 405 comprises:

the acquisition unit is used for acquiring the data type of the selectable list data;

the tracking unit is used for carrying out service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data;

and the third calculating unit is used for calculating the prediction score corresponding to the selectable list data through a preset stage model based on the processing time length.

Wherein the tracking unit comprises:

the acquisition subunit is used for acquiring the first-time processing duration of the first-time service processing when the first-time service processing is carried out on the selectable list data;

the tracking subunit is used for retracing the selectable list data when the first processing time length is greater than or equal to a preset processing threshold value, and acquiring the retracing times of retracing and the retracing time length of each retracing;

and the first calculating subunit is configured to calculate the processing duration of the selectable list data by weighted summation according to the first processing duration, the tracking times, and the retracing duration.

And the confirming subunit is used for determining whether service tracking failure exists or not when the service processing tracking is carried out on the selectable list data, and canceling the service processing tracking of the selectable list data when the frequency of the service tracking failure reaches a preset frequency.

Wherein the third computing unit further comprises:

the scoring subunit is used for carrying out normalization processing on the processing duration through a preset stage model to obtain a basic score corresponding to the selectable list data;

and the second calculating subunit is used for acquiring the historical score of the selectable list data and calculating the prediction score corresponding to the selectable list data according to the historical score and the basic score.

This application list data screening device still includes:

the training module is used for establishing a basic training model in advance and training the basic training model based on collected training data, wherein the training data comprises positive sample data and negative sample data;

and the identification module is used for determining that the basic training model is trained to be finished when the identification success rate of the basic training model on the positive sample data and the negative sample data reaches a preset success rate, so as to obtain a corresponding similarity model.

The application provides a list data sieving mechanism has realized the accurate screening to user data in a large amount of information, has improved the rate of accuracy and the information transmission efficiency of information propelling movement, has avoided the conveying of a large amount of invalid information, has also further avoided the wasting of resources because invalid propelling movement leads to.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only a computer device 6 having components 61-63 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 61 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal storage unit of the computer device 6 and an external storage device thereof. In this embodiment, the memory 61 is generally used for storing an operating system installed in the computer device 6 and various types of application software, such as computer readable instructions of a list data screening method. Further, the memory 61 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the list data screening method.

The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The application provides computer equipment has realized the accurate screening to user data in a large amount of information, has improved the rate of accuracy and the information transmission efficiency of information propelling movement, has avoided the conveying of a large amount of invalid information, has also further avoided the wasting of resources that leads to because invalid propelling movement.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the list data screening method as described above.

The computer-readable storage medium provided by the application realizes accurate screening of user data in a large amount of information, improves the accuracy rate of information push and the information transmission efficiency, avoids transmission of a large amount of invalid information, and further avoids resource waste caused by invalid push.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A business form data screening method is characterized by comprising the following steps:

2. The method for screening list data according to claim 1, wherein before the step of obtaining the total similarity between the initial list data and the sample data based on the preset similarity model, the method further comprises:

3. The method for screening list data according to claim 1, wherein the step of obtaining the total similarity between the initial list data and the sample data based on a preset similarity model specifically comprises:

4. The method for screening shortlist data according to claim 1, wherein the step of calculating the prediction score corresponding to each selectable shortlist data specifically comprises:

acquiring the data category of the selectable list data;

5. The method for screening list data according to claim 4, wherein the step of performing service processing tracking on the selectable list data according to the data category to obtain the processing duration of the selectable list data specifically includes:

6. The method of claim 5, wherein the step of collecting the first processing duration of the first service processing is preceded by the step of:

7. The method for screening list data according to claim 4, wherein the step of calculating the prediction score corresponding to the selectable list data through a preset stage model based on the processing duration specifically includes:

8. An apparatus for screening business form data, comprising:

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the roster data screening method of any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the roster data screening method according to any one of claims 1 to 7.