CN115329280A

CN115329280A - Data screening method, device, equipment and medium

Info

Publication number: CN115329280A
Application number: CN202210990092.7A
Authority: CN
Inventors: 宋丹阳
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-11-11
Anticipated expiration: 2042-08-18
Also published as: CN115329280B

Abstract

The application relates to the field of data processing, in particular to a data screening method, a data screening device, data screening equipment and a data screening medium, which are used for improving the accuracy of data screening. In the embodiment of the application, sample information is firstly obtained; then determining the upper limit value and the lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening intervals; aiming at any one initial screening interval, determining a designated information value of the initial screening interval according to an upper limit value and a lower limit value of the initial screening interval and the weight of the initial screening interval; determining a target screening domain corresponding to the initial screening interval according to the specified information value; and screening the sample information according to the target screening domain to obtain target information. In the application, the obtained initial screening interval better meets the user requirements, and the accuracy of subsequent data screening is further improved.

Description

Data screening method, device, equipment and medium

Technical Field

The present application relates to the field of data processing, and in particular, to a data screening method, apparatus, device, and medium.

Background

In the related technology at the present stage, the research on data screening is basically based on a fuzzy mathematical theory, and subjective presetting and fine adjustment are carried out on membership functions according to experience. However, in the case of a small sample size, it is difficult to obtain a probability distribution type that truly matches a variable. And the sample statistics are not fully utilized. For data screening, the same weight is mostly set for all screening ranges, so that the screening result of the data is not general, and the obtained screening result of the data is inaccurate.

Disclosure of Invention

The embodiment of the application provides a data screening method, a data screening device, data screening equipment and a data screening medium, which are used for improving the accuracy of data screening.

In a first aspect, the present application provides a data screening method, including:

obtaining sample information, wherein the sample information is an attribute value of an object associated with a user, and the attribute value obeys normal distribution;

determining an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening intervals; the number of the initial screening intervals is set according to the sample information; wherein the weight is set according to the sample information;

aiming at any one initial screening interval, determining a designated information value of the any one initial screening interval according to the upper limit value and the lower limit value of the any one initial screening interval and the weight of the any one initial screening interval; the specified information value represents the sample information with the maximum attribute value in any initial screening interval;

determining a target screening domain corresponding to any one of the initial screening intervals according to the designated information value;

and screening the sample information according to the target screening domain to obtain target information.

In the application, a user sets the number of the initial screening intervals and the weight of each initial screening interval according to sample information, a target screening domain can be finally determined based on the weight set by the user, and the sample information is screened based on the target screening domain, so that the requirements of the user are met, and the accuracy of data screening is improved.

In some possible embodiments, the determining an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of initial screening intervals and the weight of the initial screening interval includes:

based on the number of the initial screening intervals and the weight of the initial screening intervals, carrying out standardization processing on the initial screening intervals to obtain an upper limit value of the standardization processing and a lower limit value of the standardization processing;

and obtaining the upper limit value and the lower limit value of the initial screening interval according to the upper limit value of the standardization processing, the lower limit value of the standardization processing and the probability distribution function of the initial screening interval.

According to the method and the device, the upper limit value and the lower limit value of the initial screening interval are determined according to the number of the initial screening intervals and the weight of the initial screening interval, so that the obtained initial screening interval can better meet the user requirements, and the accuracy of subsequent data screening is improved.

In some possible embodiments, normalizing the initial filtering interval based on the number of the initial filtering intervals and the weight of the initial filtering interval to obtain an upper limit value of normalization and a lower limit value of normalization includes:

adopting a standardization processing formula to carry out standardization processing on the initial screening interval, wherein the standardization processing formula is as follows:

wherein A is _S For the s-th initial screening interval,

is the upper limit value of the normalization process,

is a lower limit value of the normalization process; omega _s The weight of the s-th initial screening interval; f (x) is the probability density function of the s-th initial screening interval; d standard deviation in the s-th initial screening interval, E is the mean of the s-th initial screening interval.

In some possible embodiments, the obtaining the upper limit value and the lower limit value of the initial screening interval according to the upper limit of integration, the lower limit of integration, and the probability distribution function of the initial screening interval includes:

respectively substituting the integral upper limit and the integral lower limit of the integral processing into the probability distribution function to obtain an upper limit value and a lower limit value of the initial screening interval; wherein the probability distribution function is:

the obtained upper limit value of the initial screening interval is as follows:

the obtained lower limit value of the initial screening interval is as follows:

wherein: a. The _S At the s-th initial screening interval, max (A) _S ) Is the upper limit value of the s-th initial screening interval, min (A) _S ) The lower limit value of the s-th initial screening interval is obtained; omega _i The weight of the ith initial screening interval; f (x) is the probability density function of the s-th initial screening interval; phi (x) is a probability distribution function of the s-th initial screening interval; d, standard deviation in the s-th initial screening interval, and E is the mean value of the s-th initial screening interval.

In the application, the upper limit value and the lower limit value of the initial screening interval are determined through a standardized processing formula and a probability distribution function, so that the obtained initial screening interval is more accurate.

In some possible embodiments, the determining, according to the upper limit value, the lower limit value, and the weight of the any one initial filtering interval, the assigned information value of the any one initial filtering interval includes:

determining a probability mean line of any initial screening interval according to the upper limit value and the lower limit value of any initial screening interval and the weight of any initial screening interval;

determining the number of intersection points of the probability average line and the probability density function of any one initial screening interval;

and determining the designated information value of any one initial screening interval according to the number of the intersection points.

In the method and the device, the specified information value is determined according to the number of intersection points of the probability average line and the probability density function of the initial screening interval, so that the determined specified information value is more accurate.

In some possible embodiments, the determining, according to the number of intersections, a specific information value of the any one initial filtering interval includes:

if the number of the intersection points is one point, determining a formula according to first designated information to determine the designated information value;

if the number of the intersection points is two, the specified information value is the average value of any one initial screening interval;

wherein the first specifying information determination formula is:

wherein: max ^T (A _S ) For the specified information value, A _S Is the s-th initial screening interval, avg is the probability mean line of the s-th initial screening interval, f ^-1 (avg) is an inverse function of the probability mean line of the s-th initial screening interval, D is the standard deviation in the s-th initial screening interval, and E is the mean of the s-th initial screening interval.

In the application, the designated information value is determined through the first designated information determination formula and the mean value of the initial screening interval, so that the accuracy of the determined designated information value is ensured.

In some possible embodiments, the determining, according to the specified information value, a target screening domain corresponding to the any one initial screening interval includes:

determining a range to which the specified information value belongs based on sample information with a minimum attribute value in the sample information, sample information with a maximum attribute value in the sample information, and the specified information value;

determining a target screening domain formula based on the belonged range;

and substituting the specified information value into the target screening domain formula to obtain a target screening domain corresponding to any one initial screening interval.

In the application, the target screening domain corresponding to the initial screening interval is determined based on the range of the specified information value, so that the accuracy of the target screening domain is ensured.

In a second aspect, the present application provides a data screening apparatus, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring sample information, the sample information is an attribute value of an object associated with a user, and the attribute value obeys normal distribution;

an initial screening interval determining module, configured to determine an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of initial screening intervals and a weight of the initial screening interval; the number of the initial screening intervals is set according to the sample information; wherein the weight is set according to the sample information;

the designated information value determining module is used for determining the designated information value of any initial screening interval according to the upper limit value and the lower limit value of any initial screening interval and the weight of any initial screening interval aiming at any initial screening interval; the specified information value represents the sample information with the maximum attribute value in any initial screening interval;

a target screening domain determining module, configured to determine, according to the specified information value, a target screening domain corresponding to the any one initial screening interval;

and the target information determining module is used for screening the sample information according to the target screening domain to obtain target information.

In some possible embodiments, the initial screening interval determining module, when performing the determining of the upper limit value and the lower limit value of each initial screening interval of the sample information according to the number of initial screening intervals and the weight of the initial screening interval, is configured to:

In some possible embodiments, the initial filtering interval determining module, when performing normalization processing on the initial filtering interval based on the number of the initial filtering intervals and the weight of the initial filtering interval, and obtaining an upper limit value of the normalization processing and a lower limit value of the normalization processing, is configured to:

wherein A is _S For the s-th initial screening interval,

is the upper limit value of the normalization process,

is a lower limit value of the normalization process; omega _s The weight of the s-th initial screening interval; f (x) is sProbability density function of each initial screening interval; d standard deviation in the s-th initial screening interval, E is the mean of the s-th initial screening interval.

In some possible embodiments, the specifying information value determining module, when determining the specifying information value of the any one initial filtering interval according to the upper limit value, the lower limit value, and the weight of the any one initial filtering interval, is configured to:

In some possible embodiments, when the specifying information value determining module determines the specifying information value of the any one initial filtering interval according to the number of the intersection points, the specifying information value determining module is configured to:

if the number of the intersection points is one point, determining a formula according to first specified information to determine the value of the specified information;

wherein the first specifying information determination formula is:

wherein: max ^T (A _S ) For the specified information value, A _S Is the s-th initial screening interval, avg is the probability mean line of the s-th initial screening interval, f ^-1 (avg) is the inverse function of the probability mean line of the s-th initial screening interval, DStandard deviation, E is the mean of the s-th initial screening interval.

In some possible embodiments, when the target screening domain determining module determines the target screening domain corresponding to the any one initial screening interval according to the designated information value, the target screening domain determining module is configured to:

determining a target screening domain formula based on the belonged range;

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the steps comprised in the method of any one of the first aspect according to the obtained program instructions.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of any of the first aspects.

In a fifth aspect, the present application provides a computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method of any of the first aspects.

Drawings

Fig. 1 is a schematic view of an application scenario of a data screening method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating a sorting flow of a data screening method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of determining an initial screening interval in a data screening method according to an embodiment of the present application;

fig. 4 is a schematic flowchart illustrating a process of determining an assigned information value according to a data screening method provided in an embodiment of the present application;

fig. 5 is a schematic flowchart of determining a target screening domain in a data screening method according to an embodiment of the present application;

fig. 6 is a schematic diagram of an apparatus of a data screening method according to an embodiment of the present application;

fig. 7 is a schematic view of an electronic device of a data screening method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In the technical scheme, the data acquisition, transmission, use and the like meet the requirements of relevant national laws and regulations.

Before describing a data screening method provided in the embodiments of the present application, for ease of understanding, the following terms used in the embodiments of the present application are first explained:

1. fuzzy set and membership functions

By a fuzzy subset a on the domain of discourse U is meant:

all have mu _A (x)∈[0,1]Corresponds to it, wherein _A (x) A membership function called fuzzy subset a, or membership degree of x to fuzzy subset a.

I.e. by mapping:

a fuzzy subset a of the universe of discourse U is determined.

The membership function is a quantitative description of the fuzzy concept, correctly determines the membership function and is the basis for solving the practical problem by applying the fuzzy set theory. Typical fuzzy distributions are ladder, exponential, normal, linear, power function, sinusoidal, and the like.

2. Alpha cut set

Membership function mu corresponding to all elements x in universe U _A (x) One set in which none of the values of (a) is less than a is called an alpha truncated set of the fuzzy set a.

Where α is the confidence level (threshold). A fuzzy set can be converted into a normal set by using the alpha-cut set.

3. Normal distribution

If the random variable X obeys the expectation of μ, the variance is σ ² Normal distribution of (d) is expressed as X to N (mu, sigma) ² ) Its probability density function is then expressed as:

after introducing the terminology, the following explanation of the background of the embodiments of the present application is made for the sake of understanding:

the inventor researches and discovers that the research on data screening in the related technology at the present stage is based on fuzzy mathematical theory, and subjectively presets and fine tunes the membership functions according to experience. On one hand, the method has too large subjective randomness and poor accuracy. Especially in the case of a small sample size, it is difficult to obtain a probability distribution type that truly conforms to the variable. On the other hand, the sample statistics are not fully utilized. Aiming at the selection problem of the most fundamental membership function in the data screening and related technologies, a membership function automatic matching method based on a normal distribution probability density function is researched. The method makes full use of the sample information, so that a more reasonable membership function is obtained. However, the method defaults that the probabilities of different fuzzy intervals are equal, which is equivalent to researching a very special situation, and the method has no generality on the research of the membership function maximum value point. Therefore, the screening result of the data is not general, and the obtained screening result of the data is inaccurate.

In view of the above, the present application provides a data screening method, an apparatus, an electronic device, and a storage medium, which are used to solve the above problems. The inventive concept of the present application can be summarized as follows: obtaining sample information, wherein the sample information is an attribute value of an object associated with a user, and the attribute value obeys normal distribution; determining an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening intervals; the number of the initial screening intervals is set according to the sample information; wherein the weight is set according to the sample information; aiming at any one initial screening interval, determining the designated information value of any one initial screening interval according to the upper limit value and the lower limit value of any one initial screening interval and the weight of any one initial screening interval; specifying information values to represent sample information with the maximum attribute value in any initial screening interval; determining a target screening domain corresponding to any one initial screening interval according to the specified information value; and screening the sample information according to the target screening domain to obtain target information.

For convenience of understanding, a data screening method provided in the embodiments of the present application is described in detail below with reference to the accompanying drawings:

fig. 1 is a diagram of an application scenario of a data screening method in the embodiment of the present application. The drawing comprises the following steps: server 10, memory 20, terminal device 30; wherein:

the server 10 first obtains sample information from the memory 20, where the sample information is an attribute value of an object associated with a user, and the attribute value follows normal distribution; the server 10 determines an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening intervals; the number of the initial screening intervals is set by the user in the terminal device 30 according to the sample information; wherein, the weight is set according to the sample information; aiming at any one initial screening interval, determining the designated information value of any one initial screening interval according to the upper limit value and the lower limit value of any one initial screening interval and the weight of any one initial screening interval; specifying information values to represent sample information with the maximum attribute value in any initial screening interval; determining a target screening domain corresponding to any one initial screening interval according to the specified information value; and screening the sample information according to the target screening domain to obtain target information.

The description in this application only details a single server, memory, terminal device, but it should be understood by those skilled in the art that the server, memory, and terminal device shown are intended to represent the operations of the server, memory, and terminal device involved in the technical aspects of this application. And is not meant to imply limitations on the number, type, or location of servers, memories, end devices, etc. It should be noted that the underlying concepts of the example embodiments of the present application may not be altered if additional modules are added or removed from the illustrated environments.

It should be noted that the storage in the embodiment of the present application may be, for example, a cache system, or a hard disk storage, a memory storage, and the like. In addition, the data screening method provided by the application is not only suitable for the application scene shown in fig. 1, but also suitable for any device with data screening requirements.

As shown in fig. 2, an overall flow diagram of a data screening method provided in the embodiment of the present application is shown, where:

in step 201: obtaining sample information, wherein the sample information is an attribute value of an object associated with a user, and the attribute value obeys normal distribution;

in step 202: determining an upper limit value and a lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening intervals; the number of the initial screening intervals is set according to the sample information; wherein, the weight is set according to the sample information;

in step 203: aiming at any one initial screening interval, determining the designated information value of any one initial screening interval according to the upper limit value and the lower limit value of any one initial screening interval and the weight of any one initial screening interval; specifying information values to represent sample information with the maximum attribute value in any initial screening interval;

in step 204: determining a target screening domain corresponding to any one initial screening interval according to the specified information value;

in step 205: and screening the sample information according to the target screening domain to obtain target information.

For ease of understanding, the steps in FIG. 2 are described in detail below:

in the embodiment of the present application, the sample information is an attribute value of an object associated with a user, and the attribute value obeys a normal distribution. For example: the height of all students in a class, the income of all employees in a company, etc. The specific type of the sample information is not limited, and data conforming to normal distribution can be used as the sample information in the application.

In some embodiments, when determining the upper limit value and the lower limit value of each initial filtering interval of the sample information according to the number of the initial filtering intervals and the weight of the initial filtering interval, the method may be implemented as the steps shown in fig. 3:

in step 301: based on the number of the initial screening intervals and the weight of the initial screening intervals, carrying out standardization processing on the initial screening intervals to obtain an upper limit value of the standardization processing and a lower limit value of the standardization processing;

in the present application, the normalization process formula is shown in equation 1:

wherein: a. The _S For the s-th initial screening interval,

in order to standardize the upper limit value of the process,

a lower limit value for the normalization process; omega _s The weight of the s-th initial screening interval; f (x) is the probability density function of the s-th initial screening interval; d standard deviation in the s-th initial screening interval, E is the mean of the s-th initial screening interval.

For example: the sample information is the height of students in a class, and a user determines to divide the sample information formed by the height of the students into 5 initial screening intervals according to the requirements, wherein the initial screening intervals are respectively less than 140 cm, 140 cm-150 cm, 150 cm-160 cm, 160 cm-170 cm and more than 170 cm; and determining the weight corresponding to each initial screening interval as follows: omega ₁ 、ω ₂ 、ω ₃ 、ω ₄ 、ω ₅ . If it isIf the second initial screening interval is normalized, A in the above equation 1 _S Is A ₂ ，ω _s Is omega ₂ D standard deviation in 2 nd initial screening interval, E is the mean of 2 nd initial screening interval.

In step 302: and obtaining the upper limit value and the lower limit value of the initial screening interval according to the upper limit value of the standardization processing, the lower limit value of the standardization processing and the probability distribution function of the initial screening interval.

In the present application, the probability distribution function is shown in equation 2:

wherein: f (x) is the probability density function of the s-th initial screening interval; Φ (x) is the probability distribution function of the s-th initial screening interval.

By substituting the upper and lower limits of equation 1 into equation 2, equations 3 and 4 can be obtained:

equation 5 and equation 6 can be obtained from equation 3 and equation 4, respectively:

the upper limit value of the initial screening interval is max (A) in formula 5 _S ) The lower limit value is min (A) in equation 6 _S )。

In particular, from equations 5 and 6, one can obtain: when s =1, i.e., when the initial screening interval is the first initial screening interval of the sample information, min (a) _S ) = infinity; max (a) when s = x, i.e., the initial screening interval is the last initial screening interval of the sample information _S ) = + ∞; when s is equal to the other values, min (A) _S )＝max(A _S ^-1 )。

According to the method and the device, the upper limit value and the lower limit value of the initial screening interval are determined according to the number of the initial screening intervals and the weight of the initial screening interval, so that the obtained initial screening interval can better meet the user requirements, the accuracy of subsequent data screening is improved, the upper limit value and the lower limit value of the initial screening interval are determined through a standardized processing formula and a probability distribution function, and the obtained initial screening interval is more accurate.

In some possible embodiments, determining the designated information value of any one initial filtering interval according to the upper limit value and the lower limit value of any one initial filtering interval and the weight of any one initial filtering interval may be embodied as the steps shown in fig. 4, where:

in step 401: determining a probability mean line of any initial screening interval according to the upper limit value and the lower limit value of any initial screening interval and the weight of any initial screening interval;

in the embodiment of the present application, equation 7 may be used to determine the probability mean line of the initial screening interval:

wherein: avg (A) _S ) Is the probability mean line, ω, of the s-th initial screening interval _s Is the weight of the s-th initial screening interval, max (A) _S ) Is the upper limit value of the s-th initial screening interval, min (A) _S ) The lower limit of the s-th initial screening interval.

Continuing with the example of sample information as the heights of students within a class, if a second initial height is determinedWhen the probability average line of the interval is selected, A in the above formula 7 _S Is A ₂ ，ω _s Is omega ₂ ，max(A _S ) Is max (A) ₂ )，min(A _S ) Is min (A) ₂ )。

In step 402: determining the number of intersection points of the probability density function of the probability average line and any one initial screening interval;

in step 403: and determining the designated information value of any initial screening interval according to the number of the intersection points.

According to the definition of the maximum value point in the related art: fuzzy interval A _S Mean probability line y = avg (a) _S ) And intersecting with a normal distribution probability density function. If there is only one intersection, the abscissa of this intersection is A _S Maximum point of the membership function of (1); if there are two intersection points, the section formed by the abscissa of the two intersection points is taken as a new fuzzy section, and the process is carried out again until only one intersection point exists, and the abscissa of the intersection point is A _S Maximum point of the membership function.

It follows that, when there are two intersection points, the sample mean E must fall within the abscissa interval of the two intersection points at this time, and since the samples satisfy the normal distribution, the two intersection points must be axisymmetric with x = E. In the initial screening interval in the application, intersection is performed again according to the definition of the maximum value point in the related technology, two new intersection points are necessarily symmetrical, the process is performed all the way to the bottom, and the final intersection point is necessarily the mean value E. Thus, the definition of the specified information value in the present application can be derived from the definition of the maximum value point in the related art.

The specified information value defines: a of initial screening interval _S Mean probability line y = avg (a) _S ) And intersecting with a normal distribution probability density function. If there is only one intersection, the abscissa of the intersection is A _S The specified information value of the target screening field of (1); if there are two intersection points, the average value of the sample information is taken as A _S The specified information value of the target screening field of (1).

Therefore, according to the definition, the specified information value of any initial screening interval is determined according to the number of the intersection points, which includes the following two cases:

1) If the number of the intersection points is one point, determining a formula according to the first designated information to determine a designated information value;

2) If the number of the intersection points is two, the designated information value is the average value of any one initial screening interval;

wherein the first specifying information determining formula is as shown in formula 8:

wherein: max ^T (A _S ) To specify the information value, A _S Is the s-th initial screening interval, avg is the probability mean line of the s-th initial screening interval, f ^-1 (avg) is the inverse function of the probability mean line of the s-th initial screening interval, D is the standard deviation in the s-th initial screening interval, and E is the mean of the s-th initial screening interval.

For example: continuing to take the example of the sample information as the height of the students in a class, if the specified information value of the second initial screening interval is determined, then A in the above formula 8 _S Is A ₂ Avg is the probability mean line of the 2 nd initial screening interval, f ^-1 (avg) is the inverse function of the probability mean line for the 2 nd initial screening interval, D is the standard deviation in the 2 nd initial screening interval, and E is the mean of the 2 nd initial screening interval.

In this application, after the designated information value is determined, when the target screening domain corresponding to any one of the initial screening intervals is determined according to the designated information value, the steps shown in fig. 5 may be implemented, where:

in step 501: determining the range of the designated information value based on the sample information with the minimum attribute value in the sample information, the sample information with the maximum attribute value in the sample information and the designated information value;

in step 502: determining a target screening domain formula based on the belonged range;

in step 503: and substituting the specified information value into a target screening domain formula to obtain a target screening domain corresponding to any one initial screening interval.

For example: for sample information, dividing the sample information into t initial screening intervals, which are respectively: a. The ₁ ,A ₂ …A _s …,A _t-1 ,A _t And the corresponding designated information values of each initial screening interval are as follows: max ^T (A ₁ ),max ^T (A ₂ )…,max ^T (A _s )…max ^T (A _t-1 ),max ^T (A _t ) The sample information with the minimum attribute value is U _min And the sample information with the maximum attribute value in the sample information is U _max In the application, a sine-cosine function is adopted to determine a target screening domain corresponding to an initial screening interval.

In the present application, the formula of the target screening domain and the corresponding affiliated range of each target screening domain are shown in formula 9:

wherein A is _i (x) A target screening field of the ith initial screening interval is defined, x is a designated information value of the ith initial screening interval, U _min Sample information, U, with the smallest attribute value _max The sample information with the largest attribute value.

From equation 9, when i =1, equation 9 can be derived as equation 10:

from equation 9, when i =2, equation 9 can be derived as equation 11:

from equation 9, when i = t-1, equation 9 can be derived as equation 12:

from equation 9, when i = t, equation 9 can be derived as equation 13:

in summary, the target screening domain corresponding to each initial screening interval can be finally determined according to the number and weight of the initial screening intervals set by the user, and since the number and weight of the initial screening intervals are set by the user according to the requirement, the obtained screening result can better meet the requirement of the user by screening the sample information according to the obtained target screening domain.

As shown in fig. 6, based on the same inventive concept, a data filtering apparatus 600 is proposed, the apparatus comprising:

an obtaining module 6001, configured to obtain sample information, where the sample information is an attribute value of an object associated with a user, and the attribute value obeys normal distribution;

an initial screening section determining module 6002, configured to determine an upper limit value and a lower limit value of each initial screening section of the sample information according to the number of initial screening sections and the weight of the initial screening sections; the number of the initial screening intervals is set according to the sample information; wherein the weight is set according to the sample information;

a designation information value determination module 6003, configured to, for any one initial screening interval, determine a designation information value of the any one initial screening interval according to the upper limit value, the lower limit value, and the weight of the any one initial screening interval; the specified information value represents the sample information with the maximum attribute value in any initial screening interval;

a target screening domain determining module 6004, configured to determine, according to the specified information value, a target screening domain corresponding to the any one initial screening interval;

and a target information determining module 6005, configured to screen the sample information according to the target screening domain to obtain target information.

In some possible embodiments, the initial screening interval determination module, when determining the upper limit value and the lower limit value of each initial screening interval of the sample information according to the number of initial screening intervals and the weight of the initial screening interval, is configured to:

substituting the number of the initial screening intervals and the weight of the initial screening intervals into the standardized processing formula;

and obtaining an upper limit value and a lower limit value of the initial screening interval according to the brought standardized processing formula and the probability distribution function.

In some possible embodiments, the normalization process formula is:

the probability distribution function is:

wherein: a. The _S At the s-th initial screening interval, max (A) _S ) Min (A) as the upper limit value of the s-th initial screening interval _S ) For the s-th initialThe lower limit value of the screening interval; omega _s The weight of the s-th initial screening interval; f (x) is the probability density function of the s-th initial screening interval; d, standard deviation in the s-th initial screening interval, and E is the mean value of the s-th initial screening interval; omega _i Is the weight of the ith initial screening interval.

In some possible embodiments, the designated information value determining module, when determining the designated information value of the any one initial filtering interval according to the upper limit value, the lower limit value and the weight of the any one initial filtering interval, is configured to:

wherein the first specifying information determination formula is:

wherein: max of ^T (A _S ) For the specified information value, A _S Is the s-th initial screening interval, avg isProbability mean line of the s-th initial screening interval, f ^-1 (avg) is an inverse function of the probability mean line of the s-th initial screening interval, D is the standard deviation in the s-th initial screening interval, and E is the mean of the s-th initial screening interval.

determining a target screening domain formula based on the belonged range;

Having described the data filtering method and apparatus according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible embodiments, an electronic device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the data screening method according to various exemplary embodiments of the present application described above in the present specification.

The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 7. The electronic device 130 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, aspects of a data filtering method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of a data filtering method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for data screening of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executable on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of data screening, the method comprising:

determining a target screening domain corresponding to any one initial screening interval according to the specified information value;

2. The method of claim 1, wherein determining the upper limit value and the lower limit value of each initial screening interval of the sample information according to the number of the initial screening intervals and the weight of the initial screening interval comprises:

3. The method of claim 2, wherein normalizing the initial screening interval based on the number of initial screening intervals and the weight of the initial screening interval to obtain an upper normalized value and a lower normalized value comprises:

wherein A is _S For the s-th initial screening interval,

is the upper limit value of the normalization process,

4. The method of claim 2, wherein obtaining the upper limit value and the lower limit value of the initial filtering interval according to an upper integration limit, a lower integration limit of the integration processing and a probability distribution function of the initial filtering interval comprises:

the obtained upper limit value of the initial screening interval is as follows:

the obtained lower limit value of the initial screening interval is as follows:

5. The method according to claim 1, wherein the determining the designated information value of the any one initial filtering interval according to the upper limit value and the lower limit value of the any one initial filtering interval and the weight of the any one initial filtering interval comprises:

6. The method according to claim 5, wherein said determining the assigned information value of any one of the initial filtering intervals according to the number of the intersection points comprises:

if the number of the intersection points is two, determining that the designated information value is the average value of any one initial screening interval;

wherein the first specifying information determination formula is:

7. The method according to claim 1, wherein the determining a target screening field corresponding to any one of the initial screening intervals according to the specified information value includes:

determining the range of the designated information value based on the sample information with the minimum attribute value in the sample information, the sample information with the maximum attribute value in the sample information and the designated information value;

determining a target screening domain formula based on the belonged range;

8. An apparatus for data screening, the apparatus comprising:

9. An electronic device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory and for executing the steps comprised by the method of any one of claims 1 to 7 in accordance with the obtained program instructions.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the method of any of claims 1-7.

11. A computer program product, the computer program product comprising: computer program code which, when run on a computer, causes the computer to perform the method according to any of the preceding claims 1-7.