CN110210559B - Object screening method and device and storage medium - Google Patents

Object screening method and device and storage medium Download PDF

Info

Publication number
CN110210559B
CN110210559B CN201910471428.7A CN201910471428A CN110210559B CN 110210559 B CN110210559 B CN 110210559B CN 201910471428 A CN201910471428 A CN 201910471428A CN 110210559 B CN110210559 B CN 110210559B
Authority
CN
China
Prior art keywords
processed
objects
acquiring
degree
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910471428.7A
Other languages
Chinese (zh)
Other versions
CN110210559A (en
Inventor
刘毅超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201910471428.7A priority Critical patent/CN110210559B/en
Publication of CN110210559A publication Critical patent/CN110210559A/en
Application granted granted Critical
Publication of CN110210559B publication Critical patent/CN110210559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides an object screening method and device and a storage medium. The method comprises the following steps: acquiring a characteristic value of a first object to be processed in each preset category, and then acquiring a first average value of the characteristic values in any preset category aiming at any first object to be processed, so as to acquire the unbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed, and then a second object to be processed with the unbalance degree larger than or equal to a preset unbalance degree is obtained from the first object to be processed. The method can effectively screen the objects in the multi-classification problem, avoids overfitting and improves the fitting precision.

Description

Object screening method and device and storage medium
Technical Field
The present disclosure relates to computer technologies, and in particular, to a method and an apparatus for screening objects, and a storage medium.
Background
In the multi-classification problem, especially the multi-classification problem with a large number of features, overfitting may be caused by an excessively large amount of data, and some unnecessary features may also cause a reduction in the accuracy of the fitting result. Therefore, it becomes important to reasonably screen the multi-class features.
In the prior art, a mutual information mode is generally adopted to screen discrete features, and obvious sparse features in the discrete features are directly deleted or continuous features are not screened at all.
Thus, especially for continuous features, the direct deletion of the obviously sparse features results in some redundant features being preserved, which are not sparse but indistinguishable, easily causing over-fitting and affecting the fitting accuracy.
Disclosure of Invention
The present disclosure provides an object screening method and apparatus, and a storage medium, which are used to effectively screen objects (which may be specifically features) in a multi-classification problem, avoid overfitting, and improve fitting accuracy.
In a first aspect, the present disclosure provides an object screening method, including:
acquiring characteristic values of a first object to be processed in each preset category respectively;
aiming at any first object to be processed, acquiring a first average value of the characteristic values in any preset category;
acquiring the unbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed;
and acquiring a second object to be processed of which the unbalance degree is greater than or equal to a preset unbalance degree from the first object to be processed.
In a possible design, the obtaining, according to the first average value, an imbalance degree of the first object to be processed includes:
aiming at any first object to be processed, acquiring the variance of the first mean value of the first object to be processed in each preset category;
and taking the value of the variance as the value of the imbalance degree.
In another possible design, the acquiring, from the first objects to be processed, a second object to be processed whose unbalance degree is greater than or equal to a predetermined unbalance degree includes:
sequencing the first objects to be processed according to the sequence of the unbalance degrees from big to small;
and acquiring a preset number of second objects to be processed from the first objects to be processed according to the sequence from front to back after the sequencing.
In another possible design, the method further includes:
in a plurality of subsets in an initial object set, acquiring a characteristic value of each object in each subset;
for any object, obtaining a second average value of the characteristic values of the object in each subset;
and determining the object with the second average value larger than a preset characteristic threshold as the first object to be processed.
In another possible design, the method further includes:
acquiring the maximum characteristic value of each object in the initial object set;
and acquiring the product of the maximum characteristic value and the initial object set coefficient as the characteristic threshold value corresponding to each object.
In a second aspect, the present disclosure provides an object screening apparatus comprising:
the first acquisition module is used for acquiring the characteristic values of the first object to be processed in each preset category respectively;
the second acquisition module is used for acquiring a first average value of the characteristic values in any preset category aiming at any first object to be processed;
a third obtaining module, configured to obtain an imbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed;
and the screening module is used for acquiring a second object to be processed of which the unbalance degree is greater than or equal to a preset unbalance degree from the first object to be processed.
In one possible design, the third obtaining module is configured to:
aiming at any first object to be processed, acquiring the variance of the first mean value of the first object to be processed in each preset category;
and taking the value of the variance as the value of the imbalance degree.
In another possible design, the screening module is configured to:
sequencing the first objects to be processed according to the sequence of the unbalance degrees from big to small;
and acquiring a preset number of second objects to be processed from the first objects to be processed according to the sequence from front to back after the sequencing.
In another possible design, the apparatus further includes:
a fourth obtaining module, configured to obtain, in a plurality of subsets in the initial object set, a feature value of each object in each of the subsets;
a fifth obtaining module, configured to obtain, for any object, a second average value of the feature values of the object in each subset;
and the determining module is used for determining the object with the second average value larger than a preset characteristic threshold as the first object to be processed.
In another possible design, the apparatus further includes:
the sixth acquisition module is used for acquiring the maximum characteristic value of each object in the initial object set;
a seventh obtaining module, configured to obtain, for each object, a product of the maximum feature value and an initial object set coefficient as the feature threshold corresponding to the object.
In a third aspect, the present disclosure provides an object screening apparatus, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program,
the computer program is executed by a processor to implement the method as described in the first aspect.
According to the object screening method and device and the storage medium, the first average values of the objects in the classes are respectively calculated according to the characteristic values of the first objects to be processed in the preset classes, the unbalance degree of the first objects to be processed is obtained, the unbalance degree is used for representing the distribution difference degree of the objects, therefore, the objects with the low unbalance degree are redundant objects in the fitting process, the objects with the high unbalance degree are more meaningful for the fitting process, the objects are screened based on the unbalance degree of each first object to be processed, the redundant objects can be reduced, the fitting phenomenon is restrained to a certain extent, the redundant objects are deleted from the set of the second objects to be processed obtained after screening, the object dimension is reduced, the complexity degree of a fitting model is more favorably reduced, and the fitting precision is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic flowchart of an object screening method according to an embodiment of the disclosure;
fig. 2 is a schematic flow chart of another object screening method provided in the embodiments of the present disclosure;
fig. 3 is a schematic flow chart of another object screening method provided in the embodiments of the present disclosure;
fig. 4 is a schematic flow chart of another object screening method provided in the embodiments of the present disclosure;
fig. 5 is a schematic flow chart of another object screening method provided in the embodiments of the present disclosure;
fig. 6 is a functional block diagram of an object screening apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic physical structure diagram of an object screening apparatus according to an embodiment of the present disclosure.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The application scenario of the embodiment of the present disclosure may be: the feature screening process can be further an object fitting process in a multi-classification problem.
Because the object screening mode adopted in the existing multi-classification problem fitting process has the problems of lower precision and easy over-fitting phenomenon, the object screening method provided by the disclosure aims to solve the above technical problems in the prior art and provides the following solving ideas: according to the preset classification, the characteristic value of each object in each class is averaged, and the unbalance degree of each object is calculated according to the average values of a plurality of classes, so as to guide object screening.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems. The several embodiments may be combined, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.
The embodiment provides an object screening method. Referring to fig. 1, fig. 1 is a schematic flow chart of an object screening method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:
s102, obtaining characteristic values of the first object to be processed in each preset category respectively.
The embodiment of the present disclosure is not particularly limited to the object type of the first object to be processed. In a multi-classification scene, each first object to be processed may be embodied as a feature, that is, feature screening is performed for multi-classification features.
S104, aiming at any first object to be processed, acquiring a first average value of the characteristic values in any preset category.
The classification of categories may then be set as desired. For example, the first object to be processed may be divided into two types according to gender: male and female subjects; alternatively, the first object to be processed may be classified into three categories by age: an elderly class subject, a middle-aged class subject, and a teenagers class subject.
S106, acquiring the unbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed.
S108, acquiring a second object to be processed of which the unbalance degree is greater than or equal to a preset unbalance degree from the first object to be processed.
Through the processing mode shown in fig. 1, the imbalance degree of each first object to be processed is obtained to realize object screening, and the imbalance degree is used for representing the distribution difference degree of the objects, so that the object with the lower imbalance degree is a redundant object in the fitting process, and the object with the higher imbalance degree is more meaningful in the fitting process.
This embodiment provides another object screening method. This embodiment is a further extension and refinement of the steps in the above embodiments.
In the present embodiment, the type of the first object to be processed is determined by actual data. For ease of understanding, several possible scenarios are given as follows:
first, there may be 0, 1 or more first objects to be processed in each of a plurality of preset categories. For example, if all the first objects to be processed are female objects, the number of female objects is plural and the number of male objects is 0, by gender classification.
Secondly, for any category, the number of the same first objects to be processed included in the category is: 0, 1 or more. For example, there are 1 first object a to be processed in the category a, no first object a to be processed in the category B, and a plurality of first objects a to be processed, such as 10, in the category C. In addition, when a plurality of first objects to be processed are provided in one category, the feature values of the first objects to be processed may be the same or different.
Conversely, from the perspective of the first object to be processed, a first object to be processed may appear in one or more categories. As before, object a appears in both class A and class C.
Further, the embodiment of the present disclosure is not particularly limited to the type of the first object to be processed. Since the embodiment of the present disclosure is an object screening implemented based on a feature value, in an implementation scenario of the embodiment, the first object to be processed may be a data-type object. In addition, in the object screening process of other non-data type objects, the feature values of the non-data type objects can be acquired according to a preset rule, and then the object screening method provided by the embodiment of the disclosure is executed. The embodiment of the present disclosure is not limited to the implementation manner of how to obtain the feature value of the non-datatype object. For example, still taking the sex object as an example, the characteristic value of the female object may be preset to be 1, and the characteristic value of the male object may be preset to be 0.
Hereinafter, for convenience of understanding, it is assumed that all the first objects to be processed relate to N categories (category 1 to category N) in total, and for any one of the determined first objects to be processed, the number of the first objects to be processed in each category is not limited.
Thus, the implementation of S104 will be described by taking the first average value of the first object to be processed a as an example.
If the category 1 contains x1 first objects to be processed a, the first average value of the first objects to be processed a in the category 1 is: and acquiring the sum of the characteristic values of the x1 first objects to be processed a, and dividing the sum by x 1. Wherein x1 can be 1 or an integer greater than 1. In addition, if the first object a to be processed is not included in the category 1, the first average value of the first object a to be processed in the category 1 may be recorded as 0.
Similarly, if the category 2 contains x2 first objects to be processed a, the first average value of the first objects to be processed a in the category 2 is: and acquiring the sum of the characteristic values of the x2 first objects to be processed a, and dividing the sum by x 2. Wherein x2 can be 1 or an integer greater than 1. In addition, if the first object a to be processed is not included in the category 2, the first average value of the first object a to be processed in the category 1 may be recorded as 0.
By analogy, a first average value of the characteristic value of the first object a to be processed in any preset category can be obtained.
And processing other first objects to be processed according to the method, so that any first object to be processed can be obtained, and a first average value of the characteristic value in any preset category is obtained.
Thereafter, S106 may be executed to obtain the degree of imbalance of each object in the first set of objects to be processed. As mentioned above, the degree of imbalance is used to measure the degree of distribution variance of the first object to be processed, and in the embodiment of the present disclosure, considering that the variance can measure the degree of dispersion of a random variable or a set of data, the embodiment of the present disclosure may use the variance to characterize the degree of imbalance of the first object to be processed.
Referring to fig. 2, fig. 2 is a schematic flowchart of another object screening method provided in the embodiment of the present disclosure, and as shown in fig. 2, step S106 shown in fig. 1 may further include:
s202, aiming at any first object to be processed, the variance of the first mean value of the first object to be processed in each preset category is obtained.
And S204, taking the value of the variance as the value of the imbalance degree.
For example, still taking the first object to be processed a as an example, obtaining the imbalance degree of the first object to be processed a may be characterized by the following formula:
Figure BDA0002080946220000071
wherein, score is used for representing the degree of imbalance of the first object a to be processed, avg (i) represents the first average value of the first object a to be processed in the ith category, wherein the value range of i is [1, N [ ]]N is the total number of classes, AVGCATAGORYThe average of the first means representing the N classes, i.e., AVGCATAGORYCan satisfy the following conditions: AVGCATAGORY=SUM(AVG(1),AVG(2)……AVG(N))/N。
As indicated above, a value indicative of the degree of imbalance is initially obtained via S106, and thus, in one possible implementation, the degree of imbalance may be expressed directly as a value, or score (score). According to the implementation mode, other processing is not needed to be carried out on the processing result, so that resources are saved to a certain extent, and the processing efficiency is improved.
Or, in another implementation process, the numerical value of the imbalance degree may be further processed to obtain other expressions of the imbalance degree.
In one possible implementation, a plurality of levels are divided according to the value of the degree of imbalance, and then the degree of imbalance of the object is represented in a level manner. Wherein, the expression form of the grade can be as follows: characters, symbols, etc. For example, two levels may be divided according to the degree of disparity: the system comprises a first level and a second level, wherein the unbalance degree of the objects in the first level is larger than that of the objects in the second level.
In addition, for any one of the first objects to be processed, the imbalance degree of each first object to be processed can be obtained by processing in the manner described above.
In this embodiment, the higher the unbalance degree of the first object to be processed is, the greater the actual meaning of the first object to be processed is proved to be, and therefore, the first object to be processed with the higher unbalance degree can be retained. Conversely, if the degree of imbalance of the first object to be processed in the sample combination set is lower, it is proved that the first object to be processed is more sparse, and overfitting is more likely to be caused, and if the sparse objects participate in fitting, the accuracy of the fitting result is reduced, so that the first object to be processed with the lower degree of imbalance can be deleted.
Referring to fig. 3, fig. 3 is a schematic flowchart of another object screening method provided in the embodiment of the present disclosure, and as shown in fig. 3, step S108 shown in fig. 1 may include the following steps:
s302, sorting the first objects to be processed according to the sequence of the unbalance degrees from big to small.
S304, acquiring a preset number of second objects to be processed from the first object to be processed according to the sequence from front to back after the sequencing.
That is, the portion of the first object to be processed with the lower degree of unbalance is deleted. Wherein the specified number may be preset in advance. For example, if y first objects to be processed are preset in advance, then according to the foregoing sorting, the y first objects to be processed which are sorted in the front are reserved and used as second objects to be processed, and the rest first objects to be processed are deleted.
Or, besides the number of objects to be preserved is preset, the screening process may be implemented in a manner of presetting the preservation proportion of the total number of the first objects to be processed. For example, if 50% of the objects in the total number of the first objects to be processed are reserved in advance, the first objects to be processed which are sorted by the top 50% are reserved as the second objects to be processed according to the sorting, and the rest of the objects are deleted.
Or, the unbalance degree may be greater than or equal to a preset degree pre-threshold value, and a first object to be processed of which the unbalance degree is greater than or equal to the preset degree pre-threshold value is obtained from the unbalance degree, so as to be used as the second object to be processed.
Or, if the imbalance degree of the first object to be processed is characterized in a level manner, when the step is executed, the first object to be processed may be directly screened by taking the level as a unit, a part of the first object to be processed with a lower level is deleted, and a part of the first object to be processed with a higher level is reserved as the second object to be processed.
For example, if two levels are classified according to the degree of disparity: and a first level and a second level, where the imbalance degree of the first to-be-processed object in the first level is greater than the imbalance degree of the first to-be-processed object in the second level, when step S108 is executed, all the first to-be-processed objects whose imbalance degrees are the second level may be deleted, and all the first to-be-processed objects whose imbalance degrees are the first level may be retained, so as to obtain the screened second to-be-processed object.
Similar to the foregoing implementation, a level that needs to be reserved may also be preset, or a mapping relationship between the level and whether to reserve may also be preset. For example, if the mapping relationship between the first level and the reservation and the mapping relationship between the second level and the deletion are preset, the level of the imbalance degree of each first object to be processed may be obtained after the imbalance degree of each first object to be processed is determined in S106, and then the object screening may be performed directly according to the mapping relationship in S108.
This embodiment provides another object screening method. This embodiment is a further extension of the above-described embodiment.
In addition, the embodiment of the present disclosure further provides an implementation manner of obtaining the aforementioned first object to be processed by filtering the initial object set. Referring to fig. 4, fig. 4 is a schematic flow chart of another object screening method provided in the embodiment of the present disclosure, and as shown in fig. 4, the method may further include the following steps:
s402, in a plurality of subsets in the initial object set, obtaining the characteristic value of each object in each subset.
Wherein the initial set of objects is comprised of a subset, the subset being comprised of a plurality of objects. And the object can have different representation forms based on different implementation scenes.
For example, in a scenario of research and screening of users of an application, the objects may be users, each of which has different feature values in different categories (or dimensions), for example, the categories may include: gender category, age category, occupation category, height category, and the like.
For another example, in an arbitrary information processing scenario, the object is information, and each piece of information has a feature value of a different category, and the category may include, for example: information description categories (e.g., whether to indicate class information or describe class information), storage location categories, length categories of information, and the like.
S404, aiming at any object, obtaining a second average value of the characteristic values of the object in each subset.
S406, determining the object with the second average value larger than a preset characteristic threshold as the first object to be processed.
For convenience of understanding, the implementation described in fig. 2 will be described by taking the example where K subsets (subset 1 to subset K) are shared in the initial object set. It should be noted that, for a certain object, the number of the objects in each subset is not limited.
The implementation manner of the step S402 is similar to that of the step S102, and can be obtained as follows (taking the object b as an example):
if the subset 1 contains z1 objects b, the second average value of the objects b in the subset 1 is: the sum of the feature values of the z1 objects b is obtained and divided by z 1. Wherein z1 can be 1 or an integer greater than 1. If the subset 1 does not include the object b, the second average of the feature values of the object b in the subset 1 may be written as 0.
Similarly, if z2 objects b are included in the subset 2, the second average value of the objects b in the subset 2 is: the sum of the feature values of the z2 objects b is obtained and divided by z 2. Wherein z2 can be 1 or an integer greater than 1. If the subset 2 does not include the object b, the second average of the feature values of the object b in the subset 1 may be written as 0.
And so on, a second average of the feature values of the object b in each subset is obtained.
In addition, for each of the other objects in the initial object set, the processing is performed in the manner described above, so that a second average value of the feature value of each object in each subset can be obtained.
After the second average value of each object is obtained, the second average value may be compared with the feature threshold, so as to retain the objects larger than the feature threshold, delete the objects smaller than or equal to the feature threshold, thus obtain a first object to be processed, and then execute the object screening process as shown in fig. 1 and any implementation manner thereof with respect to the first object to be processed.
In the embodiment of the present disclosure, the characteristic threshold may be preset in advance.
In an implementation scenario of the present embodiment, the feature threshold may be preset to a value. At this time, the preset feature threshold may be stored in a designated location, and when step S404 is executed, the preset feature threshold may be directly called.
When the feature threshold is preset by a specific numerical value, all objects may be set to be the same feature threshold, or separate feature thresholds may be set for each object, and when the separate setting is implemented, there may be objects having the same feature threshold.
In an implementation scenario of the present embodiment, the feature threshold may be preset as an algorithm, and in this case, before executing step S404, the following steps are further included: and acquiring the characteristic threshold of each object according to a preset algorithm.
The embodiment of the present disclosure provides a method for obtaining a feature threshold of each object according to a preset algorithm as follows, please refer to fig. 5, where fig. 5 is a schematic flow diagram of another object screening method provided in the embodiment of the present disclosure, and as shown in fig. 5, the method further includes the following steps:
s502, acquiring the maximum characteristic value of each object in the initial object set.
S504, for each object, obtaining a product of the maximum feature value and the initial object set coefficient as the feature threshold corresponding to the object.
In an implementation scenario of this embodiment, the initial object set coefficient may be a fixed value, for example, may be set to 0.0001.
Alternatively, the initial object set coefficient may be associated with a total number of objects in the initial object set, wherein the larger the total number of objects in the initial object set, the smaller the initial object set coefficient.
In the embodiment of the present disclosure, the first object to be processed is obtained by screening the initial object set, and then, the second object to be processed is obtained by further screening the object in the screening manner shown in fig. 1, so that when the second object to be processed is used for subsequent fitting or other data processing, the dimension of the number of the objects can be further reduced, and the fitting accuracy of fitting after object screening is improved.
The technical scheme provided by the embodiment of the disclosure at least has the following technical effects:
according to the technical scheme, the first average values of the first objects to be processed in the categories are calculated respectively according to the characteristic values of the first objects to be processed in the preset categories, the unbalance degree of the first objects to be processed is obtained, and the unbalance degree is used for representing the distribution difference degree of the objects, so that the objects with the low unbalance degree are redundant objects in the fitting process, and the objects with the high unbalance degree are more meaningful for the fitting process.
Based on the object screening method provided in the first embodiment, the embodiments of the present disclosure further provide embodiments of apparatuses for implementing the steps and methods in the first embodiment of the method.
Referring to fig. 6, fig. 3 is a functional block diagram of an object screening apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the object screening apparatus 600 includes:
a first obtaining module 61, configured to obtain a feature value of the first object to be processed in each preset category;
a second obtaining module 62, configured to obtain, for any one of the first objects to be processed, a first average value of the feature values in any preset category;
a third obtaining module 63, configured to obtain an imbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed;
and the screening module 64 is configured to obtain, from the first objects to be processed, a second object to be processed whose imbalance degree is greater than or equal to a predetermined imbalance degree.
In an implementation scenario of this embodiment, the third obtaining module 63 is configured to:
aiming at any first object to be processed, acquiring the variance of the first mean value of the first object to be processed in each preset category;
and taking the value of the variance as the value of the imbalance degree.
In an implementation scenario of this embodiment, the filtering module 64 is configured to:
sequencing the first objects to be processed according to the sequence of the unbalance degrees from big to small;
and acquiring a preset number of second objects to be processed from the first objects to be processed according to the sequence from front to back after the sequencing.
Furthermore, in an implementation scenario of the present embodiment, the object filtering apparatus 600 may further include (not shown in fig. 6):
a fourth obtaining module, configured to obtain, in a plurality of subsets in the initial object set, a feature value of each object in each of the subsets;
a fifth obtaining module, configured to obtain, for any object, a second average value of the feature values of the object in each subset;
and the determining module is used for determining the object with the second average value larger than a preset characteristic threshold as the first object to be processed.
Furthermore, in an implementation scenario of the present embodiment, the object filtering apparatus 600 may further include (not shown in fig. 6):
the sixth acquisition module is used for acquiring the maximum characteristic value of each object in the initial object set;
a seventh obtaining module, configured to obtain, for each object, a product of the maximum feature value and an initial object set coefficient as the feature threshold corresponding to the object.
In addition, an embodiment of the present disclosure provides an object screening apparatus, please refer to fig. 7, where the object screening apparatus 700 includes:
a memory 710;
a processor 720; and
a computer program;
wherein the computer program is stored in the memory 710 and configured to be executed by the processor 720 to implement the methods as described in the above embodiments.
In addition, as shown in fig. 7, a transmitter 730 and a receiver 740 are further disposed in the object screening apparatus 700 for data transmission or communication with other devices, which is not described herein again.
Furthermore, embodiments of the present disclosure provide a readable storage medium, having a computer program stored thereon,
the computer program is executed by a processor to implement the method according to the first embodiment.
Since each module in this embodiment can execute the method shown in the first embodiment, reference may be made to the related description of the first embodiment for a part of this embodiment that is not described in detail.
The technical scheme provided by the embodiment of the disclosure at least has the following technical effects:
according to the technical scheme, the first average values of the first objects to be processed in the categories are calculated respectively according to the characteristic values of the first objects to be processed in the preset categories, the unbalance degree of the first objects to be processed is obtained, and the unbalance degree is used for representing the distribution difference degree of the objects, so that the objects with the low unbalance degree are redundant objects in the fitting process, and the objects with the high unbalance degree are more meaningful for the fitting process.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. An object screening method, comprising:
in a plurality of subsets in an initial object set, acquiring a characteristic value of each object in each subset;
for any object, obtaining a second average value of the characteristic values of the object in each subset;
acquiring the maximum characteristic value of each object in the initial object set;
for each object, acquiring the product of the maximum characteristic value and the initial object set coefficient as a characteristic threshold corresponding to the object;
determining the object with the second average value larger than a preset characteristic threshold as a first object to be processed;
acquiring characteristic values of a first object to be processed in each preset category respectively;
aiming at any first object to be processed, acquiring a first average value of the characteristic values in any preset category;
acquiring the unbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed;
and acquiring a second object to be processed of which the unbalance degree is greater than or equal to a preset unbalance degree from the first object to be processed.
2. The method according to claim 1, wherein the obtaining the imbalance degree of the first object to be processed according to the first average value includes:
aiming at any first object to be processed, acquiring the variance of the first mean value of the first object to be processed in each preset category;
and taking the value of the variance as the value of the imbalance degree.
3. The method according to claim 1 or 2, wherein the acquiring, of the first objects to be processed, a second object to be processed whose unbalance degree is greater than or equal to a predetermined unbalance degree includes:
sequencing the first objects to be processed according to the sequence of the unbalance degrees from big to small;
and acquiring a preset number of second objects to be processed from the first objects to be processed according to the sequence from front to back after the sequencing.
4. An object screening apparatus, comprising:
a fourth obtaining module, configured to obtain, in a plurality of subsets in the initial object set, a feature value of each object in each of the subsets;
a fifth obtaining module, configured to obtain, for any object, a second average value of the feature values of the object in each subset;
the sixth acquisition module is used for acquiring the maximum characteristic value of each object in the initial object set;
a seventh obtaining module, configured to obtain, for each object, a product of the maximum feature value and an initial object set coefficient as a feature threshold corresponding to the object;
the determining module is used for determining the object with the second average value larger than a preset characteristic threshold as a first object to be processed;
the first acquisition module is used for acquiring the characteristic values of the first object to be processed in each preset category respectively;
the second acquisition module is used for acquiring a first average value of the characteristic values in any preset category aiming at any first object to be processed;
a third obtaining module, configured to obtain an imbalance degree of the first object to be processed according to the first average value; the unbalance degree is used for representing the distribution difference degree of the first object to be processed;
and the screening module is used for acquiring a second object to be processed of which the unbalance degree is greater than or equal to a preset unbalance degree from the first object to be processed.
5. The apparatus of claim 4, wherein the third obtaining module is configured to:
aiming at any first object to be processed, acquiring the variance of the first mean value of the first object to be processed in each preset category;
and taking the value of the variance as the value of the imbalance degree.
6. The apparatus of claim 4 or 5, wherein the screening module is configured to:
sequencing the first objects to be processed according to the sequence of the unbalance degrees from big to small;
and acquiring a preset number of second objects to be processed from the first objects to be processed according to the sequence from front to back after the sequencing.
7. An object screening apparatus, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1 to 3.
8. A computer-readable storage medium, having stored thereon a computer program,
the computer program is executed by a processor to implement the method of any one of claims 1 to 3.
CN201910471428.7A 2019-05-31 2019-05-31 Object screening method and device and storage medium Active CN110210559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910471428.7A CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910471428.7A CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110210559A CN110210559A (en) 2019-09-06
CN110210559B true CN110210559B (en) 2021-10-08

Family

ID=67790194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910471428.7A Active CN110210559B (en) 2019-05-31 2019-05-31 Object screening method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110210559B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840516A (en) * 2010-04-27 2010-09-22 上海交通大学 Feature selection method based on sparse fraction
CN103106275A (en) * 2013-02-08 2013-05-15 西北工业大学 Text classification character screening method based on character distribution information
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN105740388A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Distributed drift data set-based feature selection method
CN105938523A (en) * 2016-03-31 2016-09-14 陕西师范大学 Feature selection method and application based on feature identification degree and independence
CN106874286A (en) * 2015-12-11 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for screening user characteristics
CN107468260A (en) * 2017-10-12 2017-12-15 公安部南昌警犬基地 A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state
CN107518894A (en) * 2017-10-12 2017-12-29 公安部南昌警犬基地 A kind of construction method and device of animal brain electricity disaggregated model
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN107714038A (en) * 2017-10-12 2018-02-23 北京翼石科技有限公司 The feature extracting method and device of a kind of EEG signals
CN107844865A (en) * 2017-11-20 2018-03-27 天津科技大学 Feature based parameter chooses the stock index prediction method with LSTM models
CN107845407A (en) * 2017-08-24 2018-03-27 大连大学 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined
CN107945053A (en) * 2017-12-29 2018-04-20 广州思泰信息技术有限公司 A kind of multiple source power distribution network data convergence analysis platform and its control method
CN108240978A (en) * 2016-12-26 2018-07-03 同方威视技术股份有限公司 Self-learning type method for qualitative analysis based on Raman spectrum
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN108509996A (en) * 2018-04-03 2018-09-07 电子科技大学 Feature selection approach based on Filter and Wrapper selection algorithms
CN109192310A (en) * 2018-07-25 2019-01-11 同济大学 A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data
CN109523118A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Risk data screening technique, device, computer equipment and storage medium
CN109636035A (en) * 2018-12-12 2019-04-16 北京天诚同创电气有限公司 Load forecasting model creation method and device, Methods of electric load forecasting and device

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840516A (en) * 2010-04-27 2010-09-22 上海交通大学 Feature selection method based on sparse fraction
CN103106275A (en) * 2013-02-08 2013-05-15 西北工业大学 Text classification character screening method based on character distribution information
CN105117617A (en) * 2015-08-26 2015-12-02 大连海事大学 Method for screening environmentally sensitive biomolecules
CN106874286A (en) * 2015-12-11 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for screening user characteristics
CN105740388A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Distributed drift data set-based feature selection method
CN105938523A (en) * 2016-03-31 2016-09-14 陕西师范大学 Feature selection method and application based on feature identification degree and independence
CN108240978A (en) * 2016-12-26 2018-07-03 同方威视技术股份有限公司 Self-learning type method for qualitative analysis based on Raman spectrum
CN107845407A (en) * 2017-08-24 2018-03-27 大连大学 Based on filtering type and improve the human body physiological characteristics selection algorithm for clustering and being combined
CN107468260A (en) * 2017-10-12 2017-12-15 公安部南昌警犬基地 A kind of brain electricity analytical device and analysis method for judging ANIMAL PSYCHE state
CN107518894A (en) * 2017-10-12 2017-12-29 公安部南昌警犬基地 A kind of construction method and device of animal brain electricity disaggregated model
CN107714038A (en) * 2017-10-12 2018-02-23 北京翼石科技有限公司 The feature extracting method and device of a kind of EEG signals
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN107844865A (en) * 2017-11-20 2018-03-27 天津科技大学 Feature based parameter chooses the stock index prediction method with LSTM models
CN107945053A (en) * 2017-12-29 2018-04-20 广州思泰信息技术有限公司 A kind of multiple source power distribution network data convergence analysis platform and its control method
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN108509996A (en) * 2018-04-03 2018-09-07 电子科技大学 Feature selection approach based on Filter and Wrapper selection algorithms
CN109192310A (en) * 2018-07-25 2019-01-11 同济大学 A kind of undergraduate psychological behavior unusual fluctuation scheme Design method based on big data
CN109523118A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Risk data screening technique, device, computer equipment and storage medium
CN109636035A (en) * 2018-12-12 2019-04-16 北京天诚同创电气有限公司 Load forecasting model creation method and device, Methods of electric load forecasting and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
基于类别加权和方差统计的特征选择方法;冀俊忠等;《北京工业大学学报》;20141031;第40卷(第10期);第1593-1602页 *
数据预处理与特征选择;fresh_suger;《CSDN:https://blog.csdn.net/ganzhantoulebi0546/article/details/72921236》;20170608;https://blog.csdn.net/ganzhantoulebi0546/article/details/72921236 *
机器学习 特征选择(过滤法 封装法 嵌入法);打牛地;《CSDN:https://blog.csdn.net/weixin_43172660/article/details/84340164》;20181120;https://blog.csdn.net/weixin_43172660/article/details/84340164 *
机器学习中特征选择的方法综述;奋斗的小炎;《CSDN:https://blog.csdn.net/little_fire/article/details/80500354》;20180529;https://blog.csdn.net/little_fire/article/details/80500354 *
机器学习--特征选择(Python代码实现);RinnyLu;《CSDN:https://blog.csdn.net/github_38980969/article/details/82252412》;20180831;https://blog.csdn.net/github_38980969/article/details/82252412 *
特征选择方法总结;Joey_yk;《CSDN:https://blog.csdn.net/Joey_yk/article/details/82736145》;20180918;https://blog.csdn.net/Joey_yk/article/details/82736145 *

Also Published As

Publication number Publication date
CN110210559A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
CN110245687B (en) User classification method and device
US8953877B2 (en) Noise estimation for images
CN110705708A (en) Compression method and device of convolutional neural network model and computer storage medium
CN111814846B (en) Training method and recognition method of attribute recognition model and related equipment
CN108171570B (en) Data screening method and device and terminal
CN111862040A (en) Portrait picture quality evaluation method, device, equipment and storage medium
CN108154163A (en) Data processing method, data identification and learning method and its device
CN110210559B (en) Object screening method and device and storage medium
CN111787310B (en) Anti-shake performance testing method and device, computer equipment and storage medium
CN111047587A (en) Noise estimation method of image, computer device and storage medium
CN107784363B (en) Data processing method, device and system
CN110278119B (en) Resource allocation method and resource allocation device for edge node
CN112138399B (en) Game data updating method, system and game server
JP4967045B2 (en) Background discriminating apparatus, method and program
CN114677504A (en) Target detection method, device, equipment terminal and readable storage medium
CN111179238B (en) Subset confidence ratio dynamic selection method for underwater image set-oriented guidance consistency enhancement evaluation
CN111428767B (en) Data processing method and device, processor, electronic equipment and storage medium
CN113891323A (en) WiFi-based user tag acquisition system
CN113139579B (en) Image classification method and system based on image feature self-adaptive convolution network
CN112925937B (en) Image screening method, image screening device, storage medium and electronic device
CN109672587B (en) Public terminal identification method, public terminal identification system, public terminal identification server and computer readable medium
CN115549688A (en) Memory compression method and device and electronic equipment
CN118055044A (en) Service scene comparison method, device, computer equipment and storage medium
CN115904859A (en) Memory occupancy estimation method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant