CN108595497B - Data screening method, apparatus and terminal - Google Patents
Data screening method, apparatus and terminal Download PDFInfo
- Publication number
- CN108595497B CN108595497B CN201810220055.1A CN201810220055A CN108595497B CN 108595497 B CN108595497 B CN 108595497B CN 201810220055 A CN201810220055 A CN 201810220055A CN 108595497 B CN108595497 B CN 108595497B
- Authority
- CN
- China
- Prior art keywords
- data
- sample data
- target labels
- probability
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Abstract
The embodiment of the invention provides a kind of data screening method, apparatus and terminals, wherein the data screening method includes: that multiple noise datas are extracted from data to be screened as sample data;Conversion process is carried out to each sample data, obtains the transformation data of each sample data;By preparatory trained image classification model, Tag Estimation is carried out to each sample data and each transformation data, determines the target labels and target labels probability of each sample data;According to the target labels and target labels probability of each sample data, each sample data is screened, obtain target database data screening scheme provided in an embodiment of the present invention, it treats garbled data manually without user and screening is marked one by one, data screening can be carried out automatically according to computer program, it is convenient and time-consuming short to operate, and can either save human resources, and be able to ascend data screening efficiency.
Description
Technical field
The present invention relates to noise data screening technique fields, more particularly to a kind of data screening method, apparatus and terminal.
Background technique
Recently, deep learning achieves breakthrough in the related contents understanding such as natural language processing, text translation field
Progress.However these development depend critically upon the scale of training data, so data are by these technical applications to actual production
Most important bottleneck in environment.
By taking current data sorting task as an example, the data volume of each general labeling requirement is magnitude as " thousand ".
Traditional method uses full monitoring data training pattern then to reuse that is, firstly the need of enough labeled data are obtained
This part labeled data training pattern.But the mode based on artificial labeled data obtains extensive mark in internet data
Data exist following insufficient:
The first, the data of " thousand " magnitude seem seldom, but the amount of data to be marked is but very huge.Under normal circumstances
Just there is a training data in the labeled data of 10-20 or so, this means that the mark human cost of each labeling requirement
It increases sharply.
The second, general label system is comparatively very huge, and the use of each label is manually marked in this way
Method will consume a large amount of human resources.Moreover, the data generated daily in internet environment continually, hardly may be used
All data can manually be marked, mark difficulty is big.
Summary of the invention
The embodiment of the present invention provides a kind of data screening method, apparatus and terminal, existing in the prior art right to solve
The data generated daily in internet environment carry out data screening after being labeled, difficulty is big and consumption human cost is high asks
Topic.
According to one aspect of the present invention, a kind of data screening method is provided, wherein the described method includes: from wait sieve
It selects and extracts multiple noise datas in data as sample data;Conversion process is carried out to each sample data, is obtained each described
The transformation data of sample data;By preparatory trained image classification model, to each sample data and each transformation
Data carry out Tag Estimation, determine the target labels and target labels probability of each sample data;According to each sample number
According to target labels and target labels probability, each sample data is screened, obtain target database.
Optionally, the target labels and target labels probability according to each sample data, to each sample number
According to the step of being screened, obtaining target database, comprising: each sample data to be grouped according to target labels;Its
In, the corresponding target labels of each grouping;The sample data in same grouping is ranked up according to target labels probability;Its
In, the target labels probability value for the preceding sample data that sorts is big;Screening obtains the preceding preset quantity that sorts in each grouping
Sample data generates target database.
Optionally, described by preparatory trained image classification model, to each sample data and each transformation
The step of data carry out Tag Estimation, determine the target labels and target labels probability of each sample data, comprising: by pre-
First trained image classification model carries out Tag Estimation to each sample data and each transformation data, respectively obtains
The tag recognition result of each sample data and each transformation data;Wherein, tag recognition result includes: that data are corresponding
Each label and the corresponding probability of each label;For each sample data, according to the sample data tag recognition as a result,
Tag recognition with the transformation data of the sample data is as a result, determine that the target labels of the sample data and target labels are general
Rate.
Optionally, according to the tag recognition of the sample data as a result, label with the transformation data of the sample data
Recognition result determines the target labels of the sample data and the step of the target labels probability, comprising: be directed to each mark
Label, the probability of the corresponding label of the transformation data of the sample data and the sample data is weighted and averaged, is obtained
To the weighted average probability of the label;Determine the maximum value in the weighted average probability of each label;Maximum weighted is averaged generally
The corresponding label of rate, is determined as the target labels of the sample data;The maximum weighted average probability is determined as the sample
The target labels probability of notebook data.
Optionally, described that each sample data is converted, obtain the step of the transformation data of each sample data
Suddenly, comprising: each sample data is converted according to default mapping mode, obtains the transformation data of each sample data;
Wherein, default transform method includes at least one of: rotation, translation and shearing.
According to another aspect of the present invention, a kind of data screening device is provided, wherein described device includes: extraction mould
Block is configured as extracting multiple noise datas from data to be screened as sample data;Conversion module is configured as to each institute
It states sample data and carries out conversion process, obtain the transformation data of each sample data;Determining module is configured as by preparatory
Trained image classification model carries out Tag Estimation to each sample data and each transformation data, determines each described
The target labels and target labels probability of sample data;Screening module is configured as the target mark according to each sample data
Label and target labels probability, screen each sample data, obtain target database.
Optionally, the screening module includes: grouping submodule, is configured as each sample data according to target mark
Label are grouped;Wherein, the corresponding target labels of each grouping;Sorting sub-module is configured as according to target labels probability
Sample data in same grouping is ranked up;Wherein, the target labels probability value for the preceding sample data that sorts is big;It generates
Submodule is configured as screening the sample data for the preceding preset quantity that obtains sorting in each grouping, generates target database.
Optionally, the determining module includes: identification submodule, is configured as through preparatory trained image classification mould
Type carries out Tag Estimation to each sample data and each transformation data, respectively obtains each sample data and each
The tag recognition result of the transformation data;Wherein, tag recognition result includes: the corresponding each label of data and each label pair
The probability answered;Label determines submodule, is configured as each sample data, the tag recognition knot according to the sample data
The tag recognition of the transformation data of fruit and the sample data is as a result, determine the target labels and target mark of the sample data
Sign probability.
Optionally, the label determines that submodule is specifically configured to: each label is directed to, by the sample data and institute
The probability for stating the corresponding label of transformation data of sample data is weighted and averaged, and the weighted average for obtaining the label is general
Rate;Determine the maximum value in the weighted average probability of each label;By the corresponding label of maximum weighted average probability, it is determined as described
The target labels of sample data;The maximum weighted average probability is determined as to the target labels probability of the sample data.
Optionally, the conversion module is specifically configured to: being become to each sample data according to default mapping mode
It changes, obtains the transformation data of each sample data;Wherein, default transform method includes at least one of: rotation, translate with
And shearing.
In accordance with a further aspect of the present invention, a kind of terminal is provided, comprising: memory, processor and be stored in described deposit
On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor
The step of any one heretofore described data screening method.
According to another aspect of the invention, a kind of computer readable storage medium, the computer-readable storage are provided
Computer program is stored on medium, the computer program realizes any one heretofore described when being executed by processor
The step of data screening method.
Compared with prior art, the invention has the following advantages that
Data screening scheme provided in an embodiment of the present invention, periodically carries out data screening, and when screening sieves from user twice
Sample data is extracted in the data, that is, data to be screened generated in choosing interval, each sample data is converted to carry out data
Augmentation determines the target labels and target labels probability of each sample data by data after augmentation and sample data, according to
The target labels and target labels probability of each sample data, screen each sample data, obtain target database.The present invention
The data screening scheme that embodiment provides, treats garbled data manually without user and screening is marked one by one, can be according to calculating
Machine program carries out data screening automatically, and it is convenient and time-consuming short to operate, and can either save human resources, and be able to ascend data screening
Efficiency.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various advantage and benefit are for ordinary skill people
Member will become clear.Attached drawing is only used for showing preferred embodiment, and is not to be construed as limiting the invention.And
In entire attached drawing, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the step flow chart of a kind of according to embodiments of the present invention one data screening method;
Fig. 2 is the step flow chart of a kind of according to embodiments of the present invention two data screening method;
Fig. 3 is a kind of structural block diagram of according to embodiments of the present invention three data screening device;
Fig. 4 is a kind of structural block diagram of according to embodiments of the present invention four terminal.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Embodiment one
Referring to Fig.1, a kind of step flow chart of data screening method of the embodiment of the present invention one is shown.
The data screening method of the embodiment of the present invention may comprise steps of:
Step 101: multiple noise datas are extracted from data to be screened as sample data.
Data screening mode provided in an embodiment of the present invention can be adapted for the big rule generated in operating to user's history
Mode noise data are screened, and noise data can be image.Such as: different user uploads image to platform, server according to
Prefixed time interval periodically screens image caused by user, user's operation image generated in prefixed time interval
It is then data to be screened.Prefixed time interval can be for one day, two days or 12 hours etc., in the embodiment of the present invention not to this
It is particularly shown.Single data screening process is illustrated in the embodiment of the present invention, during specific implementation, each data
Process described in executable embodiment of the present invention when screening.
A pre-existing trained image classification model in the embodiment of the present invention, comprising more in the image classification model
A label and the corresponding training data of each label need to be by the pre-selection of management service training when executing data screening operation
Good image classification model carries out Tag Estimation to data.
The noise data number extracted from data to be screened can be carried out according to actual needs by those skilled in the art
Adjustment.Such as: it can extract necessarily or the noise data of hundred million orders of magnitude is as sample data.It extracts and makes an uproar from data to be screened
When sound data, it can extract at random.
Step 102: conversion process being carried out to each sample data, obtains the transformation data of each sample data.
Wherein, the mapping mode of sample data can include but is not limited to: any side such as rotation, translation and shearing
Formula.
Step 103: by preparatory trained image classification model, label being carried out to each sample data and each transformation data
Prediction, determines the target labels and target labels probability of each sample data.
Respectively by the transformation data of each sample data and each sample data, input in trained image classification model in advance
Tag Estimation is carried out, the tag recognition result of each data of input can be obtained.For specifically according to trained image
The concrete mode of disaggregated model prediction data label does not do specific limit to this in the embodiment of the present invention referring to the relevant technologies
System.
It wherein, include: the probability of at least one label and each label in the tag recognition result of each data;Label
Probability it is higher, then illustrate data belong to the label instruction data category a possibility that it is bigger.
It, can be according to the sample data and the sample number when determining the target labels and target labels probability of a sample data
According to transformation data tag recognition as a result, according to ballot mode, determine a final label i.e. target labels.
Step 104: according to the target labels of each sample data and target labels probability, each sample data is screened,
Obtain target database.
When being screened to sample data, each sample data can be grouped according to said target label;Then
The data that preset quantity is screened out out from each grouping, the sample data screened constitute target database.
After this data screening, only retain the sample data in target data Kuku, for the sample number screened out
It will be dropped according to being not extracted by out in data to be screened as the data of sample data.Sample data in target database is then
It can be used for expanding image classification model.
Preset quantity can be configured according to actual needs by those skilled in the art, in the embodiment of the present invention not to this
Do concrete restriction.Preset quantity is smaller, then the sample data quantity screened out is more, and the sample data volume of reservation is fewer, accordingly
The precision of sample data is higher in ground target database.
Data screening method provided in an embodiment of the present invention, periodically carries out data screening, and when screening sieves from user twice
Sample data is extracted in the data, that is, data to be screened generated in choosing interval, each sample data is converted to carry out data
Augmentation determines the target labels and target labels probability of each sample data by data after augmentation and sample data, according to
The target labels and target labels probability of each sample data, screen each sample data, obtain target database.The present invention
The data screening method that embodiment provides, treats garbled data manually without user and screening is marked one by one, can be according to calculating
Machine program carries out data screening automatically, and it is convenient and time-consuming short to operate, and can either save human resources, and be able to ascend data screening
Efficiency.
Embodiment two
Referring to Fig. 2, a kind of step flow chart of data screening method of the embodiment of the present invention two is shown.
The data screening method of the embodiment of the present invention can specifically include following steps:
Step 201: multiple noise datas are extracted from data to be screened as sample data.
User can upload noise data such as image on platform in real time during historical operation, after managing the platform
Platform server can periodically screen the noise data generated in user's history operating process.Screening the period can be by this field skill
Art personnel are configured according to actual needs.Adjacent bolting house twice is then to be screened every the noise data that middle user's operation generates
Data.
During specific implementation, multiple noise datas can be extracted at random from data to be screened as sample data, are extracted
The quantity of sample data can be ten million magnitude or hundred million magnitudes.Such as: the noise data number that user generates daily in platform
Amount is tens, but since database volume is limited, then need to extract several hundred million or several ten million noise datas as sample number
According to abandoning remaining non-extracted noise data.
Wherein, the sample data extracted may make up a database, and database may be expressed as: DBnoise。
Step 202: conversion process being carried out to each sample data, obtains the transformation data of each sample data.
After being converted to sample data, the corresponding one or more transformation data of each sample data.
Sample data may be expressed as: samplei ori, converting data may be expressed as: samplei trans。
Preferably, for a sample data, the total number of sample data transformation data corresponding with the sample data is
Odd number.
Step 203: by preparatory trained image classification model, label being carried out to each sample data and each transformation data
Prediction respectively obtains the tag recognition result of each sample data and each transformation data.
Before executing data screening process, preparatory training image disaggregated model is needed.It is wrapped in trained image classification model
Containing multiple labels and the corresponding training data of each label, training data is clean data.For being trained based on training data
The concrete mode of image classification model is not particularly limited this in the embodiment of the present invention referring to the relevant technologies.Image point
The training of class model is substantially the continuous renewal to model parameter, until image classification model converges to preset standard.
Such as: loss function L (θ) can be calculated using stochastic gradient descent method for the parameter θ in image classification model
GradientThe gradientIt is used to constantly update the parameter in image classification model, furthermore it is also possible to according to this
GradientThe value of undated parameter θWherein, η is learning rate, the width updated for control parameter θ
Degree.
Wherein, tag recognition result includes: the corresponding each label of data and the corresponding probability of each label.Sample data and
Transformation data are referred to as data, enter data into image classification model and carry out Tag Estimation, it is defeated that image classification model will export institute
Enter the corresponding tag recognition result of data.
Image classification model can carry out Tag Estimation to the data of input in the following way:
Firstly, determining the characteristic pattern of input data;
Secondly, characteristic pattern is carried out dimension-reduction treatment, intermediate features figure is obtained;
Again, intermediate features figure is averaged pond, obtains the corresponding feature vector of intermediate features figure;Wherein, feature vector
In include multiple points, each pair of point answers a label and a probability, using the non-zero label of probability as the corresponding label of data
It is exported for effective label, and exports the corresponding probability of each effective label.
Step 204: being directed to each sample data, the tag recognition according to sample data is as a result, transformation with sample data
The tag recognition of data is as a result, determine the target labels and target labels probability of sample data.
After image classification model tag recognition, each sample data corresponds at least one label, final in this step
It needs by way of ballot, determines the unique objects label and target labels probability of each sample data.It is a kind of preferably logical
Cross ballot mode determine sample data target labels and target labels probability mode it is as follows:
Firstly, it is directed to each label of each sample data, the transformation data of sample data and sample data are corresponding
The probability of the label is weighted and averaged, and obtains the weighted average probability of the label;
The weighted average probability of the single label of single sample data can be calculated by following formula:
Wherein, i is sample data mark, and j is tag identifier,Weighted average for the label j of sample data i is general
Rate.In this formula, the probability of the j label of sample data and each transformation data is weighted and averaged, it is corresponding that the label can be obtained
Weighted average probability value.#sampleiFor samplei oriWith samplei transThe sum of, S is the mark for including in image classification model
Label set.
Secondly, determining the maximum value in the weighted average probability of each label;
Finally, the corresponding label of maximum weighted average probability is determined as the target labels of the sample data;It will most greatly
Weight average determine the probability is the target labels probability of the sample data.
Repeat which, it may be determined that the target labels and target labels probability of each sample data.Determine each sample number
According to target labels and target labels probability after, according to the target labels of each sample data and target labels probability, to each sample
Data are screened, and target database is obtained.Specific screening process such as step 205 is to step
Step 205: each sample data is grouped according to target labels.
Wherein, each grouping corresponds to a target labels, includes at least one sample data in each grouping, for various kinds
The corresponding transformation data of notebook data directly abandon, without being added in grouping.
Step 206: the sample data in same grouping being ranked up according to target labels probability.
Wherein, the target labels probability value for the preceding sample data that sorts is big.
Step 207: screening the sample data for the preceding preset quantity that obtains sorting in each grouping, obtain target database.
Wherein, preset quantity can be configured according to actual needs by those skilled in the art, in the embodiment of the present invention
This is not particularly limited.
The target labels probability size of each sample data in same grouping is ranked up in this step, in each grouping
Topk sample data is filtered out, target database is constituted.Only retain the sample data in target database, for being screened out
Sample data and data to be screened in be not extracted by out and will be dropped as the noise data of sample data.In target database
Sample data then can be used for expanding training image disaggregated model.
Data screening method provided in an embodiment of the present invention, except being had with data screening method shown in embodiment one
Outside some beneficial effects, probability based on each label by way of soft ballot determines the target labels and target of sample data
Label probability is able to ascend the accuracy of sample data target labels.
Embodiment three
Referring to Fig. 3, a kind of structural block diagram of data screening device of the embodiment of the present invention three is shown.
The data screening device of the embodiment of the present invention may include: extraction module 301, be configured as from data to be screened
Multiple noise datas are extracted as sample data;Conversion module 302 is configured as carrying out at transformation each sample data
Reason, obtains the transformation data of each sample data;Determining module 303 is configured as through preparatory trained image classification
Model carries out Tag Estimation to each sample data and each transformation data, determines the target mark of each sample data
Label and target labels probability;Screening module 304 is configured as general according to the target labels and target labels of each sample data
Rate screens each sample data, obtains target database.
Preferably, the screening module 304 may include: grouping submodule 3041, be configured as each sample number
It is grouped according to according to target labels;Wherein, the corresponding target labels of each grouping;Sorting sub-module 3042, is configured as
The sample data in same grouping is ranked up according to target labels probability;Wherein, the target for the preceding sample data that sorts
Label probability value is big;Submodule 3043 is generated, is configured as screening the sample for the preceding preset quantity that obtains sorting in each grouping
Data generate target database.
Preferably, the determining module 303 may include: identification submodule 3031, be configured as by training in advance
Image classification model, Tag Estimation is carried out to each sample data and each transformation data, respectively obtains each sample
The tag recognition result of notebook data and each transformation data;Wherein, tag recognition result includes: the corresponding each label of data
Probability corresponding with each label;Label determines submodule 3032, is configured as each sample data, according to the sample
The tag recognitions of data is as a result, with the tag recognitions of the transformation data of the sample data as a result, determining the sample data
Target labels and target labels probability.
Preferably, the label determines that submodule 3032 is specifically configured to: each label is directed to, by the sample data
The probability of the label corresponding with the transformation data of the sample data is weighted and averaged, and the weighting for obtaining the label is flat
Equal probability;Determine the maximum value in the weighted average probability of each label;By the corresponding label of maximum weighted average probability, it is determined as
The target labels of the sample data;The target labels that the maximum weighted average probability is determined as the sample data are general
Rate.
Preferably, the conversion module 302 is specifically configured to: being carried out to each sample data according to default mapping mode
Transformation, obtains the transformation data of each sample data;Wherein, default transform method includes at least one of: rotation, translation
And shearing.
The data screening device of the embodiment of the present invention sieves for realizing data corresponding in previous embodiment one, embodiment two
Choosing method, and there is beneficial effect corresponding with embodiment of the method, details are not described herein.
Example IV
Referring to Fig. 4, a kind of structural block diagram of terminal for garbled data of the embodiment of the present invention four is shown.
The terminal of the embodiment of the present invention may include: memory, processor and storage on a memory and can be in processor
The computer program of upper operation realizes any one heretofore described data screening when computer program is executed by processor
The step of method.
Fig. 4 is a kind of block diagram of data screening terminal 600 shown according to an exemplary embodiment.For example, terminal 600 can
To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for
Body equipment, personal digital assistant etc..
Referring to Fig. 4, terminal 600 may include following one or more components: processing component 602, memory 604, power supply
Component 606, multimedia component 608, audio component 610, input/output interface 612, sensor module 614 and communication component
616。
The integrated operation of the usual controlling terminal 600 of processing component 602, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing component 602 may include that one or more processors 620 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more modules, just
Interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, it is more to facilitate
Interaction between media component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in terminal 600.These data are shown
Example includes the instruction of any application or method for operating in terminal 600, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of terminal 600.Power supply module 606 may include power management system
System, one or more power supplys and other with for terminal 600 generate, manage, and distribute the associated component of electric power.
Multimedia component 608 includes the screen of one output interface of offer between the terminal 600 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 608 includes a front camera and/or rear camera.When terminal 600 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike
Wind (MIC), when terminal 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched
It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set
Part 616 is sent.In some embodiments, audio component 610 further includes a loudspeaker, is used for output audio signal.
Input/output interface 612 provides interface, above-mentioned peripheral interface between processing component 602 and peripheral interface module
Module can be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, starting are pressed
Button and locking press button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for terminal 600 is commented
Estimate.For example, sensor module 614 can detecte the state that opens/closes of terminal 600, and the relative positioning of component, for example, it is described
Component is the display and keypad of terminal 600, and sensor module 614 can also detect 600 1 components of terminal 600 or terminal
Position change, the existence or non-existence that user contacts with terminal 600,600 orientation of terminal or acceleration/deceleration and terminal 600
Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between terminal 600 and other equipment.Terminal
600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 600 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing data screening method, specifically
Data screening method includes: that multiple noise datas are extracted from data to be screened as sample data;To each sample data
Conversion process is carried out, the transformation data of each sample data are obtained;By preparatory trained image classification model, to each institute
It states sample data and each transformation data carries out Tag Estimation, determine the target labels and target labels of each sample data
Probability;According to the target labels and target labels probability of each sample data, each sample data is screened, is obtained
Target database.
Preferably, the target labels and target labels probability according to each sample data, to each sample number
According to the step of being screened, obtaining target database, comprising: each sample data to be grouped according to target labels;Its
In, the corresponding target labels of each grouping;The sample data in same grouping is ranked up according to target labels probability;Its
In, the target labels probability value for the preceding sample data that sorts is big;Screening obtains the preceding preset quantity that sorts in each grouping
Sample data generates target database.
Preferably, described by preparatory trained image classification model, to each sample data and each transformation
The step of data carry out Tag Estimation, determine the target labels and target labels probability of each sample data, comprising: by pre-
First trained image classification model carries out Tag Estimation to each sample data and each transformation data, respectively obtains
The tag recognition result of each sample data and each transformation data;Wherein, tag recognition result includes: that data are corresponding
Each label and the corresponding probability of each label;For each sample data, according to the sample data tag recognition as a result,
Tag recognition with the transformation data of the sample data is as a result, determine that the target labels of the sample data and target labels are general
Rate.
Preferably, according to the tag recognition of the sample data as a result, label with the transformation data of the sample data
Recognition result determines the target labels of the sample data and the step of the target labels probability, comprising: be directed to each mark
Label, the probability of the corresponding label of the transformation data of the sample data and the sample data is weighted and averaged, is obtained
To the weighted average probability of the label;Determine the maximum value in the weighted average probability of each label;Maximum weighted is averaged generally
The corresponding label of rate, is determined as the target labels of the sample data;The maximum weighted average probability is determined as the sample
The target labels probability of notebook data.
Preferably, described that each sample data is converted, obtain the step of the transformation data of each sample data
Suddenly, comprising: each sample data is converted according to default mapping mode, obtains the transformation data of each sample data;
Wherein, default transform method includes at least one of: rotation, translation and shearing.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided
It such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of terminal 600 to complete above-mentioned data screening side
Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic
Band, floppy disk and optical data storage devices etc..When the instruction in storage medium is executed by the processor of terminal, enable the terminal to
The step of executing any one heretofore described data screening method.
Terminal provided in an embodiment of the present invention, periodically carries out data screening, when screening from user in bolting house twice every interior
Sample data is extracted in the data of data of generation, that is, to be screened, each sample data is converted to carry out data augmentation, is led to
Data and sample data after crossing augmentation determine the target labels and target labels probability of each sample data, according to each sample number
According to target labels and target labels probability, each sample data is screened, obtain target database.The embodiment of the present invention mentions
The data screening scheme of confession treats garbled data manually without user and screening is marked one by one, can be according to computer program certainly
Dynamic to carry out data screening, it is convenient and time-consuming short to operate, and can either save human resources, and be able to ascend data screening efficiency.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
Provided herein data screening scheme not with any certain computer, virtual system or the intrinsic phase of other equipment
It closes.Various general-purpose systems can also be used together with teachings based herein.As described above, construction has present invention side
Structure required by the system of case is obvious.In addition, the present invention is also not directed to any particular programming language.It should be bright
It is white, it can use various programming languages and realize summary of the invention described herein, and retouched above to what language-specific was done
State is in order to disclose the best mode of carrying out the invention.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right
As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool
Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself
Separate embodiments of the invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any
Can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) come realize some in data screening scheme according to an embodiment of the present invention or
The some or all functions of person's whole component.The present invention is also implemented as one for executing method as described herein
Point or whole device or device programs (for example, computer program and computer program product).Such this hair of realization
Bright program can store on a computer-readable medium, or may be in the form of one or more signals.It is such
Signal can be downloaded from an internet website to obtain, and is perhaps provided on the carrier signal or is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be construed to title.
Claims (8)
1. a kind of data screening method, which is characterized in that the described method includes:
Multiple noise datas are extracted from data to be screened as sample data;
Conversion process is carried out to each sample data, obtains the transformation data of each sample data;
By preparatory trained image classification model, it is pre- that label is carried out to each sample data and each transformation data
It surveys, determines the target labels and target labels probability of each sample data;
According to the target labels and target labels probability of each sample data, each sample data is screened, is obtained
Target database;
It is described by preparatory trained image classification model, to each sample data and each transformation data progress label
The step of predicting, determining the target labels and target labels probability of each sample data, comprising:
By preparatory trained image classification model, it is pre- that label is carried out to each sample data and each transformation data
It surveys, respectively obtains the tag recognition result of each sample data and each transformation data;Wherein, tag recognition result packet
It includes: the corresponding each label of data and the corresponding probability of each label;
For each sample data, the tag recognition according to the sample data is as a result, transformation data with the sample data
Tag recognition as a result, determining the target labels and target labels probability of the sample data;
Tag recognition according to the sample data is as a result, with the tag recognitions of the transformation data of the sample data as a result, really
The step of target labels of the fixed sample data and the target labels probability, comprising:
For each label, by the probability of the corresponding label of the transformation data of the sample data and the sample data into
Row weighted average, obtains the weighted average probability of the label;
Determine the maximum value in the weighted average probability of each label;
By the corresponding label of maximum weighted average probability, it is determined as the target labels of the sample data;By the maximum weighted
Average probability is determined as the target labels probability of the sample data.
2. the method according to claim 1, wherein the target labels and mesh according to each sample data
The step of marking label probability, each sample data screened, obtaining target database, comprising:
Each sample data is grouped according to target labels;Wherein, the corresponding target labels of each grouping;
The sample data in same grouping is ranked up according to target labels probability;Wherein, sort preceding sample data
Target labels probability value is big;
The sample data for screening the preceding preset quantity that obtains sorting in each grouping, generates target database.
3. being obtained each the method according to claim 1, wherein described convert each sample data
The step of transformation data of the sample data, comprising:
Each sample data is converted according to default mapping mode, obtains the transformation data of each sample data;Wherein,
Default transform method includes at least one of: rotation, translation and shearing.
4. a kind of data screening device, which is characterized in that described device includes:
Extraction module is configured as extracting multiple noise datas from data to be screened as sample data;
Conversion module is configured as carrying out conversion process to each sample data, obtains the transformation number of each sample data
According to;
Determining module is configured as by preparatory trained image classification model, to each sample data and each change
It changes data and carries out Tag Estimation, determine the target labels and target labels probability of each sample data;
Screening module is configured as target labels and target labels probability according to each sample data, to each sample
Data are screened, and target database is obtained;
The determining module includes:
It identifies submodule, is configured as through preparatory trained image classification model, to each sample data and each described
It converts data and carries out Tag Estimation, respectively obtain the tag recognition result of each sample data and each transformation data;
Wherein, tag recognition result includes: the corresponding each label of data and the corresponding probability of each label;
Label determines submodule, is configured as each sample data, according to the sample data tag recognition as a result, and
The tag recognition of the transformation data of the sample data is as a result, determine that the target labels of the sample data and target labels are general
Rate;
The label determines that submodule is specifically configured to:
For each label, by the probability of the corresponding label of the transformation data of the sample data and the sample data into
Row weighted average, obtains the weighted average probability of the label;Determine the maximum value in the weighted average probability of each label;It will most
It is big to be weighted and averaged the corresponding label of probability, it is determined as the target labels of the sample data;By the maximum weighted average probability
It is determined as the target labels probability of the sample data.
5. device according to claim 4, which is characterized in that the screening module includes:
It is grouped submodule, is configured as each sample data being grouped according to target labels;Wherein, each grouping corresponds to
One target labels;
Sorting sub-module is configured as being ranked up the sample data in same grouping according to target labels probability;Wherein, it arranges
The target labels probability value of the preceding sample data of sequence is big;
Submodule is generated, is configured as screening the sample data for the preceding preset quantity that obtains sorting in each grouping, generates target
Database.
6. device according to claim 4, which is characterized in that the conversion module is specifically configured to:
Each sample data is converted according to default mapping mode, obtains the transformation data of each sample data;Wherein,
Default transform method includes at least one of: rotation, translation and shearing.
7. a kind of terminal characterized by comprising memory, processor and be stored on the memory and can be at the place
The computer program run on reason device is realized when the computer program is executed by the processor as appointed in claims 1 to 3
The step of data screening method described in one.
8. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize data screening method as claimed any one in claims 1 to 3 when being executed by processor
The step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810220055.1A CN108595497B (en) | 2018-03-16 | 2018-03-16 | Data screening method, apparatus and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810220055.1A CN108595497B (en) | 2018-03-16 | 2018-03-16 | Data screening method, apparatus and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108595497A CN108595497A (en) | 2018-09-28 |
CN108595497B true CN108595497B (en) | 2019-09-27 |
Family
ID=63626547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810220055.1A Active CN108595497B (en) | 2018-03-16 | 2018-03-16 | Data screening method, apparatus and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108595497B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109544150A (en) * | 2018-10-09 | 2019-03-29 | 阿里巴巴集团控股有限公司 | A kind of method of generating classification model and device calculate equipment and storage medium |
CN109598307B (en) * | 2018-12-06 | 2020-11-27 | 北京达佳互联信息技术有限公司 | Data screening method and device, server and storage medium |
CN109657710B (en) * | 2018-12-06 | 2022-01-21 | 北京达佳互联信息技术有限公司 | Data screening method and device, server and storage medium |
CN110147850B (en) * | 2019-05-27 | 2021-12-07 | 北京达佳互联信息技术有限公司 | Image recognition method, device, equipment and storage medium |
CN110348993B (en) * | 2019-06-28 | 2023-12-22 | 北京淇瑀信息科技有限公司 | Determination method and determination device for label for wind assessment model and electronic equipment |
CN110807767A (en) * | 2019-10-24 | 2020-02-18 | 北京旷视科技有限公司 | Target image screening method and target image screening device |
CN111507089B (en) * | 2020-06-09 | 2022-09-09 | 平安科技(深圳)有限公司 | Document classification method and device based on deep learning model and computer equipment |
CN113139628B (en) * | 2021-06-22 | 2021-09-17 | 腾讯科技(深圳)有限公司 | Sample image identification method, device and equipment and readable storage medium |
CN113837670A (en) * | 2021-11-26 | 2021-12-24 | 北京芯盾时代科技有限公司 | Risk recognition model training method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005509978A (en) * | 2001-11-16 | 2005-04-14 | チェン,ユアン,ヤン | Ambiguous neural network with supervised and unsupervised cluster analysis |
CN102880875A (en) * | 2012-10-12 | 2013-01-16 | 西安电子科技大学 | Semi-supervised learning face recognition method based on low-rank representation (LRR) graph |
CN106650721A (en) * | 2016-12-28 | 2017-05-10 | 吴晓军 | Industrial character identification method based on convolution neural network |
CN107526785A (en) * | 2017-07-31 | 2017-12-29 | 广州市香港科大霍英东研究院 | File classification method and device |
US9911033B1 (en) * | 2016-09-05 | 2018-03-06 | International Business Machines Corporation | Semi-supervised price tag detection |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7512273B2 (en) * | 2004-10-21 | 2009-03-31 | Microsoft Corporation | Digital ink labeling |
CN104463202B (en) * | 2014-11-28 | 2017-09-19 | 苏州大学 | A kind of multiclass image semisupervised classification method and system |
CN106960219B (en) * | 2017-03-10 | 2021-04-16 | 百度在线网络技术(北京)有限公司 | Picture identification method and device, computer equipment and computer readable medium |
-
2018
- 2018-03-16 CN CN201810220055.1A patent/CN108595497B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005509978A (en) * | 2001-11-16 | 2005-04-14 | チェン,ユアン,ヤン | Ambiguous neural network with supervised and unsupervised cluster analysis |
CN102880875A (en) * | 2012-10-12 | 2013-01-16 | 西安电子科技大学 | Semi-supervised learning face recognition method based on low-rank representation (LRR) graph |
US9911033B1 (en) * | 2016-09-05 | 2018-03-06 | International Business Machines Corporation | Semi-supervised price tag detection |
CN106650721A (en) * | 2016-12-28 | 2017-05-10 | 吴晓军 | Industrial character identification method based on convolution neural network |
CN107526785A (en) * | 2017-07-31 | 2017-12-29 | 广州市香港科大霍英东研究院 | File classification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN108595497A (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595497B (en) | Data screening method, apparatus and terminal | |
CN108664989B (en) | Image tag determines method, apparatus and terminal | |
CN108399409B (en) | Image classification method, device and terminal | |
CN108256549B (en) | Image classification method, device and terminal | |
CN104737523B (en) | The situational model in mobile device is managed by assigning for the situation label of data clustering | |
CN104584513B (en) | Select the apparatus and method for sharing the device of operation for content | |
CN108614858B (en) | Image classification model optimization method, apparatus and terminal | |
CN109740018B (en) | Method and device for generating video label model | |
CN109299387A (en) | A kind of information push method based on intelligent recommendation, device and terminal device | |
CN109389162B (en) | Sample image screening technique and device, electronic equipment and storage medium | |
CN108171254A (en) | Image tag determines method, apparatus and terminal | |
CN108664829A (en) | Equipment for providing information related with objects in images | |
CN104035995B (en) | Group's label generating method and device | |
CN1655119A (en) | Statistical models and methods to support the personalization of applications and services via consideration of preference encodings of a community of users | |
CN110266879A (en) | Broadcast interface display methods, device, terminal and storage medium | |
CN106355429A (en) | Image material recommendation method and device | |
CN111523324B (en) | Named entity recognition model training method and device | |
CN106572272A (en) | IVR voice menu determination method and apparatus | |
CN109871843A (en) | Character identifying method and device, the device for character recognition | |
CN108563683A (en) | Label addition method, device and terminal | |
CN108960283B (en) | Classification task increment processing method and device, electronic equipment and storage medium | |
CN107230137A (en) | Merchandise news acquisition methods and device | |
CN109509017A (en) | User's retention ratio prediction technique and device based on big data analysis | |
CN109859770A (en) | Music separation method, device and computer readable storage medium | |
CN109902738A (en) | Network module and distribution method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |