CN109657710A

CN109657710A - Data screening method, apparatus, server and storage medium

Info

Publication number: CN109657710A
Application number: CN201811489982.XA
Authority: CN
Inventors: 张志伟; 吴丽军; 李铅
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2019-04-19
Anticipated expiration: 2038-12-06
Also published as: CN109657710B

Abstract

The disclosure is directed to a kind of data screening method, apparatus, server and storage mediums, belong to internet area.This method comprises: classifying using disaggregated model to multiple initial data, the classification results of each initial data are obtained；Based on participle tool and term vector model, the second term vector of text information in the first term vector and each initial data of each class label is obtained；The second term vector of text information determines target data from the multiple initial data in the first term vector and each initial data based on each class label.By introducing participle tool and term vector model, the text information in initial data is indicated with the accessible vector form of computer, artificial mark bring cost is reduced, increases the utilization rate of the initial data of magnanimity.

Description

Data screening method, apparatus, server and storage medium

Technical field

This disclosure relates to internet area more particularly to a kind of data screening method, apparatus, server and storage medium.

Background technique

In the related technology, deep learning is used widely in fields such as natural language processing, text translations, wherein deep The accuracy for spending learning model depends on the scale of training data to need to obtain training data to the original from internet Beginning data are screened.

By taking image classification as an example, when being trained to depth model, it is necessary first to artificial by being carried out to initial data Mark, to obtain enough labeled data, then filters out training data from labeled data, due to needing the more of multiple labels Group training data, wherein the amount of training data of each label requires to reach " thousand " magnitude, it can implementation model training.

However, in above process, to obtain the training data of " thousand " magnitude, for each training data, needing standard Standby 10-20 labeled data, so that it is very big to the human cost of data mark investment, and since human resources are limited, nothing Method manually marks data as much as possible, so that utilizing to the initial data from internet of magnanimity insufficient.

Summary of the invention

The disclosure provides a kind of data screening method, apparatus, server and storage medium, can overcome data mark investment Human cost it is big, to data utilize insufficient problem.

According to the first aspect of the embodiments of the present disclosure, a kind of data screening method is provided, this method comprises:

Classified using disaggregated model to multiple initial data, obtains the classification results of each initial data, Mei Geyuan Beginning data include text information and image information, and for classifying to image information, which includes the disaggregated model At least one class label；

Based on participle tool and term vector model, obtain in the first term vector and each initial data of each class label Second term vector of text information；

Second term vector of text information in the first term vector and each initial data based on each class label, from In multiple initial data, target data is determined, the of the second term vector of the text information of the target data and class label Meet the first preset condition between one term vector.

In a kind of possible embodiment, it is based on participle tool and term vector model, obtains the first of each class label The second term vector of text information includes: in term vector and each initial data

To each initial data, using the participle tool, at least one word in the text information of the initial data is extracted Language；

Each class label and at least one word are inputted into the term vector model, export first term vector and this extremely The term vector of a few word；

The average vector of the term vector of at least one word is retrieved as second term vector.

In a kind of possible embodiment, the first term vector and each initial data Chinese based on each class label Second term vector of this information determines that target data includes from multiple initial data:

To each initial data, the second term vector each classification mark corresponding with the initial data of the initial data is obtained The COS distance of first term vector of label；

Initial data corresponding to the COS distance of default value will be less than, is determined as the target data.

In a kind of possible embodiment, which further includes at least one prediction probability, and each prediction probability is used In indicate an initial data belong to a class label a possibility that.

In a kind of possible embodiment, is classified using disaggregated model to multiple initial data, obtained each original The classification results of data include:

To each initial data, which is inputted into the disaggregated model, the initial data is exported and belongs to each classification The prediction probability of label, each prediction probability correspond to a class label；

Class label corresponding to the prediction probability of the second preset condition will be met, is retrieved as at least the one of the initial data A class label.

In a kind of possible embodiment, class label corresponding to the prediction probability of the second preset condition will be met, obtained At least one class label for being taken as the initial data includes:

When the maximum value in the prediction probability is greater than probability threshold value, will be greater than corresponding to the prediction probability of probability threshold value Class label is retrieved as at least one class label of the initial data；Or,

It, will be corresponding to the maximum value in the prediction probability when the maximum value in the prediction probability is less than or equal to probability threshold value Class label, be retrieved as the class label of the initial data.

According to the second aspect of an embodiment of the present disclosure, a kind of data screening device is provided, which includes:

Taxon is configured as executing and be classified using disaggregated model to multiple initial data, obtained each original The classification results of data, each initial data include text information and image information, the disaggregated model be used for image information into Row classification, which includes at least one class label；

Acquiring unit is configured as executing based on participle tool and term vector model, obtains the first of each class label Second term vector of text information in term vector and each initial data；

Determination unit is configured as executing the first term vector and each initial data Chinese based on each class label Second term vector of this information determines target data from multiple initial data, the second of the text information of the target data Meet the first preset condition between term vector and the first term vector of class label.

In a kind of possible embodiment, which is additionally configured to execute:

In a kind of possible embodiment, which includes:

Subelement is exported, is configured as executing that the initial data is inputted the disaggregated model to each initial data, exports The initial data belongs to the prediction probability of each class label, and each prediction probability corresponds to a class label；

Subelement is obtained, class label corresponding to the prediction probability of the second preset condition will be met by being configured as executing, It is retrieved as at least one class label of the initial data.

In a kind of possible embodiment, which is additionally configured to execute:

According to the third aspect of an embodiment of the present disclosure, a kind of server is provided, which includes:

Processor；

For storing the memory of the processor-executable instruction；

Wherein, which is configured as:

According to a fourth aspect of embodiments of the present disclosure, a kind of non-transitorycomputer readable storage medium is provided, when this is deposited When instruction in storage media is executed by the processor of server, enable the server to execute a kind of data screening method, the party Method includes:

According to a fifth aspect of the embodiments of the present disclosure, a kind of application program is provided, including one or more instructs, this Or a plurality of instruction can by the processor of server execute when, enable the server to execute a kind of data screening method, the party Method includes:

The technical scheme provided by this disclosed embodiment can include the following benefits:

Classify by using disaggregated model to initial data, so that the classification results of each initial data are obtained, and Based on participle tool and term vector model is based on, the first term vector of class label and the second term vector of initial data are obtained, To which the initial data for meeting the first preset condition is determined as target data, due to introducing participle tool and term vector mould Type enables the text information in initial data to indicate with the accessible vector form of computer, to reduce artificial mark Infuse bring cost, and avoid because human resources it is limited caused by the utilization rate to initial data limitation, to increase The big utilization rate of the initial data of magnanimity.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.

Fig. 1 is a kind of flow chart of data screening method shown according to an exemplary embodiment.

Fig. 2 is a kind of flow chart of data screening method shown according to an exemplary embodiment.

Fig. 3 is a kind of schematic diagram of data screening method shown according to an exemplary embodiment.

Fig. 4 is a kind of logical construction block diagram of data screening device shown according to an exemplary embodiment.

Fig. 5 is a kind of logical construction block diagram of server shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

Fig. 1 is a kind of flow chart of data screening method shown according to an exemplary embodiment, as shown in Figure 1, data Screening technique is in server, comprising the following steps:

In a step 101, server classifies to multiple initial data using disaggregated model, obtains each initial data Classification results, each initial data includes text information and image information, and the disaggregated model is for dividing image information Class, the classification results include at least one class label.

In a step 102, server is based on participle tool and term vector model, obtain the first word of each class label to Second term vector of text information in amount and each initial data.

In step 103, text in the first term vector and each initial data of the server based on each class label Second term vector of information determines target data from multiple initial data, the second word of the text information of the target data Meet the first preset condition between vector and the first term vector of class label.

The method that the embodiment of the present disclosure provides, classifies to initial data by using disaggregated model, to obtain every The classification results of a initial data, and based on participle tool and term vector model is based on, obtain the first term vector of class label With the second term vector of initial data, so that the initial data for meeting the first preset condition is determined as target data, due to drawing Enter participle tool and term vector model, enables the text information in initial data with the accessible vector form of computer Indicate, to reduce artificial mark bring cost, and avoid because human resources it is limited caused by initial data Utilization rate limitation, to increase the utilization rate of the initial data of magnanimity.

All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.

Fig. 2 is a kind of flow chart of data screening method shown according to an exemplary embodiment, as shown in Fig. 2, data Screening technique is in server, comprising the following steps:

In step 201, which is inputted disaggregated model, it is original to export this by server to each initial data Data belong to the prediction probability of each class label, and each prediction probability corresponds to a class label.

Wherein, each initial data may include text information and image information, the data scale of multiple initial data It can be millions, be also possible to hundred million grades, the embodiment of the present disclosure does not limit the data scale of multiple initial data specifically Fixed, optionally, multiple initial data can be from UGC (user generated content, user's original content) website The data obtained at random on platform are also possible to the data extracted at random in existing database, and the embodiment of the present disclosure is not to this The acquisition modes of multiple initial data are specifically limited.

Optionally, which can be classified by image information of the convolutional neural networks to input model, be led to It crosses multiple convolutional layers and obtains the characteristic pattern of each initial data, Nonlinear Processing is carried out to this feature figure using activation primitive, then Image input after Nonlinear Processing is differentiated into network, to export class label and prediction probability, wherein the activation primitive can To be sigmoid function, it can be tanh function, be also possible to ReLU function, the embodiment of the present disclosure is not to the activation primitive Way of realization is defined, for example, the activation primitive uses sigmoid, makes it possible to the section by variable mappings to (0,1), To obtain accurate class label in the initial data for the magnanimity that feature differs greatly.

Wherein, such distinguishing label can be the classification for indicating the image information of input model in the form of a label, for example, should Class label is " cat ", " dog ", " monkey " or " people " etc., to indicate the classification of image information；Wherein, the prediction probability can be with The mode of numerical value indicates a possibility that belonging to some class label size, for example, initial data belongs to the pre- of class label " people " Surveying probability can be 0.8, and that is to say that disaggregated model predicts that the initial data has 80% a possibility that is portrait.

In step 201, Fig. 3 is a kind of schematic diagram of data screening method shown according to an exemplary embodiment, ginseng See Fig. 3, it is assumed that the disaggregated model selects L class label, by taking i-th of initial data as an example, by i-th of initial data input The disaggregated model exports the L prediction probability that i-th of initial data belongs to each class label, wherein each prediction probability It is used to indicate a possibility that i-th of initial data belongs to a class label, L and i are positive integer, i-th of initial data For any data in multiple initial data, above-mentioned assorting process can be carried out to each initial data, therefore not to repeat here.

In step 202, when the maximum value in the prediction probability is greater than probability threshold value, server will be greater than probability threshold value Prediction probability corresponding to class label, be retrieved as at least one class label of initial data.

Optionally, which can be the numerical value of Server Default, be also possible to the number obtained according to preset rules According to based on above-mentioned example, which, which can be, is retrieved as the probability threshold value for the median of L prediction probability, or, this is pre- If rule, which can also be, is retrieved as the probability threshold value for L prediction probability average, the embodiment of the present disclosure is not to the probability threshold value Acquisition modes specifically limited.

It is a kind of possibility implementation of at least one class label for obtaining any initial data in above-mentioned steps 202, It that is to say through a disaggregated model, filter out that prediction probability is larger and the higher class label of classification accuracy, and one In a little embodiments, which can take following manner to be replaced: when the maximum value in the prediction probability is less than or equal to generally When rate threshold value, class label corresponding to the maximum value in the prediction probability is retrieved as the classification of the initial data by server Label that is to say, if L prediction probability of some initial data is all less than or equal to the probability threshold value, to avoid this original Data do not have corresponding class label, then obtain class label corresponding to the maximum value of prediction probability.

Therefore, obtained class label can be expressed using following functional expressions in step 202:Wherein, label_iFor at least one class corresponding to i-th of initial data Distinguishing label,For first of class label without screening of i-th of initial data, prob_thresholdFor probability threshold Value.

Correspondingly, obtained class label can be carried out using following functional expressions in the alternative of step 202 Expression:Wherein, label_iFor corresponding to i-th of initial data at least One class label, prediction_iFor the unscreened class label of either one or two of i-th of initial data, argmax () letter Number is used to indicate the index position of the maximum value of input quantity.

In step 203, server extracts the text envelope of the initial data using participle tool to each initial data At least one word in breath.

Wherein, the participle tool is for extracting to the word in text information, for example, the text of i-th of initial data Information be " I likes chafing dish ", text information is handled by participle tool, so as to extract " I ", " liking " and " chafing dish " three words, wherein the participle tool can be jieba etc., and the embodiment of the present disclosure is not to the content of text information It is specifically limited with the implementation of participle tool.

In step 204, server is by each class label and at least one word input word vector model, output the The term vector of one term vector and at least one word.

Wherein, the term vector model (word embedding) can be embedded in by word come obtain input word word to Amount so that indicate text information with the accessible vector form of computer, for example, the term vector model can be Chinese word to Measure MODEL C hineseWord2Vector etc., wherein first term vector is L term vector corresponding to L class label, should At least one word is in above-mentioned steps 203, and server extracts the text information institute in each initial data according to participle tool Obtained word.

In step 205, the average vector of the term vector of at least one word is retrieved as the second term vector by server.

Wherein, the second term vector is term vector corresponding to text information in each initial data, second term vector Expression formula can be such thatWherein, Vector_i ^dFor i-th of original number According to dimension be d the second term vector, #Word_iFor i-th of initial data text information segmented after obtained word Number, Embedding are term vector model, and d is the corresponding dimension of term vector.

The word of 204-205 through the above steps, the term vector of available each initial data and each class label to Amount, so as to judge whether accord between the first term vector and the second term vector using COS distance by following step 206 Close the first preset condition, avoid the labour cost that artificial mark expends, also avoid due to human resources are limited and to original It is limited caused by beginning data user rate, to increase the utilization rate of the initial data of magnanimity.

In step 206, server obtains the second term vector and the original number of the initial data to each initial data According to the COS distance of the first term vector of corresponding each class label.

In above process, by taking i-th of initial data as an example, by step 201-202 to i-th of the initial data output Class label be " cat ", " dog " and " monkey ", it is assumed that the first term vector of class label " cat " is expressed as Cat_vector_i, class First term vector of distinguishing label " dog " is expressed as Dog_vector_i, the first term vector of class label " monkey " is expressed as Monkey_ vector_i, and the second term vector of i-th of initial data is expressed as Vector_i, then second term vector and 3 class label institutes The COS distance of corresponding first term vector are as follows: distance_i=cos (Vector_i,Cat_vector_i)=0.9, distance_i=cos (Vector_i,Dog_vector_i)=0.6, distance_i=cos (Vector_i,Monkey_vector_i) =0.3.It is only illustrated by taking i-th of initial data as an example herein, in practice for each initial data, can be carried out class Like step, to obtain remaining between the second term vector of the initial data and the first term vector of each corresponding class label Chordal distance, therefore not to repeat here.

In step 207, server will be less than initial data corresponding to the COS distance of default value, be determined as target Data.

Wherein, which can be the threshold value of Server Default, for example, the default value can be 0.5, based on upper Example is stated, if the COS distance between the second term vector and class label " monkey " is less than the default value, that is to say, distance_i=cos (Vector_i,Monkey_vector_i)=0.3 < 0.5 is then considered as the second word of i-th of initial data Meet the first preset condition between vector and the first term vector of class label " monkey ", so that i-th of initial data be obtained For target data, " monkey " is the corresponding class label of the target data.In above process, if some initial data owns COS distance is all larger than equal to the default value, then the initial data will not be acquired as target data, and it is pre- to be considered as classification results Sniffing misses, which is noise data.

The method that the embodiment of the present disclosure provides, classifies to initial data by using disaggregated model, to obtain every The classification results of a initial data, and based on participle tool and term vector model is based on, obtain the first term vector of class label With the second term vector of initial data, so that the initial data for meeting the first preset condition is determined as target data, due to drawing Enter participle tool and term vector model, enables the text information in initial data with the accessible vector form of computer Indicate, to reduce artificial mark bring cost, and avoid because human resources it is limited caused by initial data Utilization rate limitation, to increase the utilization rate of the initial data of magnanimity；Further, by by initial data interior prediction Probability is greater than the class label of probability threshold value, at least one class label of the initial data is retrieved as, to filter out original The noise data classified is difficult in beginning data；In addition, by by text information the term vector of at least one word it is average to Amount is retrieved as the first term vector, each initial data is described with a term vector, and then according to the first word The COS distance of the second term vector of vector sum determines target data, so that data screening is more accurate.

Fig. 4 is a kind of logical construction block diagram of data screening device shown according to an exemplary embodiment.Reference Fig. 4, The device includes taxon 401, acquiring unit 402 and determination unit 403:

Taxon 401 is configured as executing and be classified using disaggregated model to multiple initial data, obtains each original The classification results of beginning data, each initial data include text information and image information, which is used for image information Classify, which includes at least one class label；

Acquiring unit 402, is configured as executing based on participle tool and term vector model, obtains the of each class label Second term vector of text information in one term vector and each initial data；

Determination unit 403 is configured as executing the first term vector based on each class label and each initial data Second term vector of middle text information determines target data from multiple initial data, the text information of the target data Meet the first preset condition between second term vector and the first term vector of class label.

The device that the embodiment of the present disclosure provides, classifies to initial data by using disaggregated model, to obtain every The classification results of a initial data, and based on participle tool and term vector model is based on, obtain the first term vector of class label With the second term vector of initial data, so that the initial data for meeting the first preset condition is determined as target data, due to drawing Enter participle tool and term vector model, enables the text information in initial data with the accessible vector form of computer Indicate, to reduce artificial mark bring cost, and avoid because human resources it is limited caused by initial data Utilization rate limitation, to increase the utilization rate of the initial data of magnanimity.

In a kind of possible embodiment, which is additionally configured to execute:

In a kind of possible embodiment, the device based on Fig. 4 is formed, which includes:

In a kind of possible embodiment, which is additionally configured to execute:

About the device in above-described embodiment, wherein each unit executes the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 5 is a kind of logical construction block diagram of server shown according to an exemplary embodiment, which can be because Configuration or performance are different and generate bigger difference, may include one or more processors (central Processing units, CPU) 501 and one or more memory 502, wherein it is stored in the memory 502 At least one instruction, at least one instruction are loaded by the processor 501 and are executed to realize that above-mentioned each embodiment of the method mentions The data screening method of confession.Certainly, which can also have wired or wireless network interface, keyboard and input defeated The components such as outgoing interface, to carry out input and output, which can also include other portions for realizing functions of the equipments Part, this will not be repeated here.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 502 of instruction, above-metioned instruction can be executed by the processor 501 of server 500 to complete above-mentioned data screening Method obtains the classification knot of each initial data this method comprises: classifying using disaggregated model to multiple initial data Fruit, each initial data include text information and image information, and the disaggregated model is for classifying to image information, the classification It as a result include at least one class label；Based on participle tool and term vector model, obtain the first word of each class label to Second term vector of text information in amount and each initial data；The first term vector and each original based on each class label The second term vector of text information determines target data from multiple initial data in beginning data, the text of the target data Meet the first preset condition between second term vector of information and the first term vector of class label.Optionally, above-metioned instruction is also It can be executed as the processor 501 of server 500 to complete other steps involved in the above exemplary embodiments.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

In the exemplary embodiment, a kind of application program, including one or more instruction are additionally provided, this one or more Instruction can be executed by the processor 501 of server 500, to complete above-mentioned data screening method, this method comprises: using classification Model classifies to multiple initial data, obtains the classification results of each initial data, and each initial data includes text envelope Breath and image information, for the disaggregated model for classifying to image information, which includes at least one class label； Based on participle tool and term vector model, text information in the first term vector and each initial data of each class label is obtained The second term vector；In the first term vector and each initial data based on each class label the second word of text information to Amount, from multiple initial data, determines target data, the second term vector and class label of the text information of the target data The first term vector between meet the first preset condition.Optionally, above-metioned instruction can also be by the processor 501 of server 500 It executes to complete other steps involved in the above exemplary embodiments.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims

1. a kind of data screening method, which is characterized in that the described method includes:

Classified using disaggregated model to multiple initial data, obtains the classification results of each initial data, each original number According to including text information and image information, for classifying to image information, the classification results include the disaggregated model At least one class label；

Based on participle tool and term vector model, text in the first term vector and each initial data of each class label is obtained Second term vector of information；

Second term vector of text information in the first term vector and each initial data based on each class label, from institute It states in multiple initial data, determines target data, the second term vector of the text information of the target data and class label Meet the first preset condition between first term vector.

2. data screening method according to claim 1, which is characterized in that described to be based on participle tool and term vector mould Type, the second term vector for obtaining text information in the first term vector and each initial data of each class label include:

Each class label and at least one described word are inputted into the term vector model, export first term vector and institute State the term vector of at least one word；

3. data screening method according to claim 1, which is characterized in that based on each class label The second term vector of text information determines number of targets from the multiple initial data in one term vector and each initial data According to including:

4. data screening method according to claim 1, which is characterized in that the classification results further include that at least one is pre- Probability is surveyed, each prediction probability is used to indicate an a possibility that initial data belongs to a class label.

5. data screening method according to claim 4, which is characterized in that described to use disaggregated model to multiple original numbers According to classifying, the classification results for obtaining each initial data include:

To each initial data, the initial data is inputted into the disaggregated model, the initial data is exported and belongs to each class The prediction probability of distinguishing label, each prediction probability correspond to a class label；

Class label corresponding to the prediction probability of the second preset condition will be met, is retrieved as the initial data at least one Class label.

6. data screening method according to claim 5, which is characterized in that the prediction that the second preset condition will be met Class label corresponding to probability, at least one class label for being retrieved as the initial data include:

When the maximum value in the prediction probability is greater than probability threshold value, class corresponding to the prediction probability of probability threshold value will be greater than Distinguishing label is retrieved as at least one class label of the initial data；Or,

7. a kind of data screening device, which is characterized in that described device includes:

Taxon is configured as executing and be classified using disaggregated model to multiple initial data, obtains each initial data Classification results, each initial data includes text information and image information, and the disaggregated model is used to carry out image information Classification, the classification results include at least one class label；

Acquiring unit, be configured as executing obtaining based on participle tool and term vector model the first word of each class label to Second term vector of text information in amount and each initial data；

Determination unit is configured as executing text in the first term vector and each initial data based on each class label Second term vector of information determines target data from the multiple initial data, the of the text information of the target data Meet the first preset condition between two term vectors and the first term vector of class label.

8. data screening device according to claim 7, which is characterized in that the acquiring unit is additionally configured to execute:

9. a kind of server characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to executing to realize such as claim 1 to the described in any item data sieves of claim 6 Operation performed by choosing method.

10. a kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processor of server When execution, enable the server to execute one kind to realize such as claim 1 to the described in any item data screenings of claim 6 Operation performed by method.