CN109977400B - Verification processing method and device, computer storage medium and terminal - Google Patents

Verification processing method and device, computer storage medium and terminal Download PDF

Info

Publication number
CN109977400B
CN109977400B CN201910176629.4A CN201910176629A CN109977400B CN 109977400 B CN109977400 B CN 109977400B CN 201910176629 A CN201910176629 A CN 201910176629A CN 109977400 B CN109977400 B CN 109977400B
Authority
CN
China
Prior art keywords
parameter information
sample
calculating
processing
complexity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910176629.4A
Other languages
Chinese (zh)
Other versions
CN109977400A (en
Inventor
王道广
于政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910176629.4A priority Critical patent/CN109977400B/en
Publication of CN109977400A publication Critical patent/CN109977400A/en
Application granted granted Critical
Publication of CN109977400B publication Critical patent/CN109977400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method, a device, a computer storage medium and a terminal for checking processing are provided, which comprises: calculating relevant parameter information of each sample for a preset number of samples which are marked; determining the checking parameter information of each sample according to the calculated related parameter information; selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing; wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length. The embodiment of the invention reduces the number of samples needing to be verified and improves the verification efficiency of the label.

Description

Verification processing method and device, computer storage medium and terminal
Technical Field
The present disclosure relates to, but not limited to, information processing technology, and more particularly, to a method, an apparatus, a computer storage medium, and a terminal for verification processing.
Background
With the development of information technology, artificial intelligence is more and more widely applied in production and life. Neuro-linguistic programming (NLP) is one of the important fields in artificial intelligence, and plays an important role in products or applications such as dialogue systems, knowledge-graphs, auxiliary cases, and the like. NLP is oriented primarily to text data such as dialog text, news, reviews, referee documents, etc. Most NLP tasks, such as text classification, sequence labeling and the like belong to supervised learning, and the text classification, the sequence and the like need to be labeled.
Labeling is mainly performed by a professional; due to the diversity of natural language itself and the possible subjective differences of the text and task comprehension of the annotating personnel, the problems of inconsistent (multiple results are acceptable) or wrong annotation results can be caused. Too much inconsistency or error affects the training of the model and thus the final application effect, so that the annotation result needs to be checked. Currently, the method for verifying the labeling result includes cross-checking and random sampling; wherein, the cross check is as follows: marking the same data by two or more persons, and further checking and verifying inconsistent marking results; the disadvantages of cross-checking are: 1. the workload is greatly increased by marking two or more persons; 2. the result is consistent and does not represent that the result is correct, and special verification cannot be performed on data which is prone to error or has high labeling uncertainty. Random spot inspection: a certain proportion (e.g. 20%) of the data is randomly extracted from the labeling result and verified. The random spot check has the following disadvantages: 1. the result has randomness, and the verification result possibly cannot reflect the condition of the overall marking result; 2. due to random extraction, special verification cannot be performed on data which is prone to errors or has high labeling uncertainty.
In summary, in the current method for verifying the annotation result, the verification result needs to be further improved, and the verification manner needs to be further improved.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
Embodiments of the present invention provide a method and an apparatus for verification processing, a computer storage medium, and a terminal, which can reduce the number of samples to be verified for tagging and improve the verification efficiency of tagging.
The embodiment of the invention provides a method for checking processing, which comprises the following steps:
calculating relevant parameter information of each sample for a preset number of samples which are marked;
determining the checking parameter information of each sample according to the calculated related parameter information;
selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length.
Optionally, the calculating the relevant parameter information of each sample includes:
when the related parameter information comprises complexity, performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through a preset word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain the complexity according to the variance of each dimension vector of the participle obtained by calculation;
when the relevant parameter information comprises uncertainty, recording the labeling time for labeling each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; and calculating to obtain the uncertainty according to the determined marking speed.
Optionally, the determining the checking parameter information of each sample includes:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying the relevant parameter information by the corresponding weighting proportion respectively, accumulating, and calculating to obtain the checking parameter information of the various examples.
Optionally, before setting the corresponding weighting ratio for each piece of related parameter information according to the preset policy, the method further includes:
and carrying out normalization processing on the related parameter information.
Optionally, the selecting, according to the determined checking parameter information, a sample that needs to be checked includes:
and sequencing the determined checking parameter information of the various examples according to the value size, and determining the preset numerical value examples with larger values for checking.
On the other hand, an embodiment of the present invention further provides a device for checking processing, including: an arithmetic unit, a determination unit and a selection processing unit; wherein,
the arithmetic unit is used for: calculating relevant parameter information of each sample for a preset number of samples which are marked;
the determination unit is used for: determining the checking parameter information of each sample according to the calculated related parameter information;
the selection processing unit is used for: selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length.
Optionally, the operation unit is specifically configured to:
when the related parameter information comprises complexity, performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through a preset word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain the complexity according to the variance of each dimension vector of the participle obtained by calculation;
when the relevant parameter information comprises uncertainty, recording marking time for marking each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; and calculating to obtain the uncertainty according to the determined marking speed.
Optionally, the determining unit is specifically configured to:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying the relevant parameter information by the corresponding weighting proportion respectively, accumulating, and calculating to obtain the checking parameter information of the various examples.
Optionally, the apparatus further includes a normalization unit, configured to:
and carrying out normalization processing on the related parameter information.
Optionally, the selection processing unit is specifically configured to:
and sequencing the determined checking parameter information of the various examples according to the value size, and determining the preset numerical value examples with larger values for checking.
In another aspect, an embodiment of the present invention further provides a computer storage medium, where computer-executable instructions are stored in the computer storage medium, and the computer-executable instructions are used to execute the method for checking processing.
In another aspect, an embodiment of the present invention further provides a terminal, including: a memory and a processor; wherein,
the processor is configured to execute program instructions in the memory;
the program instructions read on the processor to perform the following operations:
calculating relevant parameter information of each sample for a preset number of samples which are marked;
determining the checking parameter information of each sample according to the calculated related parameter information;
selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length.
Compared with the related art, the technical scheme of the application comprises the following steps: calculating relevant parameter information of each sample for a preset number of samples which are marked; determining the checking parameter information of each sample according to the calculated related parameter information; selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing; wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length. The embodiment of the invention reduces the number of samples needing to be verified and improves the verification efficiency of the label.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flow chart of a method of verification processing according to an embodiment of the invention;
FIG. 2 is a block diagram of an apparatus for verification processing according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a relationship before and after the complexity normalization processing according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Fig. 1 is a flowchart of a verification processing method according to an embodiment of the present invention, as shown in fig. 1, including:
step 101, calculating relevant parameter information of each sample for a preset number of samples which are marked;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length.
It should be noted that the number of the examples according to the embodiments of the present invention can be determined by analyzing by those skilled in the art according to the task and the labeled data amount.
Optionally, the calculating of the relevant parameter information of each sample in the embodiment of the present invention includes:
when the related parameter information comprises complexity, performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through a preset word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain the complexity according to the variance of each dimension vector of the participle obtained by calculation;
the following briefly explains the computation of complexity by way of example operations:
performing word segmentation on a single sample to obtain a word segmentation model consisting of: w is a 1 w 2 w 3 ...w n The word vector sequence corresponding to the word segmentation sequence is v 1 v 2 v 3 ...v n . The word vector can be trained in advance through large-scale linguistic data, and an open word vector model can also be used. By v i,j Vector v of representative words i The value of dimension j, then:
the j-th dimension mean is:
Figure BDA0001989759990000061
the j-th variance is:
Figure BDA0001989759990000062
the sample complexity is:
Figure BDA0001989759990000063
when the relevant parameter information comprises uncertainty, recording marking time for marking each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; and calculating to obtain the uncertainty according to the determined marking speed.
The calculation of the uncertainty is briefly explained below by way of example:
the embodiment of the invention assumes that the labeled person corresponding to the sample i is p i The number of words is n i Complexity of c i Noting time t i (ii) a The embodiment of the invention sets the absolute value of the difference value between the expected marking time and the actual marking time as uncertainDegree:
Figure BDA0001989759990000064
102, determining verification parameter information of each sample according to the calculated related parameter information;
optionally, the determining the checking parameter information of each sample in the embodiment of the present invention includes:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying the relevant parameter information by the corresponding weighting proportion respectively, accumulating, and calculating to obtain the checking parameter information of the various examples.
According to the setting, the embodiment of the invention can calculate the check index of the sample i as follows:
V i =w 1 *L i +w 2 *c i +w 3 *U i
it should be noted that the weighting ratio of the embodiment of the present invention can be determined by analysis of those skilled in the art, and can be set at w, for example 1 +w 2 +w 3 And on the premise of =1, setting a weighting ratio corresponding to each piece of relevant parameter information.
Optionally, before setting the corresponding weighting ratio for each piece of related parameter information according to the preset policy, the method according to the embodiment of the present invention further includes:
and carrying out normalization processing on the related parameter information.
According to the embodiment of the invention, through normalization processing, the interference of abnormal data on the selection of the sample needing to be checked can be avoided.
Optionally, when the related parameter information includes complexity, normalization processing is performed on the determined complexity.
It should be noted that, the formula of the normalization process in the embodiment of the present invention may include:
Figure BDA0001989759990000071
α is a tunable factor, which can be determined analytically by those skilled in the art;
optionally, when the relevant parameter information includes the uncertainty, performing normalization processing on the uncertainty obtained by calculation;
here, the formula of the uncertainty information normalization process according to the embodiment of the present invention may include:
Figure BDA0001989759990000072
β has a meaning similar to α, is a regulatable factor, and can be determined analytically by those skilled in the art.
Optionally, when the relevant parameter information includes the length, the length is normalized.
The number of words corresponding to the sample i in the embodiment of the invention is n i The normalized length obtained by adjusting the length may be:
Figure BDA0001989759990000073
γ has a meaning similar to α, is a regulatable factor, and can be determined analytically by those skilled in the art.
n i Including but not limited to values obtained using the word segmentation method used in embodiments of the present invention.
103, selecting a sample needing to be verified according to the determined verification parameter information to perform verification processing;
optionally, in the embodiment of the present invention, selecting a sample to be verified according to the determined verification parameter information includes: and sequencing the determined checking parameter information of the various examples according to the value size, and determining the preset numerical value examples with larger values for checking.
The processing procedure of the embodiment of the present invention is exemplified by the following samples with serial numbers of 0 to 9, and table 1 is the basic parameters of the samples in the early stage of processing:
Figure BDA0001989759990000081
TABLE 1
Table 2 shows the information of the parameters including complexity, uncertainty, and length obtained by calculation according to the method of the embodiment of the present invention, and it should be noted that the parameters shown in table 2 are normalized; table 2 is mapped with the information in table 1 by the sequence number.
Figure BDA0001989759990000091
TABLE 2
According to the relevant parameter information after normalization processing, in the embodiment of the present invention, if 4 samples are selected for verification, 4 samples with the verification parameters sorted in the front may be selected for verification, that is, data with serial numbers of 3, 0, 5, and 7 are selected for verification. The verification method may be implemented with reference to methods known in the related art.
It should be noted that the number of the examples for verification can be determined by those skilled in the art according to the analysis of the labeled task.
Compared with the related art, the technical scheme of the application comprises the following steps: calculating relevant parameter information of each sample for a preset number of samples which are marked; determining the checking parameter information of each sample according to the calculated related parameter information; selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing; wherein the related parameter information includes one or more of the following information: complexity, uncertainty, length. The embodiment of the invention reduces the number of samples needing to be verified and improves the verification efficiency of the label.
Fig. 2 is a block diagram of a structure of a device for verification processing according to an embodiment of the present invention, as shown in fig. 2, including: an arithmetic unit, a determination unit and a selection processing unit; wherein,
the arithmetic unit is used for: calculating relevant parameter information of all samples for a preset number of samples completing labeling;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, length.
Optionally, the operation unit in the embodiment of the present invention is specifically configured to:
when the related parameter information comprises complexity, performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through a preset word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain the complexity according to the variance of each dimension vector of the participle obtained by calculation;
the complexity calculation is briefly described below by way of example operations:
performing word segmentation on a single sample to obtain a word comprising the following segmented words: w is a 1 w 2 w 3 ...w n The word vector sequence corresponding to the word segmentation sequence is v 1 v 2 v 3 ...v n . The word vectors may be pre-trained through large-scale corpora, or may use an open word vector model. By v i,j Vector v of representative words i The value of dimension j, then:
the j-th dimension mean is:
Figure BDA0001989759990000101
the j-th variance is:
Figure BDA0001989759990000102
the sample complexity is:
Figure BDA0001989759990000103
when the relevant parameter information comprises uncertainty, recording marking time for marking each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; and calculating to obtain the uncertainty according to the determined marking speed.
The calculation of the uncertainty is briefly explained below by way of example:
the embodiment of the invention assumes that the labeled person corresponding to the sample i is p i The number of words is n i Complexity of c i Noting time t i (ii) a The embodiment of the invention sets the absolute value of the difference between the expected marking time and the actual marking time as the uncertainty:
Figure BDA0001989759990000104
the determination unit is used for: determining the checking parameter information of each sample according to the calculated related parameter information;
optionally, the determining unit in the embodiment of the present invention is specifically configured to:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying and accumulating the relevant parameter information and the corresponding weighting proportion respectively, and calculating to obtain the verification parameter information of the various examples.
According to the setting, the embodiment of the invention can calculate the check index of the sample i as follows:
V i =w 1 *L i +w 2 *c i +w 3 *U i
it should be noted that the weighting ratio of the embodiment of the present invention can be determined by analysis of those skilled in the art, and can be set at w, for example 1 +w 2 +w 3 And on the premise of =1, setting a weighting ratio corresponding to each piece of relevant parameter information.
Optionally, the apparatus in this embodiment of the present invention further includes a normalization unit, configured to:
and carrying out normalization processing on the related parameter information.
According to the embodiment of the invention, through normalization processing, the interference of abnormal data on the selection of the sample needing to be checked can be avoided.
Optionally, when the related parameter information includes complexity, normalization processing is performed on the determined complexity.
It should be noted that, the formula of the normalization process in the embodiment of the present invention may include:
Figure BDA0001989759990000111
α is a tunable factor, which can be determined analytically by those skilled in the art; for example, the average of all c may be taken;
fig. 3 is a schematic diagram of a relationship before and after the complexity normalization processing according to an embodiment of the present invention, and as shown in fig. 3, there is a correlation shown in a graph between the complexity after the normalization processing and the complexity without the normalization processing; wherein x represents the complexity of the unnormalized process; y represents the complexity after the normalization process.
Optionally, when the relevant parameter information includes the uncertainty, performing normalization processing on the uncertainty obtained by calculation;
here, the formula of the uncertainty information normalization process according to the embodiment of the present invention may include:
Figure BDA0001989759990000112
beta has a meaning similar to alpha, is a regulatable factor, and can be determined analytically by a person skilled in the art, e.g. all D's can be taken i Of the average value of (a).
Optionally, when the relevant parameter information includes the length, the length is normalized.
The number of words corresponding to the sample i in the embodiment of the invention is n i The normalized length obtained by adjusting the length may be:
Figure BDA0001989759990000121
γ has a meaning similar to α, is a tunable factor, and can be determined analytically by one skilled in the art, e.g., all n can be taken i Is measured.
n i Including but not limited to values obtained using the word segmentation method used in embodiments of the present invention.
The selection processing unit is used for: selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
optionally, the selection processing unit in the embodiment of the present invention is specifically configured to:
and sequencing the determined checking parameter information of the various examples according to the value size, and determining the preset numerical value examples with larger values for checking.
Compared with the related art, the technical scheme of the application comprises the following steps: calculating relevant parameter information of each sample for a preset number of samples which are marked; determining the checking parameter information of each sample according to the calculated related parameter information; selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing; wherein the related parameter information includes one or more of the following information: complexity, uncertainty, length. The embodiment of the invention reduces the number of samples needing to be verified and improves the verification efficiency of the label.
The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used for executing the verification processing method.
An embodiment of the present invention further provides a terminal, including: a memory and a processor; wherein,
the processor is configured to execute program instructions in the memory;
the program instructions read at the processor to perform the following operations:
calculating relevant parameter information of each sample for a preset number of samples which are marked;
determining the checking parameter information of each sample according to the calculated related parameter information;
selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information includes one or more of the following information: complexity, uncertainty, length.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by a program instructing associated hardware (e.g., a processor) to perform the steps, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in hardware, for example, by an integrated circuit to implement its corresponding function, or in software, for example, by a processor executing a program/instruction stored in a memory to implement its corresponding function. The present invention is not limited to any specific form of combination of hardware and software.
Although the embodiments of the present invention have been described above, the above description is only for the convenience of understanding the present invention, and is not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of verification processing, comprising:
calculating relevant parameter information of each sample for a preset number of samples which are marked;
determining the checking parameter information of each sample according to the calculated related parameter information;
selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, and length; the calculating the relevant parameter information of each sample comprises: when the related parameter information comprises complexity, performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through the word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain the complexity according to the variance of each dimension vector of the participle obtained by calculation; when the relevant parameter information comprises uncertainty, recording the labeling time for labeling each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; and calculating to obtain the uncertainty according to the determined marking speed.
2. The method of claim 1, wherein the determining the checking parameter information of each sample comprises:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying the relevant parameter information by the corresponding weighting proportion respectively, accumulating, and calculating to obtain the checking parameter information of the various examples.
3. The method of claim 2, wherein before setting the corresponding weighting ratio for each piece of related parameter information according to the preset policy, the method further comprises:
and carrying out normalization processing on the related parameter information.
4. The method according to any one of claims 1 to 3, wherein the selecting the sample needing to be verified according to the determined verification parameter information comprises:
and sequencing the determined checking parameter information of the various examples according to the value size, and checking the samples with the preset values in which the value sequencing is performed.
5. An apparatus for verification processing, comprising: an arithmetic unit, a determination unit and a selection processing unit; wherein,
the arithmetic unit is used for: for the samples with preset number of completed labels, calculating relevant parameter information of each sample, wherein the relevant parameter information comprises one or more than one of the following information: complexity, uncertainty, and length; when the related parameter information includes complexity, calculating the complexity by: performing word segmentation processing on a preset corpus through a preset word segmentation algorithm; training the participles obtained after the participle processing to obtain a word vector set of each participle; for each example: performing word segmentation processing through a word segmentation algorithm; obtaining word vectors of all participles from the word vector set; calculating the variance of each dimension vector of each participle according to the obtained word vector of each participle; calculating to obtain complexity according to the variance of each dimension vector of the participle obtained by calculation; when the relevant parameter information includes uncertainty, the uncertainty is calculated by: recording the labeling time for labeling each sample; determining marking speed according to the complexity, the word number and the marking time of each sample; calculating to obtain uncertainty according to the determined marking speed; the determination unit is used for: determining the checking parameter information of each sample according to the calculated related parameter information;
the selection processing unit is used for: and selecting the sample needing to be verified according to the determined verification parameter information so as to perform verification processing.
6. The apparatus according to claim 5, wherein the determining unit is specifically configured to:
setting corresponding weighting proportions for the relevant parameter information according to a preset strategy;
and for various examples, multiplying the relevant parameter information by the corresponding weighting proportion respectively, accumulating, and calculating to obtain the checking parameter information of the various examples.
7. The apparatus according to claim 6, further comprising a normalization unit configured to:
and carrying out normalization processing on the related parameter information.
8. The apparatus according to any one of claims 5 to 7, wherein the selection processing unit is specifically configured to:
and sequencing the determined checking parameter information of the various examples according to the value size, and determining the samples with the preset values sequenced in the front for checking.
9. A computer storage medium having stored therein computer-executable instructions for performing the method of verification processing of any one of claims 1 to 4.
10. A terminal, comprising: a memory and a processor; wherein,
the processor is configured to execute program instructions in the memory;
the program instructions read on the processor to perform the following operations:
calculating relevant parameter information of all samples for a preset number of samples completing labeling;
determining the checking parameter information of each sample according to the calculated related parameter information;
selecting a sample needing to be verified according to the determined verification parameter information so as to perform verification processing;
wherein the related parameter information comprises one or more of the following information: complexity, uncertainty, and length.
CN201910176629.4A 2019-03-08 2019-03-08 Verification processing method and device, computer storage medium and terminal Active CN109977400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910176629.4A CN109977400B (en) 2019-03-08 2019-03-08 Verification processing method and device, computer storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910176629.4A CN109977400B (en) 2019-03-08 2019-03-08 Verification processing method and device, computer storage medium and terminal

Publications (2)

Publication Number Publication Date
CN109977400A CN109977400A (en) 2019-07-05
CN109977400B true CN109977400B (en) 2022-11-11

Family

ID=67078297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910176629.4A Active CN109977400B (en) 2019-03-08 2019-03-08 Verification processing method and device, computer storage medium and terminal

Country Status (1)

Country Link
CN (1) CN109977400B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580172B (en) * 2019-09-11 2022-12-09 北京明略软件系统有限公司 Configuration rule verification method and device, storage medium and electronic device
CN110750600A (en) * 2019-10-15 2020-02-04 北京明略软件系统有限公司 Information processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133202A (en) * 2017-06-01 2017-09-05 北京百度网讯科技有限公司 Text method of calibration and device based on artificial intelligence
CN108536666A (en) * 2017-03-03 2018-09-14 北京明略软件系统有限公司 A kind of short text information extracting method and device
CN109145303A (en) * 2018-09-06 2019-01-04 腾讯科技(深圳)有限公司 Name entity recognition method, device, medium and equipment
WO2019041865A1 (en) * 2017-08-30 2019-03-07 武汉斗鱼网络科技有限公司 Method and system for verifying request, and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536666A (en) * 2017-03-03 2018-09-14 北京明略软件系统有限公司 A kind of short text information extracting method and device
CN107133202A (en) * 2017-06-01 2017-09-05 北京百度网讯科技有限公司 Text method of calibration and device based on artificial intelligence
WO2019041865A1 (en) * 2017-08-30 2019-03-07 武汉斗鱼网络科技有限公司 Method and system for verifying request, and computer-readable storage medium
CN109145303A (en) * 2018-09-06 2019-01-04 腾讯科技(深圳)有限公司 Name entity recognition method, device, medium and equipment

Also Published As

Publication number Publication date
CN109977400A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN107122416B (en) Chinese event extraction method
CN112148955A (en) Method and system for detecting abnormal time sequence data of Internet of things
CN111914558A (en) Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN112860841B (en) Text emotion analysis method, device, equipment and storage medium
CN110222178A (en) Text sentiment classification method, device, electronic equipment and readable storage medium storing program for executing
CN109657230A (en) Merge the name entity recognition method and device of term vector and part of speech vector
CN111143569A (en) Data processing method and device and computer readable storage medium
Steidl et al. Feature-based detection of bugs in clones
CN110263934B (en) Artificial intelligence data labeling method and device
CN108959474B (en) Entity relation extraction method
CN113448843B (en) Image recognition software test data enhancement method and device based on defect analysis
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN109948735A (en) A kind of multi-tag classification method, system, device and storage medium
CN113111804B (en) Face detection method and device, electronic equipment and storage medium
CN109977400B (en) Verification processing method and device, computer storage medium and terminal
CN109800309A (en) Classroom Discourse genre classification methods and device
CN114511710A (en) Image target detection method based on convolutional neural network
CN111444718A (en) Insurance product demand document processing method and device and electronic equipment
CN115456176B (en) Text matching method and system based on knowledge enhancement
JP2010272004A (en) Discriminating apparatus, discrimination method, and computer program
CN113627553B (en) Image recognition method and system for recognizing abnormal label of electric energy meter
CN112016334A (en) Appraising method and device
CN117011577A (en) Image classification method, apparatus, computer device and storage medium
CN113821571A (en) Food safety relation extraction method based on BERT and improved PCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant