CN106708806B - Sample confirmation method, device and system - Google Patents

Sample confirmation method, device and system Download PDF

Info

Publication number
CN106708806B
CN106708806B CN201710031626.2A CN201710031626A CN106708806B CN 106708806 B CN106708806 B CN 106708806B CN 201710031626 A CN201710031626 A CN 201710031626A CN 106708806 B CN106708806 B CN 106708806B
Authority
CN
China
Prior art keywords
sample
confirmed
similar
samples
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710031626.2A
Other languages
Chinese (zh)
Other versions
CN106708806A (en
Inventor
方昕
刘俊华
魏思
胡国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201710031626.2A priority Critical patent/CN106708806B/en
Publication of CN106708806A publication Critical patent/CN106708806A/en
Application granted granted Critical
Publication of CN106708806B publication Critical patent/CN106708806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation

Abstract

The application provides a sample confirmation method, a sample confirmation device and a sample confirmation system, wherein the method comprises the following steps: obtaining the confidence of each sample to be confirmed in the sample set to be confirmed; determining a similar sample of each sample to be confirmed in a sample set to be confirmed; correcting the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed; and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed. The method can improve the accuracy of sample confirmation, and further improve the application effect.

Description

Sample confirmation method, device and system
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, and a system for sample validation.
Background
With the increasing maturity of artificial intelligence technology, more and more application systems use artificial intelligence related technologies, such as keyword retrieval, identity authentication, speaker separation, speaker gender confirmation, and the like, and all need to perform final confirmation on a candidate sample to confirm whether the candidate sample is a target sample, so that the accuracy of sample confirmation directly affects the final application effect.
In the related technology, sample confirmation is generally directly carried out according to the confidence coefficient of a sample to be confirmed, when the sample is confirmed specifically, a confidence coefficient threshold value is set, whether the confidence coefficient of the sample to be confirmed exceeds the preset threshold value is judged, and if the confidence coefficient of the sample to be confirmed exceeds the preset threshold value, the sample confirmation is successful; otherwise, the sample validation fails. In the method, only the information of a single sample is considered during sample confirmation, the sample to be confirmed is often interfered by external factors such as environment or channels in practical application, the information of the single sample is easy to change, and if only the information of the single sample is considered, the sample to be confirmed is often confirmed incorrectly during sample confirmation, namely, the target sample to be confirmed is incorrectly confirmed as a non-target sample or the target sample to be confirmed is incorrectly confirmed as a target sample, so that the accuracy of sample confirmation is greatly reduced, and the application effect is influenced.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present application is to provide a sample confirmation method, which can improve the accuracy of sample confirmation, and thus improve the application effect.
Another object of the present application is to provide a sample confirmation device.
Another object of the present application is to provide a sample confirmation system.
In order to achieve the above object, a sample confirmation method provided in an embodiment of the first aspect of the present application includes: obtaining the confidence of each sample to be confirmed in the sample set to be confirmed; determining a similar sample of each sample to be confirmed in a sample set to be confirmed; correcting the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed; and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
In order to achieve the above object, a sample confirmation device according to an embodiment of the second aspect of the present application includes: the acquisition module is used for acquiring the confidence coefficient of each sample to be confirmed in the sample set to be confirmed; the determining module is used for determining similar samples of each sample to be confirmed in the sample set to be confirmed; the correction module is used for correcting the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed; and the confirmation module is used for confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
In order to achieve the above object, a sample confirmation system according to an embodiment of the third aspect of the present application includes: the client is used for receiving a sample to be confirmed input by a user; the server is used for receiving the samples to be confirmed sent by the client and acquiring the confidence coefficient of each sample to be confirmed in the sample set to be confirmed; determining a similar sample of each sample to be confirmed in a sample set to be confirmed; correcting the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed; and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
According to the embodiment of the application, the sample confirmation accuracy can be effectively improved by determining the similar sample of the sample to be confirmed, correcting the confidence coefficient of the sample to be confirmed according to the similar sample, and confirming the sample according to the corrected confidence coefficient, and particularly, the improvement effect is more obvious when the confidence coefficient before correction is close to the confidence coefficient threshold.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a sample confirmation method according to an embodiment of the present application;
FIG. 2 is a comparative schematic diagram of sample validation using pre-correction confidence and post-correction confidence in the present embodiment;
FIG. 3 is a schematic flow chart of a method for determining similar samples of a sample to be confirmed in an embodiment of the present application;
fig. 4 is a schematic flow chart of a sample confirmation method according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of a sample confirmation apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a sample confirmation apparatus according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a sample confirmation system according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a schematic flow chart of a sample confirmation method according to an embodiment of the present application.
As shown in fig. 1, the method of the present embodiment includes:
s11: and obtaining the confidence of each sample to be confirmed in the sample set to be confirmed.
The sample set to be confirmed is a set formed by a plurality of candidate samples needing to be confirmed, and when the keywords are searched, the sample set to be confirmed is formed by a plurality of candidate keywords so as to confirm whether each candidate keyword is a target keyword.
The confidence of the sample to be confirmed is generally obtained by utilizing the matching degree of the sample to be confirmed and the target confirmation model according to application requirements, for example, when the keyword is retrieved, the keyword confirmation model can be obtained by constructing the keyword training data, and the matching degree of the keyword to be confirmed and the keyword confirmation model is sequentially calculated, so that the confidence of each keyword to be confirmed can be obtained; for example, when the speaker is confirmed, the confidence level of the speaker to be confirmed can be obtained by calculating the matching degree of the voice data of the speaker to be confirmed and the speaker confirmation model. The confidence coefficient calculation method may adopt various related technologies including the prior art, and the application does not limit the confidence coefficient acquisition method.
S12: and determining similar samples of each sample to be confirmed in the sample set to be confirmed.
S13: and correcting the confidence coefficient of each sample to be confirmed according to the confidence coefficient of the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed.
The specific process is as follows.
S14: and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
During specific confirmation, directly judging whether the corrected confidence coefficient of each sample to be confirmed is greater than a preset threshold value, if so, confirming successfully, namely the sample to be confirmed is a target sample; otherwise, the validation fails, i.e., the sample to be validated is not the target sample.
Fig. 2 is a schematic diagram of sample confirmation by using the corrected confidence of the sample to be confirmed, where a total number of 18 samples to be confirmed in the sample set to be confirmed, a dotted line represents a threshold boundary line of the confidence of the sample to be confirmed, a solid circle represents a positive sample, a hollow circle represents a negative sample, the positive sample is the sample to be confirmed whose confidence is greater than the confidence threshold, and the negative sample is the sample to be confirmed whose confidence is less than the confidence threshold; due to the influence of external factors such as environment, when sample confirmation is performed by using the existing method, a sample to be confirmed with a confirmation error is easy to occur, such as sample 1 and sample 2 which are confirmed incorrectly in fig. 2 (a);
by the method, after the confidence degrees of the samples to be confirmed are corrected according to the similar samples of the samples to be confirmed, the corrected confidence degrees comprise information of more samples, such as the confidence degree of the sample 1 to be confirmed and information of the sample 1 to be confirmed and 3 similar samples thereof, and when the corrected confidence degrees are adopted for sample confirmation, the samples with the errors in the previous confirmation can be correctly confirmed, so that the accuracy of sample confirmation is improved;
as shown in fig. 2(b), the sample confirmation is performed using the corrected confidence of the sample to be confirmed, and the sample 1 and the sample 2 which have been confirmed incorrectly can be correctly confirmed.
The determination of similar samples and the process of confidence correction are described below.
In order to improve the accuracy of sample confirmation, the method and the device find out similar samples of each sample to be confirmed from a sample set to be confirmed; and correcting the confidence coefficient of each sample to be confirmed by using the confidence coefficient of the similar sample of each sample to be confirmed, so that the corrected confidence coefficient uses the information of the similar sample of each sample to be confirmed, wherein the specific correction method is as follows.
As shown in fig. 3, the method for determining similar samples of each sample to be confirmed in the sample set to be confirmed includes:
s31: and calculating the similarity of each sample to be confirmed and other samples to be confirmed in the sample set to be confirmed.
The similarity is generally described by using a distance between samples to be confirmed, such as a euclidean distance and a cosine distance, and a calculation method of the distance may refer to various related technologies including the prior art, and a specific calculation method is not limited. If the keyword is searched, the distance between the keyword to be confirmed and other keywords is calculated to obtain the distance between the keyword and other keywords in the training data; if the speaker confirms, the method can be obtained by calculating the cosine distance between the voiceprint features of the voice data of the speaker, wherein the voiceprint features are Ivector features; generally, the smaller the distance between samples, the greater the similarity.
Naturally, other methods may also be used to describe the similarity between the sample to be confirmed and each sample in the training data, for example, the sample to be confirmed is directly matched with other samples to be confirmed to obtain the matching degree between the sample to be confirmed and other samples to be confirmed, and the matching degree is used to describe the similarity between the sample to be confirmed and other samples to be confirmed.
During specific calculation, sequentially selecting each sample to be confirmed in a sample set to be confirmed as a current sample to be confirmed; then sequentially calculating the similarity between the current sample to be confirmed and each other sample to be confirmed, and using D (X, X)j) Represents, where X represents the current sample to be confirmed, XjRepresenting the jth sample to be confirmed except the current sample to be confirmed in the sample set to be confirmed; and after the calculation is finished, the similarity between each sample to be confirmed and other samples to be confirmed in the sample set to be confirmed is obtained.
S32: and determining the similar samples of each sample to be confirmed according to the similarity of each sample to be confirmed and other samples to be confirmed.
Specifically, the similarity obtained by the calculation may be normalized, and then the similar sample of each sample to be confirmed is determined according to the normalized similarity.
When in specific normalization, sequentially normalizing the similarity between each sample to be confirmed and other samples to be confirmed according to the maximum value and the minimum value of the similarity between all samples to be confirmed in the sample set to be confirmed and other samples to be confirmed, and obtaining the similarity after normalization of each sample to be confirmed and other samples to be confirmed; the specific rule method is shown as formula (1):
Figure BDA0001211778530000071
wherein, S (X, X)j) For the current sample to be confirmed and the jth sample x after the current sample to be confirmed is removed from the sample set to be confirmedjAnd (D) the normalized similarity, where min (D) is the minimum value of the similarity between all samples to be confirmed in the sample set to be confirmed and other samples to be confirmed, and max (D) is the maximum value of the similarity between all samples to be confirmed in the sample set to be confirmed and other samples to be confirmed.
After the normalized similarity is obtained, determining other samples to be confirmed corresponding to the normalized similarity larger than a preset threshold as similar samples of each sample to be confirmed; or, the normalized similarities are sorted from large to small, a preset number of normalized similarities which are sorted in the front are selected, and other samples to be confirmed corresponding to the selected normalized similarities are determined as each sample to be confirmed.
After determining the similar samples of each sample to be confirmed, the confidence of each sample to be confirmed can be corrected as follows.
Specifically, each sample to be confirmed is respectively used as a current sample to be confirmed, and the following steps are executed corresponding to the current sample to be confirmed: calculating the contribution rates of all similar samples of the current sample to be confirmed according to the normalized similarity of the current sample to be confirmed and each similar sample and the confidence coefficient of each similar sample;
carrying out weighted summation on the confidence coefficient of the current sample to be confirmed and the contribution rate to obtain the corrected confidence coefficient of the current sample to be confirmed;
the contribution rate is a ratio of the contribution degrees of all similar samples of the current sample to be confirmed to the sum of the normalized similarity degrees of the current sample to be confirmed and all similar samples, and the contribution degree is the product sum of the normalized similarity degrees of the current sample to be confirmed and each similar sample and the confidence degree of each similar sample.
Is formulated as shown in formula (2):
g(X)=(1-α)c(X)+αT(X) (2)
wherein g (X) is the confidence coefficient after correction of the current sample to be confirmed, c (X) is the confidence coefficient before correction of the current sample to be confirmed, T (X) is the contribution rate of all similar samples of the current sample to be confirmed, and α is the weight of the contribution rate of all similar samples of the current sample to be confirmed, and can be set according to application requirements.
The contribution rate T (X) is calculated according to the normalized similarity between the current sample to be confirmed and each similar sample and the confidence of each similar sample, as shown in formula (3):
Figure BDA0001211778530000081
wherein the content of the first and second substances,
Figure BDA0001211778530000082
the contribution degrees of all similar samples of the current sample to be confirmed;
Figure BDA0001211778530000083
the normalized similarity sum of the current sample to be confirmed and all similar samples; s (X, X)i) The normalized similarity of the current sample to be confirmed and the ith similar sample is obtained; c (x)i) The confidence coefficient of the ith similar sample of the current sample to be confirmed; n is the total number of similar samples of the current sample to be confirmed.
The following describes a sample confirmation procedure by taking a keyword search as an example.
For example: in the keyword search, the sample set to be confirmed is all candidate keywords of the keyword "a", for example, the sample set to be confirmed is L ═ a ═1,a2,...,amWherein each element is a sample to be confirmed (candidate keyword), m is the total number of samples to be confirmed, it is required to confirm whether each sample to be confirmed is a keyword "a", the confidence of each sample to be confirmed can be obtained from the decoding result in the keyword retrieval, and the specific confirmation process is as follows:
firstly, obtaining the confidence of each sample to be confirmed in a sample set to be confirmed;
then determining a similar sample of each sample to be confirmed in the sample set to be confirmed;
specifically, the method can be obtained by calculating the similarity between each sample to be confirmed, and after the similarity is normalized, setting a corresponding threshold value or selecting a fixed number of similar samples, such as sample a to be confirmed1Is { a } of3,a6,a7,a10};
Correcting the confidence coefficient of each sample to be confirmed by using the similar sample of each sample to be confirmed, and when the specific correction is carried out, correcting the confidence coefficient of each sample to be confirmed according to the confidence coefficient of the similar sample of the sample to be confirmed and the normalized similarity of the sample to be confirmed and each similar sample to obtain the corrected confidence coefficient of each sample to be confirmed;
finally, sample confirmation is carried out according to the corrected confidence coefficient so as to judge whether each sample to be confirmed is the keyword 'A', specifically, whether the corrected confidence coefficient of each sample to be confirmed exceeds a preset threshold value can be judged, and if the corrected confidence coefficient of each sample to be confirmed exceeds the preset threshold value, the confirmation is successful, namely, the sample to be confirmed is the keyword 'A'; otherwise, the confirmation fails, namely the sample to be confirmed is not the keyword 'A';
in addition, the method of the present application can be used for applications requiring sample confirmation, such as speaker confirmation, gender confirmation, and the like, and is not particularly limited.
In specific implementation, a flow shown in fig. 4 is given by combining the client and the server:
s41: the client receives a sample to be confirmed input by a user.
For example, in keyword retrieval, keywords input by a user are received; alternatively, voice data input by the user is received upon speaker confirmation.
S42: and the client sends the sample to be confirmed to the server.
The client side can send the samples to be confirmed to the server side through network connection with the server side.
S43: and the server receives the sample to be confirmed sent by the client.
In this embodiment, taking the example that the server receives the sample to be confirmed sent by the client as an example, it can be understood that the server may also obtain the sample to be confirmed from the database of the server or through network crawling, or a part of the sample to be confirmed obtained by the server may come from the client, and another part of the sample to be confirmed comes from the database of the server or network crawling data.
S44: the server determines similar samples of each sample to be confirmed in the sample set to be confirmed.
S45: and the server side corrects the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed.
S46: and the server side confirms each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
S47: and the server side obtains a feedback result according to the confirmation result and sends the feedback result to the client side.
The server side can directly take the confirmation result as a feedback result, so that whether the confirmation result is successful or not is sent to the client side; or, the feedback result may be information related to the target sample, for example, when the user performs keyword search, the information related to the searched keyword is sent to the client; or, the feedback result may also be related information of subsequent processing directly performed after the sample is successfully confirmed, for example, when the user logs in through voice, if the voice data of the user is confirmed to be the target sample, which indicates that the sample is successfully confirmed, the server may directly perform subsequent processing after the sample is successfully confirmed, without first sending the sample confirmation success information to the client; after the server side directly performs subsequent processing after the sample is successfully confirmed, the related information of the subsequent processing can be sent to the client side as a feedback result, for example, when the user logs in through sound, the server side allows the user to log in after confirming that the voice data of the user is a target sample, a personal logged-in page after the user successfully logs in is obtained, and then the server side sends the data of the personal logged-in page to the client side, so that the client side performs rendering and other operations according to the received page data to display the corresponding personal logged-in page. It is understood that other situations can be also possible according to different application scenarios, and the application is not limited.
S48: and the client displays the feedback result to the user.
The details of the above steps can be referred to the related descriptions in the related embodiments, and are not described in detail here.
It can be understood that the client and the server may be respectively located in different physical devices, for example, the client is located in a terminal device on a user side, the server is located in a server, and the terminal device and the server are connected through a network; alternatively, the client and the server may be located in the same physical device, for example, the functions of the client and the server are integrated in the terminal device, so that the sample confirmation may be done locally at the terminal device.
In the embodiment, the accuracy of sample confirmation can be effectively improved by determining the similar sample of the sample to be confirmed, correcting the confidence coefficient of the sample to be confirmed according to the similar sample, and confirming the sample according to the corrected confidence coefficient, and particularly, the improvement effect is more obvious for the sample with the confidence coefficient near the confidence coefficient threshold before correction.
Fig. 5 is a schematic structural diagram of a sample confirmation apparatus according to an embodiment of the present application.
As shown in fig. 5, the apparatus 50 of the present embodiment includes: an acquisition module 51, a determination module 52, a correction module 53 and a confirmation module 54.
An obtaining module 51, configured to obtain a confidence of each sample to be confirmed in a sample set to be confirmed;
a determining module 52, configured to determine a similar sample of each sample to be confirmed in the sample set to be confirmed;
the correcting module 53 is configured to correct the confidence of each sample to be confirmed according to the similar samples, so as to obtain a corrected confidence of each sample to be confirmed;
and the confirming module 54 is configured to confirm each sample to be confirmed according to the corrected confidence of each sample to be confirmed.
In some embodiments, referring to fig. 6, the determining module 52 includes:
the calculating submodule 521 is configured to calculate similarity between each sample to be confirmed in the sample set to be confirmed and other samples to be confirmed;
the determining submodule 522 is configured to determine a similar sample of each sample to be confirmed according to the similarity between each sample to be confirmed and other samples to be confirmed.
In some embodiments, the determining submodule 522 is specifically configured to:
the similarity of each sample to be confirmed and other samples to be confirmed is regulated to obtain the regulated similarity;
and determining the similar samples of each sample to be confirmed according to the normalized similarity.
In some embodiments, the determining submodule 522 is configured to determine a similar sample of each sample to be confirmed according to the normalized similarity, and includes:
determining other samples to be confirmed corresponding to the normalized similarity larger than a preset threshold as similar samples of each sample to be confirmed; alternatively, the first and second electrodes may be,
and sorting the sorted similarity according to the descending order, selecting the sorted similarity with the preset number in front, and determining other samples to be confirmed corresponding to the selected sorted similarity as each sample to be confirmed.
In some embodiments, the modification module 53 is specifically configured to:
taking each sample to be confirmed as a current sample to be confirmed, and executing the following steps corresponding to the current sample to be confirmed:
calculating the contribution rates of all similar samples of the current sample to be confirmed according to the normalized similarity of the current sample to be confirmed and each similar sample and the confidence coefficient of each similar sample;
carrying out weighted summation on the confidence coefficient of the current sample to be confirmed and the contribution rate to obtain the corrected confidence coefficient of the current sample to be confirmed;
the contribution rate is a ratio of the contribution degrees of all similar samples of the current sample to be confirmed to the sum of the normalized similarity degrees of the current sample to be confirmed and all similar samples, and the contribution degree is the product sum of the normalized similarity degrees of the current sample to be confirmed and each similar sample and the confidence degree of each similar sample.
It is understood that the apparatus of the present embodiment corresponds to the method embodiment described above, and specific contents may be referred to the related description of the method embodiment, and are not described in detail herein.
In the embodiment, the accuracy of sample confirmation can be effectively improved by determining the similar sample of the sample to be confirmed, correcting the confidence coefficient of the sample to be confirmed according to the similar sample, and confirming the sample according to the corrected confidence coefficient, and particularly, the improvement effect is more obvious for the sample with the confidence coefficient near the confidence coefficient threshold before correction.
Fig. 7 is a schematic structural diagram of a sample confirmation system according to an embodiment of the present application.
As shown in fig. 7, the system of the present embodiment includes: a client 71 and a server 72.
The client 71 is used for receiving a sample to be confirmed input by a user;
the server 72 is configured to receive the to-be-confirmed samples sent by the client, and obtain a confidence of each to-be-confirmed sample in the to-be-confirmed sample set; determining a similar sample of each sample to be confirmed in a sample set to be confirmed; correcting the confidence coefficient of each sample to be confirmed according to the similar samples to obtain the corrected confidence coefficient of each sample to be confirmed; and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
In some embodiments, the server 72 is further configured to: obtaining a feedback result according to the confirmation result, and sending the feedback result to the client;
the client 71 is further configured to: and receiving a feedback result sent by the server side, and feeding the feedback result back to the user.
In fig. 7, the client and the server are connected through a wireless network as an example, it can be understood that the client and the server may also be connected through a wired network, or, if the client and the server are integrated in the same device, the client and the server may be connected through a bus inside the device.
It will be appreciated that the functions of the server are consistent with those of the above-described apparatus, and therefore, the specific components of the server can be referred to the apparatus shown in fig. 5 or fig. 6, which will not be described in detail herein.
In the embodiment, the accuracy of sample confirmation can be effectively improved by determining the similar sample of the sample to be confirmed, correcting the confidence coefficient of the sample to be confirmed according to the similar sample, and confirming the sample according to the corrected confidence coefficient, and particularly, the improvement effect is more obvious for the sample with the confidence coefficient near the confidence coefficient threshold before correction.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (12)

1. A sample confirmation method, comprising:
obtaining the confidence of each sample to be confirmed in the sample set to be confirmed;
determining a similar sample of each sample to be confirmed in a sample set to be confirmed;
correcting the confidence coefficient of the corresponding sample to be confirmed according to the normalized similarity between each sample to be confirmed and the similar sample in the sample set to be confirmed and the confidence coefficient of the similar sample to obtain the corrected confidence coefficient of each sample to be confirmed;
and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
2. The method of claim 1, wherein the determining similar samples for each sample to be confirmed in the set of samples to be confirmed comprises:
calculating the similarity between each sample to be confirmed and other samples to be confirmed in the sample set to be confirmed;
and determining the similar samples of each sample to be confirmed according to the similarity of each sample to be confirmed and other samples to be confirmed.
3. The method according to claim 2, wherein the determining similar samples of each sample to be confirmed according to the similarity of each sample to be confirmed and other samples to be confirmed comprises:
the similarity of each sample to be confirmed and other samples to be confirmed is regulated to obtain the regulated similarity;
and determining the similar samples of each sample to be confirmed according to the normalized similarity.
4. The method according to claim 3, wherein the determining similar samples of each sample to be confirmed according to the normalized similarity comprises:
determining other samples to be confirmed corresponding to the normalized similarity larger than a preset threshold as similar samples of each sample to be confirmed; alternatively, the first and second electrodes may be,
and sorting the sorted similarity according to the descending order, selecting the sorted similarity with the preset number, and determining other samples to be confirmed corresponding to the selected sorted similarity as the similar samples of each sample to be confirmed.
5. The method according to claim 1, wherein the modifying the confidence level of the corresponding sample to be confirmed according to the normalized similarity between each sample to be confirmed in the sample set to be confirmed and the similar sample and the confidence level of the similar sample to obtain the modified confidence level of each sample to be confirmed comprises:
taking each sample to be confirmed as a current sample to be confirmed, and executing the following steps corresponding to the current sample to be confirmed:
calculating the contribution rates of all similar samples of the current sample to be confirmed according to the normalized similarity of the current sample to be confirmed and each similar sample and the confidence coefficient of each similar sample;
carrying out weighted summation on the confidence coefficient of the current sample to be confirmed and the contribution rate to obtain the corrected confidence coefficient of the current sample to be confirmed;
the contribution rate is a ratio of the contribution degrees of all similar samples of the current sample to be confirmed to the sum of the normalized similarity degrees of the current sample to be confirmed and all similar samples, and the contribution degree is the product sum of the normalized similarity degrees of the current sample to be confirmed and each similar sample and the confidence degree of each similar sample.
6. A sample confirmation apparatus, comprising:
the acquisition module is used for acquiring the confidence coefficient of each sample to be confirmed in the sample set to be confirmed;
the determining module is used for determining similar samples of each sample to be confirmed in the sample set to be confirmed;
the correcting module is used for correcting the confidence coefficient of the corresponding sample to be confirmed according to the normalized similarity between each sample to be confirmed and the similar sample in the sample set to be confirmed and the confidence coefficient of the similar sample to obtain the corrected confidence coefficient of each sample to be confirmed;
and the confirmation module is used for confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
7. The apparatus of claim 6, wherein the determining module comprises:
the calculating submodule is used for calculating the similarity between each sample to be confirmed and other samples to be confirmed in the sample set to be confirmed;
and the determining submodule is used for determining the similar sample of each sample to be confirmed according to the similarity of each sample to be confirmed and other samples to be confirmed.
8. The apparatus of claim 7, wherein the determination submodule is specifically configured to:
the similarity of each sample to be confirmed and other samples to be confirmed is regulated to obtain the regulated similarity;
and determining the similar samples of each sample to be confirmed according to the normalized similarity.
9. The apparatus of claim 8, wherein the determining sub-module is configured to determine a similar sample of each sample to be confirmed according to the normalized similarity, and includes:
determining other samples to be confirmed corresponding to the normalized similarity larger than a preset threshold as similar samples of each sample to be confirmed; alternatively, the first and second electrodes may be,
and sorting the sorted similarity according to the descending order, selecting the sorted similarity with the preset number, and determining other samples to be confirmed corresponding to the selected sorted similarity as the similar samples of each sample to be confirmed.
10. The apparatus of claim 6, wherein the modification module is specifically configured to:
taking each sample to be confirmed as a current sample to be confirmed, and executing the following steps corresponding to the current sample to be confirmed:
calculating the contribution rates of all similar samples of the current sample to be confirmed according to the normalized similarity of the current sample to be confirmed and each similar sample and the confidence coefficient of each similar sample;
carrying out weighted summation on the confidence coefficient of the current sample to be confirmed and the contribution rate to obtain the corrected confidence coefficient of the current sample to be confirmed;
the contribution rate is a ratio of the contribution degrees of all similar samples of the current sample to be confirmed to the sum of the normalized similarity degrees of the current sample to be confirmed and all similar samples, and the contribution degree is the product sum of the normalized similarity degrees of the current sample to be confirmed and each similar sample and the confidence degree of each similar sample.
11. A sample confirmation system, comprising:
the client is used for receiving a sample to be confirmed input by a user;
the server is used for receiving the samples to be confirmed sent by the client and acquiring the confidence coefficient of each sample to be confirmed in the sample set to be confirmed; determining a similar sample of each sample to be confirmed in a sample set to be confirmed; correcting the confidence coefficient of the corresponding sample to be confirmed according to the normalized similarity between each sample to be confirmed and the similar sample in the sample set to be confirmed and the confidence coefficient of the similar sample to obtain the corrected confidence coefficient of each sample to be confirmed; and confirming each sample to be confirmed according to the corrected confidence coefficient of each sample to be confirmed.
12. The system of claim 11,
the server is further configured to: obtaining a feedback result according to the confirmation result, and sending the feedback result to the client;
the client is further configured to: and receiving a feedback result sent by the server side, and feeding the feedback result back to the user.
CN201710031626.2A 2017-01-17 2017-01-17 Sample confirmation method, device and system Active CN106708806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710031626.2A CN106708806B (en) 2017-01-17 2017-01-17 Sample confirmation method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710031626.2A CN106708806B (en) 2017-01-17 2017-01-17 Sample confirmation method, device and system

Publications (2)

Publication Number Publication Date
CN106708806A CN106708806A (en) 2017-05-24
CN106708806B true CN106708806B (en) 2020-06-02

Family

ID=58908535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710031626.2A Active CN106708806B (en) 2017-01-17 2017-01-17 Sample confirmation method, device and system

Country Status (1)

Country Link
CN (1) CN106708806B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199728A (en) * 2018-10-31 2020-05-26 阿里巴巴集团控股有限公司 Training data acquisition method and device, intelligent sound box and intelligent television

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760285A (en) * 2012-05-31 2012-10-31 河海大学 Image restoration method
CN103530604A (en) * 2013-09-27 2014-01-22 中国人民解放军空军工程大学 Robustness visual tracking method based on transductive effect
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching
CN104268900A (en) * 2014-09-26 2015-01-07 中安消技术有限公司 Motion object detection method and device
CN104392439A (en) * 2014-11-13 2015-03-04 北京智谷睿拓技术服务有限公司 Image similarity confirmation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102760285A (en) * 2012-05-31 2012-10-31 河海大学 Image restoration method
CN103530604A (en) * 2013-09-27 2014-01-22 中国人民解放军空军工程大学 Robustness visual tracking method based on transductive effect
CN103984738A (en) * 2014-05-22 2014-08-13 中国科学院自动化研究所 Role labelling method based on search matching
CN104268900A (en) * 2014-09-26 2015-01-07 中安消技术有限公司 Motion object detection method and device
CN104392439A (en) * 2014-11-13 2015-03-04 北京智谷睿拓技术服务有限公司 Image similarity confirmation method and device

Also Published As

Publication number Publication date
CN106708806A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
US10446146B2 (en) Learning apparatus and method
WO2019127924A1 (en) Sample weight allocation method, model training method, electronic device, and storage medium
CN111507419B (en) Training method and device of image classification model
CN112466298B (en) Voice detection method, device, electronic equipment and storage medium
US9202255B2 (en) Identifying multimedia objects based on multimedia fingerprint
US8458520B2 (en) Apparatus and method for verifying training data using machine learning
EP3862914A1 (en) Video action recognition method, apparatus, and device, and storage medium
CN108830329B (en) Picture processing method and device
KR20170022625A (en) Method for training classifier and detecting object
KR20170125720A (en) Recognition apparatus based on deep neural network, training apparatus and methods thereof
CN110245679B (en) Image clustering method and device, electronic equipment and computer readable storage medium
CN110502976B (en) Training method of text recognition model and related product
CN111243601B (en) Voiceprint clustering method and device, electronic equipment and computer-readable storage medium
CN111739539A (en) Method, device and storage medium for determining number of speakers
CN111078639B (en) Data standardization method and device and electronic equipment
CN111739514A (en) Voice recognition method, device, equipment and medium
CN105843889B (en) Credibility-based data acquisition method and system for big data and common data
CN111444346B (en) Word vector confrontation sample generation method and device for text classification
CN108875502B (en) Face recognition method and device
US20220343163A1 (en) Learning system, learning device, and learning method
CN106708806B (en) Sample confirmation method, device and system
CN111027316A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN113220883A (en) Text classification model performance optimization method and device and storage medium
CN110413750B (en) Method and device for recalling standard questions according to user questions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant