CN109192194A - Voice data mask method, device, computer equipment and storage medium - Google Patents

Voice data mask method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109192194A
CN109192194A CN201810960792.5A CN201810960792A CN109192194A CN 109192194 A CN109192194 A CN 109192194A CN 201810960792 A CN201810960792 A CN 201810960792A CN 109192194 A CN109192194 A CN 109192194A
Authority
CN
China
Prior art keywords
voice data
recognition result
voice
discriminant approach
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810960792.5A
Other languages
Chinese (zh)
Inventor
高伟
陈泽明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810960792.5A priority Critical patent/CN109192194A/en
Publication of CN109192194A publication Critical patent/CN109192194A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses voice data mask method, device, computer equipment and storage mediums, and wherein method includes: the voice data for obtaining online service accumulation, include: the voice and corresponding recognition result of user in every voice data;According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated, recognition result is marked out and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice data of inaccuracy.Using scheme of the present invention, it is able to ascend the accuracy etc. of annotating efficiency and annotation results.

Description

Voice data mask method, device, computer equipment and storage medium
[technical field]
The present invention relates to Computer Applied Technology, in particular to voice data mask method, device, computer equipment and deposit Storage media.
[background technique]
With making rapid progress for the horizontal raising of people's daily life and science and technology, it is desirable to by more natural mode into Row human-computer dialogue obtains information and service, therefore the requirement to the recognition accuracy of this interactive mode of voice is also higher and higher.
Accordingly, it is desirable to identify accurate speech recognition modeling, and the training of speech recognition modeling, optimization, assessment Equal work, all rely on and largely mark accurate voice data.
Currently, voice annotation mainly uses artificial notation methods, at least there is following ask in this mode in practical applications Topic: inefficiency: voice annotation work currently is listened by human ear, the modes such as hand-kept carry out substantially, and a people marks several small When high-accuracy voice data, be that ultrahigh water is shown no increases in output out, and often tens of thousands of hours voice numbers actually required According to;Accuracy is insufficient: artificial mark work repeats dullness, and the people of the long campaigns work is easy to generate feeling of fatigue, to lead Cause marking error.
[summary of the invention]
In view of this, the present invention provides voice data mask method, device, computer equipment and storage mediums.
Specific technical solution is as follows:
A kind of voice data mask method, comprising:
The voice data for obtaining online service accumulation, include: in every voice data user voice and corresponding identification As a result;
According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated, It marks out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice of inaccuracy Data.
A kind of voice data annotation equipment, comprising: acquiring unit and mark unit;
The acquiring unit includes: user's in every voice data for obtaining the voice data of online service accumulation Voice and corresponding recognition result;
The mark unit is used for according to preset at least one discriminant approach, to the recognition result of voice data Accuracy differentiated, mark out recognition result and be determined as that accurate positive sample voice data and recognition result are determined as not Accurate negative sample voice data.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.
A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.
Can be seen that based on above-mentioned introduction can be for the voice data of online service accumulation, certainly in scheme of the present invention Label it as positive sample and negative sample dynamicly, to overcome the problems of artificial mark, improve annotating efficiency and The accuracy of annotation results, and realize the effective use etc. of the voice data for online service accumulation.
[Detailed description of the invention]
Fig. 1 is the flow chart of voice data mask method embodiment of the present invention.
Fig. 2 is speech recognition effect promoting closed loop schematic diagram of the present invention.
Fig. 3 is the composed structure schematic diagram of voice data annotation equipment embodiment of the present invention.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
[specific embodiment]
In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention The scheme of stating is further described.
Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all Belong to the scope of protection of the invention.
Fig. 1 is the flow chart of voice data mask method embodiment of the present invention.As shown in Figure 1, including in detail below Implementation.
In 101, the voice data of online service accumulation is obtained, includes: the voice of user in every voice data and right The recognition result answered.
In 102, according to preset at least one discriminant approach, to the accuracy of the recognition result of voice data into Row differentiates, marks out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample of inaccuracy This voice data.
Currently, many online services need to carry out speech recognition, correspondingly, the voice data of magnanimity can be accumulated, and these Data are not developed and used effectively always.
Usually, the quasi- rate of sentence for marking voice data requires 95% or more the training that could be used for speech recognition modeling Deng, but the speech recognition result in actual scene can not reach such requirement, by conditions such as environment, distance, reflection, decays It influences, the quasi- rate of sentence reaches 85% just relatively high horizontal at last, and the voice data of 85% quasi- rate is not able to satisfy sentence standard The requirement of 95% or more rate, it is therefore desirable to high-purity (95% or more) is filtered out from the voice data of (85%) of low-purity Voice data.It, can be according to preset at least one discriminant approach, to the language of online service accumulation specific in the present embodiment The accuracy of the recognition result of sound data is differentiated, so that marking out recognition result is determined as accurate positive sample voice data And recognition result is determined as the negative sample voice data of inaccuracy.
In every voice data can include: the voice (such as voice query) of user and corresponding recognition result.
It for positive sample voice data, can be used directly, such as make directly as training set for speech recognition modeling training With.For negative sample voice data, its recognition result can be modified, and can by manually marking or other technological means It carries out revised negative sample voice data as positive sample voice data using so that it is more fixed to obtain speech recognition modeling To accurate optimization etc..
At least one discriminant approach may include but be not limited to following one or any combination: based on fixed filtering rule Discriminant approach, the discriminant approach based on user behavior analysis, based on voice and text feature analysis discriminant approach, based on use The discriminant approach of registered permanent residence sound, based on the united discriminant approach of multi-model, based on the discriminant approach of vocal print.
The specific implementation of above-mentioned each discriminant approach is illustrated individually below.
1) discriminant approach based on fixed filtering rule
The case where fixed filtering rule mainly includes some common speech recognition errors.
For example, can determine that the language if continuously repeating comprising single word or word in the recognition result of any voice data The recognition result inaccuracy of sound data.
Continuously repeating for single word or word can refer to " uh uh ", " " etc., if in a certain recognition result including this A little contents, maximum probability identify mistake, to can determine that recognition result inaccuracy.
For another example, if the voice length of any voice data and recognition result length mismatch, it can determine that the voice number According to recognition result inaccuracy.
Normal person can say 1~3 word the 1 second time of oral account, if that the average 1 second corresponding recognition result of voice length Less than 1 word or it is greater than 3 words, then maximum probability identifies mistake, to can determine that recognition result inaccuracy.
For another example, if including error code in the recognition result of any voice data, it can determine that the identification of the voice data As a result inaccurate.
For another example, if the recognition result of any voice data belongs to scheduled frequent fault case, it can determine that the voice The recognition result inaccuracy of data.
As recognition result be " using Baidu.com, you are known that ", it is likely to user misoperation or it is other due to cause Wrong identification because this does not meet normal user speech query content.
2) based on the discriminant approach of user behavior analysis
In the present embodiment, other than obtaining the voice data of online service accumulation, the line of user can be also further obtained Upper daily record data.
According to daily record data on line, some behavioural informations of user can be got, and these behavioural informations facilitate to knowledge The accuracy of other result is differentiated.
For example, if determining that the recognition result of any voice data is modified by user according to daily record data on line, It then can determine that the modified recognition result of the voice data is accurate.
When user input voice after, if discovery recognition result and oneself expected from be not inconsistent, recognition result can be led Dynamic modification, modified recognition result can be considered as correspondingly can determine that modified knowledge to the error correction of the recognition result of mistake Other result is accurate.
For another example, if according to daily record data on line determine user for any voice data recognition result perform it is pre- Fixed subsequent action then can determine that the recognition result of the voice data is accurate.
If recognition result is correct, user usually has subsequent movement, such as issues search instruction.
For another example, if determining user before the voice for inputting any voice data according to daily record data on the line The similar voice of pronunciation was inputted in scheduled duration at least once, then can determine that the recognition result of the voice data is accurate.
If the voice of user's input is identified mistake, other than active is modified, user is also possible to that voice can be repeatedly input, Until recognition result is correct, that is to say, that the corresponding recognition result of voice of user's last time input is usually accurate.
3) discriminant approach based on voice and text feature analysis
For example, can determine that the knowledge of the voice data if the signal-to-noise ratio of the voice of any voice data is lower than predetermined threshold Other result inaccuracy.
If signal-to-noise ratio very little, illustrate to be mingled with very big noise in voice, recognition result in this case is usually all It is inaccurate.
For another example, if can determine that the voice number comprising long mute and/or long white noise in the voice of any voice data According to recognition result inaccuracy.
Under normal circumstances, the voice of user's input is smooth, in the event of length is mute and/or long white noise, It is likely to there is a problem, to can determine that recognition result inaccuracy.
For another example, if the syntax of the recognition result of any voice data do not meet grammatical requirement, it can determine that the voice number According to recognition result inaccuracy.
It if the syntax of recognition result do not meet the syntax and require such as Chinese syntax requirement, i.e., is not normal life term, that Maximum probability identifies mistake, to can determine that recognition result inaccuracy.
4) discriminant approach based on user's accent
For example, can determine that the voice data if the accent of the corresponding user of any voice data is standard mandarin Recognition result is accurate.
Know compared to the local dialect or the mandarin with region accent etc. when the accent of user is standard mandarin Other result is generally more accurate.It can be by manually listening to or other technological means determine the accent of user, if a certain user Accent be standard mandarin, then can determine that the recognition result of the corresponding voice data of the user is accurate.
In addition, if the accent of user be the local dialect or the mandarin with region accent, can also be by user periphery Life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
User periphery life crowd can refer to the household etc. of user, such as parent, siblings, the accent of these people is logical Be often it is identical, for these people when using online service, address wifi etc. is usually identical, raw so as to determine user periphery Living person group.
Corresponding speech recognition modeling can be assessed, trained or be optimized using the voice data collection formed.
5) it is based on the united discriminant approach of multi-model
For example, being directed to any voice data, at least two speech recognition modelings can be utilized respectively in the voice data Voice carries out speech recognition, and respectively carries out the recognition result of the recognition result of each speech recognition modeling and the voice data Compare, the consistent speech recognition modeling of recognition result of statistics recognition result and the voice data accounts for whole speech recognition modelings Ratio can determine that the recognition result of the voice data is accurate if ratio is greater than or equal to predetermined threshold.
Wherein, at least two speech recognition modeling can belong to two different types, and the type includes: statistics Model and neural network model, both types are currently used two kinds of speech recognition modeling types.
The ratio is greater than or equal to predetermined threshold, can refer to that ratio is 100%, may also mean that less than 100% but Greater than such as 80%.For 100%, if the result that identifies of different types of speech recognition modeling with the knowledge in voice data Other result is identical, then can determine that recognition result is accurate.
6) based on the discriminant approach of vocal print
Vocal print can be also referred to as sound line, can accumulate the voice print database of million grades of user's terms in advance, wherein recordable have respectively Voiceprint and the corresponding voice of each voiceprint and recognition result.
In this way, being directed to any voice data, the voiceprint of the voice in the voice data can be obtained first, can be incited somebody to action later The voiceprint got and each voiceprint recorded in database are matched, if with any vocal print for recording in database Information matches success, can further determine that the recognition result of voice data recognition result corresponding with the voiceprint to match Pronunciation whether match, if so, can determine that the recognition result of the voice data is accurate.
Whether the pronunciation of recognition result matches, and can refer to whether the similitude of the pronunciation of recognition result (textual form) is big In predetermined threshold etc..
The specific value of above-mentioned each threshold value can be determined according to actual needs.
Each discriminant approach described above by way of example only, the technical solution being not intended to restrict the invention.In addition, working as When simultaneously using a variety of discriminant approaches, if determining the recognition result inaccuracy of a certain voice data according to any discriminant approach, The voice data can be then labeled as to negative sample voice data, if determining the knowledge of a certain voice data according to any discriminant approach Other result is accurate, then the voice data can be labeled as to positive sample voice data, alternatively, when being determined according to various discriminant approaches When the recognition result of the voice data is accurate out, which is labeled as positive sample voice data, specific implementation is not Limit.
In addition, for the aforementioned method embodiment, for simple description, it is stated that a series of action combinations, but Be those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because according to the present invention, certain A little steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know that, it is retouched in specification The embodiment stated belongs to preferred embodiment, and related actions and modules are not necessarily necessary for the present invention.
In short, using scheme described in embodiment of the present invention method, it can be for the voice data of online service accumulation, automatically Positive sample and negative sample are labeled it as, to overcome the problems of artificial mark, improves annotating efficiency and mark As a result accuracy, and realize the effective use etc. of the voice data for online service accumulation.
In addition, can be carried out based on annotation results to speech recognition modeling using scheme described in embodiment of the present invention method Training or optimization etc., and can be online by speech recognition modeling, it is user service, so that the voice data of online service is accumulated, into And automatic marking etc. can be carried out to the voice data of accumulation, realize speech recognition effect promoting closed loop.As shown in Fig. 2, Fig. 2 is this Invent the speech recognition effect promoting closed loop schematic diagram.
The introduction about embodiment of the method above, below by way of Installation practice, to scheme of the present invention carry out into One step explanation.
Fig. 3 is the composed structure schematic diagram of voice data annotation equipment embodiment of the present invention.As shown in Figure 3, comprising: Acquiring unit 301 and mark unit 302.
Acquiring unit 301 includes: user's in every voice data for obtaining the voice data of online service accumulation Voice and corresponding recognition result.
Unit 302 is marked, for according to preset at least one discriminant approach, to the recognition result of voice data Accuracy is differentiated, is marked out recognition result and is determined as that accurate positive sample voice data and recognition result are judged to being not allowed True negative sample voice data.
Mark unit 302 can also obtain the revised recognition result of negative sample voice data, by revised negative sample Voice data as positive sample voice data carry out using.
At least one discriminant approach may include but be not limited to following one or any combination: based on fixed filtering rule Discriminant approach, the discriminant approach based on user behavior analysis, based on voice and text feature analysis discriminant approach, based on use The discriminant approach of registered permanent residence sound, based on the united discriminant approach of multi-model, based on the discriminant approach of vocal print.
Unit 302 is marked according to the discriminant approach based on fixed filtering rule, to the accurate of the recognition result of voice data Property is differentiated can include:
For any voice data, however, it is determined that the continuous weight in the recognition result of the voice data comprising single word or word It is multiple, then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then sentence The recognition result inaccuracy of the fixed voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then Determine the recognition result inaccuracy of the voice data.
Acquiring unit 301 can also obtain daily record data on the line of user.Unit 302 is marked according to based on user behavior analysis Discriminant approach, the accuracy of the recognition result of voice data is differentiated can include:
For any voice data, if according to daily record data on line determine the recognition result of the voice data by user into It has gone modification, has then determined that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the recognition result of the voice data according to daily record data on line Scheduled subsequent action is performed, then determines that the recognition result of the voice data is accurate;
For any voice data, if according to daily record data on line determine user input the voice data voice it The similar voice of pronunciation was inputted in preceding scheduled duration at least once, then determines that the recognition result of the voice data is accurate.
Unit 302 is marked according to the discriminant approach based on voice and text feature analysis, to the recognition result of voice data Accuracy differentiated can include:
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then determining should The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then sentence The recognition result inaccuracy of the fixed voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then sentence The recognition result inaccuracy of the fixed voice data.
Unit 302 is marked according to the discriminant approach based on user's accent, to the accuracy of the recognition result of voice data into Row differentiates can include: is directed to any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then Determine that the recognition result of the voice data is accurate.
If the accent of user be the local dialect or the mandarin with region accent, mark unit 302 can also by with Family periphery life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
Unit 302 is marked according to the united discriminant approach of multi-model is based on, to the accuracy of the recognition result of voice data Differentiated can include: be directed to any voice data, be utilized respectively at least two speech recognition modelings in the voice data Voice carries out speech recognition, and respectively carries out the recognition result of the recognition result of each speech recognition modeling and the voice data Compare, the consistent speech recognition modeling of recognition result of statistics recognition result and the voice data accounts for whole speech recognition modelings Ratio determines that the recognition result of the voice data is accurate if ratio is greater than or equal to predetermined threshold.
Wherein, at least two speech recognition modeling can belong to two different types, the type can include: statistics Property model and neural network model.
Unit 302 is marked according to the discriminant approach based on vocal print, the accuracy of the recognition result of voice data is sentenced Not can include:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in database Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the identification knot of the voice data Whether the pronunciation of fruit recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data As a result accurate.
The specific workflow of Fig. 3 shown device embodiment please refers to the related description in preceding method embodiment, no longer It repeats.
In short, using scheme described in apparatus of the present invention embodiment, it can be for the voice data of online service accumulation, automatically Positive sample and negative sample are labeled it as, to overcome the problems of artificial mark, improves annotating efficiency and mark As a result accuracy, and realize the effective use etc. of the voice data for online service accumulation.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention. The computer system/server 12 that Fig. 4 is shown is only an example, should not function and use scope to the embodiment of the present invention Bring any restrictions.
As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology The bus 18 of system component (including memory 28 and processor 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in memory 28 by operation, at various function application and data Reason, such as realize the method in embodiment illustrated in fig. 1.
The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt Processor will realize the method in embodiment as shown in Figure 1 when executing.
It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).
In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (20)

1. a kind of voice data mask method characterized by comprising
The voice data of online service accumulation is obtained, includes: the voice and corresponding recognition result of user in every voice data;
According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated, is marked Recognition result is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice data of inaccuracy out.
2. the method according to claim 1, wherein
This method further comprises: the revised recognition result of negative sample voice data is obtained, by revised negative sample language Sound data as positive sample voice data carry out using.
3. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on fixed filtering rule;
It is described to include: based on the discriminant approach for fixing filtering rule
For any voice data, however, it is determined that continuously repeating comprising single word or word in the recognition result of the voice data, Then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then determine The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then sentences The recognition result inaccuracy of the fixed voice data.
4. the method according to claim 1, wherein
This method further comprises: obtaining daily record data on the line of user;
The discriminant approach includes: the discriminant approach based on user behavior analysis;
The discriminant approach based on user behavior analysis includes:
For any voice data, if determining the recognition result of the voice data by user according to daily record data on the line It is modified, then determines that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the identification knot of the voice data according to daily record data on the line Fruit performs scheduled subsequent action, then determines that the recognition result of the voice data is accurate;
For any voice data, if determining user in the voice for inputting the voice data according to daily record data on the line The similar voice of pronunciation was inputted in scheduled duration before at least once, then determines that the recognition result of the voice data is quasi- Really.
5. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on voice and text feature analysis;
The discriminant approach analyzed based on voice with text feature includes:
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then described in judgement The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then determine The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then determine The recognition result inaccuracy of the voice data.
6. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on user's accent;
The discriminant approach based on user's accent includes:
For any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then determines institute The recognition result for stating voice data is accurate.
7. according to the method described in claim 6, it is characterized in that,
This method further comprises: if the accent of user be the local dialect or the mandarin with region accent, by with Family periphery life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
8. the method according to claim 1, wherein
The discriminant approach includes: based on the united discriminant approach of multi-model;
It is described to include: based on the united discriminant approach of multi-model
For any voice data, it is utilized respectively at least two speech recognition modelings and language is carried out to the voice in the voice data Sound identification, and be respectively compared the recognition result of each speech recognition modeling with the recognition result of the voice data, it unites The consistent speech recognition modeling of recognition result of meter recognition result and the voice data accounts for the ratio of whole speech recognition modelings, If the ratio is greater than or equal to predetermined threshold, determine that the recognition result of the voice data is accurate;
Wherein, at least two speech recognition modeling belongs to two different types, the type include: statistical model with And neural network model.
9. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on vocal print;
The discriminant approach based on vocal print includes:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in the database Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the recognition result of the voice data Whether the pronunciation of recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data As a result accurate.
10. a kind of voice data annotation equipment characterized by comprising acquiring unit and mark unit;
The acquiring unit includes: the voice of user for obtaining the voice data of online service accumulation, in every voice data And corresponding recognition result;
The mark unit is used for according to preset at least one discriminant approach, to the standard of the recognition result of voice data True property is differentiated, is marked out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as inaccuracy Negative sample voice data.
11. device according to claim 10, which is characterized in that
The mark unit is further used for, and obtains the revised recognition result of negative sample voice data, will be revised negative Sample voice data as positive sample voice data carry out using.
12. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on fixed filtering rule;
The mark unit according to the discriminant approach based on fixed filtering rule, to the accuracy of the recognition result of voice data into Row differentiates
For any voice data, however, it is determined that continuously repeating comprising single word or word in the recognition result of the voice data, Then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then determine The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then sentences The recognition result inaccuracy of the fixed voice data.
13. device according to claim 10, which is characterized in that
The acquiring unit is further used for, and obtains daily record data on the line of user;
The discriminant approach includes: the discriminant approach based on user behavior analysis;
The mark unit according to the discriminant approach based on user behavior analysis, to the accuracy of the recognition result of voice data into Row differentiates
For any voice data, if determining the recognition result of the voice data by user according to daily record data on the line It is modified, then determines that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the identification knot of the voice data according to daily record data on the line Fruit performs scheduled subsequent action, then determines that the recognition result of the voice data is accurate;
For any voice data, if determining user in the voice for inputting the voice data according to daily record data on the line The similar voice of pronunciation was inputted in scheduled duration before at least once, then determines that the recognition result of the voice data is quasi- Really.
14. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on voice and text feature analysis;
The mark unit is according to the discriminant approach based on voice and text feature analysis, to the standard of the recognition result of voice data True property differentiate
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then described in judgement The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then determine The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then determine The recognition result inaccuracy of the voice data.
15. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on user's accent;
The mark unit sentences the accuracy of the recognition result of voice data according to the discriminant approach based on user's accent It does not include: for any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then determines The recognition result of the voice data is accurate.
16. device according to claim 15, which is characterized in that
The mark unit is further used for, if the accent of user is the local dialect or the mandarin with region accent, leads to It crosses and user periphery life crowd is analyzed and accumulated, form the corresponding voice data collection of accent of user.
17. device according to claim 10, which is characterized in that
The discriminant approach includes: based on the united discriminant approach of multi-model;
The mark unit carries out the accuracy of the recognition result of voice data according to the united discriminant approach of multi-model is based on Differentiation includes:
For any voice data, it is utilized respectively at least two speech recognition modelings and language is carried out to the voice in the voice data Sound identification, and be respectively compared the recognition result of each speech recognition modeling with the recognition result of the voice data, it unites The consistent speech recognition modeling of recognition result of meter recognition result and the voice data accounts for the ratio of whole speech recognition modelings, If the ratio is greater than or equal to predetermined threshold, determine that the recognition result of the voice data is accurate;
Wherein, at least two speech recognition modeling belongs to two different types, the type include: statistical model with And neural network model.
18. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on vocal print;
The mark unit carries out differentiation packet to the accuracy of the recognition result of voice data according to the discriminant approach based on vocal print It includes:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in the database Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the recognition result of the voice data Whether the pronunciation of recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data As a result accurate.
19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9 Method described in.
20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Such as method according to any one of claims 1 to 9 is realized when device executes.
CN201810960792.5A 2018-08-22 2018-08-22 Voice data mask method, device, computer equipment and storage medium Pending CN109192194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810960792.5A CN109192194A (en) 2018-08-22 2018-08-22 Voice data mask method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810960792.5A CN109192194A (en) 2018-08-22 2018-08-22 Voice data mask method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109192194A true CN109192194A (en) 2019-01-11

Family

ID=64919094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810960792.5A Pending CN109192194A (en) 2018-08-22 2018-08-22 Voice data mask method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109192194A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109616101A (en) * 2019-02-12 2019-04-12 百度在线网络技术(北京)有限公司 Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing
CN109949797A (en) * 2019-03-11 2019-06-28 北京百度网讯科技有限公司 A kind of generation method of training corpus, device, equipment and storage medium
CN110033765A (en) * 2019-04-11 2019-07-19 中国联合网络通信集团有限公司 A kind of method and terminal of speech recognition
CN110148402A (en) * 2019-05-07 2019-08-20 平安科技(深圳)有限公司 Method of speech processing, device, computer equipment and storage medium
CN110288976A (en) * 2019-06-21 2019-09-27 北京声智科技有限公司 Data screening method, apparatus and intelligent sound box
CN110503958A (en) * 2019-08-30 2019-11-26 厦门快商通科技股份有限公司 Audio recognition method, system, mobile terminal and storage medium
CN110737646A (en) * 2019-10-21 2020-01-31 北京明略软件系统有限公司 Data labeling method, device, equipment and readable storage medium
CN110838284A (en) * 2019-11-19 2020-02-25 大众问问(北京)信息科技有限公司 Method and device for processing voice recognition result and computer equipment
CN111510566A (en) * 2020-03-16 2020-08-07 深圳追一科技有限公司 Method and device for determining call label, computer equipment and storage medium
CN112241445A (en) * 2020-10-26 2021-01-19 竹间智能科技(上海)有限公司 Labeling method and device, electronic equipment and storage medium
CN113380238A (en) * 2021-06-09 2021-09-10 阿波罗智联(北京)科技有限公司 Method for processing audio signal, model training method, apparatus, device and medium
CN114974228A (en) * 2022-05-24 2022-08-30 名日之梦(北京)科技有限公司 Rapid voice recognition method based on hierarchical recognition
CN115497453A (en) * 2022-08-31 2022-12-20 海尔优家智能科技(北京)有限公司 Identification model evaluation method and device, storage medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0924688A2 (en) * 1997-12-19 1999-06-23 Mitsubishi Denki Kabushiki Kaisha Markov model discriminator using negative examples
CN101807399A (en) * 2010-02-02 2010-08-18 华为终端有限公司 Voice recognition method and device
CN103198828A (en) * 2013-04-03 2013-07-10 中金数据系统有限公司 Method and system of construction of voice corpus
CN103680493A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Voice data recognition method and device for distinguishing regional accents
CN105930432A (en) * 2016-04-19 2016-09-07 北京百度网讯科技有限公司 Training method and apparatus for sequence labeling tool
CN106228980A (en) * 2016-07-21 2016-12-14 百度在线网络技术(北京)有限公司 Data processing method and device
CN106971721A (en) * 2017-03-29 2017-07-21 沃航(武汉)科技有限公司 A kind of accent speech recognition system based on embedded mobile device
CN107544726A (en) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 Method for correcting error of voice identification result, device and storage medium based on artificial intelligence

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0924688A2 (en) * 1997-12-19 1999-06-23 Mitsubishi Denki Kabushiki Kaisha Markov model discriminator using negative examples
CN101807399A (en) * 2010-02-02 2010-08-18 华为终端有限公司 Voice recognition method and device
CN103198828A (en) * 2013-04-03 2013-07-10 中金数据系统有限公司 Method and system of construction of voice corpus
CN103680493A (en) * 2013-12-19 2014-03-26 百度在线网络技术(北京)有限公司 Voice data recognition method and device for distinguishing regional accents
CN105930432A (en) * 2016-04-19 2016-09-07 北京百度网讯科技有限公司 Training method and apparatus for sequence labeling tool
CN106228980A (en) * 2016-07-21 2016-12-14 百度在线网络技术(北京)有限公司 Data processing method and device
CN106971721A (en) * 2017-03-29 2017-07-21 沃航(武汉)科技有限公司 A kind of accent speech recognition system based on embedded mobile device
CN107544726A (en) * 2017-07-04 2018-01-05 百度在线网络技术(北京)有限公司 Method for correcting error of voice identification result, device and storage medium based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韦向峰等: "一种基于语义分析的汉语语音识别纠错方法", 《计算机科学》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109616101A (en) * 2019-02-12 2019-04-12 百度在线网络技术(北京)有限公司 Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing
CN109949797A (en) * 2019-03-11 2019-06-28 北京百度网讯科技有限公司 A kind of generation method of training corpus, device, equipment and storage medium
US11348571B2 (en) 2019-03-11 2022-05-31 Beijing Baidu Netcom Science And Technology Co., Ltd. Methods, computing devices, and storage media for generating training corpus
CN110033765A (en) * 2019-04-11 2019-07-19 中国联合网络通信集团有限公司 A kind of method and terminal of speech recognition
CN110148402A (en) * 2019-05-07 2019-08-20 平安科技(深圳)有限公司 Method of speech processing, device, computer equipment and storage medium
CN110288976B (en) * 2019-06-21 2021-09-07 北京声智科技有限公司 Data screening method and device and intelligent sound box
CN110288976A (en) * 2019-06-21 2019-09-27 北京声智科技有限公司 Data screening method, apparatus and intelligent sound box
CN110503958A (en) * 2019-08-30 2019-11-26 厦门快商通科技股份有限公司 Audio recognition method, system, mobile terminal and storage medium
CN110737646A (en) * 2019-10-21 2020-01-31 北京明略软件系统有限公司 Data labeling method, device, equipment and readable storage medium
CN110838284A (en) * 2019-11-19 2020-02-25 大众问问(北京)信息科技有限公司 Method and device for processing voice recognition result and computer equipment
CN110838284B (en) * 2019-11-19 2022-06-14 大众问问(北京)信息科技有限公司 Method and device for processing voice recognition result and computer equipment
CN111510566A (en) * 2020-03-16 2020-08-07 深圳追一科技有限公司 Method and device for determining call label, computer equipment and storage medium
CN111510566B (en) * 2020-03-16 2021-05-28 深圳追一科技有限公司 Method and device for determining call label, computer equipment and storage medium
CN112241445A (en) * 2020-10-26 2021-01-19 竹间智能科技(上海)有限公司 Labeling method and device, electronic equipment and storage medium
CN112241445B (en) * 2020-10-26 2023-11-07 竹间智能科技(上海)有限公司 Labeling method and device, electronic equipment and storage medium
CN113380238A (en) * 2021-06-09 2021-09-10 阿波罗智联(北京)科技有限公司 Method for processing audio signal, model training method, apparatus, device and medium
CN114974228A (en) * 2022-05-24 2022-08-30 名日之梦(北京)科技有限公司 Rapid voice recognition method based on hierarchical recognition
CN115497453A (en) * 2022-08-31 2022-12-20 海尔优家智能科技(北京)有限公司 Identification model evaluation method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN109192194A (en) Voice data mask method, device, computer equipment and storage medium
US10950241B2 (en) Diarization using linguistic labeling with segmented and clustered diarized textual transcripts
US7584103B2 (en) Automated extraction of semantic content and generation of a structured document from speech
US8666726B2 (en) Sample clustering to reduce manual transcriptions in speech recognition system
CN108986826A (en) Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
CN109686383B (en) Voice analysis method, device and storage medium
US11693988B2 (en) Use of ASR confidence to improve reliability of automatic audio redaction
US20100299135A1 (en) Automated Extraction of Semantic Content and Generation of a Structured Document from Speech
CN110148416A (en) Audio recognition method, device, equipment and storage medium
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
JP2019053126A (en) Growth type interactive device
Levitan et al. Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
US20120245942A1 (en) Computer-Implemented Systems and Methods for Evaluating Prosodic Features of Speech
CN111177350A (en) Method, device and system for forming dialect of intelligent voice robot
CN111370030A (en) Voice emotion detection method and device, storage medium and electronic equipment
US20130030794A1 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN109872714A (en) A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
CN115424618A (en) Electronic medical record voice interaction equipment based on machine learning
CN112015874A (en) Student mental health accompany conversation system
JP2004094257A (en) Method and apparatus for generating question of decision tree for speech processing
CN109074809A (en) Information processing equipment, information processing method and program
AU2020103587A4 (en) A system and a method for cross-linguistic automatic speech recognition
US20230402030A1 (en) Embedded Dictation Detection
Tao et al. The relationship between speech features changes when you get depressed: Feature correlations for improving speed and performance of depression detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190111

RJ01 Rejection of invention patent application after publication