CN109192194A - Voice data mask method, device, computer equipment and storage medium - Google Patents
Voice data mask method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109192194A CN109192194A CN201810960792.5A CN201810960792A CN109192194A CN 109192194 A CN109192194 A CN 109192194A CN 201810960792 A CN201810960792 A CN 201810960792A CN 109192194 A CN109192194 A CN 109192194A
- Authority
- CN
- China
- Prior art keywords
- voice data
- recognition result
- voice
- discriminant approach
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000013459 approach Methods 0.000 claims abstract description 76
- 238000009825 accumulation Methods 0.000 claims abstract description 16
- 241001672694 Citrus reticulata Species 0.000 claims description 11
- 230000001755 vocal effect Effects 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 230000004069 differentiation Effects 0.000 claims 2
- 238000013179 statistical model Methods 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000005291 magnetic effect Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000001737 promoting effect Effects 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses voice data mask method, device, computer equipment and storage mediums, and wherein method includes: the voice data for obtaining online service accumulation, include: the voice and corresponding recognition result of user in every voice data;According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated, recognition result is marked out and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice data of inaccuracy.Using scheme of the present invention, it is able to ascend the accuracy etc. of annotating efficiency and annotation results.
Description
[technical field]
The present invention relates to Computer Applied Technology, in particular to voice data mask method, device, computer equipment and deposit
Storage media.
[background technique]
With making rapid progress for the horizontal raising of people's daily life and science and technology, it is desirable to by more natural mode into
Row human-computer dialogue obtains information and service, therefore the requirement to the recognition accuracy of this interactive mode of voice is also higher and higher.
Accordingly, it is desirable to identify accurate speech recognition modeling, and the training of speech recognition modeling, optimization, assessment
Equal work, all rely on and largely mark accurate voice data.
Currently, voice annotation mainly uses artificial notation methods, at least there is following ask in this mode in practical applications
Topic: inefficiency: voice annotation work currently is listened by human ear, the modes such as hand-kept carry out substantially, and a people marks several small
When high-accuracy voice data, be that ultrahigh water is shown no increases in output out, and often tens of thousands of hours voice numbers actually required
According to;Accuracy is insufficient: artificial mark work repeats dullness, and the people of the long campaigns work is easy to generate feeling of fatigue, to lead
Cause marking error.
[summary of the invention]
In view of this, the present invention provides voice data mask method, device, computer equipment and storage mediums.
Specific technical solution is as follows:
A kind of voice data mask method, comprising:
The voice data for obtaining online service accumulation, include: in every voice data user voice and corresponding identification
As a result;
According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated,
It marks out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice of inaccuracy
Data.
A kind of voice data annotation equipment, comprising: acquiring unit and mark unit;
The acquiring unit includes: user's in every voice data for obtaining the voice data of online service accumulation
Voice and corresponding recognition result;
The mark unit is used for according to preset at least one discriminant approach, to the recognition result of voice data
Accuracy differentiated, mark out recognition result and be determined as that accurate positive sample voice data and recognition result are determined as not
Accurate negative sample voice data.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor
The computer program of upper operation, the processor realize method as described above when executing described program.
A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor
Now method as described above.
Can be seen that based on above-mentioned introduction can be for the voice data of online service accumulation, certainly in scheme of the present invention
Label it as positive sample and negative sample dynamicly, to overcome the problems of artificial mark, improve annotating efficiency and
The accuracy of annotation results, and realize the effective use etc. of the voice data for online service accumulation.
[Detailed description of the invention]
Fig. 1 is the flow chart of voice data mask method embodiment of the present invention.
Fig. 2 is speech recognition effect promoting closed loop schematic diagram of the present invention.
Fig. 3 is the composed structure schematic diagram of voice data annotation equipment embodiment of the present invention.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
[specific embodiment]
In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention
The scheme of stating is further described.
Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention
In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all
Belong to the scope of protection of the invention.
Fig. 1 is the flow chart of voice data mask method embodiment of the present invention.As shown in Figure 1, including in detail below
Implementation.
In 101, the voice data of online service accumulation is obtained, includes: the voice of user in every voice data and right
The recognition result answered.
In 102, according to preset at least one discriminant approach, to the accuracy of the recognition result of voice data into
Row differentiates, marks out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as the negative sample of inaccuracy
This voice data.
Currently, many online services need to carry out speech recognition, correspondingly, the voice data of magnanimity can be accumulated, and these
Data are not developed and used effectively always.
Usually, the quasi- rate of sentence for marking voice data requires 95% or more the training that could be used for speech recognition modeling
Deng, but the speech recognition result in actual scene can not reach such requirement, by conditions such as environment, distance, reflection, decays
It influences, the quasi- rate of sentence reaches 85% just relatively high horizontal at last, and the voice data of 85% quasi- rate is not able to satisfy sentence standard
The requirement of 95% or more rate, it is therefore desirable to high-purity (95% or more) is filtered out from the voice data of (85%) of low-purity
Voice data.It, can be according to preset at least one discriminant approach, to the language of online service accumulation specific in the present embodiment
The accuracy of the recognition result of sound data is differentiated, so that marking out recognition result is determined as accurate positive sample voice data
And recognition result is determined as the negative sample voice data of inaccuracy.
In every voice data can include: the voice (such as voice query) of user and corresponding recognition result.
It for positive sample voice data, can be used directly, such as make directly as training set for speech recognition modeling training
With.For negative sample voice data, its recognition result can be modified, and can by manually marking or other technological means
It carries out revised negative sample voice data as positive sample voice data using so that it is more fixed to obtain speech recognition modeling
To accurate optimization etc..
At least one discriminant approach may include but be not limited to following one or any combination: based on fixed filtering rule
Discriminant approach, the discriminant approach based on user behavior analysis, based on voice and text feature analysis discriminant approach, based on use
The discriminant approach of registered permanent residence sound, based on the united discriminant approach of multi-model, based on the discriminant approach of vocal print.
The specific implementation of above-mentioned each discriminant approach is illustrated individually below.
1) discriminant approach based on fixed filtering rule
The case where fixed filtering rule mainly includes some common speech recognition errors.
For example, can determine that the language if continuously repeating comprising single word or word in the recognition result of any voice data
The recognition result inaccuracy of sound data.
Continuously repeating for single word or word can refer to " uh uh ", " " etc., if in a certain recognition result including this
A little contents, maximum probability identify mistake, to can determine that recognition result inaccuracy.
For another example, if the voice length of any voice data and recognition result length mismatch, it can determine that the voice number
According to recognition result inaccuracy.
Normal person can say 1~3 word the 1 second time of oral account, if that the average 1 second corresponding recognition result of voice length
Less than 1 word or it is greater than 3 words, then maximum probability identifies mistake, to can determine that recognition result inaccuracy.
For another example, if including error code in the recognition result of any voice data, it can determine that the identification of the voice data
As a result inaccurate.
For another example, if the recognition result of any voice data belongs to scheduled frequent fault case, it can determine that the voice
The recognition result inaccuracy of data.
As recognition result be " using Baidu.com, you are known that ", it is likely to user misoperation or it is other due to cause
Wrong identification because this does not meet normal user speech query content.
2) based on the discriminant approach of user behavior analysis
In the present embodiment, other than obtaining the voice data of online service accumulation, the line of user can be also further obtained
Upper daily record data.
According to daily record data on line, some behavioural informations of user can be got, and these behavioural informations facilitate to knowledge
The accuracy of other result is differentiated.
For example, if determining that the recognition result of any voice data is modified by user according to daily record data on line,
It then can determine that the modified recognition result of the voice data is accurate.
When user input voice after, if discovery recognition result and oneself expected from be not inconsistent, recognition result can be led
Dynamic modification, modified recognition result can be considered as correspondingly can determine that modified knowledge to the error correction of the recognition result of mistake
Other result is accurate.
For another example, if according to daily record data on line determine user for any voice data recognition result perform it is pre-
Fixed subsequent action then can determine that the recognition result of the voice data is accurate.
If recognition result is correct, user usually has subsequent movement, such as issues search instruction.
For another example, if determining user before the voice for inputting any voice data according to daily record data on the line
The similar voice of pronunciation was inputted in scheduled duration at least once, then can determine that the recognition result of the voice data is accurate.
If the voice of user's input is identified mistake, other than active is modified, user is also possible to that voice can be repeatedly input,
Until recognition result is correct, that is to say, that the corresponding recognition result of voice of user's last time input is usually accurate.
3) discriminant approach based on voice and text feature analysis
For example, can determine that the knowledge of the voice data if the signal-to-noise ratio of the voice of any voice data is lower than predetermined threshold
Other result inaccuracy.
If signal-to-noise ratio very little, illustrate to be mingled with very big noise in voice, recognition result in this case is usually all
It is inaccurate.
For another example, if can determine that the voice number comprising long mute and/or long white noise in the voice of any voice data
According to recognition result inaccuracy.
Under normal circumstances, the voice of user's input is smooth, in the event of length is mute and/or long white noise,
It is likely to there is a problem, to can determine that recognition result inaccuracy.
For another example, if the syntax of the recognition result of any voice data do not meet grammatical requirement, it can determine that the voice number
According to recognition result inaccuracy.
It if the syntax of recognition result do not meet the syntax and require such as Chinese syntax requirement, i.e., is not normal life term, that
Maximum probability identifies mistake, to can determine that recognition result inaccuracy.
4) discriminant approach based on user's accent
For example, can determine that the voice data if the accent of the corresponding user of any voice data is standard mandarin
Recognition result is accurate.
Know compared to the local dialect or the mandarin with region accent etc. when the accent of user is standard mandarin
Other result is generally more accurate.It can be by manually listening to or other technological means determine the accent of user, if a certain user
Accent be standard mandarin, then can determine that the recognition result of the corresponding voice data of the user is accurate.
In addition, if the accent of user be the local dialect or the mandarin with region accent, can also be by user periphery
Life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
User periphery life crowd can refer to the household etc. of user, such as parent, siblings, the accent of these people is logical
Be often it is identical, for these people when using online service, address wifi etc. is usually identical, raw so as to determine user periphery
Living person group.
Corresponding speech recognition modeling can be assessed, trained or be optimized using the voice data collection formed.
5) it is based on the united discriminant approach of multi-model
For example, being directed to any voice data, at least two speech recognition modelings can be utilized respectively in the voice data
Voice carries out speech recognition, and respectively carries out the recognition result of the recognition result of each speech recognition modeling and the voice data
Compare, the consistent speech recognition modeling of recognition result of statistics recognition result and the voice data accounts for whole speech recognition modelings
Ratio can determine that the recognition result of the voice data is accurate if ratio is greater than or equal to predetermined threshold.
Wherein, at least two speech recognition modeling can belong to two different types, and the type includes: statistics
Model and neural network model, both types are currently used two kinds of speech recognition modeling types.
The ratio is greater than or equal to predetermined threshold, can refer to that ratio is 100%, may also mean that less than 100% but
Greater than such as 80%.For 100%, if the result that identifies of different types of speech recognition modeling with the knowledge in voice data
Other result is identical, then can determine that recognition result is accurate.
6) based on the discriminant approach of vocal print
Vocal print can be also referred to as sound line, can accumulate the voice print database of million grades of user's terms in advance, wherein recordable have respectively
Voiceprint and the corresponding voice of each voiceprint and recognition result.
In this way, being directed to any voice data, the voiceprint of the voice in the voice data can be obtained first, can be incited somebody to action later
The voiceprint got and each voiceprint recorded in database are matched, if with any vocal print for recording in database
Information matches success, can further determine that the recognition result of voice data recognition result corresponding with the voiceprint to match
Pronunciation whether match, if so, can determine that the recognition result of the voice data is accurate.
Whether the pronunciation of recognition result matches, and can refer to whether the similitude of the pronunciation of recognition result (textual form) is big
In predetermined threshold etc..
The specific value of above-mentioned each threshold value can be determined according to actual needs.
Each discriminant approach described above by way of example only, the technical solution being not intended to restrict the invention.In addition, working as
When simultaneously using a variety of discriminant approaches, if determining the recognition result inaccuracy of a certain voice data according to any discriminant approach,
The voice data can be then labeled as to negative sample voice data, if determining the knowledge of a certain voice data according to any discriminant approach
Other result is accurate, then the voice data can be labeled as to positive sample voice data, alternatively, when being determined according to various discriminant approaches
When the recognition result of the voice data is accurate out, which is labeled as positive sample voice data, specific implementation is not
Limit.
In addition, for the aforementioned method embodiment, for simple description, it is stated that a series of action combinations, but
Be those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because according to the present invention, certain
A little steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know that, it is retouched in specification
The embodiment stated belongs to preferred embodiment, and related actions and modules are not necessarily necessary for the present invention.
In short, using scheme described in embodiment of the present invention method, it can be for the voice data of online service accumulation, automatically
Positive sample and negative sample are labeled it as, to overcome the problems of artificial mark, improves annotating efficiency and mark
As a result accuracy, and realize the effective use etc. of the voice data for online service accumulation.
In addition, can be carried out based on annotation results to speech recognition modeling using scheme described in embodiment of the present invention method
Training or optimization etc., and can be online by speech recognition modeling, it is user service, so that the voice data of online service is accumulated, into
And automatic marking etc. can be carried out to the voice data of accumulation, realize speech recognition effect promoting closed loop.As shown in Fig. 2, Fig. 2 is this
Invent the speech recognition effect promoting closed loop schematic diagram.
The introduction about embodiment of the method above, below by way of Installation practice, to scheme of the present invention carry out into
One step explanation.
Fig. 3 is the composed structure schematic diagram of voice data annotation equipment embodiment of the present invention.As shown in Figure 3, comprising:
Acquiring unit 301 and mark unit 302.
Acquiring unit 301 includes: user's in every voice data for obtaining the voice data of online service accumulation
Voice and corresponding recognition result.
Unit 302 is marked, for according to preset at least one discriminant approach, to the recognition result of voice data
Accuracy is differentiated, is marked out recognition result and is determined as that accurate positive sample voice data and recognition result are judged to being not allowed
True negative sample voice data.
Mark unit 302 can also obtain the revised recognition result of negative sample voice data, by revised negative sample
Voice data as positive sample voice data carry out using.
At least one discriminant approach may include but be not limited to following one or any combination: based on fixed filtering rule
Discriminant approach, the discriminant approach based on user behavior analysis, based on voice and text feature analysis discriminant approach, based on use
The discriminant approach of registered permanent residence sound, based on the united discriminant approach of multi-model, based on the discriminant approach of vocal print.
Unit 302 is marked according to the discriminant approach based on fixed filtering rule, to the accurate of the recognition result of voice data
Property is differentiated can include:
For any voice data, however, it is determined that the continuous weight in the recognition result of the voice data comprising single word or word
It is multiple, then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then sentence
The recognition result inaccuracy of the fixed voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice
The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then
Determine the recognition result inaccuracy of the voice data.
Acquiring unit 301 can also obtain daily record data on the line of user.Unit 302 is marked according to based on user behavior analysis
Discriminant approach, the accuracy of the recognition result of voice data is differentiated can include:
For any voice data, if according to daily record data on line determine the recognition result of the voice data by user into
It has gone modification, has then determined that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the recognition result of the voice data according to daily record data on line
Scheduled subsequent action is performed, then determines that the recognition result of the voice data is accurate;
For any voice data, if according to daily record data on line determine user input the voice data voice it
The similar voice of pronunciation was inputted in preceding scheduled duration at least once, then determines that the recognition result of the voice data is accurate.
Unit 302 is marked according to the discriminant approach based on voice and text feature analysis, to the recognition result of voice data
Accuracy differentiated can include:
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then determining should
The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then sentence
The recognition result inaccuracy of the fixed voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then sentence
The recognition result inaccuracy of the fixed voice data.
Unit 302 is marked according to the discriminant approach based on user's accent, to the accuracy of the recognition result of voice data into
Row differentiates can include: is directed to any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then
Determine that the recognition result of the voice data is accurate.
If the accent of user be the local dialect or the mandarin with region accent, mark unit 302 can also by with
Family periphery life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
Unit 302 is marked according to the united discriminant approach of multi-model is based on, to the accuracy of the recognition result of voice data
Differentiated can include: be directed to any voice data, be utilized respectively at least two speech recognition modelings in the voice data
Voice carries out speech recognition, and respectively carries out the recognition result of the recognition result of each speech recognition modeling and the voice data
Compare, the consistent speech recognition modeling of recognition result of statistics recognition result and the voice data accounts for whole speech recognition modelings
Ratio determines that the recognition result of the voice data is accurate if ratio is greater than or equal to predetermined threshold.
Wherein, at least two speech recognition modeling can belong to two different types, the type can include: statistics
Property model and neural network model.
Unit 302 is marked according to the discriminant approach based on vocal print, the accuracy of the recognition result of voice data is sentenced
Not can include:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in database
Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the identification knot of the voice data
Whether the pronunciation of fruit recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data
As a result accurate.
The specific workflow of Fig. 3 shown device embodiment please refers to the related description in preceding method embodiment, no longer
It repeats.
In short, using scheme described in apparatus of the present invention embodiment, it can be for the voice data of online service accumulation, automatically
Positive sample and negative sample are labeled it as, to overcome the problems of artificial mark, improves annotating efficiency and mark
As a result accuracy, and realize the effective use etc. of the voice data for online service accumulation.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
The computer system/server 12 that Fig. 4 is shown is only an example, should not function and use scope to the embodiment of the present invention
Bring any restrictions.
As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service
The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology
The bus 18 of system component (including memory 28 and processor 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints
What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and
Immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory
Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no
Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing
Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used
To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving
Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive
Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program
Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention
The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28
In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould
Block 42 usually executes function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14
Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more
Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated
Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And
And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network
(LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus
18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined
Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy
Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in memory 28 by operation, at various function application and data
Reason, such as realize the method in embodiment illustrated in fig. 1.
The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt
Processor will realize the method in embodiment as shown in Figure 1 when executing.
It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating
Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited
In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates
The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just
Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium other than computer readable storage medium, which can send, propagate or
Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service
It is connected for quotient by internet).
In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through
Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (20)
1. a kind of voice data mask method characterized by comprising
The voice data of online service accumulation is obtained, includes: the voice and corresponding recognition result of user in every voice data;
According to preset at least one discriminant approach, the accuracy of the recognition result of voice data is differentiated, is marked
Recognition result is determined as that accurate positive sample voice data and recognition result are determined as the negative sample voice data of inaccuracy out.
2. the method according to claim 1, wherein
This method further comprises: the revised recognition result of negative sample voice data is obtained, by revised negative sample language
Sound data as positive sample voice data carry out using.
3. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on fixed filtering rule;
It is described to include: based on the discriminant approach for fixing filtering rule
For any voice data, however, it is determined that continuously repeating comprising single word or word in the recognition result of the voice data,
Then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then determine
The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice
The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then sentences
The recognition result inaccuracy of the fixed voice data.
4. the method according to claim 1, wherein
This method further comprises: obtaining daily record data on the line of user;
The discriminant approach includes: the discriminant approach based on user behavior analysis;
The discriminant approach based on user behavior analysis includes:
For any voice data, if determining the recognition result of the voice data by user according to daily record data on the line
It is modified, then determines that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the identification knot of the voice data according to daily record data on the line
Fruit performs scheduled subsequent action, then determines that the recognition result of the voice data is accurate;
For any voice data, if determining user in the voice for inputting the voice data according to daily record data on the line
The similar voice of pronunciation was inputted in scheduled duration before at least once, then determines that the recognition result of the voice data is quasi-
Really.
5. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on voice and text feature analysis;
The discriminant approach analyzed based on voice with text feature includes:
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then described in judgement
The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then determine
The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then determine
The recognition result inaccuracy of the voice data.
6. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on user's accent;
The discriminant approach based on user's accent includes:
For any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then determines institute
The recognition result for stating voice data is accurate.
7. according to the method described in claim 6, it is characterized in that,
This method further comprises: if the accent of user be the local dialect or the mandarin with region accent, by with
Family periphery life crowd analyzes and accumulates, and forms the corresponding voice data collection of accent of user.
8. the method according to claim 1, wherein
The discriminant approach includes: based on the united discriminant approach of multi-model;
It is described to include: based on the united discriminant approach of multi-model
For any voice data, it is utilized respectively at least two speech recognition modelings and language is carried out to the voice in the voice data
Sound identification, and be respectively compared the recognition result of each speech recognition modeling with the recognition result of the voice data, it unites
The consistent speech recognition modeling of recognition result of meter recognition result and the voice data accounts for the ratio of whole speech recognition modelings,
If the ratio is greater than or equal to predetermined threshold, determine that the recognition result of the voice data is accurate;
Wherein, at least two speech recognition modeling belongs to two different types, the type include: statistical model with
And neural network model.
9. the method according to claim 1, wherein
The discriminant approach includes: the discriminant approach based on vocal print;
The discriminant approach based on vocal print includes:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in the database
Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the recognition result of the voice data
Whether the pronunciation of recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data
As a result accurate.
10. a kind of voice data annotation equipment characterized by comprising acquiring unit and mark unit;
The acquiring unit includes: the voice of user for obtaining the voice data of online service accumulation, in every voice data
And corresponding recognition result;
The mark unit is used for according to preset at least one discriminant approach, to the standard of the recognition result of voice data
True property is differentiated, is marked out recognition result and is determined as that accurate positive sample voice data and recognition result are determined as inaccuracy
Negative sample voice data.
11. device according to claim 10, which is characterized in that
The mark unit is further used for, and obtains the revised recognition result of negative sample voice data, will be revised negative
Sample voice data as positive sample voice data carry out using.
12. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on fixed filtering rule;
The mark unit according to the discriminant approach based on fixed filtering rule, to the accuracy of the recognition result of voice data into
Row differentiates
For any voice data, however, it is determined that continuously repeating comprising single word or word in the recognition result of the voice data,
Then determine the recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the voice length and recognition result length of the voice data mismatch, then determine
The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that include error code in the recognition result of the voice data, then determine the voice
The recognition result inaccuracy of data;
For any voice data, however, it is determined that the recognition result of the voice data belongs to scheduled frequent fault case, then sentences
The recognition result inaccuracy of the fixed voice data.
13. device according to claim 10, which is characterized in that
The acquiring unit is further used for, and obtains daily record data on the line of user;
The discriminant approach includes: the discriminant approach based on user behavior analysis;
The mark unit according to the discriminant approach based on user behavior analysis, to the accuracy of the recognition result of voice data into
Row differentiates
For any voice data, if determining the recognition result of the voice data by user according to daily record data on the line
It is modified, then determines that the modified recognition result of the voice data is accurate;
For any voice data, if determining that user is directed to the identification knot of the voice data according to daily record data on the line
Fruit performs scheduled subsequent action, then determines that the recognition result of the voice data is accurate;
For any voice data, if determining user in the voice for inputting the voice data according to daily record data on the line
The similar voice of pronunciation was inputted in scheduled duration before at least once, then determines that the recognition result of the voice data is quasi-
Really.
14. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on voice and text feature analysis;
The mark unit is according to the discriminant approach based on voice and text feature analysis, to the standard of the recognition result of voice data
True property differentiate
For any voice data, however, it is determined that the signal-to-noise ratio of the voice of the voice data is lower than predetermined threshold, then described in judgement
The recognition result inaccuracy of voice data;
For any voice data, however, it is determined that comprising long mute and/or long white noise in the voice of the voice data, then determine
The recognition result inaccuracy of the voice data;
For any voice data, however, it is determined that the syntax of the recognition result of the voice data do not meet grammatical requirement, then determine
The recognition result inaccuracy of the voice data.
15. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on user's accent;
The mark unit sentences the accuracy of the recognition result of voice data according to the discriminant approach based on user's accent
It does not include: for any voice data, however, it is determined that the accent of the corresponding user of the voice data is standard mandarin, then determines
The recognition result of the voice data is accurate.
16. device according to claim 15, which is characterized in that
The mark unit is further used for, if the accent of user is the local dialect or the mandarin with region accent, leads to
It crosses and user periphery life crowd is analyzed and accumulated, form the corresponding voice data collection of accent of user.
17. device according to claim 10, which is characterized in that
The discriminant approach includes: based on the united discriminant approach of multi-model;
The mark unit carries out the accuracy of the recognition result of voice data according to the united discriminant approach of multi-model is based on
Differentiation includes:
For any voice data, it is utilized respectively at least two speech recognition modelings and language is carried out to the voice in the voice data
Sound identification, and be respectively compared the recognition result of each speech recognition modeling with the recognition result of the voice data, it unites
The consistent speech recognition modeling of recognition result of meter recognition result and the voice data accounts for the ratio of whole speech recognition modelings,
If the ratio is greater than or equal to predetermined threshold, determine that the recognition result of the voice data is accurate;
Wherein, at least two speech recognition modeling belongs to two different types, the type include: statistical model with
And neural network model.
18. device according to claim 10, which is characterized in that
The discriminant approach includes: the discriminant approach based on vocal print;
The mark unit carries out differentiation packet to the accuracy of the recognition result of voice data according to the discriminant approach based on vocal print
It includes:
For any voice data, the voiceprint of the voice in the voice data is obtained;
The voiceprint that will acquire and each voiceprint recorded in database are matched, and record has each in the database
Voiceprint and the corresponding voice of each voiceprint and recognition result;
If any voiceprint successful match with recording in database, further determines that the recognition result of the voice data
Whether the pronunciation of recognition result corresponding with the voiceprint to match matches, if so, determining the identification of the voice data
As a result accurate.
19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9
Method described in.
20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
Such as method according to any one of claims 1 to 9 is realized when device executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810960792.5A CN109192194A (en) | 2018-08-22 | 2018-08-22 | Voice data mask method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810960792.5A CN109192194A (en) | 2018-08-22 | 2018-08-22 | Voice data mask method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109192194A true CN109192194A (en) | 2019-01-11 |
Family
ID=64919094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810960792.5A Pending CN109192194A (en) | 2018-08-22 | 2018-08-22 | Voice data mask method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109192194A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109616101A (en) * | 2019-02-12 | 2019-04-12 | 百度在线网络技术(北京)有限公司 | Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing |
CN109949797A (en) * | 2019-03-11 | 2019-06-28 | 北京百度网讯科技有限公司 | A kind of generation method of training corpus, device, equipment and storage medium |
CN110033765A (en) * | 2019-04-11 | 2019-07-19 | 中国联合网络通信集团有限公司 | A kind of method and terminal of speech recognition |
CN110148402A (en) * | 2019-05-07 | 2019-08-20 | 平安科技(深圳)有限公司 | Method of speech processing, device, computer equipment and storage medium |
CN110288976A (en) * | 2019-06-21 | 2019-09-27 | 北京声智科技有限公司 | Data screening method, apparatus and intelligent sound box |
CN110503958A (en) * | 2019-08-30 | 2019-11-26 | 厦门快商通科技股份有限公司 | Audio recognition method, system, mobile terminal and storage medium |
CN110737646A (en) * | 2019-10-21 | 2020-01-31 | 北京明略软件系统有限公司 | Data labeling method, device, equipment and readable storage medium |
CN110838284A (en) * | 2019-11-19 | 2020-02-25 | 大众问问(北京)信息科技有限公司 | Method and device for processing voice recognition result and computer equipment |
CN111510566A (en) * | 2020-03-16 | 2020-08-07 | 深圳追一科技有限公司 | Method and device for determining call label, computer equipment and storage medium |
CN112241445A (en) * | 2020-10-26 | 2021-01-19 | 竹间智能科技(上海)有限公司 | Labeling method and device, electronic equipment and storage medium |
CN113380238A (en) * | 2021-06-09 | 2021-09-10 | 阿波罗智联(北京)科技有限公司 | Method for processing audio signal, model training method, apparatus, device and medium |
CN114974228A (en) * | 2022-05-24 | 2022-08-30 | 名日之梦(北京)科技有限公司 | Rapid voice recognition method based on hierarchical recognition |
CN115497453A (en) * | 2022-08-31 | 2022-12-20 | 海尔优家智能科技(北京)有限公司 | Identification model evaluation method and device, storage medium and electronic device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0924688A2 (en) * | 1997-12-19 | 1999-06-23 | Mitsubishi Denki Kabushiki Kaisha | Markov model discriminator using negative examples |
CN101807399A (en) * | 2010-02-02 | 2010-08-18 | 华为终端有限公司 | Voice recognition method and device |
CN103198828A (en) * | 2013-04-03 | 2013-07-10 | 中金数据系统有限公司 | Method and system of construction of voice corpus |
CN103680493A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Voice data recognition method and device for distinguishing regional accents |
CN105930432A (en) * | 2016-04-19 | 2016-09-07 | 北京百度网讯科技有限公司 | Training method and apparatus for sequence labeling tool |
CN106228980A (en) * | 2016-07-21 | 2016-12-14 | 百度在线网络技术(北京)有限公司 | Data processing method and device |
CN106971721A (en) * | 2017-03-29 | 2017-07-21 | 沃航(武汉)科技有限公司 | A kind of accent speech recognition system based on embedded mobile device |
CN107544726A (en) * | 2017-07-04 | 2018-01-05 | 百度在线网络技术(北京)有限公司 | Method for correcting error of voice identification result, device and storage medium based on artificial intelligence |
-
2018
- 2018-08-22 CN CN201810960792.5A patent/CN109192194A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0924688A2 (en) * | 1997-12-19 | 1999-06-23 | Mitsubishi Denki Kabushiki Kaisha | Markov model discriminator using negative examples |
CN101807399A (en) * | 2010-02-02 | 2010-08-18 | 华为终端有限公司 | Voice recognition method and device |
CN103198828A (en) * | 2013-04-03 | 2013-07-10 | 中金数据系统有限公司 | Method and system of construction of voice corpus |
CN103680493A (en) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Voice data recognition method and device for distinguishing regional accents |
CN105930432A (en) * | 2016-04-19 | 2016-09-07 | 北京百度网讯科技有限公司 | Training method and apparatus for sequence labeling tool |
CN106228980A (en) * | 2016-07-21 | 2016-12-14 | 百度在线网络技术(北京)有限公司 | Data processing method and device |
CN106971721A (en) * | 2017-03-29 | 2017-07-21 | 沃航(武汉)科技有限公司 | A kind of accent speech recognition system based on embedded mobile device |
CN107544726A (en) * | 2017-07-04 | 2018-01-05 | 百度在线网络技术(北京)有限公司 | Method for correcting error of voice identification result, device and storage medium based on artificial intelligence |
Non-Patent Citations (1)
Title |
---|
韦向峰等: "一种基于语义分析的汉语语音识别纠错方法", 《计算机科学》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109616101A (en) * | 2019-02-12 | 2019-04-12 | 百度在线网络技术(北京)有限公司 | Acoustic training model method, apparatus, computer equipment and readable storage medium storing program for executing |
CN109949797A (en) * | 2019-03-11 | 2019-06-28 | 北京百度网讯科技有限公司 | A kind of generation method of training corpus, device, equipment and storage medium |
US11348571B2 (en) | 2019-03-11 | 2022-05-31 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Methods, computing devices, and storage media for generating training corpus |
CN110033765A (en) * | 2019-04-11 | 2019-07-19 | 中国联合网络通信集团有限公司 | A kind of method and terminal of speech recognition |
CN110148402A (en) * | 2019-05-07 | 2019-08-20 | 平安科技(深圳)有限公司 | Method of speech processing, device, computer equipment and storage medium |
CN110288976B (en) * | 2019-06-21 | 2021-09-07 | 北京声智科技有限公司 | Data screening method and device and intelligent sound box |
CN110288976A (en) * | 2019-06-21 | 2019-09-27 | 北京声智科技有限公司 | Data screening method, apparatus and intelligent sound box |
CN110503958A (en) * | 2019-08-30 | 2019-11-26 | 厦门快商通科技股份有限公司 | Audio recognition method, system, mobile terminal and storage medium |
CN110737646A (en) * | 2019-10-21 | 2020-01-31 | 北京明略软件系统有限公司 | Data labeling method, device, equipment and readable storage medium |
CN110838284A (en) * | 2019-11-19 | 2020-02-25 | 大众问问(北京)信息科技有限公司 | Method and device for processing voice recognition result and computer equipment |
CN110838284B (en) * | 2019-11-19 | 2022-06-14 | 大众问问(北京)信息科技有限公司 | Method and device for processing voice recognition result and computer equipment |
CN111510566A (en) * | 2020-03-16 | 2020-08-07 | 深圳追一科技有限公司 | Method and device for determining call label, computer equipment and storage medium |
CN111510566B (en) * | 2020-03-16 | 2021-05-28 | 深圳追一科技有限公司 | Method and device for determining call label, computer equipment and storage medium |
CN112241445A (en) * | 2020-10-26 | 2021-01-19 | 竹间智能科技(上海)有限公司 | Labeling method and device, electronic equipment and storage medium |
CN112241445B (en) * | 2020-10-26 | 2023-11-07 | 竹间智能科技(上海)有限公司 | Labeling method and device, electronic equipment and storage medium |
CN113380238A (en) * | 2021-06-09 | 2021-09-10 | 阿波罗智联(北京)科技有限公司 | Method for processing audio signal, model training method, apparatus, device and medium |
CN114974228A (en) * | 2022-05-24 | 2022-08-30 | 名日之梦(北京)科技有限公司 | Rapid voice recognition method based on hierarchical recognition |
CN115497453A (en) * | 2022-08-31 | 2022-12-20 | 海尔优家智能科技(北京)有限公司 | Identification model evaluation method and device, storage medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109192194A (en) | Voice data mask method, device, computer equipment and storage medium | |
US10950241B2 (en) | Diarization using linguistic labeling with segmented and clustered diarized textual transcripts | |
US7584103B2 (en) | Automated extraction of semantic content and generation of a structured document from speech | |
US8666726B2 (en) | Sample clustering to reduce manual transcriptions in speech recognition system | |
CN108986826A (en) | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes | |
CN109686383B (en) | Voice analysis method, device and storage medium | |
US11693988B2 (en) | Use of ASR confidence to improve reliability of automatic audio redaction | |
US20100299135A1 (en) | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech | |
CN110148416A (en) | Audio recognition method, device, equipment and storage medium | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
JP2019053126A (en) | Growth type interactive device | |
Levitan et al. | Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. | |
US20120245942A1 (en) | Computer-Implemented Systems and Methods for Evaluating Prosodic Features of Speech | |
CN111177350A (en) | Method, device and system for forming dialect of intelligent voice robot | |
CN111370030A (en) | Voice emotion detection method and device, storage medium and electronic equipment | |
US20130030794A1 (en) | Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof | |
CN111145903A (en) | Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system | |
CN109872714A (en) | A kind of method, electronic equipment and storage medium improving accuracy of speech recognition | |
CN115424618A (en) | Electronic medical record voice interaction equipment based on machine learning | |
CN112015874A (en) | Student mental health accompany conversation system | |
JP2004094257A (en) | Method and apparatus for generating question of decision tree for speech processing | |
CN109074809A (en) | Information processing equipment, information processing method and program | |
AU2020103587A4 (en) | A system and a method for cross-linguistic automatic speech recognition | |
US20230402030A1 (en) | Embedded Dictation Detection | |
Tao et al. | The relationship between speech features changes when you get depressed: Feature correlations for improving speed and performance of depression detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190111 |
|
RJ01 | Rejection of invention patent application after publication |