CN108172219A - method and device for recognizing voice - Google Patents

method and device for recognizing voice Download PDF

Info

Publication number
CN108172219A
CN108172219A CN201711133270.XA CN201711133270A CN108172219A CN 108172219 A CN108172219 A CN 108172219A CN 201711133270 A CN201711133270 A CN 201711133270A CN 108172219 A CN108172219 A CN 108172219A
Authority
CN
China
Prior art keywords
voice
sound
section
identified
sections
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711133270.XA
Other languages
Chinese (zh)
Other versions
CN108172219B (en
Inventor
徐夏伶
毛跃辉
梁博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201711133270.XA priority Critical patent/CN108172219B/en
Publication of CN108172219A publication Critical patent/CN108172219A/en
Application granted granted Critical
Publication of CN108172219B publication Critical patent/CN108172219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a method and a device for recognizing voice. Wherein, the method comprises the following steps: determining a voice to be recognized; determining an effective sound segment from a plurality of sound segments contained in the voice to be recognized; and acquiring a recognition result of the voice to be recognized according to the command words extracted from the effective sound segment. The invention solves the technical problem of low voice recognition efficiency in the prior art.

Description

The method and apparatus for identifying voice
Technical field
The present invention relates to field of intelligent control, in particular to a kind of method and apparatus for identifying voice.
Background technology
With the fast development of science and technology, obtained in all trades and professions extensively by the method for voice control smart machine General popularization.But in more noisy environment, smart machine is relatively low to the acceptance rate of voice, meanwhile, to identifying the standard of voice True rate is also than relatively low.In addition, when the order word that control smart machine is contained in the voice of user, smart machine is according to user Order word in voice completes corresponding operation, but user is not intended to control smart machine at this time, for example, the language of user Sound content is " program in CCTV1 is seen very well ", and television set extracts the keyword of " CCTV1 " from the voice content of user, And current TV programme are jumped in the TV programme of CCTV1.As shown in the above, it is existing to pass through voice control intelligence The method of energy equipment has the defects of misrecognition.
In addition, to solve the above problems, the prior art mainly passes through Application on Voiceprint Recognition to mixing sound source (i.e. voice and non-language The sound source of the mixture of tones together) category filter is carried out, the selectivity that single sound source is then carried out to the voice filtered out is handled.It holds The equipment of the row above method needs to carry out a large amount of evaluation work, affects equipment performance, so as to reduce the standard of identification voice True rate.
The problem of middle audio identification efficiency is low for the above-mentioned prior art, currently no effective solution has been proposed.
Invention content
An embodiment of the present invention provides a kind of method and apparatus for identifying voice, at least to solve voice knowledge in the prior art The technical issues of other efficiency is low.
One side according to embodiments of the present invention provides a kind of method for identifying voice, including:Determine language to be identified Sound;Effective acoustic sections are determined in the multiple sound sections included from voice to be identified;According to the order word extracted in effective acoustic sections, obtain The recognition result of voice to be identified.
Another aspect according to embodiments of the present invention additionally provides a kind of device for identifying voice, including:First determining mould Block, for determining voice to be identified;Second determining module, for determining effective sound from multiple sound sections that voice to be identified includes Section;Acquisition module, for according to the order word extracted in effective acoustic sections, obtaining the recognition result of voice to be identified.
Another aspect according to embodiments of the present invention additionally provides a kind of storage medium, which includes storage Program, wherein, the method that program performs identification voice.
Another aspect according to embodiments of the present invention additionally provides a kind of processor, which is used to run program, In, the method for identifying voice is performed when program is run.
In embodiments of the present invention, in a manner that voice is identified in Application on Voiceprint Recognition, by determining voice to be identified, Effective acoustic sections are determined in the multiple sound sections included from voice to be identified, and are treated according to the order word acquisition extracted in effective acoustic sections It identifies the recognition result of voice, has achieved the purpose that accurately identify voice, it is achieved thereby that improve the accuracy rate of speech recognition Technique effect, and then solve the technical issues of audio identification efficiency is low in the prior art.
Description of the drawings
Attached drawing described herein is used to provide further understanding of the present invention, and forms the part of the application, this hair Bright illustrative embodiments and their description do not constitute improper limitations of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is a kind of method flow diagram of identification voice according to embodiments of the present invention;
Fig. 2 is a kind of schematic diagram optionally classified to mixing sound according to embodiments of the present invention;
Fig. 3 is a kind of schematic diagram for optionally dividing independent sound section according to embodiments of the present invention;
Fig. 4 is a kind of schematic diagram of optional determining effective acoustic sections according to embodiments of the present invention;And
Fig. 5 is a kind of apparatus structure schematic diagram of identification voice according to embodiments of the present invention.
Specific embodiment
In order to which those skilled in the art is made to more fully understand the present invention program, below in conjunction in the embodiment of the present invention The technical solution in the embodiment of the present invention is clearly and completely described in attached drawing, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's all other embodiments obtained without making creative work should all belong to the model that the present invention protects It encloses.
It should be noted that term " first " in description and claims of this specification and above-mentioned attached drawing, " Two " etc. be the object for distinguishing similar, and specific sequence or precedence are described without being used for.It should be appreciated that it uses in this way Data can be interchanged in the appropriate case, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment Those steps or unit clearly listed, but may include not listing clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment of the method for identifying voice is provided, it should be noted that in attached drawing The step of flow illustrates can perform in the computer system of such as a group of computer-executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can perform shown with the sequence being different from herein or retouch The step of stating.
Fig. 1 is the method flow diagram of identification voice according to embodiments of the present invention, as shown in Figure 1, this method is including as follows Step:
Step S102 determines voice to be identified.
It should be noted that in more noisy environment, the sound in environment includes non-voice and voice, wherein, it is non- Voice is the sound in addition to the voice of people, for example, the sound of vehicle traveling.And above-mentioned voice to be identified is the voice for people. It is accurately identified in order to the voice to people, needs to extract voice from the sound of noisy environment, wherein, it can Voice to be identified to be extracted from the sound of noisy environment using the method for Application on Voiceprint Recognition.
The process that Application on Voiceprint Recognition is carried out to sound mainly includes feature extraction and speech recognition.Wherein, feature extraction The task of journey is to extract and select the acoustics for having the characteristics such as separability is strong, stability is high to the vocal print of speaker or voice spy Sign.And preferable acoustics or phonetic feature can not only efficiently differentiate different speakers, moreover it is possible in the language of same speaker When sound changes, remain to keep the noiseproof feature of opposite stabilization.In the acoustics or phonetic feature for extracting sound and then Speech recognition is carried out according to the acoustics of sound or phonetic feature.Wherein, speech recognition technology is widely used in internet and intelligence Voice signal is mainly changed into the process of corresponding control instruction by household industry by way of identifying and parsing.Voice Identification technology mainly includes Feature Extraction Technology, mode-matching technique and model training technology.
In addition it is also necessary to explanation, voice and non-is extracted from the sound of noisy environment by sound groove recognition technology in e Voice in speech processes later, is only handled voice, and it is accurate to the identification of voice under noisy environment thus can to improve Rate.
Step S104 determines effective acoustic sections in the multiple sound sections included from voice to be identified.
It should be noted that smart machine completes corresponding operating to order word included in above-mentioned effective acoustic sections in order to control Order word cannot control smart machine included in order word rather than effective acoustic sections.For example, the language of non-effective sound section Sound content is " when I closes air-conditioning, the dog of my family all can be very irritated ", and the voice content of effective acoustic sections is " opens as soon as possible Air-conditioning ", the processor of smart machine can start air-conditioning according to the order word " opening air-conditioning " in effective acoustic sections, but not according to Order word " close air-conditioning " in non-effective sound section and shutoff operation is carried out to air-conditioning.
In addition it is also necessary to illustrate, effective acoustic sections are determined in the multiple sound sections included from voice to be identified, and use Order word in effective acoustic sections controls the order word in smart machine rather than effective acoustic sections that cannot control smart machine, so as to Effectively prevent the misrecognition to voice.
Step S106 according to the order word extracted in effective acoustic sections, obtains the recognition result of voice to be identified.
It should be noted that order word is the word for controlling smart machine, for example, " opening air-conditioning ", " closing sky It adjusts ".
Specifically, after order word is extracted from effective acoustic sections, the processor of smart machine is also needed to order word It is further processed, for example, order word to be converted to corresponding control instruction, the processor of smart machine refers to control Other units for being sent to smart machine are enabled, perform operation corresponding with control instruction.
Based on the scheme that above-mentioned steps S102 is limited to step S106, can know, by determining voice to be identified, from Effective acoustic sections are determined in multiple sound sections that voice to be identified includes, and is obtained according to the order word extracted in effective acoustic sections and waits to know The recognition result of other voice.
It is easily noted that, since voice to be identified includes multiple sound sections, order word may be included in multiple sound sections, Therefore, if only the order word in extraction sound section is likely to result in the misrecognition to voice, therefore, voice to be identified is being determined Comprising multiple sound sections after, from multiple sound sections determine effective acoustic sections, in effective acoustic sections order word carry out voice knowledge Not, it is possible to reduce voice is caused to misidentify the generation of phenomenon due to including order word in longer voice, so as to further carry The high accuracy rate of speech recognition.
The above is it is found that the present embodiment can achieve the purpose that accurately identify voice, it is achieved thereby that improving voice knowledge The technique effect of other accuracy rate, and then solve the technical issues of audio identification efficiency is low in the prior art.
In a kind of optional embodiment, determine that voice to be identified includes:
Step S1020 obtains mixing sound, wherein, mixing sound includes at least voice and non-voice;
Step S1022 extracts at least one voice using voiceprint recognition algorithm from mixing sound;
Step S1024 carries out classification processing at least one voice based on voiceprint recognition algorithm, obtains classification results;
Step S1026 determines voice to be identified from classification results.
Specifically, a kind of schematic diagram optionally classified to mixing sound as shown in Figure 2.In fig. 2, arrow The left side is, from the voice of people extracted in sound is mixed, includes two sound sources, the two sound sources may in the voice of the people Come from different users, wherein, black rectangle represents multiple sound sections in sound source 1, and white rectangle represents multiple in sound source 2 Sound section.Due to phonetic feature possessed by the voice of different user be it is different, can be special according to the voice of each user Sign distinguishes multiple voices, i.e., classification processing is carried out at least one voice, and classification results are the right of arrow in Fig. 2 Shown, as shown in Figure 2, sound source 1 is made of sound section 1, sound section 4,5 harmony section 6 of sound section, and sound source 2 is by sound section 2,3 harmony section 7 of sound section Composition.
In addition, it is not all with per family voice control can be carried out to smart machine, only meet the use of certain permission Family just can control smart machine, therefore, after at least one speech differentiation is come, can will distinguish obtained multiple voices Phonetic feature is matched with the phonetic feature to prestore in smart machine, if successful match, the voice of successful match is made For voice to be identified;If matching is unsuccessful, give up the unsuccessful voice of matching.
It should be noted that at least one voice can be extracted from mixing sound by way of machine learning, specifically Step is as follows:
Step S1022a analyzes mixing sound using preset model, determines at least one of mixing sound language Sound, wherein, preset model trains to obtain using multigroup voice data by machine learning, every group in multigroup voice data Data include:Mix the label of sound and the voice in mark mixing sound;
Step S1022b extracts determining at least one voice from mixing sound.
Specifically, at least one of above-mentioned mixing sound voice can be the voice from different user, wherein, it is different Phonetic feature possessed by the voice of user is also different.Therefore, after the mixing sound in getting noisy environment, Mixing sound is analyzed using beforehand through the preset model that machine learning is trained, determines voice in mixing sound Phonetic feature, for example, determining quantity, amplitude type of sound etc. of different frequency in mixing sound.In order to from sound In extract voice, neural network model can be established, the phonetic feature of multigroup mixing sound is obtained in advance, and pass through and manually mark The mode of note sets corresponding label for the phonetic feature in every group of mixing sound, then using the label set to creolized language Sound is trained, and obtains preset model.After preset model is built, input of the sound as preset model can will be mixed, and The output of preset model is at least one voice extracted from mixing sound.
It should be noted that at least one voice is divided by building disaggregated model using the method for machine learning Class processing, is as follows:
Step S1024a obtains at least one voice;
Step S1024b analyzes at least one voice using disaggregated model, determines every at least one voice The sound-type of a voice, wherein, disaggregated model trains to obtain using multigroup voice data by machine learning, Duo Zuyu Every group of voice data in sound data includes:The label of at least one voice and the sound-type at least one voice;
Step S1024c, the sound-type of each voice at least one voice divide at least one voice Class obtains classification results.
Specifically, after at least one voice is got, using the classification trained beforehand through machine learning Model analyzes at least one voice, determines the sound-type of each voice at least one voice, wherein, language can be passed through Phonetic feature possessed by sound determines the type belonging to voice, for example, phonetic feature possessed by child and adult voice Phonetic feature is different, therefore, can determine that the voice belongs to adult or child by phonetic feature.In order to according to voice Phonetic feature determine the sound-type belonging to voice, neural network model can be established, obtain the language of multigroup voice in advance Sound feature, and corresponding label is set for the sound-type in every group of voice by way of manually marking, then using setting Good label is trained voice, obtains disaggregated model.It, can be using multiple voices as classification mould after disaggregated model is built The input of type, and the output of disaggregated model is the sound-type corresponding to each voice.
It should be noted that after voice to be identified is determined, it is also necessary to the multiple languages included from voice to be identified Extract effective acoustic sections in segment, and the order word in effective acoustic sections controls smart machine.It is included from voice to be identified Sound section in determine effective acoustic sections, specifically comprise the following steps:
Step S1040 determines multiple sound sections in voice to be identified;
Multiple sound sections are divided at least by step S1042 according to the duration parameters of multiple sound sections that voice to be identified includes One independent sound section;
Step S1044 determines effective acoustic sections according to the duration of at least one independent sound section.
In a kind of optional embodiment, for user during voice is sent out, sentence will have pause, therefore, can root Voice to be identified is divided into multiple sound sections according to the time of pause, for example, a length of 10 seconds during the voice of user, preceding 2 seconds users do not have There is a pause, the 3rd second is the dead time, and 4-7 second are voice, are paused within the 8th second, and 9-10 seconds are voice, as a result, can will be to be identified Voice is divided into three sound sections, i.e. 1-2 seconds, 4-7 seconds and 9-10 seconds corresponding sound section.
In a kind of optional embodiment, after multiple sound sections in determining voice to be identified, according to voice to be identified Comprising the duration parameters of multiple sound sections multiple sound sections are divided at least one independent sound section, wherein, duration parameters are at least wrapped It includes one of following:At the beginning of sound section and the end time of sound section.The specific method for dividing independent sound section is as follows:
Step S1042a, at the beginning of determining the end time of the first sound section and rising tone section, wherein, the first sound section A upper sound section adjacent for rising tone section, also, the end time of the first sound section is less than at the beginning of rising tone section;
Step S1042b, the first time difference at the beginning of the end time of the first sound section of calculating and rising tone section;
Step S1042c in the case where first time difference is less than the first predetermined threshold value, determines the first sound section and second Sound section belongs to identical independent sound section;
Step S1042d in the case where first time difference is greater than or equal to the first predetermined threshold value, determines the first sound section Belong to different independent sound sections with rising tone section.
Now illustrated by taking Fig. 3 as an example, wherein, Fig. 3 shows a kind of schematic diagram for optionally dividing independent sound section.Fig. 3 Including 6 sound sections, respectively sound section 1, sound section 2, sound section 3, sound section 4,5 harmony section 6 of sound section, wherein, at the beginning of sound section 1 and End time is respectively t1 and t2, and at the beginning of sound section 2 and the end time is respectively t3 and t4, at the beginning of sound section 3 and End time is respectively t5 and t6, and at the beginning of sound section 4 and the end time is respectively t7 and t8, at the beginning of sound section 5 and End time is respectively t9 and t10, and at the beginning of sound section 6 and the end time is respectively t11 and t12.Assuming that the first default threshold It is worth for δ, due to t3-t2 < δ, t5-t4 < δ, therefore, sound section 1,2 harmony section 3 of sound section is divided into independent sound section 1;Due to t7- T6 > δ, t9-t8 > δ, therefore, sound section 4 are individually divided into an independent sound section, i.e., independent sound section 2;Due to t9-t8 > δ, t11- Therefore 5 harmony section 6 of sound section, is divided into independent sound section 3 by t10 < δ.
It is determining independent sound section and then effective acoustic sections is determined according to ready-portioned independent sound section, according at least one only The duration of vertical sound section determines that effective acoustic sections specifically comprise the following steps:
Step S1044a determines the duration of each independent sound section at least one independent sound section;
Step S1044b in the case where the duration of independent sound section is less than the second predetermined threshold value, determines independent sound Duan Weiyou Effect sound section.
Now illustrated by taking Fig. 4 as an example, wherein, Fig. 4 shows a kind of schematic diagram of optional determining effective acoustic sections.Fig. 4 Including 3 independent sound sections, respectively independent sound section 1, independent sound section 2 and independent sound section 3, wherein, at the beginning of independent sound section 1 It is respectively T1 and T2 with the end time, the duration λ 1=T2-T1 of independent sound section 1;At the beginning of independent sound section 2 and the end time Respectively T3 and T4, the duration λ 2=T4-T3 of independent sound section 2;At the beginning of independent sound section 3 and the end time be respectively T5 and T6, the duration λ 3=T6-T5 of independent sound section 3.Assuming that the second predetermined threshold value is λ, due to 13 > λ of > λ, λ of λ, and 2 < λ of λ, therefore, Using independent sound section 2 as effective acoustic sections, independent sound section 1 and independent sound section 3 are non-effective sound section.
It should be noted that after the effective acoustic sections during voice to be identified is determined, extraction effective acoustic sections are included Order word, and smart machine is controlled to complete corresponding operation according to the order word.Wherein, according to extracting in effective acoustic sections Order word, the recognition result for obtaining voice to be identified specifically comprise the following steps:
Step S1060 obtains the order word in effective acoustic sections;
Step S1062 determines default sound bank control instruction corresponding with order word;
Step S1064, control device are completed and the corresponding operation of control instruction.
Specifically, after effective acoustic sections are got, the processor of smart machine is by using extraction model to effective sound Duan Jinhang is analyzed, and determines the order word in effective acoustic sections, wherein, extraction model is to pass through machine learning using multigroup effective acoustic sections What training obtained, every group of effective acoustic sections in multigroup effective acoustic sections include:It is ordered in effective acoustic sections and mark effective acoustic sections The label of word.After order word is extracted from effective acoustic sections, by the keyword progress in order word and default sound bank Match, determined if existed in default sound bank with the matched keyword of order word, basis with the matched keyword of order word Control instruction, and smart machine is controlled to complete corresponding operation according to control instruction;If it is not found in default sound bank The keyword to match with order word, then smart machine is without any operation.
Embodiment 2
According to embodiments of the present invention, a kind of device embodiment for identifying voice is additionally provided, wherein, Fig. 5 is according to this hair The apparatus structure schematic diagram of the identification voice of bright embodiment, as shown in figure 5, the device includes:First determining module 501, second Determining module 503 and acquisition module 505.
Wherein, the first determining module 501, for determining voice to be identified;Second determining module 503, for to be identified Effective acoustic sections are determined in multiple sound sections that voice includes;Acquisition module 505, for according to the order extracted in effective acoustic sections Word obtains the recognition result of voice to be identified.
It should be noted that above-mentioned first determining module 501, the second determining module 503 and acquisition module 505 correspond to Step S102 to step S106 in embodiment 1, three modules are identical with example and application scenarios that corresponding step is realized, But it is not limited to the above embodiments 1 disclosure of that.
In a kind of optional embodiment, the first determining module includes:First acquisition module, extraction module, the first processing Module and third determining module.Wherein, the first acquisition module mixes sound for obtaining, wherein, mixing sound includes at least Voice and non-voice;Extraction module, for extracting at least one voice from mixing sound using voiceprint recognition algorithm;First Processing module carries out classification processing at least one voice for being based on voiceprint recognition algorithm, obtains classification results;Third determines Module, for determining voice to be identified from classification results.
It should be noted that above-mentioned first acquisition module, extraction module, first processing module and third determining module pair The example and applied field that should be realized in the step S1020 in embodiment 1 to step S1026, four modules with corresponding step Scape is identical, but is not limited to the above embodiments 1 disclosure of that.
In a kind of optional embodiment, extraction module includes:4th determining module and the 5th determining module.Wherein, 4th determining module for being analyzed using preset model mixing sound, determines at least one of mixing sound voice, Wherein, preset model trains to obtain using multigroup voice data by machine learning, every group of number in multigroup voice data According to including:Mix the label of sound and the voice in mark mixing sound;5th determining module, for from mixing sound Extract determining at least one voice.
It should be noted that above-mentioned 4th determining module and the 5th determining module correspond to the step in embodiment 1 S1022a to step S1022b, two modules are identical with example and application scenarios that corresponding step is realized, but are not limited to State 1 disclosure of that of embodiment.
In a kind of optional embodiment, processing module includes:At second acquisition module, the 6th determining module and second Manage module.Wherein, the second acquisition module, for obtaining at least one voice;6th determining module, for using disaggregated model pair At least one voice is analyzed, and determines the sound-type of each voice at least one voice, wherein, disaggregated model is makes It is trained with multigroup voice data by machine learning, every group of voice data in multigroup voice data includes:At least The label of one voice and the sound-type at least one voice;Second processing module, for according at least one voice In the sound-type of each voice classify at least one voice, obtain classification results.
Implement it should be noted that above-mentioned second acquisition module, the 6th determining module and Second processing module correspond to Step S1024a to step S1024c in example 1, three modules are identical with example and application scenarios that corresponding step is realized, But it is not limited to the above embodiments 1 disclosure of that.
In a kind of optional embodiment, the second determining module includes:7th determining module, division module and the 8th are really Cover half block.Wherein, the 7th determining module, for determining multiple sound sections in voice to be identified;Division module is waited to know for basis Multiple sound sections are divided at least one independent sound section by the duration parameters of multiple sound sections that other voice includes;8th determining module, For determining effective acoustic sections according to the duration of at least one independent sound section.
It should be noted that above-mentioned 7th determining module, division module and the 8th determining module correspond in embodiment 1 Step S1040 to step S1044, the example and application scenarios that three modules and corresponding step are realized be identical but unlimited In 1 disclosure of that of above-described embodiment.
In a kind of optional embodiment, duration parameters include at least one of following:At the beginning of sound section and sound section End time, wherein, division module includes:9th determining module, computing module, the tenth determining module and the 11st determine Module.Wherein, the 9th determining module, at the beginning of determining the end time of the first sound section and rising tone section, wherein, A first sound section upper sound section adjacent for rising tone section, also, the end time of the first sound section is less than the beginning of rising tone section Time;Computing module, for the first time difference at the beginning of the end time of the first sound section of calculating and rising tone section;The Ten determining modules in the case of being less than the first predetermined threshold value in first time difference, determine the first sound section and rising tone section Belong to identical independent sound section;11st determining module, for being greater than or equal to the first predetermined threshold value in first time difference In the case of, determine that the first sound section and rising tone section belong to different independent sound sections.
It should be noted that above-mentioned 9th determining module, computing module, the tenth determining module and the 11st determining module Corresponding to the step S1042a in embodiment 1 to step S1042d, example that four modules and corresponding step are realized and should It is identical with scene, but it is not limited to the above embodiments 1 disclosure of that.
In a kind of optional embodiment, the 8th determining module includes:12nd determining module and the 13rd determining mould Block.Wherein, the 12nd determining module, for determining the duration of each independent sound section at least one independent sound section;13rd Determining module, in the case of being less than the second predetermined threshold value in the duration of independent sound section, it is effective acoustic sections to determine independent sound section.
It should be noted that above-mentioned 12nd determining module and the 13rd determining module correspond to the step in embodiment 1 Rapid S1044a to step S1044b, two modules are identical with example and application scenarios that corresponding step is realized, but are not limited to 1 disclosure of that of above-described embodiment.
In a kind of optional embodiment, acquisition module includes:Third acquisition module, the 14th determining module and control Module.Wherein, third acquisition module, for obtaining the order word in effective acoustic sections;14th determining module, it is default for determining Sound bank control instruction corresponding with order word;Control module is completed and the corresponding operation of control instruction for control device.
It should be noted that above-mentioned third acquisition module, the 14th determining module and control module correspond to embodiment 1 In step S1060 to step S1064, three modules are identical with example and application scenarios that corresponding step is realized, but not It is limited to 1 disclosure of that of above-described embodiment.
Embodiment 3
Another aspect according to embodiments of the present invention additionally provides a kind of storage medium, which includes storage Program, wherein, the method that program performs the identification voice in above-described embodiment 1.
Embodiment 4
Another aspect according to embodiments of the present invention additionally provides a kind of processor, which is used to run program, In, program performs the method for identifying voice in above-described embodiment 1 when running.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, all emphasize particularly on different fields to the description of each embodiment, do not have in some embodiment The part of detailed description may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of division of logic function, can there is an other dividing mode in actual implementation, for example, multiple units or component can combine or Person is desirably integrated into another system or some features can be ignored or does not perform.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module It connects, can be electrical or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple On unit.Some or all of unit therein can be selected according to the actual needs to realize the purpose of this embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products It embodies, which is stored in a storage medium, is used including some instructions so that a computer Equipment (can be personal computer, server or network equipment etc.) perform each embodiment the method for the present invention whole or Part steps.And aforementioned storage medium includes:USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can to store program code Medium.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (11)

  1. A kind of 1. method for identifying voice, which is characterized in that including:
    Determine voice to be identified;
    Effective acoustic sections are determined in the multiple sound sections included from the voice to be identified;
    According to the order word extracted in the effective acoustic sections, the recognition result of the voice to be identified is obtained.
  2. 2. according to the method described in claim 1, it is characterized in that, determine that voice to be identified includes:
    Mixing sound is obtained, wherein, the mixing sound includes at least voice and non-voice;
    At least one voice is extracted from the mixing sound using voiceprint recognition algorithm;
    Classification processing is carried out at least one voice based on the voiceprint recognition algorithm, obtains classification results;
    The voice to be identified is determined from the classification results.
  3. 3. it according to the method described in claim 2, it is characterized in that, is extracted from the mixing sound using voiceprint recognition algorithm Go out at least one voice to include:
    The mixing sound is analyzed using preset model, determines at least one of mixing sound voice, wherein, The preset model trains to obtain using multigroup voice data by machine learning, every group in multigroup voice data Data include:The label of the mixing sound and the voice in the mark mixing sound;
    Determining at least one voice is extracted from the mixing sound.
  4. 4. according to the method described in claim 2, it is characterized in that, based on the voiceprint recognition algorithm at least one language Sound carries out classification processing, obtains classification results and includes:
    Obtain at least one voice;
    At least one voice is analyzed using disaggregated model, determines each voice at least one voice Sound-type, wherein, the disaggregated model trains to obtain using multigroup voice data by machine learning, multigroup language Every group of voice data in sound data includes:Sound-type at least one voice and at least one voice Label;
    The sound-type of each voice at least one voice classifies at least one voice, obtains The classification results.
  5. 5. it according to the method described in claim 1, it is characterized in that, is determined from the sound section that the voice to be identified includes effective Sound section, including:
    Determine multiple sound sections in the voice to be identified;
    According to the duration parameters of multiple sound sections that the voice to be identified includes by the multiple sound section be divided into it is at least one solely Vertical sound section;
    The effective acoustic sections are determined according to the duration of at least one independent sound section.
  6. 6. according to the method described in claim 5, it is characterized in that, the duration parameters are including at least one of following:Sound section Time started and the end time of sound section, wherein, it will according to the duration parameters of multiple sound sections that the voice to be identified includes The multiple sound section is divided at least one independent sound section and includes:
    At the beginning of determining the end time of the first sound section and rising tone section, wherein, the first sound section is described second The adjacent upper sound section of sound section, also, the end time of the first sound section is less than at the beginning of the rising tone section;
    Calculate the first time difference at the beginning of end time of the first sound section and the rising tone section;
    In the case where the first time difference is less than the first predetermined threshold value, the first sound section and the rising tone section are determined Belong to identical independent sound section;
    In the case where the first time difference is greater than or equal to first predetermined threshold value, the first sound section and institute are determined State the independent sound section that rising tone section belongs to different.
  7. 7. according to the method described in claim 5, it is characterized in that, institute is determined according to the duration of at least one independent sound section Effective acoustic sections are stated to include:
    Determine the duration of each independent sound section at least one independent sound section;
    In the case where the duration of the independent sound section is less than the second predetermined threshold value, it is effective sound to determine the independent sound section Section.
  8. 8. according to the method described in claim 1, it is characterized in that, according to the order word extracted in the effective acoustic sections, obtain The recognition result of the voice to be identified is taken to include:
    Obtain the order word in the effective acoustic sections;
    Determine default sound bank control instruction corresponding with the order word;
    Control device is completed and the corresponding operation of the control instruction.
  9. 9. a kind of device for identifying voice, which is characterized in that including:
    First determining module, for determining voice to be identified;
    Second determining module, for determining effective acoustic sections from multiple sound sections that the voice to be identified includes;
    Acquisition module, for according to the order word extracted in the effective acoustic sections, obtaining the identification knot of the voice to be identified Fruit.
  10. 10. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein, described program right of execution The method that profit requires the identification voice described in any one in 1 to 8.
  11. 11. a kind of processor, which is characterized in that the processor is used to run program, wherein, right of execution when described program is run The method that profit requires the identification voice described in any one in 1 to 8.
CN201711133270.XA 2017-11-14 2017-11-14 Method and device for recognizing voice Active CN108172219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711133270.XA CN108172219B (en) 2017-11-14 2017-11-14 Method and device for recognizing voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711133270.XA CN108172219B (en) 2017-11-14 2017-11-14 Method and device for recognizing voice

Publications (2)

Publication Number Publication Date
CN108172219A true CN108172219A (en) 2018-06-15
CN108172219B CN108172219B (en) 2021-02-26

Family

ID=62527360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711133270.XA Active CN108172219B (en) 2017-11-14 2017-11-14 Method and device for recognizing voice

Country Status (1)

Country Link
CN (1) CN108172219B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145124A (en) * 2018-08-16 2019-01-04 格力电器(武汉)有限公司 Information storage method and device, storage medium and electronic device
CN110310657A (en) * 2019-07-10 2019-10-08 北京猎户星空科技有限公司 A kind of audio data processing method and device
WO2020042603A1 (en) * 2018-08-27 2020-03-05 广东美的制冷设备有限公司 Voice control method, module, household appliance, system and computer storage medium
RU2735363C1 (en) * 2019-08-16 2020-10-30 Бейджин Сяоми Мобайл Софтвеа Ко., Лтд. Method and device for sound processing and data medium
CN111951786A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Training method and device of voice recognition model, terminal equipment and medium
WO2021031811A1 (en) * 2019-08-21 2021-02-25 华为技术有限公司 Method and device for voice enhancement

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1017041B1 (en) * 1995-11-15 2005-05-11 Hitachi, Ltd. Voice recognizing and translating system
CN102655002A (en) * 2011-03-01 2012-09-05 株式会社理光 Audio processing method and audio processing equipment
US8412455B2 (en) * 2010-08-13 2013-04-02 Ambit Microsystems (Shanghai) Ltd. Voice-controlled navigation device and method
CN103489454A (en) * 2013-09-22 2014-01-01 浙江大学 Voice endpoint detection method based on waveform morphological characteristic clustering
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting
CN104143326A (en) * 2013-12-03 2014-11-12 腾讯科技(深圳)有限公司 Voice command recognition method and device
US20140350918A1 (en) * 2013-05-24 2014-11-27 Tencent Technology (Shenzhen) Co., Ltd. Method and system for adding punctuation to voice files
CN104464723A (en) * 2014-12-16 2015-03-25 科大讯飞股份有限公司 Voice interaction method and system
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
US9324319B2 (en) * 2013-05-21 2016-04-26 Speech Morphing Systems, Inc. Method and apparatus for exemplary segment classification
CN105575394A (en) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 Voiceprint identification method based on global change space and deep learning hybrid modeling
CN106601233A (en) * 2016-12-22 2017-04-26 北京元心科技有限公司 Voice command recognition method and device and electronic equipment
CN106601241A (en) * 2016-12-26 2017-04-26 河南思维信息技术有限公司 Automatic time correcting method for recording file
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media
CN106782506A (en) * 2016-11-23 2017-05-31 语联网(武汉)信息技术有限公司 A kind of method that recorded audio is divided into section

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1017041B1 (en) * 1995-11-15 2005-05-11 Hitachi, Ltd. Voice recognizing and translating system
US8412455B2 (en) * 2010-08-13 2013-04-02 Ambit Microsystems (Shanghai) Ltd. Voice-controlled navigation device and method
CN102655002A (en) * 2011-03-01 2012-09-05 株式会社理光 Audio processing method and audio processing equipment
US9324319B2 (en) * 2013-05-21 2016-04-26 Speech Morphing Systems, Inc. Method and apparatus for exemplary segment classification
US20140350918A1 (en) * 2013-05-24 2014-11-27 Tencent Technology (Shenzhen) Co., Ltd. Method and system for adding punctuation to voice files
CN103489454A (en) * 2013-09-22 2014-01-01 浙江大学 Voice endpoint detection method based on waveform morphological characteristic clustering
CN104143326A (en) * 2013-12-03 2014-11-12 腾讯科技(深圳)有限公司 Voice command recognition method and device
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting
CN104536978A (en) * 2014-12-05 2015-04-22 奇瑞汽车股份有限公司 Voice data identifying method and device
CN104464723A (en) * 2014-12-16 2015-03-25 科大讯飞股份有限公司 Voice interaction method and system
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN105575394A (en) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 Voiceprint identification method based on global change space and deep learning hybrid modeling
CN106782506A (en) * 2016-11-23 2017-05-31 语联网(武汉)信息技术有限公司 A kind of method that recorded audio is divided into section
CN106601233A (en) * 2016-12-22 2017-04-26 北京元心科技有限公司 Voice command recognition method and device and electronic equipment
CN106601241A (en) * 2016-12-26 2017-04-26 河南思维信息技术有限公司 Automatic time correcting method for recording file
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕西宝: "移动机器人语音控制技术研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145124A (en) * 2018-08-16 2019-01-04 格力电器(武汉)有限公司 Information storage method and device, storage medium and electronic device
CN109145124B (en) * 2018-08-16 2022-02-25 格力电器(武汉)有限公司 Information storage method and device, storage medium and electronic device
WO2020042603A1 (en) * 2018-08-27 2020-03-05 广东美的制冷设备有限公司 Voice control method, module, household appliance, system and computer storage medium
CN111951786A (en) * 2019-05-16 2020-11-17 武汉Tcl集团工业研究院有限公司 Training method and device of voice recognition model, terminal equipment and medium
CN110310657A (en) * 2019-07-10 2019-10-08 北京猎户星空科技有限公司 A kind of audio data processing method and device
CN110310657B (en) * 2019-07-10 2022-02-08 北京猎户星空科技有限公司 Audio data processing method and device
RU2735363C1 (en) * 2019-08-16 2020-10-30 Бейджин Сяоми Мобайл Софтвеа Ко., Лтд. Method and device for sound processing and data medium
US11264027B2 (en) 2019-08-16 2022-03-01 Beijing Xiaomi Mobile Software Co., Ltd. Method and apparatus for determining target audio data during application waking-up
WO2021031811A1 (en) * 2019-08-21 2021-02-25 华为技术有限公司 Method and device for voice enhancement
CN112420063A (en) * 2019-08-21 2021-02-26 华为技术有限公司 Voice enhancement method and device
CN112420063B (en) * 2019-08-21 2024-10-18 华为技术有限公司 Voice enhancement method and device

Also Published As

Publication number Publication date
CN108172219B (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN108172219A (en) method and device for recognizing voice
WO2021012734A1 (en) Audio separation method and apparatus, electronic device and computer-readable storage medium
CN110021308B (en) Speech emotion recognition method and device, computer equipment and storage medium
CN109523986B (en) Speech synthesis method, apparatus, device and storage medium
CN103700370B (en) A kind of radio and television speech recognition system method and system
CN110334201A (en) A kind of intension recognizing method, apparatus and system
CN110136749A (en) The relevant end-to-end speech end-point detecting method of speaker and device
DE112021001064T5 (en) Device-directed utterance recognition
CN108305643A (en) The determination method and apparatus of emotion information
CN108447471A (en) Audio recognition method and speech recognition equipment
CN110379441B (en) Voice service method and system based on countermeasure type artificial intelligence network
CN111524527A (en) Speaker separation method, device, electronic equipment and storage medium
CN110148399A (en) A kind of control method of smart machine, device, equipment and medium
CN111901627B (en) Video processing method and device, storage medium and electronic equipment
CN113223560A (en) Emotion recognition method, device, equipment and storage medium
CN109376363A (en) A kind of real-time voice interpretation method and device based on earphone
CN109065051A (en) Voice recognition processing method and device
CN106531195B (en) A kind of dialogue collision detection method and device
CN107274900A (en) Information processing method and its system for control terminal
Alghifari et al. On the use of voice activity detection in speech emotion recognition
CN109830240A (en) Method, apparatus and system based on voice operating instruction identification user's specific identity
CN106205610B (en) A kind of voice information identification method and equipment
CN111128240B (en) Voice emotion recognition method based on anti-semantic-erasure
CN112667787A (en) Intelligent response method, system and storage medium based on phonetics label
CN111462762A (en) Speaker vector regularization method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant