WO2017024835A1 - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
WO2017024835A1
WO2017024835A1 PCT/CN2016/082079 CN2016082079W WO2017024835A1 WO 2017024835 A1 WO2017024835 A1 WO 2017024835A1 CN 2016082079 W CN2016082079 W CN 2016082079W WO 2017024835 A1 WO2017024835 A1 WO 2017024835A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice information
module
voice
determining
predetermined user
Prior art date
Application number
PCT/CN2016/082079
Other languages
French (fr)
Chinese (zh)
Inventor
曾一庭
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017024835A1 publication Critical patent/WO2017024835A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques

Definitions

  • This application relates to, but is not limited to, the field of communication technology.
  • the related art speech recognition usually has a concept of confidence, that is, the recording of the user's speech is matched with the standard data of the engine preset after the large amount of data training, and the higher the confidence, the more accurate.
  • the voice application determines a confidence level for its own situation as a standard. Exceeding this standard, the recognition is correct. Below this standard, the recognition is incorrect.
  • the confidence threshold is set low, the recognition is easy, the user's command does not need to be said to be very standard, and the sound does not need to be very loud, and the recognition result can be obtained, but It is also easier to identify the surrounding noise as the user's voice, which leads to the occurrence of misidentification and reduces the recognition rate.
  • the confidence threshold is set high, the recognition is accurate, and the noise is small, but the user command needs to be said to be very standard and sound. It is loud to be able to identify success. Many times the user has clearly said that it has been clear, but still does not pass the confidence threshold, resulting in recognition failure.
  • This paper provides a speech recognition method and device to solve the problem that the speech recognition is affected by other sounds and the false recognition rate is high in the related art.
  • a speech recognition method comprising:
  • the voice information is determined to be voice information of the predetermined user.
  • the method further includes:
  • the voice information is discarded.
  • the method further includes: executing an instruction corresponding to the voice information.
  • the method further includes:
  • the extracted sound features are saved.
  • the method before the saving the extracted sound feature, the method further comprises: determining that the confidence of the sound feature exceeds a preset threshold.
  • a speech recognition device comprising:
  • the acquisition module is set to: acquire and recognize voice information
  • a determining module configured to: determine whether the voice information acquired by the acquiring module matches a sound feature of a predetermined user that is extracted in advance;
  • a determining module configured to: when the judgment result of the determining module is that the voice information is consistent with the sound feature, determine that the voice information is voice information of the predetermined user.
  • the device further includes a discarding module
  • the determining module is further configured to: determine whether the confidence of the voice information is greater than a preset threshold;
  • the determining module is further configured to: when the determination result of the determining module is that the confidence level of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
  • the discarding module is configured to discard the voice information when the judgment result of the determining module is that the confidence level of the voice information is less than or equal to the preset threshold.
  • the device further includes: an executing module, configured to: after the determining module determines that the voice information is an instruction issued by the predetermined user, execute an instruction corresponding to the voice information.
  • an executing module configured to: after the determining module determines that the voice information is an instruction issued by the predetermined user, execute an instruction corresponding to the voice information.
  • the device further includes: a repeating acquisition module, configured to: repeatedly acquire the same recording extracting station before the determining module determines whether the voice information matches the sound feature of the pre-extracted predetermined user The sound characteristics of the recording;
  • And saving the module configured to: save the sound feature extracted by the repeated acquisition module.
  • the determining module is further configured to: before the saving module saves the extracted sound feature, determine that the confidence of the sound feature is greater than the preset threshold.
  • the voice recognition method and device determines whether the voice information matches the sound feature of the predetermined user that is extracted in advance by acquiring and identifying the voice information, and when the judgment result is that the voice information and the sound feature are consistent, Determining that the voice information is the language of the predetermined user
  • the audio information the embodiment of the present invention solves the problem that the false recognition rate is high due to the influence of other sounds in the speech recognition process, and the false recognition rate is reduced.
  • FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of another voice recognition apparatus according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of still another voice recognition apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of extracting a sound feature in a voice recognition method according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of voice recognition in a voice recognition method according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention.
  • the voice recognition method provided in this embodiment includes the following steps, that is, steps. 101 to step 103:
  • Step 101 Acquire and identify voice information.
  • Step 102 Determine whether the voice information matches the voice feature of the predetermined user that is extracted in advance;
  • Step 103 When the result of the determination is that the voice information is consistent with the voice feature, determine that the voice information is voice information of the predetermined user.
  • the voice information is determined to be the voice information of the predetermined user.
  • the method provided in this embodiment solves the problem that the false recognition rate is high due to the influence of other voices in the voice recognition process. The problem is reduced the false recognition rate.
  • the method further includes: determining whether the confidence of the voice information is greater than a preset threshold; When the confidence level of the information is greater than the preset threshold, the voice information is determined to be an instruction issued by the predetermined user; and when the result of the determination is that the confidence of the voice information is less than or equal to the preset threshold, the voice information is discarded.
  • the method further includes: executing an instruction corresponding to the voice information, where the operation performed is, for example, triggering an application according to the instruction.
  • the method before determining whether the voice information matches the sound feature of the predetermined user that is extracted in advance, the method further includes: extracting the sound feature of the recording by repeatedly acquiring the same recording; saving the extraction The sound feature.
  • the embodiment before saving the extracted sound feature, the embodiment further includes: determining that the confidence of the sound feature is greater than the preset threshold.
  • FIG. 2 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention.
  • the voice recognition device 20 provided in this embodiment may include: Module 21, determination module 22 and determination module 23.
  • the obtaining module 21 is configured to: acquire and identify voice information
  • the determining module 22 is configured to: determine whether the voice information acquired by the acquiring module 21 matches the sound feature of the predetermined user that is extracted in advance;
  • the determining module 23 is configured to: when the determination result of the determining module 22 is that the voice information is consistent with the sound feature, determine that the voice information is the voice information of the predetermined user.
  • FIG. 3 is a schematic structural diagram of another voice recognition apparatus according to an embodiment of the present invention.
  • the voice recognition device 20 provided in this embodiment may further include a discarding module 24.
  • the determining module 22 is further configured to: determine whether the confidence of the voice information is greater than a preset. Threshold value
  • the determining module 23 is further configured to: when the determination result of the determining module 22 is that the confidence level of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
  • the discarding module 24 is configured to discard the voice information when the judgment result of the determining module 22 is that the confidence level of the voice information is less than or equal to the preset threshold.
  • the apparatus further includes: an executing module 25, configured to: after the determining module 23 determines that the voice information is an instruction issued by a predetermined user, execute an instruction corresponding to the voice information.
  • an executing module 25 configured to: after the determining module 23 determines that the voice information is an instruction issued by a predetermined user, execute an instruction corresponding to the voice information.
  • the device further includes: a repetition obtaining module 26, configured to: after the determining module 22 determines whether the voice information matches the sound feature of the predetermined user that is extracted in advance, extracting the sound feature of the recording by repeatedly acquiring the same recording
  • the save module 27 is configured to: save the sound feature extracted by the repeat acquisition module 26.
  • the determining module 23 in the device is further configured to: before the saving module 27 saves the extracted sound feature, determine that the confidence of the sound feature is greater than a preset threshold.
  • FIG. 4 is a schematic structural diagram of still another voice recognition apparatus according to an embodiment of the present invention.
  • the voice recognition device 30 provided in this embodiment includes the following parts, a voiceprint extraction module 31, a voiceprint feature library 32, a voiceprint discrimination module 33, a voice recognition module 34, a control module 35, and a recording management module 36, and functions thereof.
  • the acquisition module 21, the determination module 22, the determination module 23, the discarding module 24, the execution module 25, the repeat acquisition module 26, and some or all of the save modules 27 are implemented together.
  • the voiceprint extraction module 31 is configured to: the user trains the voiceprint to extract the voice features of the user.
  • the voiceprint feature library 32 is configured to: store the voice features of the user extracted by the voiceprint extraction module 31, and provide them to subsequent modules for use.
  • the voiceprint discriminating module 33 is configured to determine whether it is the voice of the current user according to the user voice data provided by the recording management module 36.
  • the voice recognition module 34 is configured to perform corresponding voice recognition according to the user voice data provided by the recording management module 36, and convert the voice into characters.
  • the control module 35 is configured to: control the entire logic.
  • the recording management module 36 is configured to: manage the system recordings, and provide them to the voiceprint discriminating module 33 and the voice recognition module 34, respectively.
  • the embodiment of the invention further provides a usage manner for improving the speech recognition rate by using the voiceprint, and the usage mode is described below through an alternative embodiment.
  • FIG. 5 is a schematic diagram of extracting a sound feature in a voice recognition method according to an embodiment of the present invention.
  • the voiceprint extraction module 31 is used to extract a user voice feature, for example, Since the user's voice characteristics need to be extracted, the user is required to repeatedly read a certain piece of text, and then the extracted sound features are saved to the voiceprint feature library 32.
  • FIG. 6 is a schematic diagram of voice recognition in a voice recognition method according to an embodiment of the present invention.
  • the user triggers speech recognition, and the system recording is separately sent to the voiceprint discrimination module 33 and the voice recognition module 34 through the recording management module 36, and the result of the voiceprint discrimination module 33 and the result of the voice recognition module 34 are provided.
  • the control module 35 is given a decision by the control module 35. The control module 35 first determines whether the result of the voiceprint discrimination conforms to the user's voice feature.
  • the control module 35 determines whether the confidence level of the voice recognition module 34 is greater than a threshold. If the result of the determination is that the confidence is less than or equal to the threshold, the description is a voice command of the user, but is not necessarily a voice command, and the control module 35 discards the result and notifies the recording management module 36 to continue recording; if both pass the verification, the correct result is returned to the subsequent application process.
  • the system When the user has pre-trained the voiceprint, the system records the voiceprint feature.
  • the recording management module 36 starts recording and distributes the corresponding recording to the speech recognition module 34 and the voiceprint discrimination module 33, and the control module 35 waits for the speech recognition module 34 and the voiceprint discrimination module 33 to respectively Give the result.
  • the control module 35 determines whether the voiceprint matching degree reaches a threshold value, for example, the threshold value is 80%, and the threshold value can be set by the user, or can be preset by the system if the control is performed. The module 35 determines that the voiceprint matching degree does not exceed the threshold, then the control discards the result returned by the voice recognition module 34, and simultaneously notifies the recording management module 36 to continue recording, waiting for the correct result.
  • a threshold value for example, the threshold value is 80%, and the threshold value can be set by the user, or can be preset by the system if the control is performed.
  • the module 35 determines that the voiceprint matching degree does not exceed the threshold, then the control discards the result returned by the voice recognition module 34, and simultaneously notifies the recording management module 36 to continue recording, waiting for the correct result.
  • control module 35 determines that the voiceprint matching degree is greater than the threshold, it continues to determine whether the voice recognition result is greater than the threshold, and if not exceeded, still discards. If passed, return this result to the backend Process or module use.
  • the user connects the wireless router to the wired network and turns on the power switch.
  • the user opens the setting program in the mobile phone, and sets the hotspot, encryption mode, password, and access mode of the Wide Area Network (WAN) port.
  • WAN Wide Area Network
  • all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • the device/function module/functional unit in the above embodiment When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the embodiment of the present invention determines whether the voice information matches the voice feature of the predetermined user that is extracted in advance by acquiring and identifying the voice information, and determining that the voice information is the voice information when the judgment result is that the voice information matches the voice feature.
  • the voice information of the user is scheduled, and the embodiment of the present invention solves the problem that the false recognition rate is high due to the influence of other voices in the voice recognition process, and the false recognition rate is reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A voice recognition method and device. The method comprises: acquiring and recognizing voice information (101); judging whether the voice information is consistent with a pre-extracted sound feature of a predetermined user (102); and when a judging result is that the voice information is consistent with the sound feature, determining the voice information to be the voice information of the predetermined user (103).

Description

语音识别方法及装置Speech recognition method and device 技术领域Technical field
本申请涉及但不限于通信技术领域。This application relates to, but is not limited to, the field of communication technology.
背景技术Background technique
随着苹果语音助手(Siri)的发布,智能语音应用进入了爆发式的发展,对于语音应用来说,语音识别的成功率是衡量语音应用的一个重要指标,相关技术中语音识别都是获取到声音的输入,然后根据输入的声音来进行相应的识别,然而相关技术的语音应用都无法区分出是用户说话的声音,还是周围的环境噪声,或者是其他人的声音,这就导致了一个问题,在安静的环境下,语音应用的识别成功率都很高,而在一些实际的使用场景中,如果周围有突发的环境噪音,或者其他人的声音都会触发语音应用开始识别,导致语音应用误触发,从而出现识别成功率大幅下降的情况。With the release of Apple Voice Assistant (Siri), intelligent voice applications have entered an explosive development. For voice applications, the success rate of voice recognition is an important indicator for measuring voice applications. In the related technologies, voice recognition is obtained. The input of the sound is then recognized according to the input sound. However, the related art speech application cannot distinguish whether the user is talking, the ambient noise, or the sound of other people, which causes a problem. In a quiet environment, the recognition success rate of voice applications is very high. In some actual usage scenarios, if there is sudden ambient noise around, or other people's voices will trigger the voice application to start recognition, resulting in voice applications. False triggering, resulting in a significant decline in the recognition success rate.
相关技术的语音识别通常有一个置信度的概念,即将用户说话的录音与引擎预置的经过大数据量培训后的标准数据进行匹配,置信度越高代表越正确。而语音应用就为自己的情况确定一个置信度作为标准,超过这个标准认为识别正确,低于这个标准认为识别不正确。The related art speech recognition usually has a concept of confidence, that is, the recording of the user's speech is matched with the standard data of the engine preset after the large amount of data training, and the higher the confidence, the more accurate. The voice application determines a confidence level for its own situation as a standard. Exceeding this standard, the recognition is correct. Below this standard, the recognition is incorrect.
由于语音应用是通过置信度的方式来判断识别成功或者失败的,置信度阈值设置低,识别容易,用户的命令不需要说的很标准,声音也不需要很响亮,就可以得到识别结果,但是也更容易将周围噪音当成用户声音进行识别,从而导致了误识别的发生,降低了识别率;置信度阈值设置高,识别精准,受噪音影响小,但是用户命令需要说的很标准,同时声音响亮,才能识别成功,很多时候用户明明已经说的很清楚,但是仍然没有通过置信度阈值,导致识别失败。Since the voice application judges the success or failure of the recognition by means of confidence, the confidence threshold is set low, the recognition is easy, the user's command does not need to be said to be very standard, and the sound does not need to be very loud, and the recognition result can be obtained, but It is also easier to identify the surrounding noise as the user's voice, which leads to the occurrence of misidentification and reduces the recognition rate. The confidence threshold is set high, the recognition is accurate, and the noise is small, but the user command needs to be said to be very standard and sound. It is loud to be able to identify success. Many times the user has clearly said that it has been clear, but still does not pass the confidence threshold, resulting in recognition failure.
通过置信度的进行语音识别的方式是没有办法区分是用户自己说出的命令,还是其他人的声音,在实际的使用场景中,比如在驾驶环境中,在其他人说话的情况下,也很容易会导致语音应用开始误识别,出现识别率降低的 情况。The way to perform speech recognition through confidence is that there is no way to distinguish between the commands spoken by the users themselves or the voices of other people. In actual usage scenarios, such as in a driving environment, in the case of other people speaking, it is also very It is easy to cause the voice application to start misidentification, and the recognition rate is reduced. Happening.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
针对相关技术中语音识别受其他声音影响而导致误识别率较高的问题,还未提出有效的解决方案。Aiming at the problem that the speech recognition is affected by other sounds in the related art and the false recognition rate is high, an effective solution has not been proposed.
本文提供了一种语音识别方法及装置,以解决相关技术中语音识别受其他声音影响导致误识别率高的问题。This paper provides a speech recognition method and device to solve the problem that the speech recognition is affected by other sounds and the false recognition rate is high in the related art.
一种语音识别方法,包括:A speech recognition method comprising:
获取并识别语音信息;Acquire and identify voice information;
判断所述语音信息与预先提取的预定用户的声音特征是否相符;Determining whether the voice information matches a sound feature of a predetermined user that is extracted in advance;
在判断结果为所述语音信息与所述声音特征相符合时,确定所述语音信息为所述预定用户的语音信息。When the result of the judgment is that the voice information is consistent with the voice feature, the voice information is determined to be voice information of the predetermined user.
可选地,在所述确定所述语音信息为所述预定用户的语音信息之后,所述方法还包括:Optionally, after the determining that the voice information is the voice information of the predetermined user, the method further includes:
判断所述语音信息的置信度是否大于预设阈值;Determining whether the confidence of the voice information is greater than a preset threshold;
在判断结果为所述语音信息的置信度大于所述预设阈值时,确定所述语音信息为所述预定用户发出的指令;When the result of the determination is that the confidence of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
在判断结果为所述语音信息的置信度小于或等于所述预设阈值时,丢弃所述语音信息。When the result of the determination is that the confidence of the voice information is less than or equal to the preset threshold, the voice information is discarded.
可选地,在所述确定所述语音信息为所述预定用户发出的指令之后,所述方法还包括:执行所述语音信息对应的指令。Optionally, after the determining that the voice information is an instruction issued by the predetermined user, the method further includes: executing an instruction corresponding to the voice information.
可选地,在所述判断所述语音信息与预先提取的预定用户的声音特征是否相符之前,所述方法还包括:Optionally, before the determining whether the voice information matches the voice feature of the predetermined user that is extracted in advance, the method further includes:
通过重复获取相同的录音提取所述录音的声音特征;Extracting the sound characteristics of the recording by repeatedly acquiring the same recording;
保存提取的所述声音特征。 The extracted sound features are saved.
可选地,在所述保存提取的所述声音特征之前,所述方法还包括:确定所述声音特征的置信度超过预设阈值。Optionally, before the saving the extracted sound feature, the method further comprises: determining that the confidence of the sound feature exceeds a preset threshold.
一种语音识别装置,包括:A speech recognition device comprising:
获取模块,设置为:获取并识别语音信息;The acquisition module is set to: acquire and recognize voice information;
判断模块,设置为:判断所述获取模块获取的所述语音信息与预先提取的预定用户的声音特征是否相符;a determining module, configured to: determine whether the voice information acquired by the acquiring module matches a sound feature of a predetermined user that is extracted in advance;
确定模块,设置为:在所述判断模块的判断结果为所述语音信息与所述声音特征相符合时,确定所述语音信息为所述预定用户的语音信息。And a determining module, configured to: when the judgment result of the determining module is that the voice information is consistent with the sound feature, determine that the voice information is voice information of the predetermined user.
可选地,所述装置还包括丢弃模块;Optionally, the device further includes a discarding module;
其中,所述判断模块,还设置为:判断所述语音信息的置信度是否大于预设阈值;The determining module is further configured to: determine whether the confidence of the voice information is greater than a preset threshold;
所述确定模块,还设置为:在所述判断模块的判断结果为所述语音信息的置信度大于所述预设阈值时,确定所述语音信息为所述预定用户发出的指令;The determining module is further configured to: when the determination result of the determining module is that the confidence level of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
所述丢弃模块,设置为:在所述判断模块的判断结果为所述语音信息的置信度小于或等于所述预设阈值时,丢弃所述语音信息。The discarding module is configured to discard the voice information when the judgment result of the determining module is that the confidence level of the voice information is less than or equal to the preset threshold.
可选地,所述装置还包括:执行模块,设置为:在所述确定模块确定所述语音信息为所述预定用户发出的指令之后,执行所述语音信息对应的指令。Optionally, the device further includes: an executing module, configured to: after the determining module determines that the voice information is an instruction issued by the predetermined user, execute an instruction corresponding to the voice information.
可选地,所述装置还包括:重复获取模块,设置为:在所述判断模块判断所述语音信息与所述预先提取的预定用户的声音特征是否相符之前,通过重复获取相同的录音提取所述录音的声音特征;Optionally, the device further includes: a repeating acquisition module, configured to: repeatedly acquire the same recording extracting station before the determining module determines whether the voice information matches the sound feature of the pre-extracted predetermined user The sound characteristics of the recording;
保存模块,设置为:保存所述重复获取模块提取的所述声音特征。And saving the module, configured to: save the sound feature extracted by the repeated acquisition module.
可选地,所述确定模块,还设置为:在所述保存模块保存提取的所述声音特征之前,确定所述声音特征的置信度大于所述预设阈值。Optionally, the determining module is further configured to: before the saving module saves the extracted sound feature, determine that the confidence of the sound feature is greater than the preset threshold.
本发明实施例提供的语音识别方法及装置,通过获取并识别语音信息,判断所述语音信息与预先提取的预定用户的声音特征是否相符,在判断结果为该语音信息与声音特征相符合时,确定所述语音信息为所述预定用户的语 音信息,本发明实施例解决了相关技术中由于语音识别过程中可能受其他声音的影响,而导致误识别率较高的问题,降低了误识别率。The voice recognition method and device provided by the embodiment of the present invention determines whether the voice information matches the sound feature of the predetermined user that is extracted in advance by acquiring and identifying the voice information, and when the judgment result is that the voice information and the sound feature are consistent, Determining that the voice information is the language of the predetermined user The audio information, the embodiment of the present invention solves the problem that the false recognition rate is high due to the influence of other sounds in the speech recognition process, and the false recognition rate is reduced.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
图1为本发明实施例提供的一种语音识别方法的流程图;FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention;
图2为本发明实施例提供的一种语音识别装置的结构示意图;2 is a schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention;
图3为本发明实施例提供的另一种语音识别装置的结构示意图;FIG. 3 is a schematic structural diagram of another voice recognition apparatus according to an embodiment of the present disclosure;
图4为本发明实施例提供的又一种语音识别装置的结构示意图;4 is a schematic structural diagram of still another voice recognition apparatus according to an embodiment of the present invention;
图5为本发明实施例提供的语音识别方法中一种提取声音特征的示意图;FIG. 5 is a schematic diagram of extracting a sound feature in a voice recognition method according to an embodiment of the present invention; FIG.
图6为本发明实施例提供的语音识别方法中一种语音识别的示意图。FIG. 6 is a schematic diagram of voice recognition in a voice recognition method according to an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
下文中将结合附图对本发明的实施方式进行详细说明。需要说明的是,在不冲突的情况下,本文中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments herein may be arbitrarily combined with each other.
在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The steps illustrated in the flowchart of the figures may be executed in a computer system such as a set of computer executable instructions. Also, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
本发明实施例提供了一种语音识别方法,图1为本发明实施例提供的一种语音识别方法的流程图,如图1所示,本实施例提供的语音识别方法包括以下步骤,即步骤101~步骤103:The embodiment of the present invention provides a voice recognition method. FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention. As shown in FIG. 1 , the voice recognition method provided in this embodiment includes the following steps, that is, steps. 101 to step 103:
步骤101,获取并识别语音信息;Step 101: Acquire and identify voice information.
步骤102,判断该语音信息与预先提取的预定用户的声音特征是否相符;Step 102: Determine whether the voice information matches the voice feature of the predetermined user that is extracted in advance;
步骤103,在判断结果为该语音信息与声音特征相符合时,确定该语音信息为该预定用户的语音信息。Step 103: When the result of the determination is that the voice information is consistent with the voice feature, determine that the voice information is voice information of the predetermined user.
通过上述图1所示流程的步骤,通过获取并识别语音信息,判断该语音信息与预先提取的预定用户的声音特征是否相符,在判断结果为该语音信息与 声音特征相符合时,确定该语音信息为该预定用户的语音信息,本实施例的提供的方法解决了相关技术中由于语音识别过程中可能受其他声音的影响,而导致误识别率较高的问题,降低了误识别率。Through the steps of the flow shown in FIG. 1 above, by acquiring and identifying the voice information, it is determined whether the voice information matches the voice feature of the predetermined user that is extracted in advance, and the result of the determination is the voice information and When the voice features are consistent, the voice information is determined to be the voice information of the predetermined user. The method provided in this embodiment solves the problem that the false recognition rate is high due to the influence of other voices in the voice recognition process. The problem is reduced the false recognition rate.
可选地,在本发明的另一实施例中,在确定该语音信息为该预定用户的语音信息之后,还包括:判断该语音信息的置信度是否大于预设阈值;在判断结果为该语音信息的置信度大于该预设阈值时,确定该语音信息为该预定用户发出的指令;在判断结果为该语音信息的置信度小于或等于该预设阈值时,丢弃该语音信息。Optionally, in another embodiment of the present invention, after determining that the voice information is the voice information of the predetermined user, the method further includes: determining whether the confidence of the voice information is greater than a preset threshold; When the confidence level of the information is greater than the preset threshold, the voice information is determined to be an instruction issued by the predetermined user; and when the result of the determination is that the confidence of the voice information is less than or equal to the preset threshold, the voice information is discarded.
可选地,在本发明的又一实施例中,在确定该语音信息为该预定用户发出的指令之后,还包括:执行该语音信息对应的指令,执行的操作例如为根据指令触发某应用。Optionally, in another embodiment of the present invention, after determining that the voice information is an instruction issued by the predetermined user, the method further includes: executing an instruction corresponding to the voice information, where the operation performed is, for example, triggering an application according to the instruction.
可选地,在本发明的再一实施例中,在判断该语音信息与预先提取的预定用户的声音特征是否相符之前,还包括:通过重复获取相同的录音提取该录音的声音特征;保存提取的该声音特征。Optionally, in another embodiment of the present invention, before determining whether the voice information matches the sound feature of the predetermined user that is extracted in advance, the method further includes: extracting the sound feature of the recording by repeatedly acquiring the same recording; saving the extraction The sound feature.
可选地,本实施例在保存提取的该声音特征之前,还包括:确定该声音特征的置信度大于所述预设阈值。Optionally, before saving the extracted sound feature, the embodiment further includes: determining that the confidence of the sound feature is greater than the preset threshold.
本发明实施例还提供了一种语音识别装置,图2为本发明实施例提供的一种语音识别装置的结构示意图,如图2所示,本实施例提供的语音识别装置20可以包括:获取模块21、判断模块22和确定模块23。The embodiment of the present invention further provides a voice recognition device. FIG. 2 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention. As shown in FIG. 2, the voice recognition device 20 provided in this embodiment may include: Module 21, determination module 22 and determination module 23.
其中,获取模块21,设置为:获取并识别语音信息;The obtaining module 21 is configured to: acquire and identify voice information;
判断模块22,设置为:判断获取模块21获取的该语音信息与预先提取的预定用户的声音特征是否相符;The determining module 22 is configured to: determine whether the voice information acquired by the acquiring module 21 matches the sound feature of the predetermined user that is extracted in advance;
确定模块23,设置为:在判断模块22的判断结果为该语音信息与该声音特征相符合时,确定该语音信息为该预定用户的语音信息。The determining module 23 is configured to: when the determination result of the determining module 22 is that the voice information is consistent with the sound feature, determine that the voice information is the voice information of the predetermined user.
可选地,图3为本发明实施例提供的另一种语音识别装置的结构示意图。在上述图2所示装置的基础上,本实施例提供的语音识别装置20还可以包括丢弃模块24。Optionally, FIG. 3 is a schematic structural diagram of another voice recognition apparatus according to an embodiment of the present invention. Based on the device shown in FIG. 2 above, the voice recognition device 20 provided in this embodiment may further include a discarding module 24.
其中,判断模块22,还设置为:判断该语音信息的置信度是否大于预设 阈值;The determining module 22 is further configured to: determine whether the confidence of the voice information is greater than a preset. Threshold value
确定模块23,还设置为:在判断模块22的判断结果为该语音信息的置信度大于该预设阈值时,确定该语音信息为该预定用户发出的指令;The determining module 23 is further configured to: when the determination result of the determining module 22 is that the confidence level of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
丢弃模块24,设置为:在判断模块22的判断结果为该语音信息的置信度小于或等于该预设阈值时,丢弃该语音信息。The discarding module 24 is configured to discard the voice information when the judgment result of the determining module 22 is that the confidence level of the voice information is less than or equal to the preset threshold.
可选地,该装置还包括:执行模块25,设置为:在确定模块23确定该语音信息为预定用户发出的指令之后,执行该语音信息对应的指令。Optionally, the apparatus further includes: an executing module 25, configured to: after the determining module 23 determines that the voice information is an instruction issued by a predetermined user, execute an instruction corresponding to the voice information.
可选地,该装置还包括:重复获取模块26,设置为:在判断模块22判断该语音信息与预先提取的预定用户的声音特征是否相符之前,通过重复获取相同的录音提取该录音的声音特征;保存模块27,设置为:保存重复获取模块26提取的该声音特征。Optionally, the device further includes: a repetition obtaining module 26, configured to: after the determining module 22 determines whether the voice information matches the sound feature of the predetermined user that is extracted in advance, extracting the sound feature of the recording by repeatedly acquiring the same recording The save module 27 is configured to: save the sound feature extracted by the repeat acquisition module 26.
可选地,该装置中的确定模块23,还设置为:在保存模块27保存提取的声音特征之前,确定该声音特征的置信度大于预设阈值。Optionally, the determining module 23 in the device is further configured to: before the saving module 27 saves the extracted sound feature, determine that the confidence of the sound feature is greater than a preset threshold.
针对相关技术中存在的上述问题,下面结合具体的可选实施例进行说明,下述可选实施例结合了上述可选实施例及其可选实施方式。For the above problems existing in the related art, the following description will be made in conjunction with specific alternative embodiments, which are combined with the above-described alternative embodiments and alternative embodiments thereof.
可选地,图4为本发明实施例提供的又一种语音识别装置的结构示意图。本实施例提供的语音识别装置30包含有以下几个部分,声纹提取模块31,声纹特征库32,声纹辨别模块33,语音识别模块34,控制模块35和录音管理模块36,其功能由上述的获取模块21,判断模块22,确定模块23,丢弃模块24,执行模块25,重复获取模块26,以及保存模块27的部分或全部一起实现。Optionally, FIG. 4 is a schematic structural diagram of still another voice recognition apparatus according to an embodiment of the present invention. The voice recognition device 30 provided in this embodiment includes the following parts, a voiceprint extraction module 31, a voiceprint feature library 32, a voiceprint discrimination module 33, a voice recognition module 34, a control module 35, and a recording management module 36, and functions thereof. The acquisition module 21, the determination module 22, the determination module 23, the discarding module 24, the execution module 25, the repeat acquisition module 26, and some or all of the save modules 27 are implemented together.
声纹提取模块31,设置为:用户培训声纹,提取用户的声音特征。The voiceprint extraction module 31 is configured to: the user trains the voiceprint to extract the voice features of the user.
声纹特征库32,设置为:存储声纹提取模块31提取的用户的声音特征,提供给后续模块使用。The voiceprint feature library 32 is configured to: store the voice features of the user extracted by the voiceprint extraction module 31, and provide them to subsequent modules for use.
声纹辨别模块33,设置为:根据录音管理模块36提供的用户声音数据,判断是否为当前用户的声音。The voiceprint discriminating module 33 is configured to determine whether it is the voice of the current user according to the user voice data provided by the recording management module 36.
语音识别模块34,设置为:根据录音管理模块36提供的用户声音数据,进行相应的语音识别,将声音转换为文字。The voice recognition module 34 is configured to perform corresponding voice recognition according to the user voice data provided by the recording management module 36, and convert the voice into characters.
控制模块35,设置为:控制整个逻辑。 The control module 35 is configured to: control the entire logic.
录音管理模块36,设置为:管理系统录音,分别提供给声纹辨别模块33与语音识别模块34。The recording management module 36 is configured to: manage the system recordings, and provide them to the voiceprint discriminating module 33 and the voice recognition module 34, respectively.
本发明实施例还提供了一种利用声纹提高语音识别率的使用方式,以下通过可选实施例说明该使用方式。The embodiment of the invention further provides a usage manner for improving the speech recognition rate by using the voiceprint, and the usage mode is described below through an alternative embodiment.
图5为本发明实施例提供的语音识别方法中一种提取声音特征的示意图,如图5所示,用户在首次使用本系统时,使用声纹提取模块31进行用户声音特征的提取,比如,由于需要提取用户的声音特征,因此要求用户重复朗读某一段文字,随后将提取好的声音特征,保存到声纹特征库32。FIG. 5 is a schematic diagram of extracting a sound feature in a voice recognition method according to an embodiment of the present invention. As shown in FIG. 5, when the user uses the system for the first time, the voiceprint extraction module 31 is used to extract a user voice feature, for example, Since the user's voice characteristics need to be extracted, the user is required to repeatedly read a certain piece of text, and then the extracted sound features are saved to the voiceprint feature library 32.
图6为本发明实施例提供的语音识别方法中一种语音识别的示意图。如图6所示,用户触发语音识别,通过录音管理模块36,将系统录音分别送入到声纹辨别模块33和语音识别模块34,声纹辨别模块33的结果和语音识别模块34的结果提供给控制模块35,由控制模块35进行判定。该控制模块35首先判断声纹辨别的结果是否符合用户声音特征,如果不符合,说明系统录音是噪音或者是周围人声,将语音识别结果抛弃,同时通知录音管理模块36继续录音;如果声纹判断通过,控制模块35在判断语音识别模块34的置信度是否大于阀值,如果判断结果为置信度小于或等于阀值,说明虽然是用户的声音,但是不一定是说的语音命令,控制模块35抛弃掉结果,通知录音管理模块36继续录音;如果都通过验证,那么将正确的结果返回给后续应用流程使用。FIG. 6 is a schematic diagram of voice recognition in a voice recognition method according to an embodiment of the present invention. As shown in FIG. 6, the user triggers speech recognition, and the system recording is separately sent to the voiceprint discrimination module 33 and the voice recognition module 34 through the recording management module 36, and the result of the voiceprint discrimination module 33 and the result of the voice recognition module 34 are provided. The control module 35 is given a decision by the control module 35. The control module 35 first determines whether the result of the voiceprint discrimination conforms to the user's voice feature. If not, the system recording is noise or surrounding voice, discarding the voice recognition result, and notifying the recording management module 36 to continue recording; if the voiceprint If the determination is passed, the control module 35 determines whether the confidence level of the voice recognition module 34 is greater than a threshold. If the result of the determination is that the confidence is less than or equal to the threshold, the description is a voice command of the user, but is not necessarily a voice command, and the control module 35 discards the result and notifies the recording management module 36 to continue recording; if both pass the verification, the correct result is returned to the subsequent application process.
当用户已经预先培训好了声纹,系统记录下声纹特征。当应用被用户声音或者噪音触发时候,录音管理模块36开始录音,并把相应的录音分发给语音识别模块34和声纹辨别模块33,控制模块35等待语音识别模块34和声纹辨别模块33分别给出结果。When the user has pre-trained the voiceprint, the system records the voiceprint feature. When the application is triggered by the user's voice or noise, the recording management module 36 starts recording and distributes the corresponding recording to the speech recognition module 34 and the voiceprint discrimination module 33, and the control module 35 waits for the speech recognition module 34 and the voiceprint discrimination module 33 to respectively Give the result.
当控制模块35,收到声纹辨别模块33返回的结果,控制模块35判断是否声纹匹配度达到阀值,比如阈值为80%,这个阀值可以用户设置,也可以系统预置,如果控制模块35判断声纹匹配度没有超过阀值,那么控制抛弃掉语音识别模块34返回的结果,同时通知录音管理模块36继续录音,等待正确的结果。When the control module 35 receives the result returned by the voiceprint discrimination module 33, the control module 35 determines whether the voiceprint matching degree reaches a threshold value, for example, the threshold value is 80%, and the threshold value can be set by the user, or can be preset by the system if the control is performed. The module 35 determines that the voiceprint matching degree does not exceed the threshold, then the control discards the result returned by the voice recognition module 34, and simultaneously notifies the recording management module 36 to continue recording, waiting for the correct result.
当控制模块35判断声纹匹配度大于阀值,继续判断语音识别结果是否大于阀值,如果未超过,仍然抛弃。如果通过,那么将此结果返回给后端其他 流程或者模块使用。When the control module 35 determines that the voiceprint matching degree is greater than the threshold, it continues to determine whether the voice recognition result is greater than the threshold, and if not exceeded, still discards. If passed, return this result to the backend Process or module use.
用户将无线路由装置连接到有线网络中,打开电源开关。使用手机搜索无线路由装置的蓝牙,进行配对。配对完成后,用户打开手机中的设置程序,设置路由器的热点、加密方式、密码以及广域网(Wide Area Network,简称为:WAN)口的接入方式,设置成功后路由器就生效了。此设备一般用于商旅用户,经常更换旅馆,需要能够随身携带且设置方便的无线路由设备。The user connects the wireless router to the wired network and turns on the power switch. Use your mobile phone to search the Bluetooth of your wireless router for pairing. After the pairing is completed, the user opens the setting program in the mobile phone, and sets the hotspot, encryption mode, password, and access mode of the Wide Area Network (WAN) port. After the setting is successful, the router takes effect. This device is generally used for business travel users, often changing hotels, and needs a wireless routing device that can be carried and conveniently set up.
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.
上述实施例中的装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
上述实施例中的装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
工业实用性Industrial applicability
本发明实施例通过获取并识别语音信息,判断所述语音信息与预先提取的预定用户的声音特征是否相符,在判断结果为该语音信息与声音特征相符合时,确定所述语音信息为所述预定用户的语音信息,本发明实施例解决了相关技术中由于语音识别过程中可能受其他声音的影响,而导致误识别率较高的问题,降低了误识别率。 The embodiment of the present invention determines whether the voice information matches the voice feature of the predetermined user that is extracted in advance by acquiring and identifying the voice information, and determining that the voice information is the voice information when the judgment result is that the voice information matches the voice feature. The voice information of the user is scheduled, and the embodiment of the present invention solves the problem that the false recognition rate is high due to the influence of other voices in the voice recognition process, and the false recognition rate is reduced.

Claims (10)

  1. 一种语音识别方法,包括:A speech recognition method comprising:
    获取并识别语音信息;Acquire and identify voice information;
    判断所述语音信息与预先提取的预定用户的声音特征是否相符;Determining whether the voice information matches a sound feature of a predetermined user that is extracted in advance;
    在判断结果为所述语音信息与所述声音特征相符合时,确定所述语音信息为所述预定用户的语音信息。When the result of the judgment is that the voice information is consistent with the voice feature, the voice information is determined to be voice information of the predetermined user.
  2. 根据权利要求1所述的方法,其中,在所述确定所述语音信息为所述预定用户的语音信息之后,所述方法还包括:The method of claim 1, wherein after the determining that the voice information is voice information of the predetermined user, the method further comprises:
    判断所述语音信息的置信度是否大于预设阈值;Determining whether the confidence of the voice information is greater than a preset threshold;
    在判断结果为所述语音信息的置信度大于所述预设阈值时,确定所述语音信息为所述预定用户发出的指令;When the result of the determination is that the confidence of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
    在判断结果为所述语音信息的置信度小于或等于所述预设阈值时,丢弃所述语音信息。When the result of the determination is that the confidence of the voice information is less than or equal to the preset threshold, the voice information is discarded.
  3. 根据权利要求2所述的方法,其中,在所述确定所述语音信息为所述预定用户发出的指令之后,所述方法还包括:The method of claim 2, wherein after the determining that the voice information is an instruction issued by the predetermined user, the method further comprises:
    执行所述语音信息对应的指令。Executing an instruction corresponding to the voice information.
  4. 根据权利要求1所述的方法,其中,在所述判断所述语音信息与预先提取的预定用户的声音特征是否相符之前,所述方法还包括:The method according to claim 1, wherein before the determining whether the voice information matches a sound feature of a predetermined user that is extracted in advance, the method further comprises:
    通过重复获取相同的录音提取所述录音的声音特征;Extracting the sound characteristics of the recording by repeatedly acquiring the same recording;
    保存提取的所述声音特征。The extracted sound features are saved.
  5. 根据权利要求4所述的方法,其中,在所述保存提取的所述声音特征之前,所述方法还包括:The method of claim 4, wherein before the saving the extracted sound features, the method further comprises:
    确定所述声音特征的置信度大于所述预设阈值。Determining that the confidence of the sound feature is greater than the predetermined threshold.
  6. 一种语音识别装置,包括:A speech recognition device comprising:
    获取模块,设置为:获取并识别语音信息;The acquisition module is set to: acquire and recognize voice information;
    判断模块,设置为:判断所述获取模块获取的所述语音信息与预先提取 的预定用户的声音特征是否相符;a determining module, configured to: determine the voice information acquired by the acquiring module and pre-fetch Whether the predetermined user's voice characteristics match;
    确定模块,设置为:在所述判断模块的判断结果为所述语音信息与所述声音特征相符合时,确定所述语音信息为所述预定用户的语音信息。And a determining module, configured to: when the judgment result of the determining module is that the voice information is consistent with the sound feature, determine that the voice information is voice information of the predetermined user.
  7. 根据权利要求6所述的装置,还包括丢弃模块;The apparatus of claim 6 further comprising a discarding module;
    其中,所述判断模块,还设置为:判断所述语音信息的置信度是否大于预设阈值;The determining module is further configured to: determine whether the confidence of the voice information is greater than a preset threshold;
    所述确定模块,还设置为:在所述判断模块的判断结果为所述语音信息的置信度大于所述预设阈值时,确定所述语音信息为所述预定用户发出的指令;The determining module is further configured to: when the determination result of the determining module is that the confidence level of the voice information is greater than the preset threshold, determining that the voice information is an instruction issued by the predetermined user;
    所述丢弃模块,设置为:在所述判断模块的判断结果为所述语音信息的置信度小于或等于所述预设阈值时,丢弃所述语音信息。The discarding module is configured to discard the voice information when the judgment result of the determining module is that the confidence level of the voice information is less than or equal to the preset threshold.
  8. 根据权利要求7所述的装置,还包括:The apparatus of claim 7 further comprising:
    执行模块,设置为:在所述确定模块确定所述语音信息为所述预定用户发出的指令之后,执行所述语音信息对应的指令。And an execution module, configured to: after the determining module determines that the voice information is an instruction issued by the predetermined user, execute an instruction corresponding to the voice information.
  9. 根据权利要求6所述的装置,还包括:The apparatus of claim 6 further comprising:
    重复获取模块,设置为:在所述判断模块判断所述语音信息与所述预先提取的预定用户的声音特征是否相符之前,通过重复获取相同的录音提取所述录音的声音特征;And repeating the obtaining module, configured to: before the determining module determines whether the voice information matches the sound feature of the pre-extracted predetermined user, extracting the sound feature of the recorded sound by repeatedly acquiring the same sound recording;
    保存模块,设置为:保存所述重复获取模块提取的所述声音特征。And saving the module, configured to: save the sound feature extracted by the repeated acquisition module.
  10. 根据权利要求9所述的装置,其中,The apparatus according to claim 9, wherein
    所述确定模块,还设置为:在所述保存模块保存提取的所述声音特征之前,确定所述声音特征的置信度大于所述预设阈值。 The determining module is further configured to: before the saving module saves the extracted sound feature, determine that the confidence of the sound feature is greater than the preset threshold.
PCT/CN2016/082079 2015-08-13 2016-05-13 Voice recognition method and device WO2017024835A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510496816.2A CN106469553A (en) 2015-08-13 2015-08-13 Audio recognition method and device
CN201510496816.2 2015-08-13

Publications (1)

Publication Number Publication Date
WO2017024835A1 true WO2017024835A1 (en) 2017-02-16

Family

ID=57984626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/082079 WO2017024835A1 (en) 2015-08-13 2016-05-13 Voice recognition method and device

Country Status (2)

Country Link
CN (1) CN106469553A (en)
WO (1) WO2017024835A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107742516B (en) * 2017-09-29 2020-11-17 上海望潮数据科技有限公司 Intelligent recognition method, robot and computer readable storage medium
CN108231082A (en) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 A kind of update method and device of self study Application on Voiceprint Recognition
CN108259801A (en) * 2018-01-19 2018-07-06 广州视源电子科技股份有限公司 Audio, video data display methods, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000080828A (en) * 1998-09-07 2000-03-21 Denso Corp Vehicle control device
CN101441869A (en) * 2007-11-21 2009-05-27 联想(北京)有限公司 Method and terminal for speech recognition of terminal user identification
CN103811003A (en) * 2012-11-13 2014-05-21 联想(北京)有限公司 Voice recognition method and electronic equipment
CN103943110A (en) * 2013-01-21 2014-07-23 联想(北京)有限公司 Control method, device and electronic equipment
CN104078045A (en) * 2013-03-26 2014-10-01 联想(北京)有限公司 Identifying method and electronic device
CN104092932A (en) * 2013-12-03 2014-10-08 腾讯科技(深圳)有限公司 Acoustic control shooting method and device
US20140358535A1 (en) * 2013-05-28 2014-12-04 Samsung Electronics Co., Ltd. Method of executing voice recognition of electronic device and electronic device using the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680505A (en) * 2013-09-03 2014-03-26 安徽科大讯飞信息科技股份有限公司 Voice recognition method and voice recognition system
CN104468020B (en) * 2013-09-13 2018-03-23 成都鼎桥通信技术有限公司 Processing method, sending ending equipment and the receiving device of voice mistake

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000080828A (en) * 1998-09-07 2000-03-21 Denso Corp Vehicle control device
CN101441869A (en) * 2007-11-21 2009-05-27 联想(北京)有限公司 Method and terminal for speech recognition of terminal user identification
CN103811003A (en) * 2012-11-13 2014-05-21 联想(北京)有限公司 Voice recognition method and electronic equipment
CN103943110A (en) * 2013-01-21 2014-07-23 联想(北京)有限公司 Control method, device and electronic equipment
CN104078045A (en) * 2013-03-26 2014-10-01 联想(北京)有限公司 Identifying method and electronic device
US20140358535A1 (en) * 2013-05-28 2014-12-04 Samsung Electronics Co., Ltd. Method of executing voice recognition of electronic device and electronic device using the same
CN104092932A (en) * 2013-12-03 2014-10-08 腾讯科技(深圳)有限公司 Acoustic control shooting method and device

Also Published As

Publication number Publication date
CN106469553A (en) 2017-03-01

Similar Documents

Publication Publication Date Title
JP7384877B2 (en) Speaker matching using collocation information
US20220093108A1 (en) Speaker identification
US11042616B2 (en) Detection of replay attack
US10388279B2 (en) Voice interaction apparatus and voice interaction method
WO2017197953A1 (en) Voiceprint-based identity recognition method and device
US9330667B2 (en) Method and system for endpoint automatic detection of audio record
CN106463112B (en) Voice recognition method, voice awakening device, voice recognition device and terminal
US20170221488A1 (en) Method and apparatus for generating voiceprint information
JP2020112778A (en) Wake-up method, device, facility and storage medium for voice interaction facility
KR102081495B1 (en) How to add accounts, terminals, servers, and computer storage media
US20160077574A1 (en) Methods and Apparatus for Unsupervised Wakeup with Time-Correlated Acoustic Events
WO2019127897A1 (en) Updating method and device for self-learning voiceprint recognition
US9799325B1 (en) Methods and systems for identifying keywords in speech signal
CN109272991B (en) Voice interaction method, device, equipment and computer-readable storage medium
US20160077792A1 (en) Methods and apparatus for unsupervised wakeup
WO2015103836A1 (en) Voice control method and device
US9251808B2 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN104462912B (en) Improved biometric password security
US11200903B2 (en) Systems and methods for speaker verification using summarized extracted features
CN112102850A (en) Processing method, device and medium for emotion recognition and electronic equipment
WO2017024835A1 (en) Voice recognition method and device
JP5849761B2 (en) Speech recognition system, speech recognition method, and speech recognition program
CN107742516B (en) Intelligent recognition method, robot and computer readable storage medium
CN109065026A (en) A kind of recording control method and device
EP3195314B1 (en) Methods and apparatus for unsupervised wakeup

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16834460

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16834460

Country of ref document: EP

Kind code of ref document: A1