WO2021000068A1 - Speech recognition method and apparatus used by non-native speaker - Google Patents

Speech recognition method and apparatus used by non-native speaker Download PDF

Info

Publication number
WO2021000068A1
WO2021000068A1 PCT/CN2019/093947 CN2019093947W WO2021000068A1 WO 2021000068 A1 WO2021000068 A1 WO 2021000068A1 CN 2019093947 W CN2019093947 W CN 2019093947W WO 2021000068 A1 WO2021000068 A1 WO 2021000068A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
accent
module
decoded
standard
Prior art date
Application number
PCT/CN2019/093947
Other languages
French (fr)
Chinese (zh)
Inventor
郑小龙
Original Assignee
播闪机械人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 播闪机械人有限公司 filed Critical 播闪机械人有限公司
Priority to PCT/CN2019/093947 priority Critical patent/WO2021000068A1/en
Publication of WO2021000068A1 publication Critical patent/WO2021000068A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention is a voice recognition method and device, and particularly relates to a voice recognition method and device used by non-native speakers.
  • voice recognition technology is now widely used in many different fields such as home control devices, smart speakers, personal assistants and telephone interfaces.
  • Commonly used voice recognition solutions and products include Alexa from Amazon, Google Assistant and Apple Siri.
  • companies such as IBM, Apple, Amazon, and Google provide developers with powerful speech recognition APIs for the development of further applications.
  • a powerful speech recognition engine requires a lot of resources to develop, and it takes a long time to debug the engine to provide high-quality speech-to-text functions. It requires a wide range of engineering applications in terms of processing audio signals related to noise, echo and reverberation. Therefore, the cost of developing a high-quality speech recognition module is high.
  • Non-native speakers in different regions will have their own accent based on their native language.
  • To collect all non-native speakers’ pronunciation samples it will cost too much, and many new pronunciation modules have to be developed to satisfy non-native speakers in many different regions.
  • the pronunciation This requires a low-cost, efficient method to adapt to different accent speech recognition engines to decode different accents.
  • LSTM Long-term short-term memory
  • recurrent neural networks have been used in many deep learning fields, such as translation, natural language processing, and speech recognition; when used When LSTM performs translation, it usually uses letter-based sequence-to-sequence translation or word-based sequence-to-sequence translation.
  • letter-based methods nor word-based methods are effective because they are not completely related to pronunciation.
  • the present invention proposes a method of using the existing speech recognition module to obtain the decoding result of the accent of non-native speakers, but the decoding result may be incorrect.
  • the accent module comes from the decoding result of the non-local accent group, and the accent module is used Translate the decoded output of the speech recognition module.
  • the present invention provides the following technical methods:
  • a speech recognition method used by non-native speakers includes the following steps:
  • Step S10 the speech recognition module converts the received language of the non-native speaker into a decoded language, and transmits the decoded language to the language matching module and the non-native voice translation module respectively;
  • Step S11 the language matching module retrieves the standard language corresponding to the decoded language stored in the language matching module according to the received decoded language, and the standard language is decoded in pairs with the decoded language received from the speech recognition module Language and standard language, and send the pair of decoded language and standard language to the accent analyzer;
  • Step S12 The accent analyzer compares the received pair of decoded language and standard language with the accent category stored in the accent analyzer to find out the accent category corresponding to the pair of decoded language and standard language, and compares the accent The category is sent to the accent module database;
  • Step S13 According to the accent category, the accent module database retrieves the accent module corresponding to the accent category from the accent module database, and transmits the accent module to the non-native language speech translation module;
  • step S14 the non-native language speech translation module translates the decoded language from the speech recognition module into a standard sentence and outputs it with an accent module.
  • the decoding language in the step S10 is decoded text or decoded phoneme.
  • the accent module in the step S13 is obtained by receiving the decoded language of the non-native speaker and the manually added decoded language.
  • the accent module includes a plurality of LSTM layers and a density layer connected in sequence, and the last LSTM layer is connected to the density layer.
  • a speech recognition device used by non-native speakers including steps:
  • the voice recognition module is used to convert the language of the received non-native speaker into a decoded language, and transmit the decoded language to the language matching module and the non-native voice translation module respectively;
  • the language matching module is configured to retrieve the standard language corresponding to the decoded language stored in the language matching module according to the received decoded language, and the standard language is the same as the decoded language received from the speech recognition module Pair the decoded language and standard language, and send the pair of decoded language and standard language to the accent analyzer;
  • the accent analyzer is used to compare the received pair of decoded language and standard language with the accent category stored in the accent analyzer to find out the accent category corresponding to the pair of decoded language and standard language, and compare The accent category is sent to the accent module database;
  • the accent module database is used to retrieve the accent module corresponding to the accent category from the accent module database according to the accent category, and transmit the accent module to the non-native language speech translation module;
  • the non-native language speech translation module is used to translate the decoded language from the speech recognition module into a standard sentence and output it with an accent module.
  • the present invention has the following beneficial effects: in the speech recognition method used by non-native speakers, the received pair of decoded languages and the pair of decoded languages stored in the standard language and accent analyzer are combined through the voice analyzer. Compare and analyze the accent category of the decoded text with the standard language. This pair of decoded languages and standard languages can more accurately convert the language of non-native speakers into the language of the corresponding standard native speakers.
  • the accent module is collected and added manually. The language can save a lot of time and solve the time cost of collecting a large number of languages of non-native speakers.
  • Figure 1 is a flowchart of a voice recognition method used by non-native speakers of the present invention
  • FIG. 2 is a flowchart of accent module collection of a speech recognition method used by non-native speakers of the present invention
  • FIG. 3 is a flowchart of a non-native language speech translation module of a speech recognition method used by non-native speakers of the present invention.
  • Step S10 the speech recognition module 101 converts the language of the received non-native speaker into a decoded language, and transmits the decoded language to the language matching module 111 and the non-native voice translation module 104 respectively; the decoded language may be decoded text or decoded phoneme
  • the speech recognition module 101 can also handle all noise, echo, and reverberation problems in speech signal processing.
  • step S11 the language matching module 111 retrieves the standard language corresponding to the decoded language stored in the language matching module 111 according to the received decoded language, and the standard language corresponds to the decoded language received from the speech recognition module 101.
  • a pair of decoded language and standard speech is sent to the accent analyzer 102; the language matching module 111 stores a large number of standard languages.
  • Step S12 The accent analyzer 102 compares the received pair of decoded language and standard language with the accent category stored in the accent analyzer 102 to find out the accent category corresponding to the pair of decoded language and standard language
  • the accent category is sent to the accent module database 103;
  • the accent category 303 stored in the analyzer 102 contains many pairs of decoded language 313 and standard language 323, which also come from the speech recognition module 101 and the language matching module, respectively 111.
  • the analyzer 102 collects and stores a large number of accent categories.
  • Step S13 According to the accent category, the accent module database 103 retrieves the accent module 416 corresponding to the accent category from the accent module 416 database 103, and the accent module 416 transmits it to the non-native language speech translation module 104; the accent module 416 database 103 A number of accent modules 416 are stored in it.
  • the accent module 416 collects a small amount of non-native speakers’ languages and a large number of manually added languages through the speech recognition module 101, converts these two languages into decoded languages, and sends them to data processing.
  • the data processing unit 404 receives decoded languages from a small number of non-native speakers in the speech recognition module 101 and a large number of decoded languages added manually, and the data processing unit 404 receives the decoded language and processes the format of the decoded language in a unified manner Then, the decoded language after the format processing is sent to the data embedding module 405, and the data embedding module 405 converts the decoded language after the format processing into a digital code and sends it to the accent module 416.
  • Each accent module 416 contains multiple sequentially connected An LSTM layer 406 (long-term short-term memory layer) and a density layer 408, and the last LSTM layer 406 is connected to the density layer 408.
  • Step S14 the non-native language speech translation module 104 translates the decoded language accent module 416 from the speech recognition module 101 into standard sentences for output; as shown in FIG. 3, the non-native language speech translation module 104 includes the data processing unit 404 and data The embedding module 405 and the received accent module 416, the data processing unit 404 is connected to the data embedding module 405, and the data embedding module 405 is connected to the accent module 416.
  • the data processing unit 404 unifies the received decoded language format and then processes the format The decoded language is then sent to the data embedding module 405.
  • the data embedding module 405 converts the decoded language after format processing into a digital code and sends it to the accent module 416.
  • Each accent module 416 contains multiple successively connected LSTM layers 406 (long-term and short-term Memory layer) and a density layer 408.
  • the last LSTM layer 406 is connected to the density layer 408.
  • the LSTM layers 406 that are connected are converted into two-dimensional codes and sent to the density layer 408.
  • the density layer 408 The two-dimensional code is converted into standard sentences and output.
  • a speech recognition device used by non-native speakers including steps:
  • the speech recognition module 101 is configured to convert the language of the received non-native speaker into a decoded language, and transmit the decoded language to the language matching module 111 and the non-native language speech translation module 104 respectively;
  • the language matching module 111 is configured to retrieve the standard language corresponding to the decoded language stored in the language matching module 111 according to the received decoded language, and the standard language is the same as the decoded language received from the speech recognition module 101
  • the language forms a pair of decoded language and standard speech, and transmits the pair of decoded language and standard language to the accent analyzer 102;
  • the accent analyzer 102 is configured to compare the received pair of decoded language and standard language with the accent category stored in the accent analyzer 102 to find out the accent category corresponding to the pair of decoded language and standard language, and Sending the accent category to the accent module database 103;
  • the accent module database 103 is used to retrieve the accent module 416 corresponding to the accent category from the accent module database 103 according to the accent category, and transmit the accent module 416 to the non-native language speech translation module 104;
  • the non-native language speech translation module 104 is used to translate the decoded language accent module 416 from the speech recognition module 101 into standard sentences for output.
  • the decoded language is decoded text or decoded phoneme; the accent module 416 is obtained by receiving the decoded language of non-native speakers and the decoded language manually added; the accent module 416 contains multiple successively connected LSTM layers and a density layer, and finally An LSTM layer is connected to the density layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

Provided are a speech recognition method and apparatus used by a non-native speaker. The speech recognition apparatus comprises: a speech recognition module (101) for converting a received language of a non-native speaker into a decoded language, and transmitting the decoded language to a language matching module (111) and a non-native speech translation module (104) respectively; the language matching module (111) for invoking, according to the received decoded language, a standard language corresponding to the decoded language and stored in the language matching module (111), wherein the standard language and the decoded language received from the speech recognition module (101) form a decoded language and standard language pair, and the decoded language and standard language pair is transmitted to an accent analyzer (102); the accent analyzer (102) for comparing the received decoded language and standard language pair with accent types stored in the accent analyzer (102), obtaining, through analysis, an accent type corresponding to the decoded language and standard language pair, and sending the accent type to an accent module database (103); the accent module database (103) for invoking, according to the accent type and from the accent module database (103), an accent module (416) corresponding to the accent type, and transmitting the accent module (416) to the non-native speech translation module (104); and the non-native speech translation module (104) for translating the decoded language from the speech recognition module (101) into a standard sentence by means of the accent module (416) and outputting the standard sentence. By means of the decoded language and standard language pair, the language of the non-native speaker can be converted into a corresponding standard language of a native speaker more accurately, and an accent module collection method can save a great amount of time.

Description

一种非母语人士使用的语音识别方法及装置Voice recognition method and device used by non-native speakers 技术领域Technical field
本发明一种语音识别方法及装置,特别涉及提供一种非母语人士使用的语音识别方法及装置。The present invention is a voice recognition method and device, and particularly relates to a voice recognition method and device used by non-native speakers.
背景技术Background technique
目前,语音识别技术现在广泛应用于家庭控制设备、智能扬声器、个人助理和电话接口等许多不同的领域,常用的语音识别解决方案和产品包括来自Amazon的Alexa、谷歌助理和Apple Siri,另外,如IBM、Apple、Amazon和Google等公司为开发人员提供了强大的语音识别API,以用于开发进一步的应用程序。At present, voice recognition technology is now widely used in many different fields such as home control devices, smart speakers, personal assistants and telephone interfaces. Commonly used voice recognition solutions and products include Alexa from Amazon, Google Assistant and Apple Siri. In addition, such as Companies such as IBM, Apple, Amazon, and Google provide developers with powerful speech recognition APIs for the development of further applications.
一个强大的语音识别引擎需要大量的资源来开发,并且需要很长的时间来调试引擎,以提供高质量的语音到文本功能。在处理与噪声、回声和混响等有关的音频信号方面,它需要一个广泛的工程应用技术。因此,开发一个高质量的语音识别模块成本很高。A powerful speech recognition engine requires a lot of resources to develop, and it takes a long time to debug the engine to provide high-quality speech-to-text functions. It requires a wide range of engineering applications in terms of processing audio signals related to noise, echo and reverberation. Therefore, the cost of developing a high-quality speech recognition module is high.
大多数语音识别引擎都是基于母语使用者进行开发和调试的,为了研发出语音识别引擎,它需要从大量的母语使用者那里收集语音样本,才能使语音识别引擎获得最佳性能;当语音识别引擎用于非母语人士时,如果这些非母语人士不能以母语发音正确,它就不能正常工作,当非母语者发音带口音时,语音识别引擎将返回错误的解码文本。Most speech recognition engines are developed and debugged based on native speakers. In order to develop a speech recognition engine, it needs to collect voice samples from a large number of native speakers to make the speech recognition engine get the best performance; When the engine is used for non-native speakers, if these non-native speakers cannot pronounce correctly in their native language, it will not work properly. When the non-native speakers pronounce with accent, the speech recognition engine will return the wrong decoded text.
对于任何一种语言,比如英语,会有许多非母语人士的发言带有不同的口音。不同地区的非母语人士都会带有自己母语为基础的口音,为了收集所有非本地人的发音样本,需要花费太多的费用,而且要新研发出很多发音模块来满足许多不同地区的非母语人士的发音;这就要求采用一种低成本、高效的方法来适应不同口音的语音识别引擎来解码不同的口音。For any language, such as English, there are many non-native speakers with different accents. Non-native speakers in different regions will have their own accent based on their native language. To collect all non-native speakers’ pronunciation samples, it will cost too much, and many new pronunciation modules have to be developed to satisfy non-native speakers in many different regions. The pronunciation; This requires a low-cost, efficient method to adapt to different accent speech recognition engines to decode different accents.
在解决语言问题时,使用深度学习方法是很常见的,长期短期记忆(LSTM)或递归神经网络已被应用于许多深度学习领域,如翻译、自然语言 处理和语音识别等深度学习领域;当使用LSTM进行翻译时,通常使用基于字母的序列到序列翻译或基于单词的序列到序列翻译,为了适应口音,基于字母的方法和基于单词的方法都不是有效的,因为它们与发音不完全相关。When solving language problems, it is very common to use deep learning methods. Long-term short-term memory (LSTM) or recurrent neural networks have been used in many deep learning fields, such as translation, natural language processing, and speech recognition; when used When LSTM performs translation, it usually uses letter-based sequence-to-sequence translation or word-based sequence-to-sequence translation. To adapt to accents, neither letter-based methods nor word-based methods are effective because they are not completely related to pronunciation.
发明内容Summary of the invention
为了解决上述问题,本发明提出了一种使用现有语音识别模块来获得非母语人士的口音的解码结果,但是解码结果可能不正确,口音模块来自非本地口音组的解码结果,口音模块用来翻译语音识别模块的解码输出。In order to solve the above-mentioned problems, the present invention proposes a method of using the existing speech recognition module to obtain the decoding result of the accent of non-native speakers, but the decoding result may be incorrect. The accent module comes from the decoding result of the non-local accent group, and the accent module is used Translate the decoded output of the speech recognition module.
为了实现上述目的,本发明提供以下技术方法:In order to achieve the above objectives, the present invention provides the following technical methods:
一种非母语人士使用的语音识别方法,该方法包括以下步骤:A speech recognition method used by non-native speakers, the method includes the following steps:
步骤S10,语音识别模块将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块和非母语语音翻译模块;Step S10, the speech recognition module converts the received language of the non-native speaker into a decoded language, and transmits the decoded language to the language matching module and the non-native voice translation module respectively;
步骤S11,语言匹配模块根据所收到的解码语言,并调取语言匹配模块储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块中接收到的解码语言形一对解码语言和标准语言,并将该对解码语言和标准语言传送给口音分析器;Step S11, the language matching module retrieves the standard language corresponding to the decoded language stored in the language matching module according to the received decoded language, and the standard language is decoded in pairs with the decoded language received from the speech recognition module Language and standard language, and send the pair of decoded language and standard language to the accent analyzer;
步骤S12,口音分析器将接收到的所述一对解码语言和标准语言与口音分析器中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库;Step S12: The accent analyzer compares the received pair of decoded language and standard language with the accent category stored in the accent analyzer to find out the accent category corresponding to the pair of decoded language and standard language, and compares the accent The category is sent to the accent module database;
步骤S13,口音模块数据库根据口音类别,从口音模块数据库中调取口音类别对应的口音模块,并将所述口音模块将传送给非母语语音翻译模块;Step S13: According to the accent category, the accent module database retrieves the accent module corresponding to the accent category from the accent module database, and transmits the accent module to the non-native language speech translation module;
步骤S14,非母语语音翻译模块把来自语音识别模块的解码语言用口音模块翻译为标准语句输出。In step S14, the non-native language speech translation module translates the decoded language from the speech recognition module into a standard sentence and outputs it with an accent module.
优选的,所述步骤S10中的解码语言为解码文本或解码音素。Preferably, the decoding language in the step S10 is decoded text or decoded phoneme.
优选的,所述步骤S13中的口音模块是通过接收非母语人士的解码语言和人工添加的解码语言获得的。Preferably, the accent module in the step S13 is obtained by receiving the decoded language of the non-native speaker and the manually added decoded language.
优选的,所述口音模块中包含多个依次连接的LSTM层和一个密度层,最 后一个LSTM层与密度层连接。Preferably, the accent module includes a plurality of LSTM layers and a density layer connected in sequence, and the last LSTM layer is connected to the density layer.
一种非母语人士使用的语音识别装置,包括步骤:A speech recognition device used by non-native speakers, including steps:
所述语音识别模块,用于将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块和非母语语音翻译模块;The voice recognition module is used to convert the language of the received non-native speaker into a decoded language, and transmit the decoded language to the language matching module and the non-native voice translation module respectively;
所述语言匹配模块,用于根据所收到的解码语言,并调取语言匹配模块储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块中接收到的解码语言形一对解码语言和标准语言,并将该对解码语言和标准语言传送给口音分析器;The language matching module is configured to retrieve the standard language corresponding to the decoded language stored in the language matching module according to the received decoded language, and the standard language is the same as the decoded language received from the speech recognition module Pair the decoded language and standard language, and send the pair of decoded language and standard language to the accent analyzer;
所述口音分析器,用于将接收到的所述一对解码语言和标准语言与口音分析器中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库;The accent analyzer is used to compare the received pair of decoded language and standard language with the accent category stored in the accent analyzer to find out the accent category corresponding to the pair of decoded language and standard language, and compare The accent category is sent to the accent module database;
所述口音模块数据库,用于根据口音类别,从口音模块数据库中,调取口音类别对应的口音模块,并将所述口音模块将传送给非母语语音翻译模块;The accent module database is used to retrieve the accent module corresponding to the accent category from the accent module database according to the accent category, and transmit the accent module to the non-native language speech translation module;
所述非母语语音翻译模块,用于把来自语音识别模块的解码语言用口音模块翻译为标准语句输出。The non-native language speech translation module is used to translate the decoded language from the speech recognition module into a standard sentence and output it with an accent module.
与现有技术相比,本发明的有益效果是:非母语人士使用的语音识别方法中,通过音分析器将接收到的一对解码语言和标准语言与口音分析器中储存的一对解码语言和标准语言进行比较分析出解码文本所属的口音类别,这对解码语言和标准语言能够更加准确的将非母语人士的语言转化为对应的标准母语人士的语言,口音模块收集中实用了收集人工添加的语言,能够节省大量的时间,解决了需要采集大量非母语人士的语言的时间成本。Compared with the prior art, the present invention has the following beneficial effects: in the speech recognition method used by non-native speakers, the received pair of decoded languages and the pair of decoded languages stored in the standard language and accent analyzer are combined through the voice analyzer. Compare and analyze the accent category of the decoded text with the standard language. This pair of decoded languages and standard languages can more accurately convert the language of non-native speakers into the language of the corresponding standard native speakers. The accent module is collected and added manually. The language can save a lot of time and solve the time cost of collecting a large number of languages of non-native speakers.
附图说明Description of the drawings
图1为本发明一种非母语人士使用的语音识别方法的流程图;Figure 1 is a flowchart of a voice recognition method used by non-native speakers of the present invention;
图2为本发明一种非母语人士使用的语音识别方法的口音模块收集的流程图;2 is a flowchart of accent module collection of a speech recognition method used by non-native speakers of the present invention;
图3为本发明一种非母语人士使用的语音识别方法的非母语语音翻译模块的流程图。3 is a flowchart of a non-native language speech translation module of a speech recognition method used by non-native speakers of the present invention.
具体实施方式Detailed ways
为了更好的理解本发明的技术方法,下面结合附图详细描述本发明提供的实施例,如图1-3所示,该方法包括以下步骤:In order to better understand the technical method of the present invention, the embodiments provided by the present invention will be described in detail below with reference to the accompanying drawings. As shown in Figures 1-3, the method includes the following steps:
步骤S10,语音识别模块101将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块111和非母语语音翻译模块104;所述解码语言可以为解码文本或解码音素,语音识别模块101也可以处理语音信号处理中的所有噪声、回声和混响问题。Step S10, the speech recognition module 101 converts the language of the received non-native speaker into a decoded language, and transmits the decoded language to the language matching module 111 and the non-native voice translation module 104 respectively; the decoded language may be decoded text or decoded phoneme The speech recognition module 101 can also handle all noise, echo, and reverberation problems in speech signal processing.
步骤S11,语言匹配模块111根据所收到的解码语言,并调取语言匹配模块111储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块101中接收到的解码语言形一对解码语言和标准语音,并将该对解码语言和标准语言传送给口音分析器102;语言匹配模块111储存大量的标准语言。In step S11, the language matching module 111 retrieves the standard language corresponding to the decoded language stored in the language matching module 111 according to the received decoded language, and the standard language corresponds to the decoded language received from the speech recognition module 101. A pair of decoded language and standard speech is sent to the accent analyzer 102; the language matching module 111 stores a large number of standard languages.
步骤S12,口音分析器102将接收到的所述一对解码语言和标准语言与口音分析器102中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库103;分析器102中储存的口音类别303包含很多对解码语言313和标准语言323,该解码语言313和标准语言323也分别是来自语音识别模块101和语言匹配模块111,在使用前期的储存阶段,分析器102采集并储存了大量的口音类别。Step S12: The accent analyzer 102 compares the received pair of decoded language and standard language with the accent category stored in the accent analyzer 102 to find out the accent category corresponding to the pair of decoded language and standard language The accent category is sent to the accent module database 103; the accent category 303 stored in the analyzer 102 contains many pairs of decoded language 313 and standard language 323, which also come from the speech recognition module 101 and the language matching module, respectively 111. In the pre-use storage stage, the analyzer 102 collects and stores a large number of accent categories.
步骤S13,口音模块数据库103根据口音类别,从口音模块416数据库103中,调取口音类别对应的口音模块416,并所述口音模块416将传送给非母语语音翻译模块104;口音模块416数据库103中储存了含有多个口音模块416。Step S13: According to the accent category, the accent module database 103 retrieves the accent module 416 corresponding to the accent category from the accent module 416 database 103, and the accent module 416 transmits it to the non-native language speech translation module 104; the accent module 416 database 103 A number of accent modules 416 are stored in it.
如图2所示,口音模块416的采集方法是通过语音识别模块101接收少量的非母语人士的语言及大量来自人工集中添加的语言,并将这两种语言转化为解码语言,传送给数据处理单元404,数据处理单元404接收来自语音识别模块101中少量非母语人士的解码语言以及大量来自人工集中添加的解码语言,数据处理单元404接收所述的解码语言,并将解码语言的格式统一处理后,再将格式处理后的解码语言发送给数据嵌入模块405,数据嵌入模块405将格式处理后的解码语言转化为数字代码发送给口音模块416,每个口音模块416中 包含多个依次连接的LSTM层406(长期短期记忆层)和一个密度层408,最后一个LSTM层406与密度层408连接。As shown in Figure 2, the accent module 416 collects a small amount of non-native speakers’ languages and a large number of manually added languages through the speech recognition module 101, converts these two languages into decoded languages, and sends them to data processing. Unit 404, the data processing unit 404 receives decoded languages from a small number of non-native speakers in the speech recognition module 101 and a large number of decoded languages added manually, and the data processing unit 404 receives the decoded language and processes the format of the decoded language in a unified manner Then, the decoded language after the format processing is sent to the data embedding module 405, and the data embedding module 405 converts the decoded language after the format processing into a digital code and sends it to the accent module 416. Each accent module 416 contains multiple sequentially connected An LSTM layer 406 (long-term short-term memory layer) and a density layer 408, and the last LSTM layer 406 is connected to the density layer 408.
步骤S14,非母语语音翻译模块104把来自语音识别模块101的解码语言用口音模块416翻译为标准语句输出;如图3所示,非母语语音翻译模块104中包括传送给数据处理单元404、数据嵌入模块405和接收到的口音模块416,数据处理单元404连接数据嵌入模块405,接数据嵌入模块405连接口音模块416,数据处理单元404将接收到的解码语言格式统一化后,再将格式处理后的解码语言发送给数据嵌入模块405,数据嵌入模块405将格式处理后的解码语言转化为数字代码发送给口音模块416,每个口音模块416中包含多个依次连接的LSTM层406(长期短期记忆层)和一个密度层408,最后一个LSTM层406与密度层408连接,多个依次连接的LSTM层406接收到数字代码后转化为二维代码,并传送给密度层408,密度层408将二维代码转化为标准语句并输出。Step S14, the non-native language speech translation module 104 translates the decoded language accent module 416 from the speech recognition module 101 into standard sentences for output; as shown in FIG. 3, the non-native language speech translation module 104 includes the data processing unit 404 and data The embedding module 405 and the received accent module 416, the data processing unit 404 is connected to the data embedding module 405, and the data embedding module 405 is connected to the accent module 416. The data processing unit 404 unifies the received decoded language format and then processes the format The decoded language is then sent to the data embedding module 405. The data embedding module 405 converts the decoded language after format processing into a digital code and sends it to the accent module 416. Each accent module 416 contains multiple successively connected LSTM layers 406 (long-term and short-term Memory layer) and a density layer 408. The last LSTM layer 406 is connected to the density layer 408. After receiving the digital codes, the LSTM layers 406 that are connected in turn are converted into two-dimensional codes and sent to the density layer 408. The density layer 408 The two-dimensional code is converted into standard sentences and output.
一种非母语人士使用的语音识别装置,包括步骤:A speech recognition device used by non-native speakers, including steps:
所述语音识别模块101,用于将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块111和非母语语音翻译模块104;The speech recognition module 101 is configured to convert the language of the received non-native speaker into a decoded language, and transmit the decoded language to the language matching module 111 and the non-native language speech translation module 104 respectively;
所述语言匹配模块111,用于根据所收到的解码语言,并调取语言匹配模块111储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块101中接收到的解码语言形一对解码语言和标准语音,并将该对解码语言和标准语言传送给口音分析器102;The language matching module 111 is configured to retrieve the standard language corresponding to the decoded language stored in the language matching module 111 according to the received decoded language, and the standard language is the same as the decoded language received from the speech recognition module 101 The language forms a pair of decoded language and standard speech, and transmits the pair of decoded language and standard language to the accent analyzer 102;
所述口音分析器102,用于将接收到的所述一对解码语言和标准语言与口音分析器102中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库103;The accent analyzer 102 is configured to compare the received pair of decoded language and standard language with the accent category stored in the accent analyzer 102 to find out the accent category corresponding to the pair of decoded language and standard language, and Sending the accent category to the accent module database 103;
所述口音模块数据库103,用于根据口音类别,从口音模块数据库103中,调取口音类别对应的口音模块416,并将所述口音模块416将传送给非母语语音翻译模块104;The accent module database 103 is used to retrieve the accent module 416 corresponding to the accent category from the accent module database 103 according to the accent category, and transmit the accent module 416 to the non-native language speech translation module 104;
所述非母语语音翻译模块104,用于把来自语音识别模块101的解码语言用口音模块416翻译为标准语句输出。The non-native language speech translation module 104 is used to translate the decoded language accent module 416 from the speech recognition module 101 into standard sentences for output.
所述解码语言为解码文本或解码音素;口音模块416是通过接收非母语人士的解码语言和人工添加的解码语言获得的;口音模块416中包含多个依次连接的LSTM层和一个密度层,最后一个LSTM层与密度层连接。The decoded language is decoded text or decoded phoneme; the accent module 416 is obtained by receiving the decoded language of non-native speakers and the decoded language manually added; the accent module 416 contains multiple successively connected LSTM layers and a density layer, and finally An LSTM layer is connected to the density layer.
以上对本发明实施例所提供的一种非母语人士使用的语音识别方法进行了详细介绍,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The above provides a detailed introduction to a speech recognition method used by non-native speakers provided by the embodiments of the present invention. For those skilled in the art, according to the ideas of the embodiments of the present invention, there will be specific implementation and application scope. In summary, the content of this specification should not be construed as limiting the present invention.

Claims (8)

  1. 一种非母语人士使用的语音识别方法,该方法包括以下步骤:A speech recognition method used by non-native speakers, the method includes the following steps:
    步骤S10,语音识别模块(101)将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块(111)和非母语语音翻译模块(104);Step S10, the speech recognition module (101) converts the received language of the non-native speaker into a decoded language, and transmits the decoded language to the language matching module (111) and the non-native voice translation module (104) respectively;
    步骤S11,语言匹配模块(111)根据所收到的解码语言,并调取语言匹配模块(111)储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块(101)中接收到的解码语言形一对解码语言和标准语言,并将该对解码语言和标准语言传送给口音分析器(102);Step S11, the language matching module (111) retrieves the standard language corresponding to the decoded language stored in the language matching module (111) according to the received decoded language, and the standard language is the same as the standard language from the speech recognition module (101) The received decoded language is a pair of decoded language and standard language, and the pair of decoded language and standard language is sent to the accent analyzer (102);
    步骤S12,口音分析器(102)将接收到的所述一对解码语言和标准语言与口音分析器(102)中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库(103);Step S12: The accent analyzer (102) compares the received pair of decoded language and standard language with the accent category stored in the accent analyzer (102) to analyze the accent category corresponding to the pair of decoded language and standard language , And send the accent category to the accent module database (103);
    步骤S13,口音模块数据库(103)根据口音类别,从口音模块数据库(103)中,调取口音类别对应的口音模块(416),并将所述口音模块(416)将传送给非母语语音翻译模块(104);Step S13: According to the accent category, the accent module database (103) retrieves the accent module (416) corresponding to the accent category from the accent module database (103), and transmits the accent module (416) to the non-native language voice translation Module (104);
    步骤S14,非母语语音翻译模块(104)把来自语音识别模块(101)的解码语言用口音模块(416)翻译为标准语句输出。In step S14, the non-native language speech translation module (104) translates the decoded language accent module (416) from the speech recognition module (101) into standard sentences for output.
  2. 根据权利要求1所述的一种非母语人士使用的语音识别方法,其特征在于,所述步骤S10中的解码语言为解码文本或解码音素。The speech recognition method used by non-native speakers of claim 1, wherein the decoding language in step S10 is decoded text or decoded phoneme.
  3. 根据权利要求1所述的一种非母语人士使用的语音识别方法,其特征在于,所述步骤S13中的口音模块(416)是通过接收非母语人士的解码语言和人工添加的解码语言获得的。A speech recognition method used by non-native speakers according to claim 1, wherein the accent module (416) in step S13 is obtained by receiving the decoded language of the non-native speaker and the decoded language manually added. .
  4. 根据权利要求1所述的一种非母语人士使用的语音识别方法,其特征在于,每个口音模块(416)中包含多个依次连接的LSTM层和一个密度层,最后一个LSTM层与密度层连接。The speech recognition method used by non-native speakers according to claim 1, wherein each accent module (416) contains a plurality of LSTM layers and a density layer connected in sequence, and the last LSTM layer and the density layer connection.
  5. 一种非母语人士使用的语音识别装置,其特征在于,包括步骤:A speech recognition device used by non-native speakers is characterized in that it comprises the steps:
    所述语音识别模块(101),用于将接收到非母语人士的语言转换为解码语言,并将解码语言分别传送给语言匹配模块(111)和非母语语音翻译模块 (104);The speech recognition module (101) is used to convert the language of the received non-native speaker into a decoded language, and transmit the decoded language to the language matching module (111) and the non-native language speech translation module (104) respectively;
    所述语言匹配模块(111),用于根据所收到的解码语言,并调取语言匹配模块(111)储存的所述解码语言对应的标准语言,所述标准语言与从语音识别模块(101)中接收到的解码语言形一对解码语言和标准语音,并将该对解码语言和标准语言传送给口音分析器(102);The language matching module (111) is used to retrieve the standard language corresponding to the decoded language stored in the language matching module (111) according to the received decoded language, and the standard language is compatible with the slave speech recognition module (101) The decoded language received in) is a pair of decoded language and standard speech, and the pair of decoded language and standard language are sent to the accent analyzer (102);
    所述口音分析器(102),用于将接收到的所述一对解码语言和标准语言与口音分析器(102)中储存的口音类别进行比较分析出该对解码语言和标准语言所对应的口音类别,并将所述口音类别发送给口音模块数据库(103);The accent analyzer (102) is configured to compare the received pair of decoded language and standard language with the accent category stored in the accent analyzer (102) to find out the corresponding pair of decoded language and standard language Accent category, and send the accent category to the accent module database (103);
    所述口音模块数据库(103),用于根据口音类别,从口音模块数据库(103)中,调取口音类别对应的口音模块(416),并将所述口音模块(416)将传送给非母语语音翻译模块(104);The accent module database (103) is used to retrieve the accent module (416) corresponding to the accent category from the accent module database (103) according to the accent category, and transmit the accent module (416) to the non-native language Voice translation module (104);
    所述非母语语音翻译模块(104),用于把来自语音识别模块(101)的解码语言用口音模块(416)翻译为标准语句输出。The non-native language speech translation module (104) is used to translate the decoded language accent module (416) from the speech recognition module (101) into standard sentences for output.
  6. 根据权利要求5所述的一种非母语人士使用的语音识别装置,其特征在于,所述解码语言为解码文本或解码音素。The speech recognition device used by non-native speakers of claim 5, wherein the decoded language is decoded text or decoded phoneme.
  7. 根据权利要求5所述的一种非母语人士使用的语音识别装置,其特征在于,所述口音模块(416)是通过接收非母语人士的解码语言和人工添加的解码语言获得的。The speech recognition device used by non-native speakers according to claim 5, wherein the accent module (416) is obtained by receiving the decoded language of the non-native speakers and the decoded language manually added.
  8. 根据权利要求5所述的一种非母语人士使用的语音识别装置,其特征在于,所述口音模块(416)中包含多个依次连接的LSTM层和一个密度层,最后一个LSTM层与密度层连接。A speech recognition device for non-native speakers according to claim 5, characterized in that the accent module (416) comprises a plurality of LSTM layers and a density layer connected in sequence, the last LSTM layer and the density layer connection.
PCT/CN2019/093947 2019-06-29 2019-06-29 Speech recognition method and apparatus used by non-native speaker WO2021000068A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/093947 WO2021000068A1 (en) 2019-06-29 2019-06-29 Speech recognition method and apparatus used by non-native speaker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/093947 WO2021000068A1 (en) 2019-06-29 2019-06-29 Speech recognition method and apparatus used by non-native speaker

Publications (1)

Publication Number Publication Date
WO2021000068A1 true WO2021000068A1 (en) 2021-01-07

Family

ID=74100097

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093947 WO2021000068A1 (en) 2019-06-29 2019-06-29 Speech recognition method and apparatus used by non-native speaker

Country Status (1)

Country Link
WO (1) WO2021000068A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080147404A1 (en) * 2000-05-15 2008-06-19 Nusuara Technologies Sdn Bhd System and methods for accent classification and adaptation
CN101650943A (en) * 2008-12-19 2010-02-17 中国科学院声学研究所 Non-native speech recognition system and method thereof
CN105408952A (en) * 2013-02-21 2016-03-16 谷歌技术控股有限责任公司 Recognizing accented speech
CN107452379A (en) * 2017-08-17 2017-12-08 广州腾猴科技有限公司 The identification technology and virtual reality teaching method and system of a kind of dialect language
CN108346426A (en) * 2018-02-01 2018-07-31 威盛电子股份有限公司 Speech recognition equipment and audio recognition method
CN108682420A (en) * 2018-05-14 2018-10-19 平安科技(深圳)有限公司 A kind of voice and video telephone accent recognition method and terminal device
CN109785832A (en) * 2018-12-20 2019-05-21 安徽声讯信息技术有限公司 A kind of old man's set-top box Intelligent voice recognition method suitable for accent again

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080147404A1 (en) * 2000-05-15 2008-06-19 Nusuara Technologies Sdn Bhd System and methods for accent classification and adaptation
CN101650943A (en) * 2008-12-19 2010-02-17 中国科学院声学研究所 Non-native speech recognition system and method thereof
CN105408952A (en) * 2013-02-21 2016-03-16 谷歌技术控股有限责任公司 Recognizing accented speech
CN107452379A (en) * 2017-08-17 2017-12-08 广州腾猴科技有限公司 The identification technology and virtual reality teaching method and system of a kind of dialect language
CN108346426A (en) * 2018-02-01 2018-07-31 威盛电子股份有限公司 Speech recognition equipment and audio recognition method
CN108682420A (en) * 2018-05-14 2018-10-19 平安科技(深圳)有限公司 A kind of voice and video telephone accent recognition method and terminal device
CN109785832A (en) * 2018-12-20 2019-05-21 安徽声讯信息技术有限公司 A kind of old man's set-top box Intelligent voice recognition method suitable for accent again

Similar Documents

Publication Publication Date Title
CN106710586B (en) Automatic switching method and device for voice recognition engine
EP2571023B1 (en) Machine translation-based multilingual human-machine dialog
CN110807332A (en) Training method of semantic understanding model, semantic processing method, semantic processing device and storage medium
US9697201B2 (en) Adapting machine translation data using damaging channel model
US20220230628A1 (en) Generation of optimized spoken language understanding model through joint training with integrated knowledge-language module
US7593842B2 (en) Device and method for translating language
US7840399B2 (en) Method, device, and computer program product for multi-lingual speech recognition
MXPA05002208A (en) Translation system.
CN110795945A (en) Semantic understanding model training method, semantic understanding device and storage medium
US20090299724A1 (en) System and method for applying bridging models for robust and efficient speech to speech translation
CN111368559A (en) Voice translation method and device, electronic equipment and storage medium
EP3940693A1 (en) Voice interaction-based information verification method and apparatus, and device and computer storage medium
US11907665B2 (en) Method and system for processing user inputs using natural language processing
US11798529B2 (en) Generation of optimized knowledge-based language model through knowledge graph multi-alignment
CN110807333A (en) Semantic processing method and device of semantic understanding model and storage medium
US20120136664A1 (en) System and method for cloud-based text-to-speech web services
CN116737759B (en) Method for generating SQL sentence by Chinese query based on relation perception attention
CN114330371A (en) Session intention identification method and device based on prompt learning and electronic equipment
US11741317B2 (en) Method and system for processing multilingual user inputs using single natural language processing model
WO2023272616A1 (en) Text understanding method and system, terminal device, and storage medium
CN113505609A (en) One-key auxiliary translation method for multi-language conference and equipment with same
WO2021000068A1 (en) Speech recognition method and apparatus used by non-native speaker
US20220230629A1 (en) Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module
WO2022159198A1 (en) Generation of optimized knowledge-based language model through knowledge graph multi-alignment
WO2022159211A1 (en) Generation of optimized spoken language understanding model through joint training with integrated knowledge-language module

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936516

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19936516

Country of ref document: EP

Kind code of ref document: A1