WO2017166631A1 - Voice signal processing method, apparatus and electronic device - Google Patents

Voice signal processing method, apparatus and electronic device Download PDF

Info

Publication number
WO2017166631A1
WO2017166631A1 PCT/CN2016/096828 CN2016096828W WO2017166631A1 WO 2017166631 A1 WO2017166631 A1 WO 2017166631A1 CN 2016096828 W CN2016096828 W CN 2016096828W WO 2017166631 A1 WO2017166631 A1 WO 2017166631A1
Authority
WO
WIPO (PCT)
Prior art keywords
language model
information string
scene
recognized
word sequence
Prior art date
Application number
PCT/CN2016/096828
Other languages
French (fr)
Chinese (zh)
Inventor
王彪
Original Assignee
乐视控股(北京)有限公司
乐视致新电子科技(天津)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视致新电子科技(天津)有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017166631A1 publication Critical patent/WO2017166631A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the embodiments of the present invention relate to the field of voice recognition technologies, and in particular, to a voice signal processing method, apparatus, and electronic device.
  • Speech recognition technology has developed rapidly in recent years, enabling users to interact with smart devices via voice.
  • Speech recognition technology is a technique for transforming a speech signal into a corresponding text or command through an identification and parsing process.
  • the process of recognizing and parsing speech signals is inseparable from the language model (Language Model, LM).
  • the purpose of the language model is to establish a distribution that can describe the probability that a given word sequence appears in the language.
  • the general language model mainly includes the general word sequence and the probability that the general word sequence appears in the language for identifying the speech signal in the general domain.
  • the existing common language models obviously cannot meet these application requirements, which will reduce the accuracy of speech recognition.
  • Embodiments of the present invention provide a voice signal processing method, apparatus, and electronic device for performing Speech recognition improves the accuracy of speech signal recognition.
  • the embodiment of the invention provides a voice signal processing method, including:
  • An embodiment of the present invention provides a voice signal processing apparatus, including:
  • An acquiring module configured to acquire a string of information corresponding to the voice signal to be identified
  • a determining module configured to determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal
  • a determining module configured to determine whether a word sequence corresponding to the information string exists in the scened language model
  • an enhancement module configured to: if the determination result is yes, increase a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scene language model;
  • an identifying module configured to perform voice recognition on the to-be-identified voice signal according to the enhanced scened language model.
  • Embodiments of the present invention further provide an electronic device including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can acquire an information string corresponding to the to-be-identified voice signal; and according to the information string, determine a scene language corresponding to the to-be-identified voice signal a model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing the scene a probability of a word sequence corresponding to the information string appearing in the language in the language model to obtain an enhanced scened language model; and performing speech recognition on the to-be-recognized speech signal according to the enhanced scened language model .
  • Embodiments of the present invention also provide a non-volatile computer storage medium storing the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to acquire a voice signal to be recognized Corresponding information string; determining, according to the information string, a scene language model corresponding to the to-be-identified speech signal; determining whether there is a word sequence corresponding to the information string in the scene language model; And increasing a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model; according to the enhanced scened language model, The speech signal to be recognized is subjected to speech recognition.
  • Embodiments of the present invention also provide a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to execute the above-described speech signal processing method.
  • the voice signal processing method, apparatus, and electronic device determine a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and the information string corresponding to the scene language model exists.
  • the probability of the word sequence appearing in the language is increased to obtain an enhanced scened language model, and the speech signal to be recognized is recognized based on the enhanced scene language model.
  • the embodiment of the present invention can improve the accuracy of speech recognition based on the enhanced scene language model.
  • FIG. 1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a voice signal processing method according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
  • connection In the description of the present invention, it should be noted that the terms “installation”, “connected”, and “connected” are to be understood broadly, and may be fixed or detachable, for example, unless otherwise explicitly defined and defined. Connected, or connected integrally; can be mechanical or electrical; can be directly connected, indirectly connected through an intermediate medium, or can be connected inside the two components, It is either a wireless connection or a wired connection.
  • Connected, or connected integrally can be mechanical or electrical; can be directly connected, indirectly connected through an intermediate medium, or can be connected inside the two components, It is either a wireless connection or a wired connection.
  • the specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.
  • the general language model mainly includes the general word sequence and the probability that the general word sequence appears in the language for identifying the speech signal in the general domain.
  • the existing common language models obviously cannot meet these application requirements, which will reduce the accuracy of speech recognition.
  • the main principle is: determining a scene language model corresponding to a speech signal to be recognized, and increasing a probability that a sequence of corresponding words in the scene language model appears in a language
  • an enhanced scened language model based on the enhanced scened language model for speech recognition of the speech signal to be recognized.
  • the scene language model contains more word sequences related to the application scene (also called a specific word sequence), and the word sequence related to the speech signal to be recognized in the scene language model is pre-increased.
  • the probability of occurrence in the language so based on the enhanced scened language model to recognize the speech signal for speech recognition, can improve the accuracy of speech recognition.
  • FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
  • step 104 Determine whether there is a sequence of words corresponding to the information string in the scenario language model; if the determination result is yes, execute step 104, if the determination result is no, optionally, end the operation or treat according to the scene language model Identify voice models for speech recognition.
  • the embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.
  • the voice signal processing device first acquires the information string corresponding to the voice signal to be recognized.
  • the information string refers to a string of information that can reflect the speech signal to be recognized to a certain extent, and may be, for example, a Pinyin string corresponding to the speech signal to be recognized, or an initial text string obtained by performing initial speech recognition on the speech signal to be recognized.
  • the speech signal processing device determines a scene language model corresponding to the speech signal to be recognized according to the information string, so as to perform speech recognition on the speech signal to be recognized based on the scene language model.
  • the foregoing implementation manner of determining a scenario language model corresponding to the to-be-identified voice signal according to the information string includes:
  • Semantically parsing the information string corresponding to the recognized speech signal determining a grammatical sentence and an entity word in the information string; determining a user intention of the speech signal to be recognized according to the grammatical sentence and the entity word; determining according to the user intention
  • a scened language model corresponding to the speech signal to be recognized For example, the information string corresponding to the speech signal to be recognized is “I want to call Xiao Li”. After semantic analysis, it can be determined that the grammar sentence in the information string is “I want to call...” and the entity word is “ Xiao Li”, according to the sentence sentence and the entity word, it can be determined that the user's intention is to call someone.
  • the scene language model corresponding to the voice signal to be recognized can be determined as the phone scene language model instead of Search for a scene language model.
  • the speech signal to be recognized is not directly based on the scene language model, but the probability that the corresponding word sequence in the scene language model appears in the language. Increase to improve speech recognition accuracy.
  • the information string reflects the speech signal to be recognized to a certain extent, so that the speech signal to be recognized is recognized as the word sequence corresponding to the information string is larger than other word sequences, and based on this, the information corresponding to the information may be
  • the sequence of words of the string is used as a sequence of corresponding words in the scene language model corresponding to the speech signal to be recognized that requires an increased probability.
  • the word sequence corresponding to the information string it is first determined whether there is a word sequence corresponding to the information string in the scene language model corresponding to the speech signal to be recognized; if the judgment result is yes, That is, the word sequence corresponding to the information string exists in the scene language model, and the probability that the word sequence corresponding to the information string appears in the language in the scene language model is increased to obtain an enhanced scene language model, and then Based on the enhanced scene language model, speech recognition is performed on the speech signal to be recognized.
  • the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary.
  • the grammar file stores various grammatical expressions in the application scenario corresponding to the scene language model, that is, some fixed expressions, such as "please call us", "play songs", "search for songs... lyrics”, etc. .
  • the scene dictionary stores the entity words commonly used in the application scenario corresponding to the scene language model. For example, in the phone application scenario, the entity word may be the name of the contact in the address book, or the entity word may be in the application scenario of the voice control playing music. The name of the song in the music library, etc.
  • an implementation manner of determining whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal includes:
  • the process and judgment of determining the scene language model corresponding to the signal to be identified In the process of disconnecting the word sequence corresponding to the information string in the scene language model, the operations of semantically parsing the information string and determining the grammatical sentence form and the entity word in the information string are specifically implemented, the operation is It can be executed only once or once in two processes.
  • the scene language model corresponding to the speech signal to be recognized in the embodiment includes a sequence of words related to the application scenario, and the sequence of words in the scene language model that may be the recognition result of the speech signal to be recognized is in the language.
  • the probability of occurrence is increased again, so the recognition of the speech signal based on the enhanced scened language model can improve the accuracy of speech recognition.
  • the universal speech model may be used to perform speech recognition on the speech signal to be recognized.
  • the method provided by the embodiment of the present invention is used to treat The speech signal is recognized for speech recognition. The flow of this embodiment is shown in FIG. 2 and includes the following steps:
  • step 201 Determine whether the common language model identifies the word sequence corresponding to the to-be-identified voice signal; if the determination result is yes, the operation ends; if the determination result is no, step 202 is performed.
  • step 204 Determine whether there is a word sequence corresponding to the information string in the scene language model. If the determination result is yes, execute step 205. If the determination result is no, if yes, perform step 207.
  • the general language model can be called a large language model
  • the scene language model can also be called a small language model.
  • voice recognition may be performed on the voice signal to be recognized based on the enhanced scened language model.
  • the speech signal to be recognized may be voice-recognized in combination with the general language model and the enhanced scene language model.
  • a general language model or an enhanced scene language model is adopted, a process of performing speech recognition on a speech signal to be recognized, and a process of performing speech recognition on a speech signal based on a common language model in the prior art. Similar, it will not be described in detail here.
  • an implementation manner of performing speech recognition on a speech signal to be recognized includes:
  • the enhanced scened language model can be superimposed into the general language model to generate a compound language model (actually a larger language model), and then the speech signal to be recognized based on the composite language model for speech recognition.
  • Another embodiment of speech recognition for the speech signal to be recognized includes:
  • the universal speech model is used to perform speech recognition on the speech signal, and the candidate word sequence corresponding to the speech signal to be recognized and the first probability that the candidate word sequence appears in the language in the common language model are obtained from the enhanced scene language model.
  • a second probability of occurrence of the candidate word sequence in the language is obtained, and the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the to-be-recognized speech signal is obtained from the candidate word sequence according to the weighting processing result.
  • Another embodiment of speech recognition for the speech signal to be recognized includes:
  • a universal language model to perform speech recognition on the speech signal to be recognized, obtaining a first candidate word sequence corresponding to the speech signal to be recognized and a probability of occurrence of the first candidate word sequence in the language;
  • the post-scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and appears in the language according to the first candidate word sequence
  • the probabilities and the probability that the second candidate word sequence appears in the language, and the sequence of words corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence.
  • the corresponding two probabilities may be weighted and summed as their final probability.
  • the speech recognition accuracy is improved, and the common language model and the enhanced scene language model are combined to perform speech recognition on the speech signal to be recognized, and the general language model can be fully utilized. It contains more general word sequences, and the enhanced scene language model contains more features of word sequences related to the application scene, improving the accuracy of speech recognition.
  • FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to still another embodiment of the present invention. As shown in FIG. 3, the apparatus includes: an acquisition module 31, a determination module 32, a determination module 33, an enhancement module 34, and an identification module 35.
  • the obtaining module 31 is configured to acquire a string of information corresponding to the voice signal to be identified.
  • the determining module 32 is configured to determine a scene language model corresponding to the to-be-identified voice signal according to the information string corresponding to the to-be-identified voice signal.
  • the determining module 33 is configured to determine whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal.
  • the enhancement module 34 is configured to: if the determination result is yes, increase a probability that a word sequence corresponding to the information string appears in the language in the scene language model corresponding to the to-be-identified voice signal, to obtain an enhanced scene language model.
  • the identification module 35 is configured to perform voice recognition on the voice signal to be recognized according to the enhanced scene language model.
  • the determining module 32 is specifically configured to:
  • the scened language model corresponding to the to-be-recognized speech signal is determined according to the user's intention.
  • the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary. Based on this, the determining module 33 is specifically used to:
  • Semantic analysis of the information string corresponding to the recognized speech signal determining the grammatical sentence and the entity word in the information string
  • the determination result is yes, it is determined that there is a word sequence corresponding to the information string in the scene language model, and the word sequence composed of the fixed sentence pattern and the entity word is a word sequence corresponding to the information string.
  • the obtaining module 31 is specifically configured to:
  • the information string corresponding to the to-be-recognized speech signal is obtained.
  • the identification module 35 is specifically configured to:
  • speech recognition is performed on the speech signal to be recognized.
  • the identifying module 35 is specifically configured to: first use the general language model to perform speech recognition on the speech signal to be recognized, obtain a candidate word sequence corresponding to the to-be-recognized speech signal, and a first probability that the candidate word sequence appears in the language in the universal language model, Obtaining a second probability of the candidate word sequence appearing in the language from the enhanced scene language model, weighting the first probability and the second probability of the candidate word sequence, and obtaining the to-be-identified from the candidate word sequence according to the weighting processing result The final sequence of words corresponding to the speech signal.
  • the identification module 35 is specifically configured to: first use a common language model to identify a voice message No. Perform speech recognition, obtain candidate word sequences corresponding to the speech signal to be recognized (usually multiple groups) and first probability of occurrence of candidate word sequences in the language in the common language model, and obtain candidates from the enhanced scene language model The second probability of the word sequence appearing in the language, the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the speech signal to be recognized is obtained from the candidate word sequence according to the weighting processing result.
  • the identification module 35 is specifically configured to: perform speech recognition on the speech signal to be recognized by using the universal language model, and acquire a probability that the first candidate word sequence and the first candidate word sequence corresponding to the to-be-recognized speech signal appear in the language;
  • the scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and the probability and occurrence in the language according to the first candidate word sequence
  • the probability that the second candidate word sequence appears in the language, and the word sequence corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence.
  • the corresponding two probabilities may be weighted and summed as their final probability.
  • the voice signal processing apparatus determines a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and increases the word sequence corresponding to the information string in the scene language model.
  • the probability that the sequence of words appears in the language to obtain an enhanced scened language model based on the enhanced scened language model for speech recognition of the speech signal to be recognized, rather than using a common language model for speech as in the prior art Recognition can improve the accuracy of speech recognition.
  • An embodiment of the present invention further provides an electronic device including at least one processor 810; and a memory 800 communicably coupled to the at least one processor 810; wherein the memory 800 is stored for processing by the at least one
  • the instructions are executed by the at least one processor 810 to enable the at least one processor 810 to acquire a string of information corresponding to the voice signal to be identified; and determine the to-be-identified according to the string of information Scenario corresponding to speech signal a language model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing a word sequence corresponding to the information string in the scene language model in a language The probability of occurrence is obtained to obtain an enhanced scened language model; and the speech signal to be recognized is subjected to speech recognition according to the enhanced scened language model.
  • the electronic device also includes an input device 830 and an output device 840 that are electrically coupled to the memory 800 and the processor, the electrical connections preferably being connected by a bus.
  • Embodiments of the present invention also provide a non-volatile computer storage medium storing the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to acquire a voice signal to be recognized Corresponding information string; determining, according to the information string, a scene language model corresponding to the to-be-identified speech signal; determining whether there is a word sequence corresponding to the information string in the scene language model; And increasing a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model; according to the enhanced scened language model, The speech signal to be recognized is subjected to speech recognition.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

A voice signal processing method, an apparatus and an electronic device. The voice signal processing method comprises: acquiring an information string corresponding to a voice signal to be recognized (101); according to the information string, determining a scenarized linguistic model corresponding to the voice signal to be recognized (102); determining whether there is a word sequence corresponding to the information string in the scenarized linguistic model (103); if a determination result is yes, increasing the appearance probability in the language of a word sequence corresponding to the information string in the scenarized linguistic model so as to obtain an enhanced scenarized linguistic model (104); and according to the enhanced scenarized linguistic model, performing voice recognition on the voice signal to be recognized (105). By adoption of the present embodiment, voice recognition can improve the accuracy rate of voice signal recognition.

Description

语音信号处理方法、装置和电子设备Voice signal processing method, device and electronic device
交叉引用cross reference
本申请要求在2016年03月30日提交中国专利局、申请号为201610195611.5、发明名称为“一种语音信号处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 20161019561, the entire disclosure of which is incorporated herein by reference. in.
技术领域Technical field
本发明实施例涉及语音识别技术领域,尤其涉及一种语音信号处理方法、装置和电子设备。The embodiments of the present invention relate to the field of voice recognition technologies, and in particular, to a voice signal processing method, apparatus, and electronic device.
背景技术Background technique
语音识别技术近年来发展迅速,使用户能够通过语音与智能设备进行交互。语音识别技术是通过识别和解析过程把语音信号转变为相应文本或命令的技术。其中,对语音信号的识别和解析过程离不开语言模型(Language Model,LM),语言模型的目的是建立一个能够描述给定词序列在语言中出现的概率的分布。Speech recognition technology has developed rapidly in recent years, enabling users to interact with smart devices via voice. Speech recognition technology is a technique for transforming a speech signal into a corresponding text or command through an identification and parsing process. Among them, the process of recognizing and parsing speech signals is inseparable from the language model (Language Model, LM). The purpose of the language model is to establish a distribution that can describe the probability that a given word sequence appears in the language.
在语音识别领域中,大多使用通用语言模型,通用语言模型主要包括通用词序列以及通用词序列在语言中出现的概率,用于对通用领域中的语音信号进行识别。但是,随着时代的发展、应用场景的增多以及用户语言习惯的不断变化等,现有通用语言模型显然无法满足这些应用需求,这会降低语音识别的准确率。In the field of speech recognition, most of the common language models are used. The general language model mainly includes the general word sequence and the probability that the general word sequence appears in the language for identifying the speech signal in the general domain. However, with the development of the times, the increase of application scenarios and the changing language habits of users, the existing common language models obviously cannot meet these application requirements, which will reduce the accuracy of speech recognition.
发明内容Summary of the invention
本发明实施例提供一种语音信号处理方法\装置和电子设备,用以进行 语音识别,提高语音信号识别的准确率。Embodiments of the present invention provide a voice signal processing method, apparatus, and electronic device for performing Speech recognition improves the accuracy of speech signal recognition.
本发明实施例提供一种语音信号处理方法,包括:The embodiment of the invention provides a voice signal processing method, including:
获取待识别语音信号对应的信息串;Obtaining a string of information corresponding to the voice signal to be identified;
根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
判断所述场景化语言模型中是否存在对应于所述信息串的词序列;Determining whether there is a word sequence corresponding to the information string in the scene language model;
若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;
根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
本发明实施例提供一种语音信号处理装置,包括:An embodiment of the present invention provides a voice signal processing apparatus, including:
获取模块,用于获取待识别语音信号对应的信息串;An acquiring module, configured to acquire a string of information corresponding to the voice signal to be identified;
确定模块,用于根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;a determining module, configured to determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
判断模块,用于判断所述场景化语言模型中是否存在对应于所述信息串的词序列;a determining module, configured to determine whether a word sequence corresponding to the information string exists in the scened language model;
增强模块,用于若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;And an enhancement module, configured to: if the determination result is yes, increase a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scene language model;
识别模块,用于根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。And an identifying module, configured to perform voice recognition on the to-be-identified voice signal according to the enhanced scened language model.
本发明实施例又提供了一种电子设备,包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够获取待识别语音信号对应的信息串;根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;判断所述场景化语言模型中是否存在对应于所述信息串的词序列;若判断结果为是,增大所述场景化 语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Embodiments of the present invention further provide an electronic device including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can acquire an information string corresponding to the to-be-identified voice signal; and according to the information string, determine a scene language corresponding to the to-be-identified voice signal a model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing the scene a probability of a word sequence corresponding to the information string appearing in the language in the language model to obtain an enhanced scened language model; and performing speech recognition on the to-be-recognized speech signal according to the enhanced scened language model .
本发明实施例还提供了一种非易失性计算机存储介质,所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够获取待识别语音信号对应的信息串;根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;判断所述场景化语言模型中是否存在对应于所述信息串的词序列;若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Embodiments of the present invention also provide a non-volatile computer storage medium storing the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to acquire a voice signal to be recognized Corresponding information string; determining, according to the information string, a scene language model corresponding to the to-be-identified speech signal; determining whether there is a word sequence corresponding to the information string in the scene language model; And increasing a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model; according to the enhanced scened language model, The speech signal to be recognized is subjected to speech recognition.
本发明实施例还提供了计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述的语音信号处理方法。Embodiments of the present invention also provide a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to execute the above-described speech signal processing method.
本发明实施例提供的语音信号处理方法、装置和电子设备,根据待识别语音信号对应的信息串,确定待识别语音信号对应的场景化语言模型,并在该场景化语言模型中存在信息串对应的词序列时,增大该词序列在语言中出现的概率,以获得增强后的场景化语言模型,基于增强后的场景化语言模型对待识别语音信号进行语音识别。与现有技术中基于通用语言模型的语音识别方案相比,本发明实施例基于增强后的场景化语言模型,可以提高语音识别的准确率。The voice signal processing method, apparatus, and electronic device provided by the embodiment of the present invention determine a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and the information string corresponding to the scene language model exists. When the word sequence is used, the probability of the word sequence appearing in the language is increased to obtain an enhanced scened language model, and the speech signal to be recognized is recognized based on the enhanced scene language model. Compared with the speech recognition scheme based on the common language model in the prior art, the embodiment of the present invention can improve the accuracy of speech recognition based on the enhanced scene language model.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图1为本发明一实施例提供的语音信号处理方法的流程示意图;1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention;
图2为本发明另一实施例提供的语音信号处理方法的流程示意图;2 is a schematic flowchart of a voice signal processing method according to another embodiment of the present invention;
图3为本发明又一实施例提供的语音信号处理装置的结构示意图;FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention; FIG.
图4为本发明实施例中电子设备的硬件结构示意图。FIG. 4 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described in the following with reference to the accompanying drawings. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of the present invention, it is to be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inside", "outside", etc. The orientation or positional relationship of the indications is based on the orientation or positional relationship shown in the drawings, and is merely for the convenience of the description of the invention and the simplified description, rather than indicating or implying that the device or component referred to has a specific orientation, in a specific orientation. The construction and operation are therefore not to be construed as limiting the invention. Moreover, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,还可以是两个元件内部的连通,可 以是无线连接,也可以是有线连接。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that the terms "installation", "connected", and "connected" are to be understood broadly, and may be fixed or detachable, for example, unless otherwise explicitly defined and defined. Connected, or connected integrally; can be mechanical or electrical; can be directly connected, indirectly connected through an intermediate medium, or can be connected inside the two components, It is either a wireless connection or a wired connection. The specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.
此外,下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。Further, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not constitute a conflict with each other.
在语音识别领域中,大多使用通用语言模型,通用语言模型主要包括通用词序列以及通用词序列在语言中出现的概率,用于对通用领域中的语音信号进行识别。但是,随着时代的发展、应用场景的增多以及用户语言习惯的不断变化等,现有通用语言模型显然无法满足这些应用需求,这会降低语音识别的准确率。In the field of speech recognition, most of the common language models are used. The general language model mainly includes the general word sequence and the probability that the general word sequence appears in the language for identifying the speech signal in the general domain. However, with the development of the times, the increase of application scenarios and the changing language habits of users, the existing common language models obviously cannot meet these application requirements, which will reduce the accuracy of speech recognition.
针对现有技术存在的问题,本发明提供一种解决方案,主要原理是:确定待识别语音信号对应的场景化语言模型,并增大该场景化语言模型中相应词序列在语言中出现的概率,以获得增强后的场景化语言模型,基于增强后的场景化语言模型对待识别语音信号进行语音识别。与通用语言模型相比,场景化语言模型包含更多与应用场景相关的词序列(也可称为特定词序列),且预先增大了场景化语言模型中与待识别语音信号相关的词序列在语言中出现的概率,所以基于增强后的场景化语言模型对待识别语音信号进行语音识别,能够提高语音识别的准确率。In view of the problems existing in the prior art, the present invention provides a solution, the main principle is: determining a scene language model corresponding to a speech signal to be recognized, and increasing a probability that a sequence of corresponding words in the scene language model appears in a language To obtain an enhanced scened language model, based on the enhanced scened language model for speech recognition of the speech signal to be recognized. Compared with the general language model, the scene language model contains more word sequences related to the application scene (also called a specific word sequence), and the word sequence related to the speech signal to be recognized in the scene language model is pre-increased. The probability of occurrence in the language, so based on the enhanced scened language model to recognize the speech signal for speech recognition, can improve the accuracy of speech recognition.
下面通过具体实施例对本发明技术方案进行详细说明。The technical solution of the present invention will be described in detail below through specific embodiments.
图1为本发明一实施例提供的语音信号处理方法的流程示意图。如图1所示,该方法包括:FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
101、获取待识别语音信号对应的信息串。101. Acquire a string of information corresponding to the voice signal to be identified.
102、根据信息串,确定待识别语音信号对应的场景化语言模型。102. Determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal.
103、判断场景化语言模型中是否存在对应于信息串的词序列;若判断结果为是,则执行步骤104,若判断结果为否,可选的,结束此次操作或者根据场景化语言模型对待识别语音型号进行语音识别。 103. Determine whether there is a sequence of words corresponding to the information string in the scenario language model; if the determination result is yes, execute step 104, if the determination result is no, optionally, end the operation or treat according to the scene language model Identify voice models for speech recognition.
104、增大场景化语言模型中对应于信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型。104. Increase a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model.
105、根据增强后的场景化语言模型,对待识别语音信号进行语音识别。105. Perform speech recognition on the recognized speech signal according to the enhanced scene language model.
本实施例提供一种语音信号处理方法,可由语音信号处理装置来执行,用以提高语音信号识别的准确率。The embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.
具体的,在对待识别语音信号进行识别之前,语音信号处理装置首先获取待识别语音信号对应的信息串。该信息串是指能够在一定程度上反映待识别语音信号的信息串,例如可以是待识别语音信号对应的拼音串,或者是对待识别语音信号进行初始语音识别获得的初始文本串。然后,语音信号处理装置根据该信息串,确定待识别语音信号对应的场景化语言模型,以便于基于该场景化语言模型对待识别语音信号进行语音识别。Specifically, before the recognition of the voice signal to be recognized, the voice signal processing device first acquires the information string corresponding to the voice signal to be recognized. The information string refers to a string of information that can reflect the speech signal to be recognized to a certain extent, and may be, for example, a Pinyin string corresponding to the speech signal to be recognized, or an initial text string obtained by performing initial speech recognition on the speech signal to be recognized. Then, the speech signal processing device determines a scene language model corresponding to the speech signal to be recognized according to the information string, so as to perform speech recognition on the speech signal to be recognized based on the scene language model.
可选的,上述根据该信息串,确定待识别语音信号对应的场景化语言模型的实施方式包括:Optionally, the foregoing implementation manner of determining a scenario language model corresponding to the to-be-identified voice signal according to the information string includes:
对待识别语音信号对应的信息串进行语义解析,确定该信息串中的语法句式和实体词;根据该语法句式和实体词,确定待识别语音信号表达的用户意图;根据该用户意图,确定待识别语音信号对应的场景化语言模型。例如,待识别语音信号对应的信息串为“我要给小李打电话”,经过语义解析,可以确定该信息串中的语法句式为“我要给…打电话”,而实体词为“小李”,根据该语句句式和实体词,可以确定用户意图是要给某人打电话,根据该用户意图,可以确定待识别语音信号对应的场景化语言模型为电话场景语言模型,而不是搜索场景语言模型。Semantically parsing the information string corresponding to the recognized speech signal, determining a grammatical sentence and an entity word in the information string; determining a user intention of the speech signal to be recognized according to the grammatical sentence and the entity word; determining according to the user intention A scened language model corresponding to the speech signal to be recognized. For example, the information string corresponding to the speech signal to be recognized is “I want to call Xiao Li”. After semantic analysis, it can be determined that the grammar sentence in the information string is “I want to call...” and the entity word is “ Xiao Li", according to the sentence sentence and the entity word, it can be determined that the user's intention is to call someone. According to the user's intention, the scene language model corresponding to the voice signal to be recognized can be determined as the phone scene language model instead of Search for a scene language model.
在确定待识别语音信号对应的场景化语言模型之后,并不是直接基于该场景化语言模型对待识别语音信号进行语音识别,而是对该场景化语言模型中的相应词序列在语言中出现的概率进行增大,以提高语音识别准确率。由 于信息串一定程度上反映着待识别语音信号,所以相比于其它词序列,待识别语音信号被识别为所述对应于信息串的词序列的可能更大,基于此,可以将对应于信息串的词序列作为待识别语音信号对应的场景化语言模型中需要增大概率的相应词序列。当然,在增大对应于信息串的词序列在语言中出现的概率之前,先要判断待识别语音信号对应的场景化语言模型中是否存在对应于信息串的词序列;如果判断结果为是,即该场景化语言模型中存在对应于信息串的词序列,则增大该场景化语言模型中对应于信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型,然后基于增强后的场景化语言模型,对待识别语音信号进行语音识别。After determining the scene language model corresponding to the speech signal to be recognized, the speech signal to be recognized is not directly based on the scene language model, but the probability that the corresponding word sequence in the scene language model appears in the language. Increase to improve speech recognition accuracy. By The information string reflects the speech signal to be recognized to a certain extent, so that the speech signal to be recognized is recognized as the word sequence corresponding to the information string is larger than other word sequences, and based on this, the information corresponding to the information may be The sequence of words of the string is used as a sequence of corresponding words in the scene language model corresponding to the speech signal to be recognized that requires an increased probability. Of course, before increasing the probability that the word sequence corresponding to the information string appears in the language, it is first determined whether there is a word sequence corresponding to the information string in the scene language model corresponding to the speech signal to be recognized; if the judgment result is yes, That is, the word sequence corresponding to the information string exists in the scene language model, and the probability that the word sequence corresponding to the information string appears in the language in the scene language model is increased to obtain an enhanced scene language model, and then Based on the enhanced scene language model, speech recognition is performed on the speech signal to be recognized.
在一可选实施方式中,待识别语音信号对应的场景化语言模型包括语法文件和场景词典。语法文件存储该场景化语言模型对应的应用场景中各种语法句式,即一些固定表达方式,例如“请给…打电话”,“请播放歌曲…”,“请搜索歌曲…的歌词”等。场景词典存储该场景化语言模型对应的应用场景中常用的实体词,例如在电话应用场景下实体词可以是通讯录中联系人的姓名,或者在语音控制播放音乐的应用场景中实体词可以是音乐库中的歌曲名等。In an optional implementation manner, the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary. The grammar file stores various grammatical expressions in the application scenario corresponding to the scene language model, that is, some fixed expressions, such as "please call us", "play songs...", "search for songs... lyrics", etc. . The scene dictionary stores the entity words commonly used in the application scenario corresponding to the scene language model. For example, in the phone application scenario, the entity word may be the name of the contact in the address book, or the entity word may be in the application scenario of the voice control playing music. The name of the song in the music library, etc.
基于上述,判断待识别语音信号对应的场景化语言模型中是否存在对应于信息串的词序列的实施方式包括:Based on the above, an implementation manner of determining whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal includes:
对待识别语音信号对应的信息串进行语义解析,确定信息串中的语法句式和实体词;判断信息串中的固定句式是否包含在该场景化语言模型的语法文件中,并判断信息串中的实体词是否包含在该场景化语言模型的场景词典中;若判断结果均为是,则确定该场景化语言模型中存在对应于信息串的词序列,且由信息串中的固定句式和实体词组合成的词序列即为对应于信息串的词序列。Semantically parsing the information string corresponding to the recognized speech signal, determining the grammatical sentence and the entity word in the information string; determining whether the fixed sentence pattern in the information string is included in the grammar file of the scene language model, and determining the information string Whether the entity word is included in the scene dictionary of the scene language model; if the judgment result is yes, it is determined that there is a word sequence corresponding to the information string in the scene language model, and the fixed sentence pattern in the information string and The sequence of words into which the entity words are combined is the sequence of words corresponding to the information string.
值得说明的是,在确定待识别信号对应的场景化语言模型的过程与在判 断该场景化语言模型中是否存在对应于信息串的词序列的过程中,均包括对信息串进行语义解析,确定信息串中的语法句式和实体词的操作,在具体实现是,该操作可以仅执行一次,也可以在两个过程中分别执行一次。It is worth noting that the process and judgment of determining the scene language model corresponding to the signal to be identified In the process of disconnecting the word sequence corresponding to the information string in the scene language model, the operations of semantically parsing the information string and determining the grammatical sentence form and the entity word in the information string are specifically implemented, the operation is It can be executed only once or once in two processes.
由于上述可见,本实施例中待识别语音信号对应的场景化语言模型包括了与应用场景相关的词序列,同时该场景化语言模型中可能作为待识别语音信号的识别结果的词序列在语言中出现的概率又被增大了,所以基于增强后的场景化语言模型对待识别语音信号进行识别,能够提高语音识别的准确率。As can be seen from the above, the scene language model corresponding to the speech signal to be recognized in the embodiment includes a sequence of words related to the application scenario, and the sequence of words in the scene language model that may be the recognition result of the speech signal to be recognized is in the language. The probability of occurrence is increased again, so the recognition of the speech signal based on the enhanced scened language model can improve the accuracy of speech recognition.
在一可选实施方式中,可以先采用通用语言模型对待识别语音信号进行语音识别,当采用通用语言模型无法识别待识别语音信号对应的词序列时,再采用本发明实施例提供的方法,对待识别语音信号进行语音识别。该实施方式的流程如图2所示,包括以下步骤:In an optional implementation manner, the universal speech model may be used to perform speech recognition on the speech signal to be recognized. When the common language model is used to identify the word sequence corresponding to the speech signal to be recognized, the method provided by the embodiment of the present invention is used to treat The speech signal is recognized for speech recognition. The flow of this embodiment is shown in FIG. 2 and includes the following steps:
200、采用通用语言模型对待识别语音信号进行语音识别;200. Using a universal language model to perform speech recognition on the recognized speech signal;
201、判断采用通用语言模型是否识别出待识别语音信号对应的词序列;若判断结果为是,则结束操作;若判断结果为否,则执行步骤202。201. Determine whether the common language model identifies the word sequence corresponding to the to-be-identified voice signal; if the determination result is yes, the operation ends; if the determination result is no, step 202 is performed.
202、获取待识别语音信号对应的信息串。202. Acquire a sequence of information corresponding to the voice signal to be identified.
203、根据该信息串,确定待识别语音信号对应的场景化语言模型;203. Determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
204、判断场景化语言模型中是否存在对应于信息串的词序列;若判断结果为是,则执行步骤205,若判断结果为否,可选的,则执行步骤207。204. Determine whether there is a word sequence corresponding to the information string in the scene language model. If the determination result is yes, execute step 205. If the determination result is no, if yes, perform step 207.
205、增大场景化语言模型中对应于信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型。205. Increase a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model.
206、根据增强后的场景化语言模型,对待识别语音信号进行语音识别,并结束此次操作。206. Perform speech recognition on the recognized speech signal according to the enhanced scenario language model, and end the operation.
207、结束此次操作或者根据场景化语言模型对待识别语音型号进行语音识别,并结束此次操作。 207. End the operation or perform speech recognition according to the scene language model to identify the voice model, and end the operation.
其中,通用语言模型又可以称为大语言模型,而场景化语言模型又可以称为小语言模型。Among them, the general language model can be called a large language model, and the scene language model can also be called a small language model.
在一可选实施方式中,在上述步骤105或步骤206中,可以单独基于增强后的场景化语言模型,对待识别语音信号进行语音识别。In an optional implementation manner, in step 105 or step 206 above, voice recognition may be performed on the voice signal to be recognized based on the enhanced scened language model.
在另一可选实施方式中,在上述步骤105或步骤206中,可以结合通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别。In another optional implementation manner, in the above step 105 or step 206, the speech signal to be recognized may be voice-recognized in combination with the general language model and the enhanced scene language model.
值得说明的是,本发明实施例中采用通用语言模型或增强后的场景化语言模型,对待识别语音信号进行语音识别的过程,与现有技术中基于通用语言模型对语音信号进行语音识别的过程类似,在此不再详细说明。It should be noted that, in the embodiment of the present invention, a general language model or an enhanced scene language model is adopted, a process of performing speech recognition on a speech signal to be recognized, and a process of performing speech recognition on a speech signal based on a common language model in the prior art. Similar, it will not be described in detail here.
另外,上述结合通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别的一种实施方式包括:In addition, in combination with the above-mentioned common language model and the enhanced scenario language model, an implementation manner of performing speech recognition on a speech signal to be recognized includes:
可以将增强后的场景化语言模型叠加到通用语言模型中,生成一个复合语言模型(实际上是一个更大的语言模型),然后基于该复合语言模型对待识别语音信号进行语音识别。The enhanced scened language model can be superimposed into the general language model to generate a compound language model (actually a larger language model), and then the speech signal to be recognized based on the composite language model for speech recognition.
上述结合通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别的另一种实施方式包括:The above-mentioned combined implementation of the common language model and the enhanced scenario language model, another embodiment of speech recognition for the speech signal to be recognized includes:
先使用通用语言模型对待识别语音信号进行语音识别,获得待识别语音信号对应的候选词序列以及在通用语言模型中候选词序列在语言中出现的第一概率,从增强后的场景化语言模型中获取候选词序列在语言中出现的第二概率,将候选词序列的第一概率和第二概率进行加权处理,根据加权处理结果从候选词序列中获取待识别语音信号最终对应的词序列。Firstly, the universal speech model is used to perform speech recognition on the speech signal, and the candidate word sequence corresponding to the speech signal to be recognized and the first probability that the candidate word sequence appears in the language in the common language model are obtained from the enhanced scene language model. A second probability of occurrence of the candidate word sequence in the language is obtained, and the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the to-be-recognized speech signal is obtained from the candidate word sequence according to the weighting processing result.
上述结合通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别的另一种实施方式包括:The above-mentioned combined implementation of the common language model and the enhanced scenario language model, another embodiment of speech recognition for the speech signal to be recognized includes:
使用通用语言模型对待识别语音信号进行语音识别,获取待识别语音信号对应的第一候选词序列及第一候选词序列在语言中出现的概率;使用增强 后的场景化语言模型对待识别语音信号进行语音识别,获取待识别语音信号对应的第二候选词序列及第二候选词序列在语言中出现的概率;根据第一候选词序列在语言中出现的概率和第二候选词序列在语言中出现的概率,从第一候选词序列和第二候选词序列中获取待识别语音信号最终对应的词序列。其中,对于第一候选词序列和第二候选词序列中的相同候选词序列,可以将其对应的两个概率进行加权求和,作为其最终概率。Using a universal language model to perform speech recognition on the speech signal to be recognized, obtaining a first candidate word sequence corresponding to the speech signal to be recognized and a probability of occurrence of the first candidate word sequence in the language; The post-scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and appears in the language according to the first candidate word sequence The probabilities and the probability that the second candidate word sequence appears in the language, and the sequence of words corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence. Wherein, for the same candidate word sequence in the first candidate word sequence and the second candidate word sequence, the corresponding two probabilities may be weighted and summed as their final probability.
在上述实施方式中,除了增强场景化语言模型有利于提高语音识别准确率之外,还结合通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别,可以充分利用通用语言模型包含较多通用词序列,而增强后的场景化语言模型中包含较多与应用场景相关的词序列的特点,提高语音识别的准确率。In the above embodiment, in addition to enhancing the contextual language model, the speech recognition accuracy is improved, and the common language model and the enhanced scene language model are combined to perform speech recognition on the speech signal to be recognized, and the general language model can be fully utilized. It contains more general word sequences, and the enhanced scene language model contains more features of word sequences related to the application scene, improving the accuracy of speech recognition.
图3为本发明又一实施例提供的语音信号处理装置的结构示意图。如图3所示,该装置包括:获取模块31、确定模块32、判断模块33、增强模块34和识别模块35。FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to still another embodiment of the present invention. As shown in FIG. 3, the apparatus includes: an acquisition module 31, a determination module 32, a determination module 33, an enhancement module 34, and an identification module 35.
获取模块31,用于获取待识别语音信号对应的信息串。The obtaining module 31 is configured to acquire a string of information corresponding to the voice signal to be identified.
确定模块32,用于根据待识别语音信号对应的信息串,确定待识别语音信号对应的场景化语言模型。The determining module 32 is configured to determine a scene language model corresponding to the to-be-identified voice signal according to the information string corresponding to the to-be-identified voice signal.
判断模块33,用于判断待识别语音信号对应的场景化语言模型中是否存在对应于信息串的词序列。The determining module 33 is configured to determine whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal.
增强模块34,用于若判断结果为是,增大待识别语音信号对应的场景化语言模型中对应于信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型。The enhancement module 34 is configured to: if the determination result is yes, increase a probability that a word sequence corresponding to the information string appears in the language in the scene language model corresponding to the to-be-identified voice signal, to obtain an enhanced scene language model.
识别模块35,用于根据增强后的场景化语言模型,对待识别语音信号进行语音识别。The identification module 35 is configured to perform voice recognition on the voice signal to be recognized according to the enhanced scene language model.
在一可选实施方式中,确定模块32具体用于: In an optional implementation, the determining module 32 is specifically configured to:
对待识别语音信号对应的信息串进行语义解析,确定该信息串中的语法句式和实体词;Semantically parsing the information string corresponding to the recognized speech signal, and determining a grammatical sentence and an entity word in the information string;
根据语法句式和实体词,确定待识别语音信号表达的用户意图;Determining the user intent of the speech signal to be recognized according to the grammatical sentence and the entity word;
根据用户意图,确定待识别语音信号对应的场景化语言模型。The scened language model corresponding to the to-be-recognized speech signal is determined according to the user's intention.
在一可选实施方式中,待识别语音信号对应的场景化语言模型包括语法文件和场景词典。基于此,判断模块33具体用于:In an optional implementation manner, the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary. Based on this, the determining module 33 is specifically used to:
对待识别语音信号对应的信息串进行语义解析,确定该信息串中的语法句式和实体词Semantic analysis of the information string corresponding to the recognized speech signal, determining the grammatical sentence and the entity word in the information string
判断固定句式是否包含在语法文件中,并判断实体词是否包含在场景词典中;Determining whether the fixed sentence is included in the grammar file and determining whether the entity word is included in the scene dictionary;
若判断结果均为是,则确定场景化语言模型中存在对应于信息串的词序列,且由固定句式和实体词组合成的词序列为对应于信息串的词序列。If the determination result is yes, it is determined that there is a word sequence corresponding to the information string in the scene language model, and the word sequence composed of the fixed sentence pattern and the entity word is a word sequence corresponding to the information string.
在一可选实施方式中,获取模块31具体用于:In an optional implementation, the obtaining module 31 is specifically configured to:
当采用通用语言模型无法识别待识别语音信号对应的词序列时,获取待识别语音信号对应的信息串。When the word sequence corresponding to the to-be-recognized speech signal cannot be identified by using the general language model, the information string corresponding to the to-be-recognized speech signal is obtained.
在一可选实施方式中,识别模块35具体用于:In an optional implementation, the identification module 35 is specifically configured to:
根据通用语言模型和增强后的场景化语言模型,对待识别语音信号进行语音识别。According to the general language model and the enhanced scene language model, speech recognition is performed on the speech signal to be recognized.
进一步,识别模块35具体用于:先使用通用语言模型对待识别语音信号进行语音识别,获得待识别语音信号对应的候选词序列以及在通用语言模型中候选词序列在语言中出现的第一概率,从增强后的场景化语言模型中获取候选词序列在语言中出现的第二概率,将候选词序列的第一概率和第二概率进行加权处理,根据加权处理结果从候选词序列中获取待识别语音信号最终对应的词序列。Further, the identifying module 35 is specifically configured to: first use the general language model to perform speech recognition on the speech signal to be recognized, obtain a candidate word sequence corresponding to the to-be-recognized speech signal, and a first probability that the candidate word sequence appears in the language in the universal language model, Obtaining a second probability of the candidate word sequence appearing in the language from the enhanced scene language model, weighting the first probability and the second probability of the candidate word sequence, and obtaining the to-be-identified from the candidate word sequence according to the weighting processing result The final sequence of words corresponding to the speech signal.
进一步,识别模块35具体用于:先使用通用语言模型对待识别语音信 号进行语音识别,获得待识别语音信号对应的候选词序列(通常为多组)以及在通用语言模型中候选词序列在语言中出现的第一概率,从增强后的场景化语言模型中获取候选词序列在语言中出现的第二概率,将候选词序列的第一概率和第二概率进行加权处理,根据加权处理结果从候选词序列中获取待识别语音信号最终对应的词序列。Further, the identification module 35 is specifically configured to: first use a common language model to identify a voice message No. Perform speech recognition, obtain candidate word sequences corresponding to the speech signal to be recognized (usually multiple groups) and first probability of occurrence of candidate word sequences in the language in the common language model, and obtain candidates from the enhanced scene language model The second probability of the word sequence appearing in the language, the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the speech signal to be recognized is obtained from the candidate word sequence according to the weighting processing result.
进一步,识别模块35具体用于:使用通用语言模型对待识别语音信号进行语音识别,获取待识别语音信号对应的第一候选词序列及第一候选词序列在语言中出现的概率;使用增强后的场景化语言模型对待识别语音信号进行语音识别,获取待识别语音信号对应的第二候选词序列及第二候选词序列在语言中出现的概率;根据第一候选词序列在语言中出现的概率和第二候选词序列在语言中出现的概率,从第一候选词序列和第二候选词序列中获取待识别语音信号最终对应的词序列。其中,对于第一候选词序列和第二候选词序列中的相同候选词序列,可以将其对应的两个概率进行加权求和,作为其最终概率。Further, the identification module 35 is specifically configured to: perform speech recognition on the speech signal to be recognized by using the universal language model, and acquire a probability that the first candidate word sequence and the first candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; The scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and the probability and occurrence in the language according to the first candidate word sequence The probability that the second candidate word sequence appears in the language, and the word sequence corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence. Wherein, for the same candidate word sequence in the first candidate word sequence and the second candidate word sequence, the corresponding two probabilities may be weighted and summed as their final probability.
本实施例提供的语音信号处理装置,根据待识别语音信号对应的信息串,确定待识别语音信号对应的场景化语言模型,并在该场景化语言模型中存在信息串对应的词序列时,增大该词序列在语言中出现的概率,以获得增强后的场景化语言模型,基于增强后的场景化语言模型对待识别语音信号进行语音识别,而不是像现有技术那样基于通用语言模型进行语音识别,可以提高语音识别的准确率。The voice signal processing apparatus provided in this embodiment determines a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and increases the word sequence corresponding to the information string in the scene language model. The probability that the sequence of words appears in the language to obtain an enhanced scened language model, based on the enhanced scened language model for speech recognition of the speech signal to be recognized, rather than using a common language model for speech as in the prior art Recognition can improve the accuracy of speech recognition.
本发明实施例又提供了一种电子设备,包括至少一个处理器810;以及,与所述至少一个处理器810通信连接的存储器800;其中,所述存储器800存储有可被所述至少一个处理器810执行的指令,所述指令被所述至少一个处理器810执行,以使所述至少一个处理器810能够获取待识别语音信号对应的信息串;根据所述信息串,确定所述待识别语音信号对应的场景化 语言模型;判断所述场景化语言模型中是否存在对应于所述信息串的词序列;若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。所述电子设备还包括与所述存储器800和所述处理器电连接的输入装置830和输出装置840,所述电连接优选为通过总线连接。An embodiment of the present invention further provides an electronic device including at least one processor 810; and a memory 800 communicably coupled to the at least one processor 810; wherein the memory 800 is stored for processing by the at least one The instructions are executed by the at least one processor 810 to enable the at least one processor 810 to acquire a string of information corresponding to the voice signal to be identified; and determine the to-be-identified according to the string of information Scenario corresponding to speech signal a language model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing a word sequence corresponding to the information string in the scene language model in a language The probability of occurrence is obtained to obtain an enhanced scened language model; and the speech signal to be recognized is subjected to speech recognition according to the enhanced scened language model. The electronic device also includes an input device 830 and an output device 840 that are electrically coupled to the memory 800 and the processor, the electrical connections preferably being connected by a bus.
本发明实施例还提供了一种非易失性计算机存储介质,所述存储介质存储有计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够获取待识别语音信号对应的信息串;根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;判断所述场景化语言模型中是否存在对应于所述信息串的词序列;若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Embodiments of the present invention also provide a non-volatile computer storage medium storing the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to acquire a voice signal to be recognized Corresponding information string; determining, according to the information string, a scene language model corresponding to the to-be-identified speech signal; determining whether there is a word sequence corresponding to the information string in the scene language model; And increasing a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model; according to the enhanced scened language model, The speech signal to be recognized is subjected to speech recognition.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功 能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. The work specified in one or more blocks of a flow or a flow and/or a block diagram of a flowchart Able device.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。 It is apparent that the above-described embodiments are merely illustrative of the examples, and are not intended to limit the embodiments. Other variations or modifications of the various forms may be made by those skilled in the art in light of the above description. There is no need and no way to exhaust all of the implementations. Obvious changes or variations resulting therefrom are still within the scope of the invention.

Claims (13)

  1. 一种语音信号处理方法,其特征在于,包括:A voice signal processing method, comprising:
    获取待识别语音信号对应的信息串;Obtaining a string of information corresponding to the voice signal to be identified;
    根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
    判断所述场景化语言模型中是否存在对应于所述信息串的词序列;Determining whether there is a word sequence corresponding to the information string in the scene language model;
    若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;
    根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述信息串,确定所述待识别语音信号对应的场景化语言模型,包括:The method according to claim 1, wherein the determining a scene language model corresponding to the to-be-identified speech signal according to the information string comprises:
    对所述信息串进行语义解析,确定所述信息串中的语法句式和实体词;Performing semantic analysis on the information string to determine a grammatical sentence and an entity word in the information string;
    根据所述语法句式和实体词,确定所述待识别语音信号表达的用户意图;Determining, according to the grammatical sentence formula and the entity word, the user intention of the speech signal to be recognized;
    根据所述用户意图,确定所述待识别语音信号对应的场景化语言模型。And determining, according to the user intention, a scene language model corresponding to the to-be-identified voice signal.
  3. 根据权利要求1所述的方法,其特征在于,所述场景化语言模型包括语法文件和场景词典;The method of claim 1, wherein the scened language model comprises a grammar file and a scene dictionary;
    所述判断所述场景化语言模型中是否存在对应于所述信息串的词序列,包括:Determining whether there is a word sequence corresponding to the information string in the scened language model includes:
    对所述信息串进行语义解析,确定所述信息串中的语法句式和实体词Performing semantic analysis on the information string to determine grammatical sentences and entity words in the information string
    判断所述固定句式是否包含在所述语法文件中,并判断所述实体词是否包含在所述场景词典中;Determining whether the fixed sentence is included in the grammar file, and determining whether the entity word is included in the scene dictionary;
    若判断结果均为是,则确定所述场景化语言模型中存在对应于所述信息串的词序列,且由所述固定句式和所述实体词组合成的词序列为对应于所述信息串的词序列。If the determination result is yes, determining that there is a word sequence corresponding to the information string in the scene language model, and the word sequence formed by the fixed sentence pattern and the entity word is corresponding to the information string The sequence of words.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述获取待识别 语音信号对应的信息串,包括:The method according to any one of claims 1 to 3, wherein the obtaining is to be identified The information string corresponding to the voice signal includes:
    当采用通用语言模型无法识别所述待识别语音信号对应的词序列时,获取所述待识别语音信号对应的信息串。When the word sequence corresponding to the to-be-recognized speech signal is not recognized by using the common language model, the information string corresponding to the to-be-recognized speech signal is acquired.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别,包括:The method according to claim 4, wherein the performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model comprises:
    根据所述通用语言模型和所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing speech recognition on the to-be-recognized speech signal according to the universal language model and the enhanced scened language model.
  6. 一种语音信号处理装置,其特征在于,包括:A voice signal processing device, comprising:
    获取模块,用于获取待识别语音信号对应的信息串;An acquiring module, configured to acquire a string of information corresponding to the voice signal to be identified;
    确定模块,用于根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;a determining module, configured to determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
    判断模块,用于判断所述场景化语言模型中是否存在对应于所述信息串的词序列;a determining module, configured to determine whether a word sequence corresponding to the information string exists in the scened language model;
    增强模块,用于若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;And an enhancement module, configured to: if the determination result is yes, increase a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scene language model;
    识别模块,用于根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。And an identifying module, configured to perform voice recognition on the to-be-identified voice signal according to the enhanced scened language model.
  7. 根据权利要求6所述的装置,其特征在于,所述确定模块具体用于:The apparatus according to claim 6, wherein the determining module is specifically configured to:
    对所述信息串进行语义解析,确定所述信息串中的语法句式和实体词;Performing semantic analysis on the information string to determine a grammatical sentence and an entity word in the information string;
    根据所述语法句式和实体词,确定所述待识别语音信号表达的用户意图;Determining, according to the grammatical sentence formula and the entity word, the user intention of the speech signal to be recognized;
    根据所述用户意图,确定所述待识别语音信号对应的场景化语言模型。And determining, according to the user intention, a scene language model corresponding to the to-be-identified voice signal.
  8. 根据权利要求6所述的装置,其特征在于,所述场景化语言模型包括语法文件和场景词典;The apparatus according to claim 6, wherein the scene language model comprises a grammar file and a scene dictionary;
    所述判断模块具体用于: The determining module is specifically configured to:
    对所述信息串进行语义解析,确定所述信息串中的语法句式和实体词Performing semantic analysis on the information string to determine grammatical sentences and entity words in the information string
    判断所述固定句式是否包含在所述语法文件中,并判断所述实体词是否包含在所述场景词典中;Determining whether the fixed sentence is included in the grammar file, and determining whether the entity word is included in the scene dictionary;
    若判断结果均为是,则确定所述场景化语言模型中存在对应于所述信息串的词序列,且由所述固定句式和所述实体词组合成的词序列为对应于所述信息串的词序列。If the determination result is yes, determining that there is a word sequence corresponding to the information string in the scene language model, and the word sequence formed by the fixed sentence pattern and the entity word is corresponding to the information string The sequence of words.
  9. 根据权利要求6-8任一项所述的装置,其特征在于,所述获取模块具体用于:The device according to any one of claims 6-8, wherein the obtaining module is specifically configured to:
    当采用通用语言模型无法识别所述待识别语音信号对应的词序列时,获取所述待识别语音信号对应的信息串。When the word sequence corresponding to the to-be-recognized speech signal is not recognized by using the common language model, the information string corresponding to the to-be-recognized speech signal is acquired.
  10. 根据权利要求9所述的装置,其特征在于,所述识别模块具体用于:The device according to claim 9, wherein the identification module is specifically configured to:
    根据所述通用语言模型和所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing speech recognition on the to-be-recognized speech signal according to the universal language model and the enhanced scened language model.
  11. 一种电子设备,其特征在于包括至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions Executed by the at least one processor to enable the at least one processor to
    获取待识别语音信号对应的信息串;Obtaining a string of information corresponding to the voice signal to be identified;
    根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
    判断所述场景化语言模型中是否存在对应于所述信息串的词序列;Determining whether there is a word sequence corresponding to the information string in the scene language model;
    若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;
    根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
  12. 一种非易失性计算机存储介质,其特征在于:所述存储介质存储有 计算机可执行指令的所述计算机可执行指令,当由电子设备执行时使得电子设备能够:A non-volatile computer storage medium characterized in that: the storage medium is stored The computer executable instructions of computer executable instructions, when executed by an electronic device, enable the electronic device to:
    获取待识别语音信号对应的信息串;Obtaining a string of information corresponding to the voice signal to be identified;
    根据所述信息串,确定所述待识别语音信号对应的场景化语言模型;Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;
    判断所述场景化语言模型中是否存在对应于所述信息串的词序列;Determining whether there is a word sequence corresponding to the information string in the scene language model;
    若判断结果为是,增大所述场景化语言模型中对应于所述信息串的词序列在语言中出现的概率,以获得增强后的场景化语言模型;If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;
    根据所述增强后的场景化语言模型,对所述待识别语音信号进行语音识别。Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
  13. 一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1-5所述的方法。 A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute The method of claims 1-5.
PCT/CN2016/096828 2016-03-30 2016-08-26 Voice signal processing method, apparatus and electronic device WO2017166631A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610195611.5A CN105845133A (en) 2016-03-30 2016-03-30 Voice signal processing method and apparatus
CN201610195611.5 2016-03-30

Publications (1)

Publication Number Publication Date
WO2017166631A1 true WO2017166631A1 (en) 2017-10-05

Family

ID=56596271

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096828 WO2017166631A1 (en) 2016-03-30 2016-08-26 Voice signal processing method, apparatus and electronic device

Country Status (2)

Country Link
CN (1) CN105845133A (en)
WO (1) WO2017166631A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992937A (en) * 2019-12-06 2020-04-10 广州国音智能科技有限公司 Language offline recognition method, terminal and readable storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105845133A (en) * 2016-03-30 2016-08-10 乐视控股(北京)有限公司 Voice signal processing method and apparatus
CN106328148B (en) * 2016-08-19 2019-12-31 上汽通用汽车有限公司 Natural voice recognition method, device and system based on local and cloud hybrid recognition
CN108241678B (en) * 2016-12-26 2021-10-15 北京搜狗信息服务有限公司 Method and device for mining point of interest data
CN110070859B (en) * 2018-01-23 2023-07-14 阿里巴巴集团控股有限公司 Voice recognition method and device
CN110287209A (en) * 2019-06-10 2019-09-27 北京百度网讯科技有限公司 Question and answer processing method, device, equipment and storage medium
CN112509573A (en) * 2020-11-19 2021-03-16 北京蓦然认知科技有限公司 Voice recognition method and device
CN112669845B (en) * 2020-12-25 2024-04-12 竹间智能科技(上海)有限公司 Speech recognition result correction method and device, electronic equipment and storage medium
CN113920999A (en) * 2021-10-29 2022-01-11 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015326A1 (en) * 2004-07-14 2006-01-19 International Business Machines Corporation Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building
CN101593518A (en) * 2008-05-28 2009-12-02 中国科学院自动化研究所 The balance method of actual scene language material and finite state network language material
CN102074231A (en) * 2010-12-30 2011-05-25 万音达有限公司 Voice recognition method and system
JP2013142870A (en) * 2012-01-12 2013-07-22 Nippon Telegr & Teleph Corp <Ntt> Specific situation model database creating device and method thereof, specific element sound model database creating device, situation estimation device, call suitability notification device and program
US20140025380A1 (en) * 2012-07-18 2014-01-23 International Business Machines Corporation System, method and program product for providing automatic speech recognition (asr) in a shared resource environment
CN105845133A (en) * 2016-03-30 2016-08-10 乐视控股(北京)有限公司 Voice signal processing method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007083496A1 (en) * 2006-01-23 2007-07-26 Nec Corporation Speech recognition language model making system, method, and program, and speech recognition system
JP5276610B2 (en) * 2010-02-05 2013-08-28 日本放送協会 Language model generation apparatus, program thereof, and speech recognition system
CN101923854B (en) * 2010-08-31 2012-03-28 中国科学院计算技术研究所 Interactive speech recognition system and method
US9043205B2 (en) * 2012-06-21 2015-05-26 Google Inc. Dynamic language model
CN105869629B (en) * 2016-03-30 2018-03-20 乐视控股(北京)有限公司 Audio recognition method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015326A1 (en) * 2004-07-14 2006-01-19 International Business Machines Corporation Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building
CN101593518A (en) * 2008-05-28 2009-12-02 中国科学院自动化研究所 The balance method of actual scene language material and finite state network language material
CN102074231A (en) * 2010-12-30 2011-05-25 万音达有限公司 Voice recognition method and system
JP2013142870A (en) * 2012-01-12 2013-07-22 Nippon Telegr & Teleph Corp <Ntt> Specific situation model database creating device and method thereof, specific element sound model database creating device, situation estimation device, call suitability notification device and program
US20140025380A1 (en) * 2012-07-18 2014-01-23 International Business Machines Corporation System, method and program product for providing automatic speech recognition (asr) in a shared resource environment
CN105845133A (en) * 2016-03-30 2016-08-10 乐视控股(北京)有限公司 Voice signal processing method and apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110992937A (en) * 2019-12-06 2020-04-10 广州国音智能科技有限公司 Language offline recognition method, terminal and readable storage medium
CN110992937B (en) * 2019-12-06 2022-06-28 广州国音智能科技有限公司 Language off-line identification method, terminal and readable storage medium

Also Published As

Publication number Publication date
CN105845133A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
WO2017166631A1 (en) Voice signal processing method, apparatus and electronic device
CN107016994B (en) Voice recognition method and device
US8589163B2 (en) Adapting language models with a bit mask for a subset of related words
US9805718B2 (en) Clarifying natural language input using targeted questions
US8914288B2 (en) System and method for advanced turn-taking for interactive spoken dialog systems
CN109710727B (en) System and method for natural language processing
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
US20140379334A1 (en) Natural language understanding automatic speech recognition post processing
KR102390940B1 (en) Context biasing for speech recognition
US9589578B1 (en) Invoking application programming interface calls using voice commands
KR102413616B1 (en) On-device speech synthesis of text segments for training on-device speech recognition models
US10242670B2 (en) Syntactic re-ranking of potential transcriptions during automatic speech recognition
EP3826007B1 (en) Method and apparatus with speech processing
CN111566638B (en) Adding descriptive metadata to an application programming interface for use by intelligent agents
WO2014183373A1 (en) Systems and methods for voice identification
CN112331206A (en) Speech recognition method and equipment
CN109616096A (en) Construction method, device, server and the medium of multilingual tone decoding figure
WO2017016126A1 (en) Picture composition method and apparatus for speech recognition syntax tree, terminal device and storage medium
JP2020042257A (en) Voice recognition method and device
KR20200084260A (en) Electronic apparatus and controlling method thereof
CN111312230B (en) Voice interaction monitoring method and device for voice conversation platform
JP5231484B2 (en) Voice recognition apparatus, voice recognition method, program, and information processing apparatus for distributing program
KR102536944B1 (en) Method and apparatus for speech signal processing
JP6275569B2 (en) Dialog apparatus, method and program
US11211056B1 (en) Natural language understanding model generation

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16896401

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16896401

Country of ref document: EP

Kind code of ref document: A1