WO2017166631A1

WO2017166631A1 - Voice signal processing method, apparatus and electronic device

Info

Publication number: WO2017166631A1
Application number: PCT/CN2016/096828
Authority: WO
Inventors: 王彪
Original assignee: 乐视控股（北京）有限公司; 乐视致新电子科技（天津）有限公司
Priority date: 2016-03-30
Filing date: 2016-08-26
Publication date: 2017-10-05
Also published as: CN105845133A

Abstract

A voice signal processing method, an apparatus and an electronic device. The voice signal processing method comprises: acquiring an information string corresponding to a voice signal to be recognized (101); according to the information string, determining a scenarized linguistic model corresponding to the voice signal to be recognized (102); determining whether there is a word sequence corresponding to the information string in the scenarized linguistic model (103); if a determination result is yes, increasing the appearance probability in the language of a word sequence corresponding to the information string in the scenarized linguistic model so as to obtain an enhanced scenarized linguistic model (104); and according to the enhanced scenarized linguistic model, performing voice recognition on the voice signal to be recognized (105). By adoption of the present embodiment, voice recognition can improve the accuracy rate of voice signal recognition.

Description

Voice signal processing method, device and electronic device

cross reference

The present application claims priority to Chinese Patent Application No. 20161019561, the entire disclosure of which is incorporated herein by reference. in.

Technical field

The embodiments of the present invention relate to the field of voice recognition technologies, and in particular, to a voice signal processing method, apparatus, and electronic device.

Background technique

Speech recognition technology has developed rapidly in recent years, enabling users to interact with smart devices via voice. Speech recognition technology is a technique for transforming a speech signal into a corresponding text or command through an identification and parsing process. Among them, the process of recognizing and parsing speech signals is inseparable from the language model (Language Model, LM). The purpose of the language model is to establish a distribution that can describe the probability that a given word sequence appears in the language.

In the field of speech recognition, most of the common language models are used. The general language model mainly includes the general word sequence and the probability that the general word sequence appears in the language for identifying the speech signal in the general domain. However, with the development of the times, the increase of application scenarios and the changing language habits of users, the existing common language models obviously cannot meet these application requirements, which will reduce the accuracy of speech recognition.

Summary of the invention

Embodiments of the present invention provide a voice signal processing method, apparatus, and electronic device for performing Speech recognition improves the accuracy of speech signal recognition.

The embodiment of the invention provides a voice signal processing method, including:

Obtaining a string of information corresponding to the voice signal to be identified;

Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

Determining whether there is a word sequence corresponding to the information string in the scene language model;

If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;

Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.

An embodiment of the present invention provides a voice signal processing apparatus, including:

An acquiring module, configured to acquire a string of information corresponding to the voice signal to be identified;

a determining module, configured to determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

a determining module, configured to determine whether a word sequence corresponding to the information string exists in the scened language model;

And an enhancement module, configured to: if the determination result is yes, increase a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scene language model;

And an identifying module, configured to perform voice recognition on the to-be-identified voice signal according to the enhanced scened language model.

Embodiments of the present invention further provide an electronic device including at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor The instruction is executed by the at least one processor, so that the at least one processor can acquire an information string corresponding to the to-be-identified voice signal; and according to the information string, determine a scene language corresponding to the to-be-identified voice signal a model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing the scene a probability of a word sequence corresponding to the information string appearing in the language in the language model to obtain an enhanced scened language model; and performing speech recognition on the to-be-recognized speech signal according to the enhanced scened language model .

Embodiments of the present invention also provide a non-volatile computer storage medium storing the computer-executable instructions of computer-executable instructions that, when executed by an electronic device, enable an electronic device to acquire a voice signal to be recognized Corresponding information string; determining, according to the information string, a scene language model corresponding to the to-be-identified speech signal; determining whether there is a word sequence corresponding to the information string in the scene language model; And increasing a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model; according to the enhanced scened language model, The speech signal to be recognized is subjected to speech recognition.

Embodiments of the present invention also provide a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to execute the above-described speech signal processing method.

The voice signal processing method, apparatus, and electronic device provided by the embodiment of the present invention determine a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and the information string corresponding to the scene language model exists. When the word sequence is used, the probability of the word sequence appearing in the language is increased to obtain an enhanced scened language model, and the speech signal to be recognized is recognized based on the enhanced scene language model. Compared with the speech recognition scheme based on the common language model in the prior art, the embodiment of the present invention can improve the accuracy of speech recognition based on the enhanced scene language model.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention;

2 is a schematic flowchart of a voice signal processing method according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention; FIG.

FIG. 4 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.

detailed description

The technical solutions of the present invention will be clearly and completely described in the following with reference to the accompanying drawings. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In the description of the present invention, it is to be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inside", "outside", etc. The orientation or positional relationship of the indications is based on the orientation or positional relationship shown in the drawings, and is merely for the convenience of the description of the invention and the simplified description, rather than indicating or implying that the device or component referred to has a specific orientation, in a specific orientation. The construction and operation are therefore not to be construed as limiting the invention. Moreover, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that the terms "installation", "connected", and "connected" are to be understood broadly, and may be fixed or detachable, for example, unless otherwise explicitly defined and defined. Connected, or connected integrally; can be mechanical or electrical; can be directly connected, indirectly connected through an intermediate medium, or can be connected inside the two components, It is either a wireless connection or a wired connection. The specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.

Further, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not constitute a conflict with each other.

In view of the problems existing in the prior art, the present invention provides a solution, the main principle is: determining a scene language model corresponding to a speech signal to be recognized, and increasing a probability that a sequence of corresponding words in the scene language model appears in a language To obtain an enhanced scened language model, based on the enhanced scened language model for speech recognition of the speech signal to be recognized. Compared with the general language model, the scene language model contains more word sequences related to the application scene (also called a specific word sequence), and the word sequence related to the speech signal to be recognized in the scene language model is pre-increased. The probability of occurrence in the language, so based on the enhanced scened language model to recognize the speech signal for speech recognition, can improve the accuracy of speech recognition.

The technical solution of the present invention will be described in detail below through specific embodiments.

FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:

101. Acquire a string of information corresponding to the voice signal to be identified.

102. Determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal.

103. Determine whether there is a sequence of words corresponding to the information string in the scenario language model; if the determination result is yes, execute step 104, if the determination result is no, optionally, end the operation or treat according to the scene language model Identify voice models for speech recognition.

104. Increase a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model.

105. Perform speech recognition on the recognized speech signal according to the enhanced scene language model.

The embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.

Specifically, before the recognition of the voice signal to be recognized, the voice signal processing device first acquires the information string corresponding to the voice signal to be recognized. The information string refers to a string of information that can reflect the speech signal to be recognized to a certain extent, and may be, for example, a Pinyin string corresponding to the speech signal to be recognized, or an initial text string obtained by performing initial speech recognition on the speech signal to be recognized. Then, the speech signal processing device determines a scene language model corresponding to the speech signal to be recognized according to the information string, so as to perform speech recognition on the speech signal to be recognized based on the scene language model.

Optionally, the foregoing implementation manner of determining a scenario language model corresponding to the to-be-identified voice signal according to the information string includes:

Semantically parsing the information string corresponding to the recognized speech signal, determining a grammatical sentence and an entity word in the information string; determining a user intention of the speech signal to be recognized according to the grammatical sentence and the entity word; determining according to the user intention A scened language model corresponding to the speech signal to be recognized. For example, the information string corresponding to the speech signal to be recognized is “I want to call Xiao Li”. After semantic analysis, it can be determined that the grammar sentence in the information string is “I want to call...” and the entity word is “ Xiao Li", according to the sentence sentence and the entity word, it can be determined that the user's intention is to call someone. According to the user's intention, the scene language model corresponding to the voice signal to be recognized can be determined as the phone scene language model instead of Search for a scene language model.

After determining the scene language model corresponding to the speech signal to be recognized, the speech signal to be recognized is not directly based on the scene language model, but the probability that the corresponding word sequence in the scene language model appears in the language. Increase to improve speech recognition accuracy. By The information string reflects the speech signal to be recognized to a certain extent, so that the speech signal to be recognized is recognized as the word sequence corresponding to the information string is larger than other word sequences, and based on this, the information corresponding to the information may be The sequence of words of the string is used as a sequence of corresponding words in the scene language model corresponding to the speech signal to be recognized that requires an increased probability. Of course, before increasing the probability that the word sequence corresponding to the information string appears in the language, it is first determined whether there is a word sequence corresponding to the information string in the scene language model corresponding to the speech signal to be recognized; if the judgment result is yes, That is, the word sequence corresponding to the information string exists in the scene language model, and the probability that the word sequence corresponding to the information string appears in the language in the scene language model is increased to obtain an enhanced scene language model, and then Based on the enhanced scene language model, speech recognition is performed on the speech signal to be recognized.

In an optional implementation manner, the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary. The grammar file stores various grammatical expressions in the application scenario corresponding to the scene language model, that is, some fixed expressions, such as "please call us", "play songs...", "search for songs... lyrics", etc. . The scene dictionary stores the entity words commonly used in the application scenario corresponding to the scene language model. For example, in the phone application scenario, the entity word may be the name of the contact in the address book, or the entity word may be in the application scenario of the voice control playing music. The name of the song in the music library, etc.

Based on the above, an implementation manner of determining whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal includes:

Semantically parsing the information string corresponding to the recognized speech signal, determining the grammatical sentence and the entity word in the information string; determining whether the fixed sentence pattern in the information string is included in the grammar file of the scene language model, and determining the information string Whether the entity word is included in the scene dictionary of the scene language model; if the judgment result is yes, it is determined that there is a word sequence corresponding to the information string in the scene language model, and the fixed sentence pattern in the information string and The sequence of words into which the entity words are combined is the sequence of words corresponding to the information string.

It is worth noting that the process and judgment of determining the scene language model corresponding to the signal to be identified In the process of disconnecting the word sequence corresponding to the information string in the scene language model, the operations of semantically parsing the information string and determining the grammatical sentence form and the entity word in the information string are specifically implemented, the operation is It can be executed only once or once in two processes.

As can be seen from the above, the scene language model corresponding to the speech signal to be recognized in the embodiment includes a sequence of words related to the application scenario, and the sequence of words in the scene language model that may be the recognition result of the speech signal to be recognized is in the language. The probability of occurrence is increased again, so the recognition of the speech signal based on the enhanced scened language model can improve the accuracy of speech recognition.

In an optional implementation manner, the universal speech model may be used to perform speech recognition on the speech signal to be recognized. When the common language model is used to identify the word sequence corresponding to the speech signal to be recognized, the method provided by the embodiment of the present invention is used to treat The speech signal is recognized for speech recognition. The flow of this embodiment is shown in FIG. 2 and includes the following steps:

200. Using a universal language model to perform speech recognition on the recognized speech signal;

201. Determine whether the common language model identifies the word sequence corresponding to the to-be-identified voice signal; if the determination result is yes, the operation ends; if the determination result is no, step 202 is performed.

202. Acquire a sequence of information corresponding to the voice signal to be identified.

203. Determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

204. Determine whether there is a word sequence corresponding to the information string in the scene language model. If the determination result is yes, execute step 205. If the determination result is no, if yes, perform step 207.

205. Increase a probability that a sequence of words corresponding to the information string appears in the language in the scened language model to obtain an enhanced scened language model.

206. Perform speech recognition on the recognized speech signal according to the enhanced scenario language model, and end the operation.

207. End the operation or perform speech recognition according to the scene language model to identify the voice model, and end the operation.

Among them, the general language model can be called a large language model, and the scene language model can also be called a small language model.

In an optional implementation manner, in step 105 or step 206 above, voice recognition may be performed on the voice signal to be recognized based on the enhanced scened language model.

In another optional implementation manner, in the above step 105 or step 206, the speech signal to be recognized may be voice-recognized in combination with the general language model and the enhanced scene language model.

It should be noted that, in the embodiment of the present invention, a general language model or an enhanced scene language model is adopted, a process of performing speech recognition on a speech signal to be recognized, and a process of performing speech recognition on a speech signal based on a common language model in the prior art. Similar, it will not be described in detail here.

In addition, in combination with the above-mentioned common language model and the enhanced scenario language model, an implementation manner of performing speech recognition on a speech signal to be recognized includes:

The enhanced scened language model can be superimposed into the general language model to generate a compound language model (actually a larger language model), and then the speech signal to be recognized based on the composite language model for speech recognition.

The above-mentioned combined implementation of the common language model and the enhanced scenario language model, another embodiment of speech recognition for the speech signal to be recognized includes:

Firstly, the universal speech model is used to perform speech recognition on the speech signal, and the candidate word sequence corresponding to the speech signal to be recognized and the first probability that the candidate word sequence appears in the language in the common language model are obtained from the enhanced scene language model. A second probability of occurrence of the candidate word sequence in the language is obtained, and the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the to-be-recognized speech signal is obtained from the candidate word sequence according to the weighting processing result.

Using a universal language model to perform speech recognition on the speech signal to be recognized, obtaining a first candidate word sequence corresponding to the speech signal to be recognized and a probability of occurrence of the first candidate word sequence in the language; The post-scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and appears in the language according to the first candidate word sequence The probabilities and the probability that the second candidate word sequence appears in the language, and the sequence of words corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence. Wherein, for the same candidate word sequence in the first candidate word sequence and the second candidate word sequence, the corresponding two probabilities may be weighted and summed as their final probability.

In the above embodiment, in addition to enhancing the contextual language model, the speech recognition accuracy is improved, and the common language model and the enhanced scene language model are combined to perform speech recognition on the speech signal to be recognized, and the general language model can be fully utilized. It contains more general word sequences, and the enhanced scene language model contains more features of word sequences related to the application scene, improving the accuracy of speech recognition.

FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to still another embodiment of the present invention. As shown in FIG. 3, the apparatus includes: an acquisition module 31, a determination module 32, a determination module 33, an enhancement module 34, and an identification module 35.

The obtaining module 31 is configured to acquire a string of information corresponding to the voice signal to be identified.

The determining module 32 is configured to determine a scene language model corresponding to the to-be-identified voice signal according to the information string corresponding to the to-be-identified voice signal.

The determining module 33 is configured to determine whether a word sequence corresponding to the information string exists in the scene language model corresponding to the to-be-identified voice signal.

The enhancement module 34 is configured to: if the determination result is yes, increase a probability that a word sequence corresponding to the information string appears in the language in the scene language model corresponding to the to-be-identified voice signal, to obtain an enhanced scene language model.

The identification module 35 is configured to perform voice recognition on the voice signal to be recognized according to the enhanced scene language model.

In an optional implementation, the determining module 32 is specifically configured to:

Semantically parsing the information string corresponding to the recognized speech signal, and determining a grammatical sentence and an entity word in the information string;

Determining the user intent of the speech signal to be recognized according to the grammatical sentence and the entity word;

The scened language model corresponding to the to-be-recognized speech signal is determined according to the user's intention.

In an optional implementation manner, the scene language model corresponding to the to-be-identified voice signal includes a grammar file and a scene dictionary. Based on this, the determining module 33 is specifically used to:

Semantic analysis of the information string corresponding to the recognized speech signal, determining the grammatical sentence and the entity word in the information string

Determining whether the fixed sentence is included in the grammar file and determining whether the entity word is included in the scene dictionary;

If the determination result is yes, it is determined that there is a word sequence corresponding to the information string in the scene language model, and the word sequence composed of the fixed sentence pattern and the entity word is a word sequence corresponding to the information string.

In an optional implementation, the obtaining module 31 is specifically configured to:

When the word sequence corresponding to the to-be-recognized speech signal cannot be identified by using the general language model, the information string corresponding to the to-be-recognized speech signal is obtained.

In an optional implementation, the identification module 35 is specifically configured to:

According to the general language model and the enhanced scene language model, speech recognition is performed on the speech signal to be recognized.

Further, the identifying module 35 is specifically configured to: first use the general language model to perform speech recognition on the speech signal to be recognized, obtain a candidate word sequence corresponding to the to-be-recognized speech signal, and a first probability that the candidate word sequence appears in the language in the universal language model, Obtaining a second probability of the candidate word sequence appearing in the language from the enhanced scene language model, weighting the first probability and the second probability of the candidate word sequence, and obtaining the to-be-identified from the candidate word sequence according to the weighting processing result The final sequence of words corresponding to the speech signal.

Further, the identification module 35 is specifically configured to: first use a common language model to identify a voice message No. Perform speech recognition, obtain candidate word sequences corresponding to the speech signal to be recognized (usually multiple groups) and first probability of occurrence of candidate word sequences in the language in the common language model, and obtain candidates from the enhanced scene language model The second probability of the word sequence appearing in the language, the first probability and the second probability of the candidate word sequence are weighted, and the word sequence corresponding to the speech signal to be recognized is obtained from the candidate word sequence according to the weighting processing result.

Further, the identification module 35 is specifically configured to: perform speech recognition on the speech signal to be recognized by using the universal language model, and acquire a probability that the first candidate word sequence and the first candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; The scene language model performs speech recognition on the recognized speech signal, and acquires a probability that the second candidate word sequence and the second candidate word sequence corresponding to the to-be-recognized speech signal appear in the language; and the probability and occurrence in the language according to the first candidate word sequence The probability that the second candidate word sequence appears in the language, and the word sequence corresponding to the speech signal to be recognized is finally obtained from the first candidate word sequence and the second candidate word sequence. Wherein, for the same candidate word sequence in the first candidate word sequence and the second candidate word sequence, the corresponding two probabilities may be weighted and summed as their final probability.

The voice signal processing apparatus provided in this embodiment determines a scene language model corresponding to the voice signal to be recognized according to the information string corresponding to the voice signal to be recognized, and increases the word sequence corresponding to the information string in the scene language model. The probability that the sequence of words appears in the language to obtain an enhanced scened language model, based on the enhanced scened language model for speech recognition of the speech signal to be recognized, rather than using a common language model for speech as in the prior art Recognition can improve the accuracy of speech recognition.

An embodiment of the present invention further provides an electronic device including at least one processor 810; and a memory 800 communicably coupled to the at least one processor 810; wherein the memory 800 is stored for processing by the at least one The instructions are executed by the at least one processor 810 to enable the at least one processor 810 to acquire a string of information corresponding to the voice signal to be identified; and determine the to-be-identified according to the string of information Scenario corresponding to speech signal a language model; determining whether there is a word sequence corresponding to the information string in the scene language model; if the determination result is yes, increasing a word sequence corresponding to the information string in the scene language model in a language The probability of occurrence is obtained to obtain an enhanced scened language model; and the speech signal to be recognized is subjected to speech recognition according to the enhanced scened language model. The electronic device also includes an input device 830 and an output device 840 that are electrically coupled to the memory 800 and the processor, the electrical connections preferably being connected by a bus.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. The work specified in one or more blocks of a flow or a flow and/or a block diagram of a flowchart Able device.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

It is apparent that the above-described embodiments are merely illustrative of the examples, and are not intended to limit the embodiments. Other variations or modifications of the various forms may be made by those skilled in the art in light of the above description. There is no need and no way to exhaust all of the implementations. Obvious changes or variations resulting therefrom are still within the scope of the invention.

Claims

A voice signal processing method, comprising:

Obtaining a string of information corresponding to the voice signal to be identified;

Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

Determining whether there is a word sequence corresponding to the information string in the scene language model;

If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;

Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
The method according to claim 1, wherein the determining a scene language model corresponding to the to-be-identified speech signal according to the information string comprises:

Performing semantic analysis on the information string to determine a grammatical sentence and an entity word in the information string;

Determining, according to the grammatical sentence formula and the entity word, the user intention of the speech signal to be recognized;

And determining, according to the user intention, a scene language model corresponding to the to-be-identified voice signal.
The method of claim 1, wherein the scened language model comprises a grammar file and a scene dictionary;

Determining whether there is a word sequence corresponding to the information string in the scened language model includes:

Performing semantic analysis on the information string to determine grammatical sentences and entity words in the information string

Determining whether the fixed sentence is included in the grammar file, and determining whether the entity word is included in the scene dictionary;

If the determination result is yes, determining that there is a word sequence corresponding to the information string in the scene language model, and the word sequence formed by the fixed sentence pattern and the entity word is corresponding to the information string The sequence of words.
The method according to any one of claims 1 to 3, wherein the obtaining is to be identified The information string corresponding to the voice signal includes:

When the word sequence corresponding to the to-be-recognized speech signal is not recognized by using the common language model, the information string corresponding to the to-be-recognized speech signal is acquired.
The method according to claim 4, wherein the performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model comprises:

Performing speech recognition on the to-be-recognized speech signal according to the universal language model and the enhanced scened language model.
A voice signal processing device, comprising:

An acquiring module, configured to acquire a string of information corresponding to the voice signal to be identified;

a determining module, configured to determine, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

a determining module, configured to determine whether a word sequence corresponding to the information string exists in the scened language model;

And an enhancement module, configured to: if the determination result is yes, increase a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scene language model;

And an identifying module, configured to perform voice recognition on the to-be-identified voice signal according to the enhanced scened language model.
The apparatus according to claim 6, wherein the determining module is specifically configured to:

Performing semantic analysis on the information string to determine a grammatical sentence and an entity word in the information string;

Determining, according to the grammatical sentence formula and the entity word, the user intention of the speech signal to be recognized;

And determining, according to the user intention, a scene language model corresponding to the to-be-identified voice signal.
The apparatus according to claim 6, wherein the scene language model comprises a grammar file and a scene dictionary;

The determining module is specifically configured to:

Performing semantic analysis on the information string to determine grammatical sentences and entity words in the information string

Determining whether the fixed sentence is included in the grammar file, and determining whether the entity word is included in the scene dictionary;

If the determination result is yes, determining that there is a word sequence corresponding to the information string in the scene language model, and the word sequence formed by the fixed sentence pattern and the entity word is corresponding to the information string The sequence of words.
The device according to any one of claims 6-8, wherein the obtaining module is specifically configured to:

When the word sequence corresponding to the to-be-recognized speech signal is not recognized by using the common language model, the information string corresponding to the to-be-recognized speech signal is acquired.
The device according to claim 9, wherein the identification module is specifically configured to:

Performing speech recognition on the to-be-recognized speech signal according to the universal language model and the enhanced scened language model.
An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions Executed by the at least one processor to enable the at least one processor to

Obtaining a string of information corresponding to the voice signal to be identified;

Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

Determining whether there is a word sequence corresponding to the information string in the scene language model;

If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;

Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
A non-volatile computer storage medium characterized in that: the storage medium is stored The computer executable instructions of computer executable instructions, when executed by an electronic device, enable the electronic device to:

Obtaining a string of information corresponding to the voice signal to be identified;

Determining, according to the information string, a scene language model corresponding to the to-be-identified voice signal;

Determining whether there is a word sequence corresponding to the information string in the scene language model;

If the determination result is yes, increasing a probability that the word sequence corresponding to the information string appears in the language in the scene language model to obtain an enhanced scened language model;

Performing voice recognition on the to-be-recognized speech signal according to the enhanced scened language model.
A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to execute The method of claims 1-5.