CN107170450B

CN107170450B - Voice recognition method and device

Info

Publication number: CN107170450B
Application number: CN201710448331.5A
Authority: CN
Inventors: 蒋化冰; 蔡汉嘉; 张海建; 谭舟; 王振超; 梁兰; 徐志强; 严婷; 郦莉
Original assignee: Shanghai Zhihuilin Medical Technology Co ltd
Current assignee: Shanghai Noah Wood Robot Technology Co.,Ltd.
Priority date: 2017-06-14
Filing date: 2017-06-14
Publication date: 2021-03-12
Anticipated expiration: 2037-06-14
Also published as: CN107170450A

Abstract

The embodiment of the invention provides a voice recognition method and a voice recognition device, wherein the method comprises the following steps: responding to the operation of inputting audio information on an audio input interface of a voice recognition client by a user, and acquiring voice recognition engine indication information which is determined in advance according to the network condition between the voice recognition client and an online voice recognition engine; and sending the audio information to the voice recognition engine indicated by the voice recognition engine indication information so as to recognize the audio information through the voice recognition engine indicated by the voice recognition engine indication information. The voice recognition method and the voice recognition device provided by the embodiment of the invention can improve the real-time performance of voice recognition and reduce the waste of time.

Description

Voice recognition method and device

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a speech recognition method and apparatus.

Background

With the development of speech recognition technology, speech recognition engines have emerged that can convert audio into text. Wherein the speech synthesis engine includes an online speech recognition engine and an offline speech recognition engine.

In the prior art, in order to perform voice recognition smoothly under different network conditions, a method for performing voice recognition by comprehensively using an online voice recognition engine and an offline voice recognition engine has appeared. The method specifically comprises the following steps: after the user inputs the audio information, the client side simultaneously sends the audio information input by the user to the offline speech recognition engine and the online speech recognition engine. Within preset time, if the client receives the text information returned by the online voice recognition engine, the online voice recognition engine is used for voice recognition; otherwise, performing voice recognition by using an offline voice recognition engine.

In the speech recognition method, after the user inputs the audio information, the user needs to wait for a preset time to determine the used speech synthesis engine, so that the real-time performance of speech recognition is poor.

Disclosure of Invention

Aspects of the present disclosure provide a speech recognition method and apparatus for improving real-time performance of speech recognition.

The embodiment of the application provides a voice recognition method, which comprises the following steps:

responding to the operation of inputting audio information on an audio input interface of a voice recognition client by a user, and acquiring voice recognition engine indication information which is determined in advance according to the network condition between the voice recognition client and an online voice recognition engine;

and sending the audio information to the voice recognition engine indicated by the voice recognition engine indication information so as to recognize the audio information through the voice recognition engine indicated by the voice recognition engine indication information.

Optionally, before responding to an operation of a user inputting audio information on an audio input interface of the speech recognition client, the method further comprises:

responding to the operation of entering the audio input interface or starting the voice recognition client, and detecting the network condition between the voice recognition client and the online voice recognition engine;

and determining the indication information of the voice recognition engine according to the detected network condition between the voice recognition client and the online voice recognition engine.

Optionally, before responding to an entry into the audio input interface or an operation of the speech recognition client being turned on, the method further comprises:

responding to the detection configuration request of the user, and displaying a network setting interface for the user to configure a detection period and a detection website;

and responding to the setting operation of the user on the network setting interface, and acquiring the detection period and the detection website configured by the user.

Optionally, the detecting a network condition between the speech recognition client and the online speech recognition engine in response to an operation of entering the audio input interface or starting the speech recognition client includes:

responding to the operation of entering the audio input interface or starting the voice recognition client, and periodically sending a detection request to the online voice recognition engine corresponding to the detection website according to the detection period;

and determining the network condition between the voice recognition client and the online voice recognition engine according to the response condition of the online voice recognition engine to the detection request.

Optionally, determining the indication information of the speech recognition engine according to detecting a network condition between the speech recognition client and the online speech recognition engine, including:

if the network condition between the voice recognition client and the online voice recognition engine meets the set network requirement, determining voice recognition engine indication information indicating the online voice recognition engine;

and if the network condition between the voice recognition client and the online voice recognition engine does not meet the set network requirement, determining voice recognition engine indication information indicating an offline voice recognition engine.

Optionally, after determining the indication information of the speech recognition engine according to the detected network condition between the speech recognition client and the online speech recognition engine, the method further includes:

storing the voice recognition engine indication information to a local memory;

the acquiring, in response to an operation of a user inputting audio information on an audio input interface provided by a speech recognition client, speech recognition engine indication information determined in advance according to a network condition between the speech recognition client and an online speech recognition engine includes:

and responding to the operation of inputting audio information on the audio input interface by the user, and acquiring the latest stored voice recognition engine indication information from the local memory.

Optionally, the method further comprises:

detecting a network condition between the voice recognition client and an online voice recognition engine in the process that the voice recognition engine indicated by the voice recognition engine indication information identifies the audio information;

and updating the indication information of the voice recognition engine when the network condition between the voice recognition client and the online voice recognition engine changes.

Optionally, the method further comprises:

and stopping detecting the network condition between the voice recognition client and an online voice recognition engine in response to the operation of exiting the audio input interface or closing the voice recognition client.

An embodiment of the present application further provides a speech recognition apparatus, including:

the system comprises a first acquisition module, a second acquisition module and a voice recognition engine, wherein the first acquisition module is used for responding to the operation that a user inputs audio information on an audio input interface of a voice recognition client and acquiring voice recognition engine indication information which is determined in advance according to the network condition between the voice recognition client and an online voice recognition engine;

and the sending module is used for sending the audio information to the voice recognition engine indicated by the voice recognition engine indication information so as to recognize the audio information through the voice recognition engine indicated by the voice recognition engine indication information.

Optionally, the apparatus further comprises:

the detection module is used for responding to the operation of entering the audio input interface or starting the voice recognition client and detecting the network condition between the voice recognition client and the online voice recognition engine;

and the determining module is used for determining the indication information of the voice recognition engine according to the detected network condition between the voice recognition client and the online voice recognition engine.

In the embodiment of the application, before the user performs the operation of inputting the audio information, the indication information of the speech recognition engine is determined in advance according to the network condition; when the user executes the operation of inputting the audio information, the indication information of the speech recognition engine can be immediately acquired, and an available speech recognition engine is determined; furthermore, after the user inputs the audio information, the voice recognition can be immediately carried out through the determined voice recognition engine, so that the real-time performance of the voice recognition is improved, and the waste of time is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of a speech recognition method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a speech recognition method according to another embodiment of the present application;

fig. 3 is a block diagram of a speech recognition apparatus according to another embodiment of the present application;

fig. 4 is a block diagram of a speech recognition apparatus according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

s101: and responding to the operation of inputting audio information on an audio input interface of the voice recognition client by a user, and acquiring voice recognition engine indication information which is determined in advance according to the network condition between the voice recognition client and the online voice recognition engine.

S102: and sending the audio information to the voice recognition engine indicated by the voice recognition engine indication information so as to recognize the audio information through the voice recognition engine indicated by the voice recognition engine indication information.

When the user has the requirement of audio recognition, the audio information to be recognized can be input to the voice recognition client, and then the voice recognition client sends the received audio information to the voice recognition engine for voice recognition.

For the speech recognition client, an audio input interface is generally provided, and a user can perform an operation of inputting audio information on the audio input interface.

Optionally, an audio input control may be disposed on the audio input interface, and the user may perform the input of the audio information by triggering the audio input control. For example, the audio input control may be a microphone icon control, and the user may input audio information by touching the microphone icon control.

Then, in order to determine the speech recognition engine immediately after the user inputs the audio information, the present embodiment determines the speech recognition engine indication information in advance, and further may obtain the predetermined speech recognition engine indication information in response to the user's operation of inputting the audio information on the audio input interface, so as to determine the speech recognition engine for performing speech recognition on the audio information input by the user based on the speech recognition engine indication information. Wherein the voice recognition engine indication information is determined in advance according to the network condition between the voice recognition client and the online voice recognition engine. For convenience of description, the network condition between the speech recognition client and the online speech recognition engine is simply referred to as network condition in the following embodiments.

Optionally, the speech recognition engine indication information includes indication information indicating an online speech recognition engine and indication information indicating an offline speech recognition engine.

Optionally, if the network condition is available for the online speech recognition engine to perform speech recognition, determining indication information indicating the online speech recognition engine; if the network condition is not available for the online speech recognition engine to perform speech recognition, determining indication information indicating the offline speech recognition engine. Then, an online speech recognition engine or an offline speech recognition engine may be determined according to the speech recognition engine indication information.

The speech recognition client may then send the audio information to the speech recognition engine indicated by the speech recognition engine indication information. And the voice recognition engine recognizes the audio information after receiving the audio information.

After the speech recognition engine finishes the speech recognition, the recognition result can be returned to the speech recognition client side, and the speech recognition client side displays the recognition result to the user.

In the embodiment, the indication information of the voice recognition engine is determined in advance according to the network condition, so that when a user executes the operation of inputting audio information, the indication information of the voice recognition engine can be immediately acquired, and an available voice recognition engine is determined; further, after the user inputs the audio information, the voice recognition can be performed by the determined voice recognition engine, thereby improving the real-time performance of the voice recognition.

In the above-described embodiment or the following-described embodiment, the speech recognition engine indication information may be determined in advance by a network condition before responding to an operation of a user inputting audio information on an audio input interface of the speech recognition client.

Optionally, the network condition between the speech recognition client and the online speech recognition engine may be detected in response to an operation of entering the audio input interface or starting the speech recognition client, and the speech recognition engine indication information may be determined according to the detected network condition between the speech recognition client and the online speech recognition engine.

When a user has a voice recognition requirement, a voice recognition client needs to be started first. After the voice recognition client is started, the page displayed by the voice recognition client may or may not be an audio input interface. If the voice recognition client is started, the page displayed by the voice recognition client is not the audio input interface and needs to enter the audio input interface again. Subsequently, on the audio input interface, audio information is input by triggering the audio input control.

Based on the above analysis, when the user enters the audio input interface or turns on the speech recognition client, it means that the user is about to perform an input operation of audio information. At this time, the network condition between the speech recognition client and the online speech recognition engine can be detected in response to the operation of entering the audio input interface or starting the speech recognition client. And determining the indication information of the voice recognition engine according to the detected network condition between the voice recognition client and the online voice recognition engine. When the user inputs the audio information, the determined indication information of the speech recognition engine can be directly obtained, and the speech recognition engine required by the speech recognition can be determined in time.

Fig. 2 is a flowchart illustrating a speech recognition method according to another embodiment of the present application. As shown in fig. 2, the following steps are included.

S201: and responding to the detection configuration request of the user, and displaying a network setting interface for the user to configure the detection period and detect the website.

S202: and responding to the setting operation of the user on the network setting interface, and acquiring the detection period configured by the user and the detection website.

S203: and responding to the operation of entering an audio input interface or starting a voice recognition client, and periodically sending a detection request to an online voice recognition engine corresponding to the detected website according to a detection period.

S204: and determining the network condition between the voice recognition client and the online voice recognition engine according to the response condition of the online voice recognition engine to the detection request.

S205: and judging whether the network condition between the voice recognition client and the online voice recognition engine meets the set network requirement. If yes, jumping to step S206; if not, go to step S207.

S206: voice recognition engine indication information indicating an online voice recognition engine is determined and it jumps to step S209.

S207: speech recognition engine indication information indicating an offline speech recognition engine is determined, and the process proceeds to step S208.

S208: and sending the audio information to an offline speech recognition engine indicated by the speech recognition engine indication information indicating the offline speech recognition engine so as to recognize the audio information through the offline speech recognition engine and finish the operation.

S209: and sending the audio information to an online voice recognition engine indicated by voice recognition engine indication information indicating the online voice recognition engine so as to recognize the audio information through the online voice recognition engine and end the operation.

Since the operation of detecting the network condition requires a certain time duration, in order to immediately acquire the indication information of the speech recognition engine when the user inputs the audio information, optionally, the network condition may be detected in response to entering the audio input interface or starting the operation of the speech recognition client. Based on the method, the parameters required for detecting the network condition can be configured in advance before entering the audio input interface or starting the voice recognition client. Optionally, the parameters required for detecting the network condition may include, but are not limited to: detecting period and detecting web address.

Based on this, before the user enters the audio input interface or opens the voice recognition client, the detection period and the detection website address can be preset. Optionally, the user may send a detection configuration request to the speech recognition client; the speech recognition client may respond to the detection configuration request of the user, and display a network setting interface to the user for the user to configure the detection period and detect the website (i.e., step S201).

The detection website is a website for the voice recognition client to send a detection request. The detecting website in this embodiment may include an IP address and/or a domain name address, for example, the domain name address may be www.baidu.com, the IP address may be 61.135.169.121, and the like. In order to successfully detect the network condition between the voice recognition client and the online voice recognition engine, the detected network address should be the IP address and/or domain name address of the online voice recognition engine, or the IP address and/or domain name address of the transit server on the communication link between the voice recognition client and the online voice recognition engine.

The detection period is a period, such as 1s, 2s, etc., during which the speech recognition client sends the detection request. The detection request is a request signal sent by the voice recognition client to the server corresponding to the detection website.

Optionally, the network setting interface may provide options of the detection period and the detection website in advance, or provide an input box of the detection period and the detection website for the user to select or input the detection period and the detection website.

After the user sets the detection period and the detection website on the network setting interface, the speech recognition client may respond to the setting operation of the user on the network setting interface to obtain the detection period and the detection website configured by the user (i.e., step S202).

Optionally, after the user completes the setting on the network setting interface, a submit operation or a save operation may be performed, for example, triggering a submit control or a save control. The voice recognition client can respond to the submitting operation or the saving operation of the user on the network setting interface, and acquire the detection period and the detection website set by the user.

The user does not need to set a detection period and detect the website every time before entering the audio input interface or starting the voice recognition client. Optionally, after the detection period and the detection website set by the user are obtained, the detection period and the detection website may be stored in the local memory, and when the user needs to perform voice recognition next time, the voice recognition client may directly obtain the detection website and the detection period set earlier from the local memory without sending a detection configuration request.

After the detection website and the detection period are obtained, the voice recognition client can detect the network condition. Optionally, in response to an operation of entering the audio input interface or starting the speech recognition client, a detection request is periodically sent to the online speech recognition engine corresponding to the detected website according to a detection period (i.e., step S203). Then, according to the response of the online speech recognition engine to the probing request, the network status between the speech recognition client and the online speech recognition engine is determined (i.e., step S204).

Optionally, in order to obtain the indication information of the speech recognition engine determined according to the network condition as soon as possible, the detection request may be sent to the online speech recognition engine corresponding to the detected website immediately in response to the operation of the user entering the audio input interface or starting the speech recognition client, and then the detection request may be sent according to the detection period.

Optionally, the present embodiment may use a Hypertext Transfer Protocol (HTTP) to detect the network status.

The HTTP protocol employs a request/response model. The voice recognition client sends a request message to the online voice recognition engine corresponding to the detected website, and the online voice recognition engine replies a response text.

For example, a detected URL is entered in the browser address bar, and after the enter key is pressed, the browser establishes a connection with the online voice recognition engine corresponding to the detected URL. Then, the browser sends an HTTP request for reading the file, namely a detection request, to an online voice search engine; the online voice recognition engine responds to the HTTP request of the browser and sends the corresponding html response message to the browser, and then the browser can display the received html response message.

According to the return time and the content of the html response message, the response condition of the online voice recognition engine to the HTTP request comprises the following three conditions:

the first response case: and no html message is returned within the specified time. Possibly, the browser does not successfully establish connection with an online voice recognition engine corresponding to the detected URL, so that no html response message exists; or the return time of the html response message exceeds the specified time due to network congestion.

Second response case: the return time of the html response message does not exceed the specified time, but the content of the html response message is not the HTTP response 200 OK indicating that the request is normal. For example, the html response packet has 403 Forbidden and Not Found contents.

The third response case: the return time of the html response message does not exceed the specified time, and the content of the html response message is HTTP response 200 OK indicating that the request is normal.

The online voice recognition engines respond to the detection requests in different conditions, and the determined network conditions are different. Based on this, the html response packet content of the HTTP response 200 OK and the specified time can be used as the set network requirement, and further, whether the network status between the voice recognition client and the online voice recognition engine meets the set network requirement is determined (i.e., step S205).

Obviously, for the first response case and the second response case, the determined network condition does not meet the set network requirement; for the third response case, the determined network condition satisfies the set network requirement.

Alternatively, if the network condition between the speech recognition client and the online speech recognition engine satisfies the set network requirement, which means that the speech recognition can be performed by the online speech recognition engine according to the network condition at this time, the speech recognition engine indication information indicating the online speech recognition engine is determined (i.e., step S206). Then, the audio information is sent to the online speech recognition engine indicated by the speech recognition engine indication information indicating the online speech recognition engine, so as to recognize the audio information by the online speech recognition engine, and the operation is ended (i.e., step S209).

If the network status between the speech recognition client and the online speech recognition engine does not meet the set network requirement, which means that the speech recognition by the online speech recognition engine is not possible according to the network status at this time, the speech recognition engine indication information indicating the offline speech recognition engine is determined (i.e., step S207). Then, the audio information is sent to the offline speech recognition engine indicated by the speech recognition engine indication information indicating the offline speech recognition engine, so as to recognize the audio information by the offline speech recognition engine, and the operation is ended (i.e., step S208).

Optionally, after determining the indication information of the speech recognition engine, before sending the audio information to the corresponding speech recognition engine, further turning on a microphone device; and receiving audio information input by a user on the audio input interface through the microphone device. And then sending the audio information input by the user to a corresponding speech recognition engine.

Alternatively, the speech recognition engine may divide the received audio information into a plurality of audio segments. Then, with the audio clip as a unit, after each audio clip is recognized, the recognition result of the audio clip, such as a segment of characters, is returned to the speech recognition client. The identification of the next audio piece is then performed until all audio pieces into which the audio information is divided have been identified. Based on this, the speech recognition client may present the recognition result of a segment to the user.

After the speech recognition engine indication information is determined, in order to facilitate the speech recognition client to acquire the speech recognition engine indication information, optionally, after the speech recognition engine indication information is determined, the speech recognition engine indication information may be saved to a local memory.

In practical applications, the speech recognition engine indication information may be embodied as a speech recognition engine variable, such as mEngine Type. If the network condition meets the set network requirement, assigning a variable mEngine Type to be a cloud for indicating an online voice recognition engine; if the network condition does not meet the set network requirements, the variable mEngine Type can be assigned to local for indicating the offline speech recognition engine.

Based on this, in response to the operation of inputting audio information on the audio input interface provided by the voice recognition client by the user, acquiring the voice recognition engine indication information determined in advance according to the network condition between the voice recognition client and the online voice recognition engine, including: and responding to the operation of inputting audio information on the audio input interface by the user, and acquiring the latest stored voice recognition engine indication information from the local memory.

The voice recognition engine instruction information stored last time is determined according to the network condition at the last time, and the network condition at the last time is approximately the same as the network condition during voice recognition, so that the voice recognition by the determined voice recognition engine is successfully performed. Alternatively, the speech recognition engine indication information may be embodied as a most recently assigned speech recognition engine variable, such as cloud or local.

In some cases, the user may input audio information to the speech recognition client at one time, after which the audio information is no longer input on the audio input interface. In other cases, the user may need to input audio information multiple times, for example, in a chat scenario, the user may input chat speech multiple times. In this case, if the network status is changed after the user inputs the audio information; when the user inputs audio information the next time, it is not appropriate to acquire the speech recognition engine instruction information acquired when the audio information was input the last time.

Based on the analysis, the network condition between the speech recognition client and the online speech recognition engine can be continuously detected during the process of recognizing the audio information by the speech recognition engine indicated by the speech recognition engine indication information. And updating the indication information of the voice recognition engine when the network condition between the voice recognition client and the online voice recognition engine changes. Then, the updated speech recognition engine indication information may be obtained when the user next inputs audio information.

The condition that the network condition changes comprises that the network condition changes from meeting the set network requirement to not meeting the set network requirement, or the network condition changes from not meeting the set network requirement to meeting the set network requirement.

Based on this, updating the speech recognition engine indication information includes updating the current speech recognition engine indication information indicating the online speech recognition engine to speech recognition engine indication information indicating the offline speech recognition engine, e.g., "closed" to "local". Alternatively, the current speech recognition engine indicating information indicating the offline speech recognition engine is updated to speech recognition engine indicating information indicating the online speech recognition engine, for example, "local" is updated to "close".

When the user exits the audio input interface or closes the speech recognition client, it means that the user does not input audio information any more, and it is not necessary to detect the network status. Based on this, the voice recognition client can stop detecting the network condition between the voice recognition client and the online voice recognition engine in response to the operation of exiting the audio input interface or closing the voice recognition client.

The embodiment of the present application further provides a speech recognition apparatus 300, as shown in fig. 3, including a first obtaining module 301 and a sending module 302.

The first obtaining module 301 is configured to, in response to an operation of a user inputting audio information on an audio input interface of a speech recognition client, obtain speech recognition engine indication information determined in advance according to a network condition between the speech recognition client and an online speech recognition engine.

A sending module 302, configured to send the audio information to the speech recognition engine indicated by the speech recognition engine indication information obtained by the first obtaining module 401, so as to recognize the audio information through the speech recognition engine indicated by the speech recognition engine indication information.

In the embodiment, the indication information of the voice recognition engine is determined in advance according to the network condition, so that when a user executes the operation of inputting audio information, the indication information of the voice recognition engine can be immediately acquired, and an available voice recognition engine is further determined; therefore, after the user inputs the audio information, the voice recognition can be carried out through the determined voice recognition engine, and the real-time performance of the voice recognition is improved.

Optionally, as shown in fig. 4, the speech recognition apparatus 300 further includes a detection module 303 and a determination module 304.

The detecting module 303 is configured to detect a network condition between the speech recognition client and the online speech recognition engine in response to an operation of entering the audio input interface or starting the speech recognition client.

A determining module 304, configured to determine the indication information of the speech recognition engine according to the network condition between the speech recognition client and the online speech recognition engine detected by the detecting module 303.

Optionally, as shown in fig. 4, the speech recognition apparatus 300 further includes a presentation module 305 and a second obtaining module 306.

The display module 305 is specifically configured to, before responding to the operation of entering the audio input interface or starting the voice recognition client, respond to the detection configuration request of the user, display a network setting interface for the user to configure a detection period and detect a website.

A second obtaining module 306, configured to obtain the detection period and the detection website configured by the user in response to a setting operation of the user on a network setting interface displayed by the displaying module 305.

Optionally, the detecting module 303 is further configured to, when detecting a network condition between the speech recognition client and the online speech recognition engine in response to an operation of entering the audio input interface or starting the speech recognition client: and responding to the operation of entering the audio input interface or starting the voice recognition client, and periodically sending a detection request to the online voice recognition engine corresponding to the detection website according to the detection period.

Based on this, the determining module 304 is further configured to determine a network condition between the speech recognition client and the online speech recognition engine according to a response condition of the online speech recognition engine to the probing request.

Optionally, when the determining module 304 determines the indication information of the speech recognition engine according to detecting a network condition between the speech recognition client and the online speech recognition engine, the determining module is further configured to: if the network condition between the voice recognition client and the online voice recognition engine meets the set network requirement, determining voice recognition engine indication information indicating the online voice recognition engine; and if the network condition between the voice recognition client and the online voice recognition engine does not meet the set network requirement, determining voice recognition engine indication information indicating an offline voice recognition engine.

Optionally, the speech recognition apparatus 300 further includes a saving module, configured to save the speech recognition engine indication information to a local memory after determining the speech recognition engine indication information according to the detected network condition between the speech recognition client and the online speech recognition engine.

Based on this, when the first obtaining module 301, in response to an operation of a user inputting audio information on an audio input interface provided by a speech recognition client, obtains speech recognition engine indication information determined in advance according to a network condition between the speech recognition client and an online speech recognition engine, specifically, the first obtaining module is further configured to:

Optionally, the detecting module 303 is further configured to: detecting a network condition between the speech recognition client and an online speech recognition engine in the process of recognizing the audio information by the speech recognition engine indicated by the speech recognition engine indication information.

Based on the above, the saving module is further configured to update the indication information of the speech recognition engine when a network condition between the speech recognition client and the online speech recognition engine changes.

Optionally, the detecting module 303 is further configured to: and stopping detecting the network condition between the voice recognition client and an online voice recognition engine in response to the operation of exiting the audio input interface or closing the voice recognition client.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A speech recognition method, comprising:

sending the audio information to a speech recognition engine indicated by the speech recognition engine indication information to recognize the audio information by the speech recognition engine indicated by the speech recognition engine indication information;

prior to responding to an operation of a user inputting audio information on an audio input interface of a speech recognition client, the method further comprises:

determining the indication information of the voice recognition engine according to the detected network condition between the voice recognition client and the online voice recognition engine;

storing the voice recognition engine indication information to a local memory;

2. The method of claim 1, wherein prior to responding to an entry into the audio input interface or an operation to turn on the speech recognition client, the method further comprises:

3. The method of claim 2, wherein detecting a network condition between the speech recognition client and the online speech recognition engine in response to an operation of entering the audio input interface or turning on the speech recognition client comprises:

4. The method of claim 3, wherein determining the speech recognition engine indication information based on detecting a network condition between the speech recognition client and the online speech recognition engine comprises:

5. The method according to any one of claims 1-4, further comprising:

6. The method of claim 5, further comprising:

7. A speech recognition apparatus, comprising:

a sending module, configured to send the audio information to a speech recognition engine indicated by the speech recognition engine indication information, so as to recognize the audio information by the speech recognition engine indicated by the speech recognition engine indication information;

the storage module is used for storing the voice recognition engine indication information to a local memory;

the first obtaining module is further configured to: and responding to the operation of inputting audio information on the audio input interface by the user, and acquiring the latest stored voice recognition engine indication information from the local memory.

8. The apparatus of claim 7, further comprising: