CN112068793A

CN112068793A - Voice input method and device

Info

Publication number: CN112068793A
Application number: CN201910501519.0A
Authority: CN
Inventors: 胡伟; 胡妙丽; 吴永波; 吕崇; 马传兴; 张小贝; 吴军
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2020-12-11

Abstract

The embodiment of the application discloses a voice input method and a voice input device. Then, according to the attribute of the input box and/or the input mode of the input method, the voice recognition mode is determined. And then acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode so as to display the recognition result. Therefore, when performing the speech recognition, the speech recognition mode, i.e. the chinese recognition mode or the non-chinese recognition mode, may be determined first based on the attribute of the input box itself and/or the input mode of the input method. When the voice recognition mode is the Chinese recognition mode, the acquired voice data is recognized as a Chinese recognition result, and when the voice recognition mode is the non-Chinese recognition mode, the acquired voice data is recognized as a non-Chinese recognition result, so that the accuracy of voice input is improved.

Description

Voice input method and device

Technical Field

The application relates to the technical field of internet, in particular to a voice input method and device.

Background

With the popularization of terminal devices, users can perform a large number of input operations in the terminal devices. Input methods, which refer to coding methods employed to input various symbols into a computer or other device, are commonly available for input. In the prior art, the input method can realize that the input is completed by performing voice recognition on voice data. However, in the input process by voice, there is a case where voice recognition is not accurate.

Disclosure of Invention

In view of this, embodiments of the present application provide a voice input method and device to solve the technical problem of inaccurate voice input in the prior art.

In order to solve the above problem, the technical solution provided by the embodiment of the present application is as follows:

in a first aspect of embodiments of the present application, a method for inputting speech is provided, where the method includes:

acquiring the input box attribute of the current input box and/or acquiring the input mode of an input method;

determining a voice recognition mode according to the attribute of the input box and/or the input mode of the input method, wherein the voice recognition mode comprises a Chinese recognition mode and a non-Chinese recognition mode;

acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode;

and displaying the identification result.

In a possible implementation manner, when only the attribute of the input box is obtained, the determining a speech recognition mode according to the attribute of the input box and/or the input mode of the input method includes:

when the input box attribute is at least one of only allowing input of characters, only allowing input of letters, only allowing input of symbols and only allowing input of numbers, determining that the voice recognition mode is a non-Chinese recognition mode;

and when the attribute of the input box is that the text is allowed to be input, determining that the voice recognition mode is a Chinese recognition mode.

In a possible implementation manner, when only the input mode of the input method is obtained, the determining the voice recognition mode according to the attribute of the input box and/or the input mode of the input method includes:

when the input mode of the input method is a letter input mode, a symbol input mode or a number input mode, determining that a voice recognition mode is a non-Chinese recognition mode;

and when the input mode of the input method is a Chinese input mode, determining that the voice recognition mode is a Chinese recognition mode.

In a possible implementation manner, when the attribute of the input box and the input manner of the input method are obtained, the determining the voice recognition mode according to the attribute of the input box and/or the input manner of the input method includes:

when the attribute of the input box is at least one of only allowing characters to be input or only allowing letters to be input, only allowing symbols to be input and only allowing numbers to be input, and the input mode of the input method is an alphabetic input mode, a symbol input mode or a number input mode, determining that the voice recognition mode is a non-Chinese recognition mode;

when the attribute of the input box is at least one of only allowing characters to be input or only allowing letters to be input, only allowing symbols to be input and only allowing numbers to be input, and the input mode of the input method is a Chinese input mode, determining that the voice recognition mode is a non-Chinese recognition mode;

when the attribute of the input box is that a text is allowed to be input and the input mode of the input method is a letter input mode, a symbol input mode or a number input mode, determining that the voice recognition mode is a non-Chinese recognition mode;

and when the attribute of the input box is that the text is allowed to be input and the input mode of the input method is a Chinese input mode, determining that the voice recognition mode is a Chinese recognition mode.

In a possible implementation manner, the obtaining a recognition result of performing speech recognition on the speech data according to the speech recognition mode includes:

sending the voice recognition mode to a voice recognition module so that the voice recognition module performs voice recognition on the voice data to generate a recognition result;

and acquiring the recognition result sent by the voice recognition module.

In a possible implementation manner, the speech recognition module is specifically configured to, when the speech recognition mode is a chinese recognition mode, input the speech data into a chinese speech recognition model to obtain a chinese recognition result; when the voice recognition mode is a non-Chinese recognition mode, inputting the voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result;

the obtaining of the recognition result sent by the voice recognition module includes:

acquiring a Chinese recognition result sent by the voice recognition module;

or acquiring a character or character string recognition result sent by the voice recognition module.

and carrying out voice recognition on the voice data according to the voice recognition mode to obtain a recognition result.

In a possible implementation manner, the performing speech recognition on the speech data according to the speech recognition mode to obtain a recognition result includes:

when the voice recognition mode is a Chinese recognition mode, inputting the voice data into a Chinese voice recognition model to obtain a Chinese recognition result;

and when the voice recognition mode is a non-Chinese recognition mode, inputting the voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result.

In a possible implementation manner, when a character or character string recognition result is obtained, the displaying the recognition result includes:

acquiring an capitalization locking state of an input method;

when the capitalization locking state of the input method is the capitalization locking state, displaying letters in the character or character string recognition result as capitalization;

and when the capitalization locking state of the input method is the capitalization unlocking state, displaying letters in the character or character string recognition result as lowercase.

In a possible implementation manner, the input method for acquiring the input method includes:

an input mode of an input method before entering speech recognition is acquired.

In a second aspect of embodiments of the present application, there is provided a voice input apparatus, including:

the first acquisition unit is used for the input method client to acquire the input box attribute of the current input box and/or acquire the input mode of the input method;

the determining unit is used for determining a voice recognition mode according to the attribute of the input box and/or the input mode of the input method, wherein the voice recognition mode comprises a Chinese recognition mode and a non-Chinese recognition mode;

the second acquisition unit is used for acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode;

and the display unit is used for displaying the identification result.

In a possible implementation manner, when only the input box attribute is obtained, the determining unit includes:

a first determining subunit, configured to determine that the speech recognition mode is a non-chinese recognition mode when the input box attribute is at least one of only allowing input of characters or only allowing input of letters, only allowing input of symbols, and only allowing input of numbers;

and the second determining subunit is used for determining that the voice recognition mode is the Chinese recognition mode when the attribute of the input box is that the input of the text is allowed.

In a possible implementation manner, when only the input manner of the input method is obtained, the determining unit includes:

the third determining subunit is used for determining that the voice recognition mode is a non-Chinese recognition mode when the input mode of the input method is a letter input mode, a symbol input mode or a number input mode;

and the fourth determining subunit is used for determining that the voice recognition mode is the Chinese recognition mode when the input mode of the input method is the Chinese input mode.

In a possible implementation manner, when acquiring the attribute of the input box and the input manner of the input method, the determining unit includes:

a fifth determining subunit, configured to determine that the speech recognition mode is a non-chinese recognition mode when the attribute of the input box is at least one of only allowing inputting characters or only allowing inputting letters, only allowing inputting symbols, and only allowing inputting numbers, and the input method is an alphabetic input mode, a symbol input mode, or a numeric input mode;

a sixth determining subunit, configured to determine that the speech recognition mode is a non-chinese recognition mode when the input box attribute is at least one of only allowing input of characters or only allowing input of letters, only allowing input of symbols, and only allowing input of numbers, and the input mode of the input method is a chinese input mode;

a seventh determining subunit, configured to determine that the speech recognition mode is a non-chinese recognition mode when the attribute of the input box is that a text is allowed to be input, and the input mode of the input method is an alphabetic input mode, a symbolic input mode, or a numeric input mode;

and the eighth determining subunit is used for determining that the voice recognition mode is the Chinese recognition mode when the attribute of the input box is that the text is allowed to be input and the input mode of the input method is the Chinese input mode.

In a possible implementation manner, the second obtaining unit includes:

the sending subunit is used for sending the voice recognition mode to a voice recognition module so that the voice recognition module performs voice recognition on the voice data to generate a recognition result;

and the acquisition subunit is used for acquiring the recognition result sent by the voice recognition module.

the acquiring subunit is specifically configured to acquire a chinese recognition result sent by the speech recognition module;

In a possible implementation manner, the second obtaining unit is specifically configured to perform speech recognition on the speech data according to the speech recognition mode to obtain a recognition result.

In a possible implementation manner, the second obtaining unit is specifically configured to, when the speech recognition mode is a chinese recognition mode, input the speech data into a chinese speech recognition model to obtain a chinese recognition result;

In one possible implementation, the apparatus further includes:

the third acquisition unit is used for acquiring the capitalization locking state of the input method;

the display unit is specifically used for displaying the letters in the character or character string recognition result as capital letters when the capital locking state of the input method is the capital locking state;

the display unit is specifically configured to display letters in the character or character string recognition result as lower case when the capitalization locked state of the input method is the capitalization unlocked state.

In a third aspect of embodiments herein, there is provided an apparatus for speech input, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors include instructions for:

the input method client side obtains the input box attribute of the current input box and/or obtains the input mode of the input method;

and displaying the identification result.

In a fourth aspect of embodiments herein, there is provided a computer-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform the method of speech input of the first aspect.

Therefore, the embodiment of the application has the following beneficial effects:

according to the input method, the input method client side firstly obtains the input box attribute of the current input box and/or the input mode of the input method. Then, according to the attribute of the input box and/or the input mode of the input method, the voice recognition mode is determined. That is, when only the input box attribute or only the input method of the input method is acquired, the voice recognition mode is determined according to the input box attribute or the input method, and if the input box attribute and the input method of the input method are acquired at the same time, the voice recognition mode is determined according to the input box attribute and the input method. And then acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode so as to display the recognition result. Therefore, by the method provided by the embodiment of the application, when the voice is input, the voice recognition mode, namely the Chinese recognition mode or the non-Chinese recognition mode, can be determined firstly based on the attribute of the input box and/or the input mode of the input method. When the voice recognition mode is the Chinese recognition mode, the acquired voice data is recognized as a Chinese recognition result, and when the voice recognition mode is the non-Chinese recognition mode, the acquired voice data is recognized as a non-Chinese recognition result, so that the accuracy of voice input is improved.

Drawings

Fig. 1 is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application;

fig. 2 is a flowchart of a voice input method according to an embodiment of the present application;

FIG. 3a is an exemplary diagram illustrating an input method using letter input;

FIG. 3b is an exemplary diagram illustrating an input method of a symbol input method;

FIG. 3c is an exemplary diagram illustrating an input method of a digital input method;

FIG. 3d is an exemplary diagram illustrating an input method of Chinese input mode;

fig. 4 is a structural diagram of a voice input device according to an embodiment of the present application;

FIG. 5 is a block diagram of another exemplary embodiment of a voice input device;

fig. 6 is a schematic structural diagram of a server device according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.

In order to facilitate understanding of the technical solutions provided in the present application, the following description will first be made on the background of the present application.

The inventor finds that, in the research on the traditional voice recognition scheme, the traditional voice input method directly recognizes the acquired voice data, and the recognition result is usually not accurate enough. For example, a user wants to input the letter "B" through the voice input function of the input method client, and since the pronunciation of the letter "B" is the same as that of the pinyin "bi", the conventional voice recognition scheme may recognize the letter "B" as "B", which may result in inaccurate recognition results. Particularly, for a television input method client, because the input cost of a user is high, the possibility that the user uses the voice input function of the input method client is high, and the problem that the existing voice input scheme is inaccurate in identification is more prominent when contents such as account passwords and the like are input.

Based on this, the embodiment of the present application provides a voice input method, and specifically, when a user performs voice recognition, an input method client first obtains an input box attribute of a current input box and/or obtains an input mode of an input method. And then, determining a voice recognition mode according to the attribute of the input box and/or the input mode of the input method, namely determining whether a Chinese recognition mode or a non-Chinese recognition mode is executed when voice recognition is carried out on voice data according to the current input environment so as to obtain a more accurate recognition result, and further displaying the recognition result to a user.

To facilitate understanding of the embodiments of the present application, reference is made to fig. 1, which is a schematic diagram of a framework of an exemplary application scenario provided by the embodiments of the present application. The voice input method provided by the embodiment of the present application can be applied to the input method client 10.

In practical application, the input method client 10 obtains the input box attribute of the current input box and/or obtains the input mode of the input method, and determines the speech recognition mode according to the input box attribute and/or the input mode of the input method, that is, determines whether to perform speech recognition according to the chinese recognition mode or the non-chinese recognition mode when performing speech recognition on speech data. And when the voice recognition mode corresponding to the current input environment is determined, acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode so as to display the recognition result.

It should be noted that, the operation of performing the recognition result of the voice recognition on the acquired voice data according to the voice recognition mode may be performed by the input method client 10 itself according to the voice recognition mode to obtain the recognition result; or the input method client 10 sends the voice recognition mode to the corresponding input method server 20 for voice recognition, and obtains the voice recognition result from the input method server 20; the input method client 10 may also send the speech recognition mode to the speech recognition module of another client or device, and then obtain the recognition result from the speech recognition module of another client or device. The other client may be a client different from the input method client, the other device may be a device independent from the input method server 20, and may be a device having a voice recognition function, which is currently available, being developed, or developed in the future.

Those skilled in the art will appreciate that the block diagram shown in fig. 1 is only one example in which embodiments of the present application may be implemented. The scope of applicability of the embodiments of the present application is not limited in any way by this framework.

It is noted that the client 10 may be hosted by a terminal, which may be any user equipment now existing, developing, or later developed that is capable of interacting with each other through any form of wired and/or wireless connection (e.g., Wi-Fi, LAN, cellular, coaxial cable, etc.), including but not limited to: smart wearable devices, smart phones, non-smart phones, tablets, laptop personal computers, desktop personal computers, minicomputers, midrange computers, mainframe computers, and the like, either now in existence, under development, or developed in the future. The embodiments of the present application are not limited in any way in this respect. It should also be noted that the server 20 in the embodiment of the present application may be an example of an existing, developing or future developing device capable of providing a voice recognition service. The embodiments of the present application are not limited in any way in this respect.

In order to facilitate understanding of the technical solutions provided by the embodiments of the present application, a speech input method provided by the embodiments of the present application will be described below with reference to the accompanying drawings.

Referring to fig. 2, which is a flowchart of a voice input method provided in an embodiment of the present application, as shown in fig. 2, the method may include:

s201: the input method client side obtains the input box attribute of the current input box and/or obtains the input mode of the input method.

In this embodiment, after the user inputs voice data through the input method client, before the input method client obtains a voice recognition result, the user first obtains an input box attribute of the current input box and/or an input mode of the input method, that is, obtains an environment of the current voice recognition.

The current input box refers to an input box where an input cursor is currently located, that is, an input box to which data is to be input. The input box attribute refers to an attribute that data can be input into the input box preset in the page where the input box is located, and may include that only characters are allowed to be input, only letters are allowed to be input, only symbols are allowed to be input, only numbers are allowed to be input, or text is allowed to be input. Wherein, the characters can comprise letters, symbols and numbers; the text may include chinese, and may also include letters, symbols, and numbers.

After triggering the input method, the user can switch the input mode of the input method, which may include a Chinese input mode, an alphabet input mode, a symbol input mode, a number input mode, or the like. The user can also trigger to enter the voice input mode after switching to a certain input mode of the input method, and the input mode of the input method can be acquired before entering the voice input mode. Then in some possible implementations the input modality that acquired the input modality may be the input modality that acquired the input modality prior to entering speech recognition.

S202: and determining a voice recognition mode according to the attribute of the input box and/or the input mode of the input method.

In this embodiment, when the attribute of the input box and/or the input method is obtained, the speech recognition mode is determined according to the attribute of the input box and/or the input method. The voice recognition mode comprises a Chinese recognition mode and a non-Chinese recognition mode.

It can be understood that when the input method client only obtains the attribute of the input box, the voice recognition mode is determined according to the attribute of the input box; when the input method client only acquires the input mode of the input method, the voice recognition mode is determined only according to the input mode of the input method; and when the input method client side simultaneously obtains the input box attribute and the input mode of the input method, determining the voice recognition mode according to the input box attribute and the input mode of the input method. The input method client may obtain one of the input method clients or obtain the input method clients simultaneously, and may be set according to actual requirements, which is not limited herein. A specific implementation of determining the speech recognition mode according to the input method and/or the input frame attribute will be described in the following embodiments.

S203: and acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode.

In this embodiment, after the voice recognition mode is determined, a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode is acquired. That is, when speech recognition is performed on speech data, speech recognition is performed on the speech data according to a speech recognition mode to obtain a recognition result, thereby improving the accuracy of speech input. For example, a user inputs voice data (pronunciation is equal to pinyin bi) through a voice acquisition module, and if the voice recognition mode is a Chinese voice recognition mode, the recognition result can be 'must'; and if the voice recognition mode is a non-Chinese voice recognition mode, the recognition result obtained by the input method client is 'b'. For another example, the user inputs a voice "dou hao" through the voice acquisition module, and if the voice recognition mode is a Chinese voice recognition mode, the recognition result is a Chinese "comma"; if the speech recognition mode is a non-Chinese speech recognition mode, the recognition result may be a symbol ",".

It should be noted that the operation of performing speech recognition on the acquired speech data according to the speech recognition mode may be performed by the input method client, or the input method client may send the determined speech recognition mode to a speech recognition module independent of the input method client, where the speech recognition module performs speech recognition on the speech data according to the speech recognition mode to obtain a recognition result, and sends the recognition result to the input method client. Here, an operation of performing speech recognition on speech data according to a speech recognition mode will be described in the following embodiments.

S204: and displaying the recognition result.

And after the identification result is obtained, displaying the identification result to the user so as to carry out screen-on display under the triggering operation of the user.

According to the embodiment, the input method client firstly acquires the input box attribute of the current input box and/or the input mode of the input method. Then, according to the input box attribute and/or the input mode, a voice recognition mode is determined. That is, when only the attribute of the input box or only the input method of the input method is acquired, the voice recognition mode is determined according to the attribute of the input box or the input method, and if the attribute of the input box and the input method of the input method are acquired at the same time, the voice recognition mode is determined according to the input box and the input method. And then acquiring a recognition result of performing voice recognition on the acquired voice data according to the voice recognition mode so as to display the recognition result. Therefore, by the method provided by the embodiment of the application, when the voice recognition is carried out, the voice recognition mode, namely the Chinese recognition mode or the non-Chinese recognition mode, can be determined firstly based on the attribute of the input box and/or the input mode of the input method. When the voice recognition mode is the Chinese recognition mode, the acquired voice data is recognized as a Chinese recognition result, and when the voice recognition mode is the non-Chinese recognition mode, the acquired voice data is recognized as a non-Chinese recognition result, so that the accuracy of voice input is improved.

In a possible implementation manner of the embodiment of the present application, an implementation manner of determining a speech recognition mode according to an attribute of an input box and/or an input manner of an input method in the above embodiment is described.

The first method is that when only the input box attribute is acquired, the voice recognition mode is determined according to the input box attribute and/or the input mode of the input method, and comprises the following steps: when the attribute of the input box is at least one of only allowing input of characters, only allowing input of letters, only allowing input of symbols and only allowing input of numbers, determining that the voice recognition mode is a non-Chinese recognition mode; when the input box attribute is that text is allowed to be input, the speech recognition mode is determined to be a Chinese recognition mode.

In a specific implementation, when only the attribute of the input box is obtained, the speech recognition mode is determined only according to the attribute of the input box, specifically, when the attribute of the input box is at least one of only allowing inputting characters or only allowing inputting letters, only allowing inputting symbols and only allowing inputting numbers, the speech recognition mode is determined to be a non-Chinese recognition mode, so that the obtained speech data is recognized as non-Chinese during speech recognition. For example, if the current input box is a password input box, and the input box attribute of the password input box is that only characters are allowed to be input, the speech recognition mode is determined to be a non-chinese recognition mode. And if the attribute of the input box is that the text is allowed to be input, namely the Chinese character is allowed to be input, determining that the voice recognition mode is the Chinese character recognition mode, thereby ensuring that the acquired voice is recognized as the Chinese character during voice recognition. For example, when the current input box is a name input box whose input box attribute is that text is allowed to be input, the speech recognition mode is determined as a chinese recognition mode so that chinese is recognized.

Secondly, when only the input mode of the input method is acquired, determining the voice recognition mode according to the attribute of the input box and/or the input mode of the input method includes: and when the input mode of the input method is a letter input mode, a symbol input mode or a number input mode, determining that the voice recognition mode is a non-Chinese recognition mode. For example, as shown in fig. 3a, when the input method is the letter input method, i.e. the english input method; FIG. 3b shows the input method being a symbol input method; fig. 3c shows that the input mode of the input method is a digital input mode, and the input mode of the input method can be switched by triggering a corresponding input mode key by a user. When the input method of the input method is any one of the above modes, the speech recognition mode is a non-Chinese recognition mode.

And when the input mode of the input method is a Chinese input mode, determining the voice recognition mode as a Chinese recognition mode. For example, as shown in fig. 3d, if the input method is a chinese input method, the speech recognition mode is a chinese recognition mode.

In a specific implementation, when only the input mode of the input method is obtained, the speech recognition mode is determined only according to the input mode of the input method, specifically, when the input mode of the input method is an alphabetic input mode, a symbolic input mode or a numeric input mode, the speech recognition mode is determined to be a non-Chinese recognition mode, so that the obtained speech data is recognized as non-Chinese during speech recognition. When the input mode of the input method is a Chinese input mode, determining the voice recognition mode as a Chinese recognition mode so as to recognize the voice data as Chinese.

Thirdly, when the input box attribute and the input mode of the input method are acquired, the voice recognition mode needs to be determined according to the input box attribute and the input mode of the input method. In this case, the following four cases can be divided for explanation.

1) And when the attribute of the input box is at least one of only allowing characters to be input or only allowing letters to be input, only allowing symbols to be input and only allowing numbers to be input, and the input mode of the input method is an alphabetic input mode, a symbol input mode or a numeric input mode, determining that the voice recognition mode is a non-Chinese recognition mode.

As can be seen from the above description of the two cases, when the attribute of the input box is at least one of only allowing characters to be input, only allowing letters to be input, only allowing symbols to be input, and only allowing numbers to be input; when the input method is in a letter input mode, a symbol input mode or a number input mode. When the voice recognition mode is independently determined, determining that the voice recognition mode is a non-Chinese recognition mode according to the attribute of the input box; and determining the speech recognition mode as a non-Chinese recognition mode according to the input mode of the input method. And when the attribute of the input box and the input mode of the input method are simultaneously acquired and the voice recognition modes determined by the input box and the input method are consistent, determining that the voice recognition mode is a non-Chinese recognition mode.

2) And when the attribute of the input box is at least one of only allowing characters to be input or only allowing letters to be input, only allowing symbols to be input and only allowing numbers to be input, and the input mode of the input method is a Chinese input mode, determining that the speech recognition mode is a non-Chinese recognition mode.

When the input box attribute is at least one of only allowing input of characters or only allowing input of letters, only allowing input of symbols and only allowing input of numbers; when the input mode of the input method is a Chinese input mode. If the voice recognition mode is independently determined, the voice recognition mode is determined to be a non-Chinese recognition mode according to the attribute of the input box, the voice recognition mode is determined to be a Chinese recognition mode according to the input mode of the input method, and the determined voice recognition modes are mutually contradictory. However, since the input box does not allow the input of chinese, the speech recognition mode is determined to be a non-chinese recognition mode based on the input box attribute.

3) And when the attribute of the input box is that the text is allowed to be input and the input mode of the input method is an alphabetic input mode, a symbol input mode or a numeric input mode, determining that the voice recognition mode is a non-Chinese recognition mode.

In a specific implementation, when the attribute of the input box is that text is allowed to be input, and the input mode of the input method is an alphabetic input mode, a symbolic input mode or a numeric input mode. When the voice recognition mode is independently determined, the voice recognition mode is determined to be the Chinese recognition mode according to the attribute of the input frame, the voice recognition mode is determined to be the non-Chinese recognition mode according to the input mode of the input method, and the determined voice recognition modes are mutually contradictory. In this scenario, the speech recognition mode may be determined to be a non-chinese recognition mode mainly based on the input mode of the input method. For a user, an input mode of the input method is usually selected according to contents to be input, and if the input mode of the input method is an alphabetic input mode, a symbol input mode or a numeric input mode, which represents that the user is more likely to input letters, symbols or numbers, the voice recognition mode is determined to be a non-Chinese recognition mode. Two different cases will be specifically described below.

In one case, when the attribute of the input box is that text is allowed to be input, it represents that non-Chinese data such as letters, symbols, and numbers can be input in the input box, and if the input mode of the input method selected by the user is the letter input mode, the symbol input mode, or the number input mode, it represents that the user wishes to input letters, symbols, or numbers in the input box that allows text to be input, it is determined that the speech recognition mode is the non-Chinese recognition mode.

In another case, there may be a case where the input box attribute of the input box is incorrectly labeled, when the input box attribute is incorrectly labeled as "allow to input text", the user does not know the input box attribute, but the input mode of the input method may be switched according to the prompt of the required input content, and when the input mode of the input method selected by the user is the letter input mode, the symbol input mode or the number input mode, it represents that the user wishes to input letters, symbols or numbers in the input box, and the speech recognition mode is determined to be the non-chinese recognition mode. For example, the input box where the cursor is located is a mobile phone number input box, if the input box attribute is correct, the input box attribute is only allowed to input numbers, if a developer mistakenly marks the input box attribute as a text allowed to be input, in actual application, a user can switch the input mode of the input method to a number input mode through a prompt of 'please input the mobile phone number' and the like so as to input the numbers in the input box, and then the voice recognition mode is determined to be a non-Chinese voice recognition mode according to the input mode of the input method.

The input mode of the input method is mainly used for determining that the voice recognition mode is a non-Chinese recognition mode, so that the input intention of a user can be reflected, and the recognition is more accurate.

4) And when the attribute of the input box is that the text is allowed to be input and the input mode of the input method is a Chinese input mode, determining that the voice recognition mode is a Chinese recognition mode.

In this embodiment, when the attribute of the input box is that a text is allowed to be input and the input mode of the input method is a chinese input mode, when the speech recognition mode is independently determined, the speech recognition mode is determined to be a chinese recognition mode according to the attribute of the input box, and the speech recognition mode is determined to be a chinese recognition mode according to the input mode of the input method. When the voice recognition mode is determined according to the attribute of the input box and the input mode of the input method, the determination result of the voice recognition mode and the determination result of the input box are the same, and the voice recognition mode is the Chinese recognition mode.

In a possible implementation manner of the embodiment of the present application, two manners of performing speech recognition on speech data according to a speech recognition mode are further provided, and the two manners of performing speech recognition on speech data will be described below.

One way is that a voice recognition mode is sent to a voice recognition module so that the voice recognition module carries out voice recognition on voice data to generate a recognition result; and acquiring the recognition result sent by the voice recognition module.

In this embodiment, after determining the speech recognition mode, the input method client sends the speech recognition mode to the speech recognition module, so that the speech recognition module performs speech recognition on speech data according to the speech recognition mode to generate a recognition result, and sends the recognition result to the input method client, thereby displaying the recognition result. The voice recognition module is other application programs independent of the input method client and is used for recognizing voice data.

It can be understood that the speech recognition mode includes a chinese recognition mode and a non-chinese recognition mode, and in practical application, the speech recognition module generates a corresponding recognition result according to the current speech recognition mode, specifically, the speech recognition module is specifically configured to input speech data into the chinese speech recognition mode when the speech recognition mode is the chinese recognition mode, so as to obtain a chinese recognition result; when the voice recognition mode is a non-Chinese recognition mode, inputting voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result; obtaining the recognition result sent by the voice recognition module, including: acquiring a Chinese recognition result sent by a voice recognition module; or acquiring a character or character string recognition result sent by the voice recognition module.

That is, the speech recognition module includes a pre-trained Chinese speech recognition model and a non-Chinese speech recognition model. And when the voice recognition mode is the Chinese recognition mode, inputting the voice data into the Chinese voice recognition model to obtain a Chinese recognition result. When the speech recognition mode is a non-Chinese recognition mode, the speech data is input into the non-Chinese speech recognition model to obtain a non-Chinese recognition result, so that the input method client can obtain an accurate recognition result. For example, in the application scenario shown in fig. 3a, a user inputs voice data (pronunciation is equal to pinyin bi) through a voice acquisition module of an input method client, the input method is an alphabet input method, the voice recognition mode is a non-chinese voice recognition mode, and the recognition result is an english letter "b". For another example, in the application scenario shown in fig. 3d, the user inputs a voice (pronunciation is equal to pinyin bi) through the voice acquisition module of the input method client, the input method is a chinese input method, the voice recognition mode is a chinese voice recognition mode, and the recognition result should be one of the chinese characters "must, close, and complete" corresponding to pinyin "bi".

Alternatively, speech recognition is performed on the speech data according to a speech recognition mode to obtain a recognition result.

In this embodiment, after the input method client determines the voice recognition mode, the input method client may perform voice recognition on the voice data to obtain a recognition result. In the specific implementation, the voice recognition can be performed locally on the equipment where the input method client is located, or the voice recognition mode can be sent to the corresponding input method server for voice recognition, so that the voice recognition of voice data by the input method client according to the voice recognition mode is realized.

It can be understood that the speech recognition mode includes a chinese recognition mode and a non-chinese recognition mode, and in actual application, the input method client or the input method server generates a corresponding recognition result according to the current speech recognition mode, specifically, when the speech recognition mode is the chinese recognition mode, the speech data is input to the chinese speech recognition mode to obtain the chinese recognition result; and when the voice recognition mode is a non-Chinese recognition mode, inputting the voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result.

That is, a Chinese speech recognition model and a non-Chinese speech recognition model are trained in advance, and when the speech recognition model is a Chinese recognition model, speech data is input into the Chinese speech recognition model to obtain a Chinese recognition result. When the speech recognition mode is a non-Chinese recognition mode, the speech data is input into the non-Chinese speech recognition mode to obtain a non-Chinese recognition result, so that the input method client can obtain a Chinese recognition result or a non-Chinese recognition result.

In addition, in practical application, the user can set upper case or lower case at the input method client so as to perform corresponding input. Therefore, when voice data is recognized, corresponding display can be performed according to whether the current capitalization is in a locked state or not. Specifically, acquiring an capitalization locking state of an input method; when the capitalization locking state of the input method is the capitalization locking state, displaying letters in the character or character string recognition result as capitalization; and when the capitalization locking state of the input method is the capitalization unlocking state, displaying letters in the character or character string recognition result as lowercase.

It can be understood that, in the embodiment, mainly for the recognition result that the recognition result is the character or character string recognition result, when the capitalization locking state of the input method is the capitalization locking state, the letters in the character or character string recognition result are displayed as capitalization; and when the capitalization locking state of the input method is the capitalization unlocking state, displaying letters in the character or character string recognition result as lowercase. For example, the recognition result is a character B, and if the input method capitalization locked state is capitalization locked, it is displayed as "B", and if it is capitalization unlocked state, it is displayed as "B". Therefore, the identification result can be accurately input according to the capitalization locking state of the input method, and modification of a user is reduced.

By the scheme provided by the embodiment of the application, when the voice recognition is carried out, the voice recognition mode, namely the Chinese recognition mode or the non-Chinese recognition mode, can be determined firstly based on the attribute of the input box and/or the current input mode of the input method. When the voice recognition mode is the Chinese recognition mode, the acquired voice data is recognized as a Chinese recognition result, and when the voice recognition mode is the non-Chinese recognition mode, the acquired voice data is recognized as a non-Chinese recognition result, so that the accuracy of voice input is improved.

Based on the above method embodiment, the present application further provides a voice input device, which will be described below with reference to the accompanying drawings.

Referring to fig. 4, which is a block diagram of a voice input device provided in an embodiment of the present application, as shown in fig. 4, the device may include:

a first obtaining unit 401, configured to obtain, by an input method client, an input box attribute of a current input box, and/or obtain an input mode of an input method;

a determining unit 402, configured to determine a speech recognition mode according to the attribute of the input box and/or an input manner of the input method, where the speech recognition mode includes a chinese recognition mode and a non-chinese recognition mode;

a second obtaining unit 403, configured to obtain a recognition result of performing voice recognition on the obtained voice data according to the voice recognition mode;

a display unit 404, configured to display the recognition result.

In a possible implementation manner, the second obtaining unit includes:

In one possible implementation, the apparatus further includes:

In a possible implementation manner, the first obtaining unit is specifically configured to:

the input method client side obtains the input box attribute of the current input box and/or obtains the input mode of the input method before entering the voice recognition.

It should be noted that, for specific implementation of each unit in this embodiment, reference may be made to the above method embodiment, and this embodiment is not described herein again.

Fig. 5 shows a block diagram of an input device 600. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, the apparatus 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 69, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 606 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply component 606 provides power to the various components of device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, audio component 610 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the apparatus 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 614 may also detect a change in position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in temperature of the apparatus 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the apparatus 600 and other devices in a wired or wireless manner. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following methods:

and displaying the identification result.

Optionally, when only the attribute of the input box is obtained, the determining the voice recognition mode according to the attribute of the input box and/or the input mode of the input method includes:

Optionally, when only the input mode of the input method is acquired, the determining the voice recognition mode according to the attribute of the input box and/or the input mode of the input method includes:

Optionally, when the attribute of the input box and the input mode of the input method are obtained, the determining the voice recognition mode according to the attribute of the input box and/or the input mode of the input method includes:

Optionally, the obtaining a recognition result of performing voice recognition on the voice data according to the voice recognition mode includes:

and acquiring the recognition result sent by the voice recognition module.

Optionally, the speech recognition module is specifically configured to, when the speech recognition mode is a chinese recognition mode, input the speech data into a chinese speech recognition model to obtain a chinese recognition result; when the voice recognition mode is a non-Chinese recognition mode, inputting the voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result;

acquiring a Chinese recognition result sent by the voice recognition module;

Optionally, the performing voice recognition on the voice data according to the voice recognition mode to obtain a recognition result includes:

Optionally, when a character or character string recognition result is obtained, the displaying the recognition result includes:

acquiring an capitalization locking state of an input method;

Optionally, the input method for obtaining the input method includes:

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a display method, the method comprising:

and displaying the identification result.

and acquiring the recognition result sent by the voice recognition module.

acquiring a Chinese recognition result sent by the voice recognition module;

acquiring an capitalization locking state of an input method;

Optionally, the input method for obtaining the input method includes:

Fig. 6 is a schematic structural diagram of a server in an embodiment of the present invention. The server 700 may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Further, the central processor 722 may be configured to communicate with the storage medium 730, and execute a series of instruction operations in the storage medium 730 on the server 700.

The terminal 700 can also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 756, one or more keyboards 756, and/or one or more operating systems 741, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of speech input, the method comprising:

and displaying the identification result.

2. The method according to claim 1, wherein when only the input box attribute is acquired, the determining a speech recognition mode according to the input box attribute and/or the input mode of the input method includes:

3. The method according to claim 1, wherein when only the input mode of the input method is obtained, the determining the voice recognition mode according to the input box attribute and/or the input mode of the input method includes:

4. The method according to claim 1, wherein when the input box attribute and the input method of the input method are acquired, the determining the voice recognition mode according to the input box attribute and/or the input method comprises:

5. The method according to claim 1, wherein the obtaining a recognition result of the voice recognition of the voice data according to the voice recognition mode comprises:

and acquiring the recognition result sent by the voice recognition module.

6. The method according to claim 5, wherein the speech recognition module is specifically configured to input the speech data into a Chinese speech recognition model when the speech recognition mode is a Chinese recognition mode, to obtain a Chinese recognition result; when the voice recognition mode is a non-Chinese recognition mode, inputting the voice data into the non-Chinese voice recognition model to obtain a character or character string recognition result;

acquiring a Chinese recognition result sent by the voice recognition module;

7. The method according to claim 1, wherein the obtaining a recognition result of the voice recognition of the voice data according to the voice recognition mode comprises:

8. The method of claim 7, wherein performing speech recognition on the speech data according to the speech recognition mode to obtain a recognition result comprises:

9. The method according to claim 6 or 8, wherein when obtaining a character or character string recognition result, the displaying the recognition result comprises:

acquiring an capitalization locking state of an input method;

10. The method of claim 1, wherein the input method of obtaining the input method comprises:

11. A speech input apparatus, characterized in that the apparatus comprises:

and the display unit is used for displaying the identification result.

12. An apparatus for speech input comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for:

and displaying the identification result.

13. A computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform a method of speech input as recited in one or more of claims 1-10.