CN106531168B

CN106531168B - Voice recognition method and device

Info

Publication number: CN106531168B
Application number: CN201611035487.2A
Authority: CN
Inventors: 赵东阳
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2016-11-18
Filing date: 2016-11-18
Publication date: 2020-04-28
Anticipated expiration: 2036-11-18
Also published as: CN106531168A

Abstract

The invention discloses a voice recognition method and a voice recognition device. The method comprises the following steps: when voice input triggering operation is detected, a recording thread in the control terminal is always kept in a starting state in a preset process that a preset voice input method is started, so that first voice information input is recorded through the recording thread; and starting the first recognition thread to recognize the first voice information as the first text information through the first recognition thread. According to the technical scheme, the recording thread in the terminal is controlled to be in the starting state all the time in the preset process of starting the preset voice input method, so that the first voice information input in the recording process can be recorded, the recording thread does not need to be started repeatedly when a user starts a recording key repeatedly, the terminal is in the recording state all the time in the preset process, and the problem of recording delay in the related technology is avoided as much as possible.

Description

Voice recognition method and device

Technical Field

The present invention relates to the field of terminal technologies, and in particular, to a voice recognition method and apparatus.

Background

At present, when a voice input method is used for voice recognition, recording is started when a recording key is pressed down, then the recorded voice is recognized as characters and then output, a recording scheme in the related technology corresponds to two threads, namely a recording thread and an identification thread, and the specific recording process is as follows: when a recording key is pressed down, a recording thread and an identification thread corresponding to the voice input method are started, the recording thread is responsible for starting the terminal for recording, the identification thread is responsible for taking recording data from the recording thread and sending the recording data to the server for identification, after the identification is finished, identified characters are obtained from the server and output, but a certain time is needed when the recording thread starts the terminal for recording, so that the recording delay can be caused by a recording scheme in the related technology, and meanwhile, the current identification process cannot be immediately finished when the recording key is lifted up due to the problem of processing delay of the identification thread, and only one identification thread is needed, so that when a user continuously lifts up and presses down the recording key in a short time, the identification thread is likely to not immediately process the next recording process due to the fact that the recording data of the previous recording process is not identified, and a serious identification lag problem occurs or the identification thread only needs to reduce the identification lag as much as possible And part of voice information can be discarded without being recognized, so that the voice input experience of a user is influenced, and great inconvenience is brought to the user.

Disclosure of Invention

The invention provides a voice recognition method and a voice recognition device, which are used for keeping a starting state in a preset process of starting a preset voice input method through a recording thread in a control terminal when a voice input triggering operation is detected, so that not only can first voice information input in the recording process be recorded through the recording thread, but also a recording thread does not need to be started repeatedly when a user starts a recording key repeatedly, and further the terminal does not need to be started repeatedly for recording, so that the terminal is always in the recording state in the preset process, the problem that the recording thread delays to start the terminal for recording is avoided as far as possible, and the problem of recording delay in the related technology is avoided as far as possible.

The invention provides a voice recognition method, which comprises the following steps:

when voice input triggering operation is detected, a recording thread in the control terminal is always kept in a starting state in a preset process that a preset voice input method is started, so that first voice information input is recorded through the recording thread;

and starting a first identification thread to identify the first voice information as first text information through the first identification thread.

In one embodiment, the method further comprises:

in the preset process that the preset voice input method is started, if the recording triggering operation on the preset recording key is detected again, a second identification thread is started;

and identifying the second voice information recorded by the recording thread as second character information through a second identification thread.

In one embodiment, the terminal includes: at least two identification threads, wherein the at least two identification threads comprise at least the first identification thread and the second identification thread;

in the preset process that the preset voice input method is started, if the recording triggering operation on the preset recording key is detected again, starting a second identification thread, including:

in a preset process that the preset voice input method is started, if the recording triggering operation is detected again, determining an idle identification thread from the at least two identification threads;

determining the idle identification thread as the second identification thread.

In an embodiment, in a preset process in which the preset speech input method is enabled, if a recording trigger operation on a preset recording key is detected again, starting a second identification thread includes:

when the recording triggering operation is detected again, determining whether the first identification thread stops running or not;

and starting the second identification thread when the first identification thread does not stop running, wherein the first identification thread and the second identification thread are different identification threads.

In one embodiment, the voice input trigger operation comprises: and starting the preset voice input method or triggering the recording of a preset recording key for the first time after the preset voice input method is started.

The present invention also provides a speech recognition apparatus comprising:

the control module is used for controlling a recording thread in the terminal to keep a starting state in a preset process that a preset voice input method is started when voice input triggering operation is detected, so that first voice information input is recorded through the recording thread;

the first starting module is used for starting a first recognition thread so as to recognize the first voice information as first character information through the first recognition thread.

In one embodiment, the apparatus further comprises:

the second starting module is used for starting a second identification thread if the recording triggering operation of the preset recording key is detected again in the preset process of starting the preset voice input method;

and the recognition module is used for recognizing the second voice information recorded by the recording thread into second text information through a second recognition thread.

the second activation module comprises:

the first determining submodule is used for determining an idle identification thread from the at least two identification threads if the recording triggering operation is detected again in the preset process that the preset voice input method is started;

a second determining submodule, configured to determine the idle identification thread as the second identification thread.

In one embodiment, the second activation module comprises:

a third determining submodule, configured to determine whether the first identification thread has stopped running when the recording trigger operation is detected again;

and the starting module is used for starting the second identification thread when the first identification thread does not stop running, wherein the first identification thread and the second identification thread are different identification threads.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

when voice input triggering operation is detected, the recording thread in the control terminal is kept in a starting state all the time in the preset process that the preset voice input method is started, not only can first voice information input in the recording process be recorded through the recording thread, but also when a user repeatedly starts the recording key, the recording thread does not need to be started repeatedly, and further the terminal is not required to be started repeatedly for recording, so that the terminal is always in a recording state in the preset process, the problem that the recording thread delays to start the terminal for recording is avoided as much as possible, and the problem of recording delay in the related technology is avoided as much as possible.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow diagram illustrating a method of speech recognition according to an example embodiment.

FIG. 2 is a flow diagram illustrating another method of speech recognition according to an example embodiment.

FIG. 3 is a block diagram illustrating a speech recognition apparatus according to an example embodiment.

FIG. 4 is a block diagram illustrating another speech recognition apparatus according to an example embodiment.

FIG. 5 is a block diagram illustrating yet another speech recognition apparatus according to an example embodiment.

Fig. 6 is a block diagram illustrating yet another speech recognition apparatus according to an example embodiment.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

In the related art, when the voice recognition is performed by using the voice input method, the recording is started when the recording key is pressed, and then the recorded voice is recognized as characters and then output, and the recording scheme in the related art corresponds to two threads, namely a recording thread and an identification thread, and the specific recording process is as follows: when a recording key is pressed down, a recording thread and an identification thread corresponding to the voice input method are started, the recording thread is responsible for starting the terminal for recording, the identification thread is responsible for taking recording data from the recording thread and sending the recording data to the server for identification, after the identification is finished, identified characters are obtained from the server and output, but a certain time is needed when the recording thread starts the terminal for recording, so that the recording delay can be caused by a recording scheme in the related technology, and meanwhile, the current identification process cannot be immediately finished when the recording key is lifted up due to the problem of processing delay of the identification thread, and only one identification thread is needed, so that when a user continuously lifts up and presses down the recording key in a short time, the identification thread is likely to not immediately process the next recording process due to the fact that the recording data of the previous recording process is not identified, and a serious identification lag problem occurs or the identification thread only needs to reduce the identification lag as much as possible For example, if the user presses the recording key immediately after lifting the recording key, the recognition thread often cannot process the next recording process immediately due to recognizing the voice information recorded in the previous recording process, and the recognition thread starts to prepare the next recording process until the recognition thread completes the recognition of the voice information of the previous recording process, so that the recording thread is started in a delayed manner to lose part of the voice information, or even if the recording thread is started immediately when the recording key is pressed, the recognition thread starts to recognize the voice information received at the current moment and discards the part of the voice information recorded by the recording thread in order to synchronize the recognition process with the voice input process as much as possible, or if the recording thread recognizes the voice information from the beginning of recording the voice information, the user is greatly inconvenienced And serious identification lag, such as 1 or 2 second lag, is common.

In order to solve the above technical problem, the present disclosure provides a voice recognition method, which is suitable for an information input program, system or device, and an execution body of the method is provided with a preset voice input method capable of converting recorded voice information into text information, such as a cloud known sound, a science fiction flight, and the like, as shown in fig. 1, the method includes step S101 and step S102, wherein,

in step S101, when a voice input trigger operation is detected, a recording thread in the control terminal is always kept in a starting state in a preset process in which a preset voice input method is enabled, so as to record input first voice information through the recording thread;

the preset voice input method can be an input method which can convert recorded voice information into character information, such as a cloud unknown voice, a science news flight and the like, and the preset voice input method is started to indicate that the currently used input method is switched to the preset voice input method.

In addition, the voice input triggering operation may include: starting the preset voice input method, so that the recording thread can be started when the current input method is switched to the preset voice input method and is always in a starting state in the process of staying in the preset voice input method, and is not influenced by repeated triggering operations of a user on the preset voice key in the preset process (wherein the triggering operations are used for starting or closing the preset voice key, such as pressing the preset voice key or lifting the preset voice key, and the like);

or

The voice input triggering operation may further include: after the preset voice input method is started, the recording triggering operation executed on the preset recording key is performed for the first time, so that the recording thread can be in an open state all the time when a user executes the recording triggering operation (for example, presses the preset recording key) on the preset recording key for the first time after the preset voice input method is started, the recording thread is not influenced by repeated triggering operation of the user on the preset recording key, and the operation is stopped until the current input method is switched to other input methods and the like, and in this case, the time starting point of the preset process is the time when the preset voice input method is started, the recording triggering operation of the user on the preset recording key is received for the first time, and the time ending point of the preset process is still the time when the current input method is switched from the preset voice input method to other non-voice input methods.

In step S102, a first recognition thread is started to recognize the first voice message as a first text message through the first recognition thread.

In the recording process, a first recognition thread can be started to recognize the first voice information input by the user at this time as the first character information so as to complete the voice recognition process.

As shown in fig. 2, in an embodiment, the method shown in fig. 1 may further include step S201 and step S202:

in step S201, in a preset process in which the preset voice input method is activated, if a recording trigger operation on the preset recording key is detected again, a second identification thread is started;

the preset recording key may be a recording key on an external microphone connected to an execution main body in which the preset voice input method is installed, or

The preset recording key can be a preset key on a peripheral keyboard connected with an execution main body provided with a preset voice input method, or

The preset recording key can also be a virtual recording key arranged on an execution main body of the preset voice input method.

The terminal is always in a recording state in the preset process and can record voice information input by a user all the time, so the recording trigger operation is used for indicating that the user expects to start an effective recording process, the recording trigger operation can be clicking and long-pressing the preset recording key, and the recording trigger operation of the preset recording key is detected again to indicate that the recording trigger operation of the preset recording key is executed again after the last effective recording process expected by the user is finished, and if the recording trigger operation is pressing the preset recording key, the recording trigger operation is detected again to lift the preset recording key and then press the preset recording key for the user.

In addition, the detection of the recording triggering operation on the preset voice key again indicates that the user performs the recording triggering operation on the preset voice key for the nth time in the preset process of starting the preset voice input method, wherein N is a positive integer greater than or equal to 2.

In step S202, the second speech information recorded by the recording thread is recognized as the second text information by the second recognition thread, and certainly, after the second recognition thread is enabled to recognize the second speech information, the first recognition thread may still not stop running, and still perform operations such as ending and resetting of recording and processing the first speech information, and the first recognition thread may initialize after processing the first speech information, and this processing procedure is similar to the processing steps in the related art and is not repeated here.

If the recording triggering operation is detected again in the preset process of starting the preset voice input method, it is indicated that the user wishes to start the next active recording process and, thus, by starting the second identification thread, another identification thread (namely a second identification thread) can be used for identifying the complete second voice information recorded by the recording thread at this time as second text information in time, thereby realizing the voice recognition process, simultaneously avoiding the situation that the first recognition thread can not immediately process the recording process because the first voice information of the previous recording process is not processed because the same recognition thread (namely the first recognition thread) is used in the two adjacent recording processes, and further, the problem of serious recognition lag is avoided, or the first recognition thread is avoided to discard only part of voice information at the beginning of the second voice information and not perform recognition processing on the voice information in order to reduce the problem of recognition lag as much as possible.

In addition, when the re-received recording trigger operation is the received Nth recording trigger operation, the first voice message is the recorded voice message from the time period from the receiving time of the Nth-1 th recording trigger operation (wherein, the recording trigger operation is used for indicating the starting of the current recording process, that is, the starting of the Nth-1 th recording process) to the receiving time of the Nth-1 th recording stop operation (wherein, the recording stop operation is used for indicating the ending of the current recording process, that is, the ending of the Nth-1 th recording process, and when the recording trigger operation is pressing the preset recording key, the recording stop operation may be lifting the preset recording key) in the time period (of course, since the recording thread is kept in the starting state in the preset process all the time, the terminal is still in the recording state from the receiving time of the Nth-1 th recording stop operation to the receiving time of the Nth recording trigger operation, only the recording data in this process may not be the voice information required by the user, and the first voice information and the second voice information are the voice information required by the user), and the second voice information is the voice information recorded in the time period from the receiving time of the nth recording trigger operation to the receiving time of the nth recording stop operation, that is, the voice information in the nth recording process.

In one embodiment, a terminal includes: at least two identification threads, wherein the at least two identification threads include at least a first identification thread and a second identification thread, for example: the at least two identification threads may comprise 3 identification threads, etc.;

by opening a plurality of identification threads which can be used in cooperation with the recording thread in the terminal, the preset voice input method can realize that if a user repeatedly executes recording triggering operation on the preset recording key to start a plurality of different effective recording processes in the preset process of starting the preset voice input method, different identification threads can be conveniently switched and used in two adjacent recording processes.

The step S201 shown in fig. 2 may be performed as follows:

in the preset process that the preset voice input method is started, if the recording triggering operation is detected again, determining an idle identification thread from at least two identification threads;

the idle identified thread is determined to be the second identified thread.

In the preset process that the preset voice input method is started, if the recording triggering operation is detected again, an idle identification thread can be automatically determined from at least two identification threads according to whether each identification thread of the at least two identification threads is in an idle state, and the idle identification thread is determined as a second identification thread, so that the idle identification thread is used for identifying and processing the recording process in time, and the problem that the same identification thread is used in two adjacent recording processes to cause serious identification lag is avoided, or partial voice information recorded in the recording process is discarded without being identified and processed in order to reduce the identification lag problem.

In one embodiment, step S201 shown in fig. 2 may be performed as follows:

in determining whether the first identified thread has stopped running, it may be determined whether the state of the first identified thread is marked as a stopped running state, indicating that the first identified thread has stopped running when the state of the first identified thread is marked as a stopped running state, and indicating that the first identified thread has not stopped running when the state of the first identified thread is not marked as a stopped running state.

And when the first identification thread does not stop running, starting a second identification thread, wherein the first identification thread and the second identification thread are different identification threads, and the preset time interval can be set in a personalized manner.

When the recording triggering operation of the preset recording key is detected again, it is indicated that the user expects to start a new recording process again, and only the previous recording process may be processed, that is, the first identification thread may stop running, so that another identification thread does not need to be started, and therefore, it is necessary to determine whether the first identification thread stops running or not, and if not, it is indicated that the previous recording process is not processed, so that, in order to avoid that the first identification thread cannot process the recording process in time, the second identification thread can be called, so that the recording process can be processed in time after the second identification thread runs, and the problem of losing the recording data of the recording process or having serious identification lag is avoided as much as possible.

In one embodiment, the voice input trigger operation includes: starting the preset voice input method or triggering the recording of the preset recording key for the first time after the preset voice input method is started.

The voice input trigger operation may include: the preset voice input method is started or after the preset voice input method is started, the recording triggering operation of the preset recording key is carried out for the first time, so that the recording thread can be started when the current input method is switched to the preset voice input method and is always in the starting state in the process of staying in the preset voice input method, or the recording thread can be always in the starting state when a user carries out the recording triggering operation of the preset recording key for the first time after the preset voice input method is started.

As shown in fig. 3, the present invention also provides a speech recognition apparatus, including:

the control module 301 is configured to, when a voice input trigger operation is detected, control a recording thread in the terminal to keep a start state all the time in a preset process in which a preset voice input method is enabled, so as to record input first voice information through the recording thread;

the first starting module 302 is configured to start a first recognition thread to recognize the first voice message as a first text message through the first recognition thread.

As shown in fig. 4, in an embodiment, the apparatus shown in fig. 3 may further include a second starting module 401 and an identification module 402:

a second starting module 401, configured to start a second identification thread if a recording triggering operation on the preset recording key is detected again in a preset process in which the preset voice input method is started;

and the recognition module 402 is configured to recognize the second voice information recorded by the recording thread as the second text information through the second recognition thread.

As shown in fig. 5, in one embodiment, the terminal includes: the method comprises the steps that at least two identification threads are used, wherein the at least two identification threads at least comprise a first identification thread and a second identification thread;

the second starting module 401 shown in fig. 4 may include a first determining sub-module 4011 and a second determining sub-module 4012:

the first determining sub-module 4011 is configured to, in a preset process in which the preset voice input method is enabled, determine an idle recognition thread from the at least two recognition threads if a recording trigger operation is detected again;

a second determination sub module 4012 configured to determine the idle identification thread as a second identification thread.

As shown in fig. 6, in one embodiment, the second start module 401 shown in fig. 4 above may include a third determination sub-module 4013 and a start module 4014:

a third determining sub-module 4013 configured to determine whether the first identifying thread has stopped running when the recording trigger operation is detected again;

and the starting module 4014 is configured to start a second identification thread when the first identification thread does not stop running, wherein the first identification thread and the second identification thread are different identification threads.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Finally, the voice recognition device of the invention is suitable for terminal equipment. For example, it may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, etc.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A speech recognition method, comprising:

starting a first identification thread to identify the first voice information as first text information through the first identification thread;

the method further comprises the following steps:

identifying second voice information recorded by the recording thread as second character information through a second identification thread;

2. The method of claim 1,

the terminal includes: at least two identification threads, wherein the at least two identification threads comprise at least the first identification thread and the second identification thread;

determining the idle identification thread as the second identification thread.

3. The method according to claim 1 or 2,

the voice input trigger operation comprises: and starting the preset voice input method or triggering the recording of a preset recording key for the first time after the preset voice input method is started.

4. A speech recognition apparatus, comprising:

the first starting module is used for starting a first recognition thread so as to recognize the first voice information as first character information through the first recognition thread;

the device further comprises:

the recognition module is used for recognizing the second voice information recorded by the recording thread into second character information through a second recognition thread;

the second activation module comprises:

5. The apparatus of claim 4,

the second activation module comprises:

6. The apparatus according to claim 4 or 5,