CN110751948A

CN110751948A - Voice recognition method, device, storage medium and voice equipment

Info

Publication number: CN110751948A
Application number: CN201910993460.1A
Authority: CN
Inventors: 毛跃辉; 文皓; 汪进; 王慧君; 梁博; 陶梦春
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai; Gree Green Refrigeration Technology Center Co Ltd of Zhuhai
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2020-02-04

Abstract

The invention provides a voice recognition method, a voice recognition device, a storage medium and voice equipment, wherein the method comprises the following steps: when the voice equipment is awakened, receiving a voice instruction within a preset time; if an effective voice instruction is received within a preset time, judging whether a plurality of rounds of interactive conversations are needed or not at present according to the effective voice instruction; and if the current needs to carry out multiple rounds of interactive conversations, executing a multiple rounds of interactive conversation modes. The scheme provided by the invention can realize that the voice equipment can continuously acquire the voice instruction for multiple times only by waking up once when multiple rounds of interactive conversations are needed.

Description

Voice recognition method, device, storage medium and voice equipment

Technical Field

The present invention relates to the field of voice control, and in particular, to a voice recognition method, apparatus, storage medium, and voice device.

Background

The voice equipment needs to be in a working state all the time in the voice recognition process, so that continuous pickup is guaranteed, and effective commands of a user can not be missed. Meanwhile, the voice instructions of the user are continuously collected and recognized, so that the mistaken recognition action of the voice is increased, for example, the more the voice skills are, the more the mistaken recognition is; for example, the voice device continuously collects the voice information of the user, when surrounding noise or other instructions related to command words are mixed (such as similar and similar), the voice information may be recognized as an action instruction to cause false triggering, and meanwhile, the user may chat or speak at the moment and does not want the conversation content of the user to be uploaded to the cloud terminal for recognition, and the voice device continuously collects the voice information of the user, so that the privacy data of the user cannot be guaranteed.

Disclosure of Invention

The present invention is directed to overcome the drawbacks of the prior art, and provides a speech recognition method, a speech recognition device, a storage medium, and a speech device, so as to solve the problem in the prior art that the speech device continuously collects and recognizes a speech instruction, which may cause an increase in speech misrecognition.

One aspect of the present invention provides a speech recognition method, including: when the voice equipment is awakened, receiving a voice instruction within a preset time; if an effective voice instruction is received within a preset time, judging whether a plurality of rounds of interactive conversations are needed or not at present according to the effective voice instruction; and if the current needs to carry out multiple rounds of interactive conversations, executing a multiple rounds of interactive conversation modes.

Optionally, the method further comprises: when the voice equipment is awakened, if an effective voice instruction is not received within a preset time, the voice equipment exits from a voice recognition state; and/or if the current need of carrying out multiple rounds of interactive conversations is judged, executing the control operation corresponding to the effective voice command, and exiting the voice recognition state.

Optionally, judging whether multiple rounds of interactive dialogues are needed currently according to the valid voice instruction includes: identifying a control intention corresponding to the effective voice instruction; and judging whether a plurality of rounds of interactive conversations are needed currently according to the control intention.

Optionally, a multi-turn interactive dialog mode is performed, comprising: and carrying out multiple rounds of interactive dialogue with the user according to interactive dialogue logic corresponding to the preset control intention corresponding to the effective voice instruction.

Another aspect of the present invention provides a speech recognition apparatus, including: the receiving unit is used for receiving a voice instruction within preset time after the voice equipment is awakened; the judging unit is used for judging whether a plurality of rounds of interactive conversations are needed at present according to an effective voice instruction if the effective voice instruction is received within a preset time; and the execution unit is used for executing a multi-round interactive dialogue mode if judging that the multi-round interactive dialogue is needed currently.

Optionally, the execution unit is further configured to: when the voice equipment is awakened, if an effective voice instruction is not received within a preset time, the voice equipment exits from a voice recognition state; and/or if the current need of carrying out multiple rounds of interactive conversations is judged, executing the control operation corresponding to the effective voice command, and exiting the voice recognition state.

Optionally, the determining unit determines whether multiple rounds of interactive dialogues are currently required according to the valid voice instruction, including: identifying a control intention corresponding to the effective voice instruction; and judging whether a plurality of rounds of interactive conversations are needed currently according to the control intention.

Optionally, the execution unit executes a multi-turn interactive dialog mode, including: and carrying out multiple rounds of interactive dialogue with the user according to interactive dialogue logic corresponding to the preset control intention corresponding to the effective voice instruction.

A further aspect of the invention provides a storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of any of the methods described above.

A further aspect of the invention provides a speech device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods described above when executing the program.

In another aspect, the present invention provides a speech device, including any one of the speech recognition apparatuses described above.

According to the technical scheme of the invention, when a user needs multiple rounds of interactive conversations, the voice device can continuously and repeatedly acquire the voice instruction of the user after the user wakes up the voice device once, and when the user does not need the multiple rounds of interactive conversations, the voice device immediately quits voice recognition after the user wakes up the voice device once and acquires the voice instruction of the user, so that the requirements of the multiple rounds of interactive conversations under certain scenes of the user are met, meanwhile, the privacy of the user can be ensured, and the false recognition rate is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating one embodiment of the steps for determining whether multiple interactive sessions are currently required based on the active voice command;

FIG. 3 is a schematic diagram of a speech recognition method according to another embodiment of the present invention;

FIG. 4 is a schematic diagram of a speech recognition method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Currently, the speech recognition of existing speech devices mainly includes: the first method is as follows: and after awakening, if the effective instruction of the user cannot be received within the preset time, the voice equipment exits from the voice recognition state. The second method comprises the following steps: after awakening, receiving an effective voice command of a user within preset time, and immediately exiting the voice recognition state by the voice equipment and executing the control intention of the received effective voice command. The above two ways of entering and exiting speech recognition have the advantages that: 1. the voice error recognition can be greatly reduced. 2. The privacy of the user is ensured, and the voice of the user is collected when the user needs the voice (when the user needs the voice equipment, the voice equipment is informed that the user has the voice recognition requirement by waking up the voice equipment). The disadvantages are that: after the user wakes up the voice equipment each time, the voice equipment can only execute one effective voice instruction of the user.

The invention provides a voice recognition method. The voice recognition method can be used in voice equipment with a voice recognition interaction function, such as electric appliances with the voice recognition interaction function, for example, a voice air conditioner.

Fig. 1 is a schematic method diagram of an embodiment of a speech recognition method provided by the present invention.

As shown in fig. 1, according to an embodiment of the present invention, the speech recognition method includes at least step S110, step S120, and step S130.

Step S110, after the voice device is awakened, receiving a voice command within a preset time.

Specifically, the user wakes up the voice device by speaking the preset wake-up word, enters a voice recognition state when the voice device receives the preset wake-up word, and receives a voice instruction within a preset time. Optionally, after the voice device is awakened, if a valid voice instruction is not received within a preset time, the voice device exits from the voice recognition state. For example, after the voice device is awakened, if no valid voice command of the user is received within 10 seconds, the voice device exits from the voice recognition state, and at this time, the voice device needs to be awakened again before entering the voice recognition state.

And step S120, if an effective voice instruction is received within a preset time, judging whether a plurality of rounds of interactive conversations are needed or not at present according to the effective voice instruction. The effective voice instruction comprises preset voice command words, such as command words for controlling air conditioning or command words for inquiring weather, playing music and the like.

FIG. 2 is a flowchart illustrating an embodiment of the step of determining whether multiple interactive dialogs are currently required according to the valid voice command. As shown in fig. 2, in a specific embodiment, step S120 includes step S121 and step S122.

And step S121, identifying a control intention corresponding to the effective voice instruction.

In a specific implementation manner, a preset keyword included in the effective voice instruction is extracted, and a control intention corresponding to the effective voice instruction is determined according to the preset keyword. For example, keywords corresponding to different control intentions are configured in advance, so that the control intentions corresponding to the effective voice instructions are determined according to preset keywords contained in the extracted effective voice instructions. In another specific implementation manner, semantic recognition is performed on the effective voice instruction, and the control intention corresponding to the effective voice instruction is determined according to the semantic corresponding to the effective voice instruction. For example, the voice command of the user is "please help me to turn on the air conditioner", and the control intention of the user is determined to be to turn on the air conditioner through semantic recognition.

And step S122, judging whether multiple rounds of interactive conversations are needed currently according to the control intention.

Specifically, a control scene requiring multiple rounds of interactive dialogues is preset, and whether a control intention corresponding to the effective voice instruction is the preset control scene requiring multiple rounds of dialog recognition is judged. For example, the received voice command of the user is 'set alarm clock', at this time, the control intention of the user is analyzed and obtained as the set alarm clock, and the alarm clock is determined to be the control intention which needs to carry out multiple rounds of interactive conversations.

Step S130, if the current needs to carry out multiple rounds of interactive conversations, a multiple rounds of interactive conversation modes are executed.

Specifically, multiple rounds of interactive dialogs are conducted with the user according to interactive dialog logic corresponding to the preset control intention corresponding to the effective voice instruction. That is, according to the preset interactive dialogue logic, dialogue is continuously performed with the user for multiple times and voice instructions sent by the user based on the dialogue are collected.

For example, after the voice device receives a voice command of the user as "set an alarm clock", and at this time, after it is analyzed that the control intention of the user is set as the alarm clock, an interactive dialog is performed with the user to ask the user "ask what time is you to set an alarm clock? "at this time, the voice device continues to be in a voice recognition state, and the user can continue to speak the alarm clock time to the voice device, for example, the user speaks" 8 am ", recognizes that the alarm clock time the user wants to set is eight am, then asks whether to repeat the alarm clock" again, and the user speaks "repeat every day", and then sets an alarm clock that starts at 8 am every day for the user.

Fig. 3 is a schematic method diagram of another embodiment of the speech recognition method provided by the present invention. As shown in fig. 3, according to an embodiment of the present invention, the speech recognition method further includes step S140.

Step S140, if it is determined that there is no need to perform multiple rounds of interactive dialogues currently, executing a control operation corresponding to the valid voice instruction, and exiting from the voice recognition state.

Specifically, if it is judged that multiple rounds of interactive conversations are not required currently, the control intention corresponding to the effective voice instruction is identified, and corresponding control operation is executed. For example, the voice command received by the voice device is 'broadcast news', the voice device does not need to enter a multi-turn conversation recognition mode, the voice device requests cloud news resources at the moment, the news is broadcasted, the voice device exits from a voice recognition state at the same time, and the voice device can enter the voice recognition state after being awakened again.

For clearly explaining the technical solution of the present invention, the following describes an execution flow of the speech recognition method provided by the present invention with a specific embodiment.

FIG. 4 is a schematic diagram of a speech recognition method according to an embodiment of the present invention. The embodiment shown in fig. 4 includes steps S201 to S208.

Step S201, the user wakes up the voice device.

Step S202, after the voice device is awakened, the voice device is in a voice recognition state.

Step S203 is to receive the voice command within a predetermined time, execute step S204 if no valid voice command is received, and execute step S205 if a valid voice command is received.

Step S204, if no effective voice command is received within the preset time, the voice equipment exits the voice recognition state.

In step S205, if an effective voice command is received within a preset time, the user control intention of the received voice command is analyzed.

Step S206, determining whether to enter multiple sessions currently according to the analyzed user control intention, if yes, performing step S207, and if not, performing step S208.

Step S207, if the voice equipment needs to enter multiple rounds of conversations currently, the voice equipment enters a multiple round of conversation mode.

And step S208, if the voice equipment does not need to carry out multiple rounds of conversations currently, the voice equipment executes the control intention of the user and quits the voice recognition state at the same time.

The voice recognition method of the present invention is described below by taking a voice control scenario of a voice air conditioner as an example.

Scene 1: after the user awakens the voice air conditioner, the voice air conditioner is in a voice recognition state, and within a preset time (for example, 10 seconds), the voice air conditioner does not receive an effective voice instruction of the user, so that the voice air conditioner exits the voice recognition state, and at the moment, the voice air conditioner can enter the voice recognition state after being awakened again.

Scene 2: after the user awakens up the operation to voice air conditioner, voice air conditioner is in the speech recognition mode, and in the time of predetermineeing (for example, 10 seconds), voice air conditioner received user's voice command is "report news", need not get into many rounds of dialogue identification modes, and then voice air conditioner request high in the clouds news resource this moment to news report, withdraw from the speech recognition state simultaneously, just can get into the speech recognition state after need awaken up again.

Scene 3: after the user wakes up the voice air conditioner, the voice air conditioner is in a voice recognition mode, and within a preset time (for example, 10 seconds), the voice air conditioner receives a voice instruction of the user, namely 'turn on the air conditioner', and needs to enter a multi-turn conversation recognition mode, at this time, the voice air conditioner continues to be in a recognition state after executing the instruction of the user for turning on the air conditioner, and the user can continue to send other voice instructions to the air conditioner, for example, 'a refrigeration mode', '25 degrees', and the like. At this time, the voice air conditioner does not need to be awakened again.

Scene 4: after the user awakens the voice air conditioner, the voice air conditioner is in a voice recognition mode, and within a preset time (for example, 10 seconds), the voice air conditioner receives a voice instruction of the user and sets an alarm clock, and after the voice air conditioner analyzes that the intention of the user is the set alarm clock, the user can be guided to continue to speak out specific alarm clock set time through voice. For example, by a voice "ask you what time's alarm clock to set? At this time, the voice air conditioner continues to be in a voice recognition state, and the user can continue to speak out a specific alarm clock setting requirement to the air conditioner. For example, the voice air conditioner performs corresponding operation and response after receiving the complete intention of the user, and simultaneously exits from the voice recognition state. At this point, the voice recognition state can be entered after the voice recognition device needs to be awakened again.

The invention also provides a voice recognition device. The voice recognition device can be used in voice equipment with a voice recognition interaction function, such as electric appliances with the voice recognition interaction function, for example, a voice air conditioner.

Fig. 5 is a schematic structural diagram of an embodiment of a speech recognition apparatus provided in the present invention. As shown in fig. 5, the speech recognition apparatus 100 includes a receiving unit 110, a judging unit 120, and an executing unit 130.

The receiving unit 110 is configured to receive a voice instruction within a preset time after the voice device is awakened; the judging unit 120 is configured to, if an effective voice instruction is received within a preset time, judge whether a multi-round interactive conversation needs to be performed currently according to the effective voice instruction; the execution unit 130 is configured to execute a multi-round interactive dialog mode if it is determined that multiple rounds of interactive dialogs are currently required.

When the voice device is awakened, the receiving unit 110 receives the voice command within a preset time.

Specifically, the user wakes up the voice device by speaking the preset wake-up word, and when the voice device receives the preset wake-up word, the voice device enters a voice recognition state, and the receiving unit 110 receives a voice command within a preset time.

Optionally, the execution unit 130 is further configured to: and when the voice equipment is awakened, if the effective voice instruction is not received within the preset time, the voice equipment exits from the voice recognition state. For example, after the voice device is woken up, no valid voice command of the user is received within 10 seconds, the execution unit 130 executes to exit the voice recognition state, and at this time, it needs to be woken up again to enter the voice recognition state.

If the receiving unit 110 receives an effective voice command within a preset time, the determining unit 120 determines whether multiple rounds of interactive conversations are currently required according to the effective voice command. The effective voice instruction comprises preset voice command words, such as command words for controlling air conditioning or command words for inquiring weather, playing music and the like.

Specifically, the determining unit 120 identifies a control intention corresponding to the valid voice command, and determines whether multiple rounds of interactive dialogues are currently required according to the control intention.

Presetting a control scene needing multi-round interactive dialog, and judging whether the control intention corresponding to the effective voice instruction is the preset control scene needing multi-round dialog recognition. For example, the received voice command of the user is 'set alarm clock', at this time, the control intention of the user is analyzed and obtained as the set alarm clock, and the alarm clock is determined to be the control intention which needs to carry out multiple rounds of interactive conversations.

If the determining unit 120 determines that multiple rounds of interactive dialogues are currently required, the executing unit 130 executes a multiple round of interactive dialog mode. Specifically, the execution unit 130 performs multiple rounds of interactive dialogues with the user according to the interactive dialog logic corresponding to the control intention corresponding to the preset effective voice instruction. That is, according to the preset interactive dialogue logic, dialogue is continuously performed with the user for multiple times and voice instructions sent by the user based on the dialogue are collected.

Optionally, if the determining unit 120 determines that multiple rounds of interactive dialogs are not required currently, the executing unit 120 executes the control operation corresponding to the valid voice instruction and exits from the voice recognition state.

Specifically, if the determining unit 120 determines that multiple rounds of interactive dialogs are not currently needed, the executing unit 120 identifies a control intention corresponding to the valid voice command and executes a corresponding control operation. For example, the voice command received by the voice device is 'broadcast news', the voice device does not need to enter a multi-turn conversation recognition mode, the voice device requests cloud news resources at the moment, the news is broadcasted, the voice device exits from a voice recognition state at the same time, and the voice device can enter the voice recognition state after being awakened again.

The invention also provides a storage medium corresponding to the speech recognition method, on which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.

The invention also provides a speech device corresponding to the speech recognition method, comprising a processor, a memory and a computer program stored in the memory and operable on the processor, wherein the processor executes the program to implement the steps of any of the methods.

The invention also provides a voice device corresponding to the voice recognition device, which comprises any one of the voice recognition devices.

Therefore, according to the scheme provided by the invention, when the user needs multiple rounds of interactive conversations, the voice device can continuously and repeatedly acquire the voice instruction of the user (multiple rounds of conversations) after the user wakes up the voice device once, and when the user does not need multiple rounds of interactive conversation interactions, the voice device immediately quits voice recognition after the user wakes up the voice device once and acquires the voice instruction of the user once. .

The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope and spirit of the invention and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hardwired, or a combination of any of these. In addition, each functional unit may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and the parts serving as the control device may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A speech recognition method, comprising:

when the voice equipment is awakened, receiving a voice instruction within a preset time;

if an effective voice instruction is received within a preset time, judging whether a plurality of rounds of interactive conversations are needed or not at present according to the effective voice instruction;

and if the current needs to carry out multiple rounds of interactive conversations, executing a multiple rounds of interactive conversation modes.

2. The method of claim 1, further comprising:

when the voice equipment is awakened, if an effective voice instruction is not received within a preset time, the voice equipment exits from a voice recognition state;

and/or the presence of a gas in the gas,

and if the current need of carrying out multiple rounds of interactive conversations is judged, executing the control operation corresponding to the effective voice command, and exiting the voice recognition state.

3. The method of claim 1 or 2, wherein determining whether multiple rounds of interactive dialogs are currently required according to the valid voice instruction comprises:

identifying a control intention corresponding to the effective voice instruction;

and judging whether a plurality of rounds of interactive conversations are needed currently according to the control intention.

4. The method of any one of claims 1-3, wherein performing a plurality of rounds of interactive dialog mode comprises:

and carrying out multiple rounds of interactive dialogue with the user according to interactive dialogue logic corresponding to the preset control intention corresponding to the effective voice instruction.

5. A speech recognition apparatus, comprising:

the receiving unit is used for receiving a voice instruction within preset time after the voice equipment is awakened;

the judging unit is used for judging whether a plurality of rounds of interactive conversations are needed at present according to an effective voice instruction if the effective voice instruction is received within a preset time;

and the execution unit is used for executing a multi-round interactive dialogue mode if judging that the multi-round interactive dialogue is needed currently.

6. The apparatus of claim 5, wherein the execution unit is further configured to:

and/or the presence of a gas in the gas,

7. The apparatus according to claim 5 or 6, wherein the determining unit determines whether multiple rounds of interactive dialogs are currently required according to the valid voice instruction, and includes:

8. The apparatus according to any one of claims 5-7, wherein the execution unit performs a plurality of interactive dialog modes, including:

9. A storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.

10. Speech device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any of claims 1-4 when executing the program or comprising the speech recognition apparatus according to any of claims 5-8.