CN110992955A

CN110992955A - Voice operation method, device, equipment and storage medium of intelligent equipment

Info

Publication number: CN110992955A
Application number: CN201911359467.4A
Authority: CN
Inventors: 李勇; 甘津瑞; 徐大光
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-04-10

Abstract

The invention discloses a voice operation method, a voice operation device, equipment and a storage medium of intelligent equipment, wherein the method comprises the following steps: the control terminal receives voice information input by a user, carries out voice recognition on the voice information and generates a voice recognition result; performing semantic analysis on the voice recognition result to obtain a semantic analysis result; generating an operation control instruction matched with the voice information according to the semantic analysis result; and sending the operation control instruction to the controlled intelligent device through the server to instruct the controlled intelligent device to execute the operation control instruction. The embodiment of the invention realizes that the user operates the intelligent equipment by inputting the voice information through the control terminal, and the voice information input by the user is converted into the operation control instruction in the text format in the wide area network and is sent to the intelligent equipment through the server, thereby avoiding the limitation of the local area network and reducing the requirement on the network state; the real-time performance of the intelligent device operated by the voice of the user is guaranteed, and the experience of the user is improved.

Description

Voice operation method, device, equipment and storage medium of intelligent equipment

Technical Field

The embodiment of the invention relates to a data acquisition and processing technology, in particular to a voice operation method, a voice operation device, voice operation equipment and a storage medium of intelligent equipment.

Background

The method for operating the intelligent equipment through the voice of the mobile terminal is common in the market at present, the voice operation method of the existing intelligent equipment is mainly divided into two methods, the first method is that long connection of the mobile terminal and the intelligent equipment in the same local area network is established, the mobile terminal records in real time and sends audio data to the intelligent equipment, the intelligent equipment converts the audio data into text data in real time, and a corresponding module is operated under the text data; the second method is that a whole audio file is recorded in advance through a mobile terminal, then the audio file is sent to the intelligent equipment through a network interface to carry out whole audio identification, whole text data corresponding to the audio file is obtained, and finally a corresponding module is operated under the text data.

Although the method can enable the intelligent device to complete the operation action matched with the voice information of the user, in the process of transmitting the audio data stream in real time, the first method needs to ensure that the mobile terminal is located in the same local area network in the intelligent device, and is severely limited by network conditions, and under the condition of poor network state, the audio file transmission is slow, so that conversion errors are easily caused; the second method cannot ensure the real-time performance of the user voice operation of the intelligent device, and the user experience is poor.

Disclosure of Invention

The embodiment of the invention provides a voice operation method, a voice operation device, equipment and a storage medium of intelligent equipment, which ensure the real-time performance of voice operation of the intelligent equipment by a user, avoid the limitation of a local area network, reduce the requirement on a network state and improve the experience of the user.

In a first aspect, an embodiment of the present invention provides a voice operation method for an intelligent device, where the method includes:

the control terminal receives voice information input by a user, and performs voice recognition on the voice information to generate a voice recognition result when the voice information is determined to be human voice information;

the control terminal carries out semantic analysis on the voice recognition result to obtain a semantic analysis result;

the control terminal generates an operation control instruction matched with the voice information according to the semantic analysis result;

and the control terminal sends the operation control instruction to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to execute the operation control instruction.

In a second aspect, an embodiment of the present invention further provides a voice operation method for an intelligent device, where the method includes:

the server receives a voice information matching operation control instruction which is sent by the control terminal and input by a user;

and the server sends the operation control instruction to the controlled intelligent equipment so as to instruct the controlled intelligent equipment to execute the operation control instruction.

In a third aspect, an embodiment of the present invention provides a voice operation apparatus for an intelligent device, where the apparatus includes:

the voice recognition result generation module is used for receiving voice information input by a user, and carrying out voice recognition on the voice information to generate a voice recognition result when the voice information is determined to be the human voice information;

the semantic analysis result acquisition module is used for performing semantic analysis on the voice recognition result to acquire a semantic analysis result;

the operation control instruction generating module is used for generating an operation control instruction matched with the voice information according to the semantic analysis result;

and the operation control instruction sending module is used for sending the operation control instruction to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to execute the operation control instruction.

In a fourth aspect, an embodiment of the present invention further provides a voice operating apparatus for an intelligent device, where the apparatus includes:

the operation control instruction receiving module is used for receiving an operation control instruction which is sent by the control terminal and matched with the voice information input by the user by the server;

and the operation control instruction sending module is used for sending the operation control instruction to the controlled intelligent equipment by the server so as to instruct the controlled intelligent equipment to execute the operation control instruction.

In a fifth aspect, an embodiment of the present invention further provides a computing device, where the computing device includes:

one or more processors;

storage means for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement a voice operation method of an intelligent device provided by any embodiment of the invention.

In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program, when executed by a processor, implements a voice operation method of an intelligent device according to any embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, after the voice information input by a user is received by the control terminal, the voice recognition result corresponding to the voice information is generated, then the semantic analysis result corresponding to the voice recognition result is obtained, the operation control instruction matched with the voice information is generated according to the semantic analysis result, and finally the operation control instruction is sent to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to execute the operation control instruction. The embodiment of the invention realizes that the user operates the intelligent equipment by inputting the voice information through the control terminal, and the voice information input by the user is converted into the operation control instruction in the text format in the wide area network and is sent to the intelligent equipment through the server, thereby avoiding the limitation of the local area network; compared with the transmission of audio data streams, the embodiment of the invention transmits the operation control instruction in the text format, thereby reducing the requirement on the network state; the real-time performance of the intelligent device operated by the voice of the user is guaranteed, and the experience of the user is improved.

Drawings

Fig. 1 is a flowchart of a voice operation method of an intelligent device according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a method for voice operation of an intelligent device in an embodiment of the present invention;

fig. 3 is a flowchart of a voice operation method of an intelligent device according to a second embodiment of the present invention;

fig. 4 is a structural diagram of a voice operation device of an intelligent device in a third embodiment of the present invention;

fig. 5 is a structural diagram of a voice operation apparatus of an intelligent device according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computing device in the fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a voice operation method of an intelligent device according to an embodiment of the present invention, where this embodiment is applicable to a situation where a control terminal receives voice information input by a user in real time, and converts the voice information into an operation control instruction in a text format to be sent to a server, so that the server sends the operation control instruction to a controlled intelligent device, thereby implementing an operation on the intelligent device, as shown in fig. 2, this method may be executed by a voice operation apparatus of the intelligent device, which may be implemented by software and/or hardware, and may be generally integrated in the control terminal, and the control terminal may communicate with the server, where the method specifically includes the following steps:

and step 110, the control terminal receives voice information input by a user, and performs voice recognition on the voice information to generate a voice recognition result when the voice information is determined to be the human voice information.

In this embodiment, the control terminal may be a mobile phone terminal or a computer terminal, a user inputs Voice information by opening a recording module of the control terminal, when the control terminal detects a trigger request of the recording module, the Voice information input in the recording module is detected by using a Voice Activity Detection (VAD) technology, and if the control terminal detects that the Voice information is human Voice information, Voice Recognition is performed on the Voice information by using an Automatic Speech Recognition (ASR) technology to generate a Voice Recognition result. Specifically, it is assumed that a user inputs voice information of 'i want to see the chinese captain' through a recording module of a control terminal, the control terminal detects that the voice information is human voice information by using VAD technology, then performs voice recognition on the voice information by using ASR technology, converts the voice information into text information of 'i want to see the chinese captain', and takes the text information as a voice recognition result matched with the voice information input by the user.

After the step, the control terminal sends the voice recognition result to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

In a specific embodiment, as shown in fig. 2, the control terminal generates a voice recognition result for the voice information input by the user, such as "i want to see the chinese captain", and then sends the voice recognition result to the server, and the server sends the voice recognition result to the controlled intelligent device, and after receiving the voice recognition result, the controlled intelligent device synchronously displays the voice recognition result "i want to see the chinese captain" in the display interface.

The controlled intelligent device may be an intelligent device carrying a display module, such as a television and a sound box with a display screen.

And 120, the control terminal performs semantic analysis on the voice recognition result to obtain a semantic analysis result.

In the step, the control terminal judges whether a voice instruction mapping relation matched with the voice recognition result is prestored; if not, the control terminal performs semantic analysis on the voice recognition result to obtain a semantic analysis result; and if so, the control terminal acquires the operation control instruction in the matched voice instruction mapping relation as the operation control instruction matched with the voice information.

In this embodiment, the voice instruction mapping relationship includes a voice recognition result and an operation control instruction corresponding to the voice recognition result, for example, the voice recognition result is "i want to see the chinese captain", and the corresponding operation control instruction is "trigger the movie module and play the movie chinese captain".

After the control terminal generates a voice recognition result according to voice information input by a user, searching whether a voice instruction mapping relation matched with the voice recognition result exists in a local memory, and if so, extracting an operation control instruction in the voice instruction mapping relation as an operation control instruction matched with the voice information; if not, performing semantic analysis on the voice recognition result by using an NLU (Natural Language Understanding) technology, converting the voice recognition result into text information which can be understood by the intelligent equipment, and taking the text information as a semantic analysis result. For example, the voice recognition result is "i want to see the chinese captain", the voice recognition result is semantically analyzed by using the NLU technology, and the obtained semantic analysis result is "playing the chinese captain".

And step 130, the control terminal generates an operation control instruction matched with the voice information according to the semantic analysis result.

In this step, after the control terminal obtains the semantic analysis result matched with the voice information input by the user, an operation control instruction understood by the controlled smart device is generated by using a DM (Dialog Manager) technique according to the semantic analysis result. For example, the semantic analysis result is "chinese captain playing", and the operation control instruction corresponding to the semantic analysis result may be "chinese captain triggering a movie module and playing a movie".

After the step, the control terminal stores the voice recognition result and the operation control instruction as a new voice instruction mapping relation, and aims to directly take the operation control instruction in the locally stored voice instruction mapping relation as the operation control instruction matched with the voice recognition result obtained next time when the voice recognition result obtained next time by the control terminal is consistent with the voice recognition result obtained this time, without performing semantic analysis on the voice recognition result obtained next time, so that the voice operation steps of the intelligent device are saved.

And 140, the control terminal sends the operation control instruction to the controlled intelligent device through the server to instruct the controlled intelligent device to execute the operation control instruction.

In this step, as shown in fig. 2, the control terminal sends the operation control instruction to the server, and the server sends the operation control instruction to the controlled intelligent device, and the controlled intelligent device executes a corresponding operation after receiving the operation control instruction.

According to the embodiment of the invention, after the voice information input by a user is received through the control terminal, the voice recognition result corresponding to the voice information is generated, then the voice recognition result is subjected to semantic analysis to obtain a semantic analysis result, an operation control instruction matched with the voice information is generated according to the semantic analysis result, and finally the operation control instruction is sent to the controlled intelligent equipment through the server to instruct the controlled intelligent equipment to execute the operation control instruction. The embodiment of the invention realizes that the user operates the intelligent equipment by inputting the voice information through the control terminal, and the voice information input by the user is converted into the operation control instruction in the text format in the wide area network and is sent to the intelligent equipment through the server, thereby avoiding the limitation of the local area network; compared with the transmission of audio data streams, the embodiment of the invention transmits the operation control instruction in the text format, thereby reducing the requirement on the network state; the real-time performance of the intelligent device operated by the voice of the user is guaranteed, and the experience of the user is improved.

Example two

Fig. 3 is a flowchart of a voice operation method of an intelligent device according to a second embodiment of the present invention, where this embodiment is applicable to a case where a server receives an operation control instruction in a text format sent by a control terminal, and sends the operation control instruction to a controlled intelligent device, so as to implement an operation on the intelligent device, as shown in fig. 2, the method may be executed by a voice operation apparatus of the intelligent device, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in the server, and the server may be capable of communicating with the control terminal, where the method specifically includes the following steps:

and step 310, the server receives an operation control instruction which is sent by the control terminal and is matched with the voice information input by the user.

In this step, the control terminal transmits an operation control command matched with voice information input by the user to the server through a long-distance wireless communication mechanism (e.g., WIFI or mobile data). Specifically, assuming that the voice information input by the user is "i want to watch the chinese captain", the operation control instruction matched with the voice information may be "trigger the movie module and play the movie chinese captain".

Before the step, the server receives the voice recognition result sent by the control terminal, and sends the voice recognition result to the controlled intelligent equipment so as to instruct the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

In a specific embodiment, the control terminal generates a voice recognition result in a text format for voice information input by a user, such as "i want to see the captain in china", and then sends the voice recognition result to the server, and the server sends the voice recognition result to the controlled intelligent device, and after receiving the voice recognition result, the controlled intelligent device synchronously displays the voice recognition result "i want to see the captain in china" in a display interface.

And step 320, the server sends the operation control instruction to the controlled intelligent device to instruct the controlled intelligent device to execute the operation control instruction.

In this step, the server sends the operation control instruction to the controlled intelligent device through a long-distance wireless communication mechanism (e.g., WIFI or mobile data), and the controlled intelligent device executes a corresponding operation after receiving the operation control instruction.

According to the embodiment of the invention, the server receives the operation control instruction which is sent by the control terminal and is matched with the voice information input by the user, and then the operation control instruction is sent to the controlled intelligent equipment so as to instruct the controlled intelligent equipment to execute the operation control instruction. The embodiment of the invention realizes that the user operates the intelligent equipment by inputting the voice information through the control terminal, and the voice information input by the user is converted into the operation control instruction in the text format in the wide area network and is sent to the intelligent equipment through the server, thereby avoiding the limitation of the local area network; compared with the transmission of audio data streams, the embodiment of the invention transmits the operation control instruction in the text format, thereby reducing the requirement on the network state; the real-time performance of the intelligent device operated by the voice of the user is guaranteed, and the experience of the user is improved.

EXAMPLE III

Fig. 4 is a structural diagram of a voice operation apparatus of an intelligent device according to a third embodiment of the present invention, where the apparatus includes: a voice recognition result generating module 410, a semantic analysis result acquiring module 420, an operation control instruction generating module 430 and an operation control instruction transmitting module 440.

The voice recognition result generating module 410 is configured to receive voice information input by a user, perform voice recognition on the voice information when it is determined that the voice information is human voice information, and generate a voice recognition result; a semantic analysis result obtaining module 420, configured to perform semantic analysis on the voice recognition result to obtain a semantic analysis result; an operation control instruction generating module 430, configured to generate an operation control instruction matched with the voice information according to the semantic analysis result; an operation control instruction sending module 440, configured to send the operation control instruction to the controlled intelligent device via the server, so as to instruct the controlled intelligent device to execute the operation control instruction.

On the basis of the foregoing embodiments, the semantic analysis result obtaining module 420 may include:

the judging unit is used for judging whether a voice instruction mapping relation matched with the voice recognition result is prestored;

the semantic analysis unit is used for performing semantic analysis on the voice recognition result to acquire a semantic analysis result when the voice command mapping relation matched with the voice recognition result is not prestored in the control terminal;

and the operation control instruction acquisition unit is used for acquiring the operation control instruction in the matched voice instruction mapping relation as the operation control instruction matched with the voice information when the control terminal stores the voice instruction mapping relation matched with the voice recognition result in advance.

The operation control instruction generation module 430 may include:

and the voice instruction mapping relation storage unit is used for storing the voice recognition result and the operation control instruction as a new voice instruction mapping relation.

The speech recognition result generating module 410 may include:

and the voice recognition result sending unit is used for sending the voice recognition result to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

The voice operation device of the intelligent device provided by the embodiment of the invention can execute the voice operation method of the intelligent device provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 5 is a structural diagram of a voice operation apparatus of an intelligent device according to a fourth embodiment of the present invention, where the apparatus includes: an operation control instruction receiving module 510 and an operation control instruction transmitting module 520.

The operation control instruction receiving module 510 is configured to receive an operation control instruction which is sent by the control terminal and matches with the voice information input by the user; an operation control instruction sending module 520, configured to send the operation control instruction to the controlled intelligent device, so as to instruct the controlled intelligent device to execute the operation control instruction.

On the basis of the foregoing embodiments, the voice operation apparatus of the smart device may further include:

and the voice recognition result receiving module is used for receiving the voice recognition result sent by the control terminal and sending the voice recognition result to the controlled intelligent equipment so as to indicate the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

EXAMPLE five

Fig. 6 is a schematic structural diagram of a computing apparatus according to a fifth embodiment of the present invention, as shown in fig. 6, the computing apparatus includes a processor 610, a memory 620, an input device 630, and an output device 640; the number of processors 610 in the computing device may be one or more, and one processor 610 is taken as an example in fig. 6; the processor 610, memory 620, input device 630, and output device 640 in the computing device may be connected by a bus or other means, such as by a bus in fig. 6.

The memory 620 serves as a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a voice operation method of an intelligent device in the embodiments of the present invention (for example, the voice recognition result generating module 410, the semantic analysis result acquiring module 420, the operation control instruction generating module 430, and the operation control instruction transmitting module 440 in a voice operation apparatus of an intelligent device). The processor 610 executes various functional applications and data processing of the computing device by executing software programs, instructions and modules stored in the memory 620, namely, implementing a voice operation method of the smart device as described above. That is, the program when executed by the processor implements:

The memory 620, as a computer-readable storage medium, can also be used for program instructions/modules corresponding to a voice operation method of an intelligent device (for example, the operation control instruction receiving module 510 and the operation control instruction sending module 520 in a voice operation apparatus of an intelligent device) in the second embodiment of the present invention. The processor 610 executes various functional applications and data processing of the computing device by executing software programs, instructions and modules stored in the memory 620, namely, implementing a voice operation method of the smart device as described above. That is, the program when executed by the processor implements:

The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 620 may further include memory located remotely from the processor 610, which may be connected to a computing device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computing device, and may include a keyboard and a mouse, etc. The output device 640 may include a display device such as a display screen.

EXAMPLE six

The sixth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method according to any embodiment of the present invention. Of course, the computer-readable storage medium provided in the embodiment of the present invention may perform related operations in the voice operation method of the smart device provided in the embodiment of the present invention. That is, the program when executed by the processor implements:

The computer-readable storage medium provided in the embodiment of the present invention may further perform related operations in the voice operation method of the intelligent device provided in the second embodiment of the present invention. That is, the program when executed by the processor implements:

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the voice operation apparatus of the intelligent device, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A voice operation method of an intelligent device is characterized by comprising the following steps:

2. The method of claim 1, wherein the semantic analysis of the voice recognition result by the control terminal to obtain a semantic analysis result comprises:

the control terminal judges whether a voice instruction mapping relation matched with the voice recognition result is prestored;

if not, the control terminal performs semantic analysis on the voice recognition result to obtain a semantic analysis result;

after the control terminal generates an operation control instruction matched with the voice information according to the semantic analysis result, the method further comprises the following steps:

and the control terminal stores the voice recognition result and the operation control instruction as a new voice instruction mapping relation.

3. The method according to claim 2, wherein after the control terminal determines whether to pre-store the voice instruction mapping relationship matching the voice recognition result, the method further comprises:

and if so, the control terminal acquires the operation control instruction in the matched voice instruction mapping relation as the operation control instruction matched with the voice information.

4. The method according to any one of claims 1-3, wherein after the control terminal performs voice recognition on the voice information to generate a voice recognition result, the method further comprises:

and the control terminal sends the voice recognition result to the controlled intelligent equipment through the server so as to instruct the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

5. A voice operation method of an intelligent device is characterized by comprising the following steps:

6. The method of claim 5, before the server receives the operation control command which is sent by the control terminal and matches with the voice information input by the user, further comprising:

and the server receives the voice recognition result sent by the control terminal and sends the voice recognition result to the controlled intelligent equipment so as to instruct the controlled intelligent equipment to synchronously display the voice recognition result in an equipment display interface.

7. A voice-operated apparatus of an intelligent device, comprising:

8. A voice-operated apparatus of an intelligent device, comprising:

9. A computing device, wherein the computing device comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of voice operation of a smart device as recited in any of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out a method of speech operation of a smart device as claimed in any one of claims 1 to 6.