CN111897916A

CN111897916A - Voice instruction recognition method and device, terminal equipment and storage medium

Info

Publication number: CN111897916A
Application number: CN202010722276.6A
Authority: CN
Inventors: 王璐
Original assignee: Huizhou TCL Mobile Communication Co Ltd
Current assignee: Huizhou TCL Mobile Communication Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-11-06
Anticipated expiration: 2040-07-24
Also published as: CN111897916B

Abstract

The embodiment of the invention discloses a voice instruction identification method, a voice instruction identification device, terminal equipment and a storage medium. The voice instruction identification method provided by the embodiment of the invention comprises the steps of identifying a voice instruction input by a user and generating a voice text; acquiring an information slot value of the voice text; and judging whether to execute the voice command according to the information slot value, so that the user can conveniently and quickly stop voice input when the terminal equipment receives an abnormal voice command.

Description

Voice instruction recognition method and device, terminal equipment and storage medium

Technical Field

The invention relates to the technical field of mobile communication, in particular to a voice instruction identification method, a voice instruction identification device, terminal equipment and a storage medium.

Background

ASR (Automatic Speech Recognition) is a technology for converting human Speech into text, and is applied to various terminal devices, such as smart phones, notebook computers, tablet computers, vehicle-mounted terminals, etc., with the development and progress of scientific technology, the convenience requirement of users for life is gradually increased, more and more terminal devices have the functions of collecting user Speech and performing Speech Recognition, the terminal devices collect Speech instructions input by users by using microphones and convert the Speech instructions input by users into Speech texts by using an Automatic Speech Recognition technology, so that the terminal devices can make corresponding system actions according to the Speech texts, thereby preventing users from controlling terminal systems to make corresponding actions by using text input or other operations, so that users can control terminals by Speech to achieve the same purpose, in the research and practice process of the prior art, the inventor of the invention finds that the user can have noisy background sound when inputting voice, the terminal can receive a plurality of disordered voice instructions, and the user can not conveniently and quickly close the voice input function of the terminal equipment.

Disclosure of Invention

The embodiment of the invention provides a voice instruction identification method, a voice instruction identification device, a terminal device and a storage medium, which are used for judging whether a voice text is abnormal or not according to the field of the voice text, expression intention and an information slot value, so that when the terminal device receives an abnormal voice instruction, a user can conveniently and quickly stop voice input.

The embodiment of the invention provides a voice instruction identification method, which comprises the following steps:

recognizing a voice instruction input by a user and generating a voice text;

acquiring an information slot value of the voice text;

and judging whether to execute the voice command according to the information slot value.

Optionally, in some embodiments of the present invention, the obtaining the slot value of the speech text includes:

determining the belonged field and the expression intention according to the content of the voice text;

acquiring an information slot of the voice text according to the belonged field and the expression intention;

and filling the information slot to generate an information slot value.

Optionally, in some embodiments of the present invention, the determining whether to execute the voice instruction according to the slot value includes:

and judging whether to execute the voice command according to the field of the voice text.

Optionally, in some embodiments of the present invention, the determining whether to execute the voice instruction according to the domain of the voice text includes:

determining that the voice text comprises at least two different fields, not executing the voice instruction, and stopping voice input;

and determining that the voice text comprises a field, and judging whether to execute the voice command according to the expression intention of the voice text.

Optionally, in some embodiments of the present invention, the determining whether the speech text is normal according to the expression intention includes:

determining that the voice text comprises at least two different expression intentions, not executing the voice instruction, and stopping voice input;

and determining that the voice text comprises the expression intention, and judging whether to execute the voice instruction according to the information slot value.

determining that the voice text comprises at least two different information slot values, not executing the voice instruction, and stopping voice input;

and determining that the voice text comprises the information slot value, and executing the voice instruction.

Optionally, in some embodiments of the present invention, the stopping the voice input includes:

a stop button to increase the voice input.

Correspondingly, an embodiment of the present invention further provides a voice instruction recognition apparatus, including:

the recognition unit is used for recognizing a voice instruction input by a user and generating a voice text;

the acquisition unit is used for acquiring the information slot value of the voice text;

and the judging unit is used for judging whether to execute the voice command according to the information slot value.

Similarly, an embodiment of the present invention further provides a terminal device, including:

a memory for storing an application program;

a processor for implementing the steps of any of the speech instruction recognition methods when executing the application program.

In addition, an embodiment of the present invention further provides a storage medium, where an application program is stored on the storage medium, and the application program, when executed by a processor, implements the steps of any one of the voice instruction recognition methods.

The embodiment of the invention provides a voice instruction recognition method, wherein a user inputs a voice instruction, a terminal device utilizes a microphone to collect the voice instruction and utilizes an automatic voice recognition technology to recognize the voice instruction input by the user to generate a voice text, the terminal device utilizes an NLU technology to analyze the voice text, determines the belonged field, expression intention and information slot of the voice text, fills the information slot and generates an information slot value, the terminal firstly judges the belonged field included by the voice text, if the voice text includes at least two different belonged fields, the voice text is determined to be abnormal, the voice instruction is not executed, the terminal enables the user to stop voice input, if the voice text includes one belonged field, the voice text is determined to be normal, then the terminal judges the expression intention included by the voice text, if the voice text includes at least two different expression intentions, determining that the voice text is abnormal, not executing a voice instruction, stopping voice input by the terminal, if the voice text comprises the expression intention, determining that the voice text is normal, then the terminal judges the information slot value included by the voice text, the voice text includes at least two different information slot values, the voice text is determined to be abnormal, the voice instruction is not executed, the terminal stops the voice input, if the voice text includes one information slot value, the voice text is determined to be normal, the terminal executes the voice text, and when the voice text is abnormal, the terminal increases a stop button and/or a pause button of the voice input, or the position of the stop button and/or the pause button of the voice input is changed, and the stop button and/or the pause button of the voice input is moved from the edge position of the terminal display screen to the middle position, so that the user can conveniently and quickly stop the voice input.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a speech command recognition scenario provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a method for recognizing a voice command according to an embodiment of the present invention;

FIG. 3 is another flow chart of a method for recognizing a voice command according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a voice command recognition apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a voice instruction identification method, a voice instruction identification device, terminal equipment and a storage medium. The device can be integrated in a terminal, and the terminal can be a mobile phone, a tablet computer, a notebook computer, a vehicle-mounted terminal and other equipment.

For example, as shown in fig. 1, a voice input function of the terminal device is turned on, the terminal device turns on a microphone, a user inputs a voice instruction to the terminal device, the terminal device collects the voice instruction input by the user by using the microphone, then transmits the voice instruction to a recognition unit of the terminal system, recognizes the voice instruction by using an automatic voice recognition technology, converts the voice instruction into a voice text, then transmits the voice text to an acquisition unit of the terminal system, analyzes the voice text by using an NLU technology, classifies the voice text so as to recognize an expression intention and a belonging field of the voice text, determines an information slot of the voice text, performs sequence labeling on the voice text, fills the information slot of the voice text, generates an information slot value, and finally the terminal device transmits the voice text to a judgment unit of the terminal system, the terminal judges the belonging field of the voice text firstly, if the voice text comprises at least two different belonging fields, the voice text is determined to be abnormal, a voice instruction is not executed, the terminal enables a user to stop voice input, if the voice text comprises one belonging field, the voice text is determined to be normal, then the terminal judges the expression intention of the voice text, if the voice text comprises at least two different expression intents, the voice text is determined to be abnormal, the voice instruction is not executed, the terminal stops the voice input, if the voice text comprises one expression intention, the voice text is determined to be normal, then the terminal judges the information slot value of the voice text, the voice text comprises at least two different information slot values, the voice text is determined to be abnormal, the voice instruction is not executed, and the terminal stops the voice input, if the voice text comprises the information slot value, the voice text is determined to be normal, the terminal executes the voice text, when the voice text is abnormal, the terminal increases a stop button and/or a pause button of voice input, or changes the position of the stop button and/or the pause button of the voice input, and moves the stop button and/or the pause button of the voice input to the middle position from the edge position of a display screen of the terminal, so that a user can conveniently click and stop the voice input.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The embodiment will be described from the perspective of a voice instruction recognition device, which may be specifically integrated in a terminal device, where the terminal device may include a notebook computer, a tablet computer, a smart phone, a vehicle-mounted terminal, and the like.

A voice instruction recognition method, comprising: recognizing a voice instruction input by a user and generating a voice text; acquiring an information slot value of the voice text; and judging whether to execute the voice command according to the information slot value.

As shown in fig. 2, the specific flow of the voice command recognition method is as follows:

step 201, recognizing a voice instruction input by a user, and generating a voice text.

For example, referring to fig. 3 together, in the terminal device with a voice input function, the user turns on the voice input function of the terminal device, step 301 is executed, the terminal device turns on a recording device, such as a microphone, etc., the user inputs a voice command to the terminal device, step 302 is executed, the terminal device collects the voice command input by the user by using the recording device, step 303 is executed, and then the terminal device recognizes the voice command by using an automatic voice recognition technology, and converts the voice command into a voice text.

The voice instruction is the sound collected by the recording device after the recording function of the terminal device is turned on, and includes the words of the user to express the requirements of the user on the terminal device, such as weather of Shenzhen today, background noise when the user inputs the voice instruction, and the like.

Step 202, obtaining the information slot value of the voice text.

For example, referring to fig. 3, after the terminal device obtains the voice text input by the user, the NLU technology is used to analyze the voice text, step 304 is executed to obtain the domain, expression intention and information slot of the voice text, step 305 is executed to fill the information slot of the voice text, and an information slot value is generated.

Among them, NLU (Natural Language Processing) is a technology for communicating with a computer using a Natural Language, and since the key point of Processing the Natural Language is to "understand" the Natural Language by the computer, Natural Language Processing is also called Natural Language Understanding (NLU), which is also called Computational linguistics (Computational linguistics). NLU is a branch subject of artificial intelligence, and researches on simulating human language communication process by an electronic computer to make the computer understand and use the natural language of human society, such as Chinese and English, to realize the natural language communication between human and computer to replace part of mental labor of human, including processing and processing of query data, answering questions, extract documents, assembly data and all related natural language information.

Optionally, referring to fig. 3, the terminal device classifies the voice text by using a natural language processing technology, so as to identify the domain and the expression intention of the voice text, then determines an information slot of the voice text according to the domain and the expression intention, and the terminal device performs sequence labeling on the voice text, and executes step 305, fills the information slot of the voice text, and generates an information slot value. For example, after the speech text is "how weather is in shenzhen today", the terminal device classifies the speech text by using natural language, recognizes that the field of the speech text belongs to the weather field, and expresses that the weather is queried, the information slots included in the speech text are dates and places, and performs the process of sequentially labeling the speech text by determining a corresponding information slot for each word of the speech text, where the information slot corresponding to "present" in the speech text is a date, the information slot corresponding to "day" in the speech text is a date, the information slot corresponding to "deep" in the speech text is a place, the information slot corresponding to "zhen" in the speech text is a place, the "present" in the speech text does not have a corresponding information slot, the "day" in the speech text does not have a corresponding information slot, the "qi" in the speech text does not have a corresponding information slot, and the "what" in the speech text does not have a corresponding information slot, no corresponding information slot exists in the 'sample' in the voice text, and no corresponding information slot exists in the 'sample' in the voice text, so that the information slot value corresponding to the information slot date is today, and the information slot value corresponding to the information slot place is Shenzhen.

And step 203, judging whether to execute the voice command according to the information slot value.

For example, referring to fig. 3 together, after the terminal device obtains the domain, the expression intention, the information slot, and the information slot value of the voice text, it determines whether the voice text is abnormal according to the expression intention, the domain, and the information slot value, step 306 is executed, the terminal device first determines the domain of the voice text, if the terminal device determines that the voice text includes at least two different domains, it indicates that the voice text is abnormal, step 309 is executed to stop the voice input, the terminal device increases the stop button and/or the pause button of the voice input, and can select to move the stop button and/or the pause button of the voice input from the edge corner of the display screen of the terminal device to the middle, so as to facilitate the user to click and stop the voice input.

Optionally, if it is determined that the voice text includes at least two different fields, it indicates that the voice text is abnormal, and if the terminal device controls the voice input function to stop by using the gesture or the facial expression, the terminal device reduces the requirement that the gesture or the facial expression controls the voice input function, for example, reduces the duration or complexity of the gesture or the facial expression, so that the terminal device is more sensitive to the gesture or the facial expression that stops the voice input function, and a user can conveniently stop the voice input of the terminal device.

Optionally, referring to fig. 3 together, when it is determined that the speech text includes one of the domains, the terminal device determines that the speech text is normal, step 307 is executed, then the terminal device determines an expression intention included in the speech text, if the terminal device determines that the speech text includes at least two different expression intentions, it indicates that the speech text is abnormal, step 309 is executed, the speech input is stopped, the terminal device increases a stop button and/or a pause button of the speech input, and the stop button and/or the pause button of the speech input can be selected to be moved from an edge corner to the middle of a display screen of the terminal device, so that the user can click to stop the speech input.

Optionally, if the terminal device determines that the voice text includes at least two different expression intentions, it indicates that the voice text is abnormal, and if the terminal device controls the voice input function to stop by using the gesture or the facial expression, the terminal device reduces the requirement that the gesture or the facial expression controls the voice input function, for example, reduces the duration or complexity of the gesture or the facial expression, so that the terminal device is more sensitive to the gesture or the facial expression that stops the voice input function, and a user can conveniently stop the voice input of the terminal device.

Optionally, referring to fig. 3 together, when it is determined that the speech text includes the expression intention, the terminal device determines that the speech text is normal, step 308 is executed, then the terminal device determines an information slot value included in the speech text, if the terminal device determines that the speech text includes at least two different information slot values, it indicates that the speech text is abnormal, step 309 is executed, the speech input is stopped, the terminal device increases a stop button and/or a pause button of the speech input, and may select to move the stop button and/or the pause button of the speech input from an edge corner of a display screen of the terminal device to the middle, so that a user can click to stop the speech input, otherwise, the speech text is normal, step 310 is executed, and the speech instruction is executed.

Optionally, if the terminal device determines that the voice text includes at least two different slot values, it indicates that the voice text is abnormal, and if the terminal device controls the voice input function to stop by using the gesture or the facial expression, the terminal device reduces the requirement for controlling the voice input function by using the gesture or the facial expression, for example, the duration or complexity of the gesture or the facial expression is reduced, so that the vehicle-mounted terminal senses the gesture or the facial expression for stopping the voice input function more sensitively, and a user can conveniently stop the voice input by using the terminal device.

In order to better implement the above method, an embodiment of the present invention may further provide a voice instruction recognition apparatus, where the voice instruction recognition apparatus may be specifically integrated in a network device, and the network device may be a terminal or another device.

For example, as shown in fig. 4, the voice instruction recognition apparatus may include a recognition unit 401, an acquisition unit 402, and a determination unit 403 as follows:

(1) recognition unit 401

The recognition unit 401 is configured to recognize a voice instruction input by a user and generate a voice text.

For example, a user turns on a voice input function of the terminal device, the recognition unit 401 of the terminal device turns on a recording device such as a microphone, the recording device collects a voice instruction input by the user, and the recognition unit 401 recognizes the voice instruction by using an automatic voice recognition technology to convert the voice instruction into a voice text.

(2) Acquisition unit 402

An obtaining unit 402, configured to obtain an information slot value of the speech text.

For example, after the terminal device acquires the voice text, the recognition unit 401 of the terminal device transmits the voice text to the acquisition unit 402, and the acquisition unit 402 analyzes the voice text by using the NLU technology to acquire the domain, the expression intention, the information slot, and the information slot value of the voice text.

Optionally, the obtaining unit 402 classifies the voice text by using a natural language processing technology, so as to identify an expression intention and a domain of the voice text, and determine an information slot of the voice text, and the obtaining unit 402 performs sequence labeling on the voice text, fills the information slot of the voice text, and generates an information slot value.

(3) Judging unit 403

A judging unit 403, configured to judge whether to execute the voice instruction according to the information slot value.

For example, the obtaining unit 402 transmits the expression intention, the belonging field, the information slot and the information slot value of the voice text to the judging unit 403, the judging unit 403 firstly determines that the voice text comprises at least two different belonging fields, which indicates that the voice text is abnormal, the terminal device increases the stop button and/or pause button of the voice input, and can select to move the stop button and/or pause button of the voice input from the edge corner to the middle of the display screen of the terminal device, so as to facilitate the user to click and stop the voice input, if the judging unit 403 determines that the voice text comprises one of the belonging fields, it determines that the voice text is normal, then the terminal device judges the expression intention comprised by the voice text, if the judging unit 403 determines that the voice text comprises at least two different expression intents, it determines that the voice text is abnormal, the terminal stops the voice input, if the voice text includes one expression intention, it is determined that the voice text is normal, then the terminal device may determine an information slot value included in the voice text, if the determining unit 403 determines that the voice text includes at least two different information slot values, it is determined that the voice text is abnormal, the terminal stops voice input, and if the voice text includes one information slot value, it is determined that the voice text is normal, the terminal executes the voice instruction.

Optionally, the determining unit 403 determines that the voice text includes at least two different fields, which indicates that the voice text is abnormal, and when the terminal device controls the voice input function to stop by using the gesture or the facial expression, the terminal device reduces the requirement for controlling the voice input function by using the gesture or the facial expression, for example, reduces the duration or complexity of the gesture or the facial expression, so that the vehicle-mounted terminal senses the gesture or the facial expression of stopping the voice input function more sensitively, and a user can conveniently stop the voice input of the terminal device.

Accordingly, an embodiment of the present invention further provides a terminal, as shown in fig. 5, the terminal may include a Radio Frequency (RF) circuit 501, a memory 502 including one or more computer-readable storage media, an input unit 503, a display unit 504, a sensor 505, an audio circuit 506, a Wireless Fidelity (WiFi) module 507, a processor 508 including one or more processing cores, and a power supply 509. Those skilled in the art will appreciate that the terminal structure shown in fig. 5 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the RF circuit 501 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for receiving downlink information of a base station and then sending the received downlink information to the one or more processors 508 for processing; in addition, data relating to uplink is transmitted to the base station. In general, RF circuit 501 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 501 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), and the like.

The memory 502 may be used to store software programs and modules, and the processor 508 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal, etc. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 508 and the input unit 503 access to the memory 502.

The input unit 503 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 503 may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 508, and can receive and execute commands sent by the processor 508. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 503 may include other input devices in addition to the touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 504 may be used to display information input by or provided to the user and various graphical user interfaces of the terminal, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 504 may include a Display panel, and optionally, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor 508 to determine the type of touch event, and then the processor 508 provides a corresponding visual output on the display panel according to the type of touch event. Although in FIG. 5 the touch-sensitive surface and the display panel are two separate components to implement input and output functions, in some embodiments the touch-sensitive surface may be integrated with the display panel to implement input and output functions.

The terminal may also include at least one sensor 505, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or the backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal, detailed description is omitted here.

Audio circuitry 506, a speaker, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 506 may transmit the electrical signal converted from the received audio data to a speaker, and convert the electrical signal into a sound signal for output; on the other hand, the microphone converts the collected sound signal into an electric signal, which is received by the audio circuit 506 and converted into audio data, which is then processed by the audio data output processor 508, and then transmitted to, for example, another terminal via the RF circuit 501, or the audio data is output to the memory 502 for further processing. The audio circuit 506 may also include an earbud jack to provide communication of peripheral headphones with the terminal.

WiFi belongs to short-distance wireless transmission technology, and the terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 507, and provides wireless broadband internet access for the user. Although fig. 5 shows the WiFi module 507, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 508 is a control center of the terminal, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby integrally monitoring the mobile phone. Optionally, processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 508.

The terminal also includes a power supply 509 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 508 via a power management system that may be used to manage charging, discharging, and power consumption. The power supply 509 may also include any component such as one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, and the like, which will not be described herein. Specifically, in this embodiment, the processor 508 in the terminal loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 508 runs the application programs stored in the memory 502, thereby implementing various functions: recognizing a voice instruction input by a user, generating a voice text, acquiring an information slot value of the voice text, and judging whether to execute the voice instruction according to the information slot value.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The voice instruction recognition method, the voice instruction recognition device, the terminal device and the storage medium provided by the embodiments of the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the technical scheme and the core idea of the present invention; those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A voice instruction recognition method, comprising:

recognizing a voice instruction input by a user and generating a voice text;

acquiring an information slot value of the voice text;

2. The method of claim 1, wherein the obtaining the slot value of the speech text comprises:

and filling the information slot to generate an information slot value.

3. The method of claim 1, wherein said determining whether to execute the voice command according to the slot value comprises:

4. The method of claim 3, wherein the determining whether to execute the voice instruction according to the domain of the voice text comprises:

5. The method according to claim 4, wherein the determining whether the phonetic text is normal according to the expression intention comprises:

6. The method of claim 5, wherein said determining whether to execute the voice command according to the slot value comprises:

7. The method of any of claims 4-6, wherein said ceasing speech input comprises:

a stop button to increase the voice input.

8. A voice instruction recognition apparatus, comprising:

9. A terminal device, comprising:

a memory for storing an application program;

a processor for implementing the steps in the speech instruction recognition method according to any one of claims 1 to 7 when executing the application program.

10. A storage medium having an application program stored thereon, wherein the application program, when executed by a processor, implements the steps of the speech instruction recognition method according to any one of claims 1 to 7.