CN111897916B - Voice instruction recognition method, device, terminal equipment and storage medium - Google Patents

Voice instruction recognition method, device, terminal equipment and storage medium Download PDF

Info

Publication number
CN111897916B
CN111897916B CN202010722276.6A CN202010722276A CN111897916B CN 111897916 B CN111897916 B CN 111897916B CN 202010722276 A CN202010722276 A CN 202010722276A CN 111897916 B CN111897916 B CN 111897916B
Authority
CN
China
Prior art keywords
voice
text
information slot
voice text
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010722276.6A
Other languages
Chinese (zh)
Other versions
CN111897916A (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huizhou TCL Mobile Communication Co Ltd
Original Assignee
Huizhou TCL Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huizhou TCL Mobile Communication Co Ltd filed Critical Huizhou TCL Mobile Communication Co Ltd
Priority to CN202010722276.6A priority Critical patent/CN111897916B/en
Publication of CN111897916A publication Critical patent/CN111897916A/en
Application granted granted Critical
Publication of CN111897916B publication Critical patent/CN111897916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the invention discloses a voice instruction recognition method, a voice instruction recognition device, terminal equipment and a storage medium. The voice command recognition method provided by the embodiment of the invention comprises the steps of recognizing a voice command input by a user and generating a voice text; acquiring an information slot value of the voice text; and judging whether to execute the voice command according to the information slot value, so that when the terminal equipment receives the abnormal voice command, a user can conveniently and quickly stop voice input.

Description

Voice instruction recognition method, device, terminal equipment and storage medium
Technical Field
The present invention relates to the field of mobile communications technologies, and in particular, to a method and apparatus for recognizing a voice command, a terminal device, and a storage medium.
Background
ASR (Automatic Speech Recognition, automatic speech recognition technology) is a technology for converting human speech into text, and is applied to various terminal devices, such as smart phones, notebook computers, tablet computers, vehicle-mounted terminals and the like, along with development and progress of scientific technology, convenience requirements of users on life are gradually improved, more and more terminal devices have functions of collecting user speech and performing speech recognition, the terminal devices collect speech instructions input by the users by using microphones and convert the speech instructions input by the users into speech text by using automatic speech recognition technology, so that the terminal devices can make corresponding system actions according to the speech text conveniently, the user is prevented from controlling the terminal system to make corresponding actions by using text input or other operations, the user can control the terminal more conveniently through speech, and in the research and practice process of the prior art, the inventor of the invention finds that the user can have noisy background sounds when inputting speech, the terminal can receive a lot of noisy speech instructions, and the user can not conveniently and rapidly close the speech input function of the terminal device.
Disclosure of Invention
The embodiment of the invention provides a voice instruction recognition method, a device, a terminal device and a storage medium, which are used for judging whether a voice text is abnormal according to the field, the expression intention and an information slot value of the voice text, so that a user can conveniently and rapidly stop voice input when the terminal device receives an abnormal voice instruction.
The embodiment of the invention provides a voice instruction recognition method, which comprises the following steps:
recognizing a voice instruction input by a user, and generating a voice text;
acquiring an information slot value of the voice text;
and judging whether to execute the voice instruction according to the information slot value.
Optionally, in some embodiments of the present invention, the obtaining an information slot value of the voice text includes:
determining the domain and the expression intention according to the content of the voice text;
acquiring an information slot of the voice text according to the belonging field and the expression intention;
and filling the information slot to generate an information slot value.
Optionally, in some embodiments of the present invention, the determining whether to execute the voice command according to the information slot value includes:
judging whether to execute the voice instruction according to the field of the voice text.
Optionally, in some embodiments of the present invention, the determining whether to execute the voice instruction according to the domain of the voice text includes:
determining that the voice text comprises at least two different belonging areas, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises a belonging field, and judging whether to execute the voice instruction according to the expression intention of the voice text.
Optionally, in some embodiments of the present invention, the determining whether the voice text is normal according to the expression intention includes:
determining that the voice text comprises at least two different expression intents, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises the expression intention, and judging whether to execute the voice instruction according to the information slot value.
Optionally, in some embodiments of the present invention, the determining whether to execute the voice command according to the information slot value includes:
determining that the voice text comprises at least two different information slot values, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises the information slot value and executing the voice instruction.
Optionally, in some embodiments of the present invention, the stopping the voice input includes:
stop button for increasing voice input.
Correspondingly, the embodiment of the invention also provides a voice instruction recognition device, which comprises:
the recognition unit is used for recognizing the voice instruction input by the user and generating a voice text;
the acquisition unit is used for acquiring the information slot value of the voice text;
and the judging unit is used for judging whether to execute the voice instruction according to the information slot value.
Also, an embodiment of the present invention further provides a terminal device, including:
a memory for storing an application program;
a processor for implementing the steps of any one of the voice instruction recognition methods when executing the application.
In addition, the embodiment of the invention also provides a storage medium, wherein an application program is stored on the storage medium, and the application program realizes any step of the voice instruction recognition method when being executed by a processor.
The embodiment of the invention provides a voice instruction recognition method, a user inputs a voice instruction, terminal equipment utilizes a microphone to collect the voice instruction and utilizes an automatic voice recognition technology to recognize the voice instruction input by the user to generate a voice text, the terminal equipment utilizes an NLU technology to analyze the voice text, determine the belonging field, the expression intention and an information slot of the voice text, fill the information slot and generate an information slot value, the terminal firstly judges the belonging field of the voice text, if the voice text comprises at least two different belonging fields, determines that the voice text is abnormal, does not execute the voice instruction, the terminal can stop voice input of the user, if the voice text comprises one belonging field, determines that the voice text is normal, then the terminal can judge the expression intention of the voice text, if the voice text comprises at least two different expression intentions, determining that the voice text is abnormal, not executing voice instruction, stopping voice input by the terminal, if the voice text comprises one expression intention, determining that the voice text is normal, then judging information slot values included in the voice text by the terminal, determining that the voice text is abnormal, not executing voice instruction, stopping voice input by the terminal, if the voice text comprises one information slot value, determining that the voice text is normal, executing the voice text by the terminal, increasing a stop button and/or a pause button of voice input by the terminal when the voice text is abnormal, or changing the position of the stop button and/or the pause button of voice input, moving the stop button and/or the pause button of voice input from the edge position of a display screen of the terminal to a middle position, the user can conveniently and quickly stop the voice input.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a speech instruction recognition scenario provided by an embodiment of the present invention;
FIG. 2 is a flowchart of a voice command recognition method according to an embodiment of the present invention;
FIG. 3 is another flow chart of a voice command recognition method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a voice command recognition device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a voice instruction recognition method, a voice instruction recognition device, terminal equipment and a storage medium. The device can be integrated in a terminal, and the terminal can be a mobile phone, a tablet computer, a notebook computer, a vehicle-mounted terminal and other devices.
For example, as shown in fig. 1, the voice input function of the terminal device is opened, the terminal device opens a microphone, the user inputs a voice command against the terminal device, the terminal device collects the voice command input by the user by the microphone, then transmits the voice command to the recognition unit of the terminal system, recognizes the voice command by the automatic voice recognition technology, converts the voice command into a voice text, then the terminal device transmits the voice text to the acquisition unit of the terminal system, analyzes the voice text by using the NLU technology, classifies the voice text, thereby recognizing the expression intention and the belonging field of the voice text, determines the information slot of the voice text, marks the voice text in sequence, fills the information slot of the voice text, generates an information slot value, and finally the terminal device transmits the voice text to the judgment unit of the terminal system, the terminal firstly judges the belonging field of the voice text, if the voice text comprises at least two different belonging fields, determines that the voice text is abnormal, the terminal does not execute the voice command, the user stops inputting the voice text if the voice text comprises one of the belonging fields, determines that the text comprises the voice text comprises the two different voice commands, then the terminal device stops inputting the voice command if the text comprises at least two different voice commands, the two abnormal text values comprise the text values, and the terminal device stops the voice command is determined to be different in the text value, and the abnormal text is executed, if the voice text comprises the information slot value, the voice text is determined to be normal, when the terminal executes the voice text and the voice text is abnormal, the terminal can increase a stop button and/or a pause button of voice input, or change the position of the stop button and/or the pause button of voice input, move the stop button and/or the pause button of voice input from the edge position of a terminal display screen to the middle position, facilitate clicking by a user and stop voice input, and if a terminal system utilizes a gesture or a facial expression to control the voice input function to stop, the terminal device reduces the requirement of the gesture or the facial expression to control the voice input function, so that the terminal device is more sensitive to the response of the gesture or the facial expression to stop the voice input function, and the convenience of stopping the voice input function by the user is improved.
The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.
The present embodiment will be described from the viewpoint of a voice instruction recognition apparatus, which may be integrated in a terminal device, which may include a notebook computer, a tablet computer, a smart phone, a vehicle-mounted terminal, and the like.
A voice command recognition method, comprising: recognizing a voice instruction input by a user, and generating a voice text; acquiring an information slot value of the voice text; and judging whether to execute the voice instruction according to the information slot value.
As shown in fig. 2, the specific flow of the voice command recognition method is as follows:
step 201, recognizing a voice command input by a user, and generating a voice text.
For example, referring to fig. 3 together, in a terminal device having a voice input function, a user turns on the voice input function of the terminal device, performs step 301, the terminal device turns on a recording device, such as a microphone, etc., the user inputs a voice command to the terminal device, performs step 302, the terminal device collects the voice command input by the user using the recording device, performs step 303, and then the terminal device recognizes the voice command using an automatic voice recognition technology, and converts the voice command into a voice text.
The voice command is a sound collected by the recording device after the recording function of the terminal device is opened, and the sound includes words input by a user to the terminal device and expressing requirements of the user on the terminal device, for example, how weather is in the best mode today, background noise and the like when the user inputs the voice command, the content, format and length of the voice command are not limited, and the user can flexibly input the voice command according to actual conditions.
Step 202, obtaining an information slot value of the voice text.
For example, referring to fig. 3 together, after the terminal device obtains the voice text input by the user, the voice text is analyzed by using NLU technology, step 304 is performed to obtain the domain, the expression intention and the information slot of the voice text, step 305 is performed to fill the information slot of the voice text, and an information slot value is generated.
Among them, NLU (Natural Language Processing ) is a technology for communicating with a computer using natural language, and because it is a key to process natural language to make the computer "understand" natural language, the natural language processing is also called natural language understanding (NLU, natural Language Understanding), and is also called computational linguistics (Computational Ling uistics). NLU is a branch subject of artificial intelligence, and is used for simulating the language interaction process of people by using an electronic computer so that the computer can understand and use natural language of human society such as Chinese, english and the like to realize natural language communication between people and machines to replace part of mental labor of people, and comprises the processing of inquiring data, solving problems, picking up documents, assembling data and all related natural language information.
Optionally, referring to fig. 3 together, the terminal device classifies the voice text by using a natural language processing technology, so as to identify the domain and the expression intention of the voice text, then determines the information slot of the voice text according to the domain and the expression intention, and performs a sequence labeling on the voice text, and executes step 305 to fill the information slot of the voice text to generate an information slot value. For example, after the terminal device classifies the voice text by using natural language, it recognizes that the area to which the voice text belongs is the weather area, expresses that the intention is to inquire about weather, the voice text includes information slots of date and place, the process of marking the voice text in sequence determines corresponding information slots for each word of the voice text, the information slots corresponding to "present" in the voice text are date, the information slots corresponding to "day" in the voice text are date, the information slots corresponding to "deep" in the voice text are place, the information slots corresponding to "Zhen" in the voice text are place, the "none corresponding information slots in the voice text," day "in the voice text do not have corresponding information slots, the" air "in the voice text does not have corresponding information slots, the" like "in the voice text does not have corresponding information slots, and thus the information slot value corresponding to" day "in the voice text is obtained, and the information slot value corresponding to" day "place" is the place.
And 203, judging whether to execute the voice instruction according to the information slot value.
For example, referring to fig. 3 together, after the terminal device obtains the domain of the voice text, the expression intention, the information slot and the information slot value, it is determined whether the voice text is abnormal according to the expression intention, the domain and the information slot value, step 306 is performed, the terminal device first determines the domain of the voice text, if the terminal device determines that the voice text includes at least two different domains, it indicates that the voice text is abnormal, step 309 is performed, the voice input is stopped, the terminal device increases the stop button and/or the pause button of the voice input, and the stop button and/or the pause button of the voice input can be selectively moved from the edge corner of the display screen of the terminal device to the middle, so that the user can click conveniently to stop the voice input.
Optionally, if it is determined that the voice text includes at least two different fields, the voice text is indicated to be abnormal, and if the terminal device uses the gesture or the facial expression to control the stop of the voice input function, the terminal device reduces the requirement of the gesture or the facial expression to control the voice input function, for example, reduces the duration or the complexity of the gesture or the facial expression, so that the terminal device is more sensitive to the gesture or the facial expression stopping the voice input function, and the user is facilitated to stop the voice input of the terminal device.
Optionally, referring to fig. 3, when it is determined that the voice text includes one of the fields, the terminal device determines that the voice text is normal, performs step 307, then determines that the voice text includes an expression intention, if the terminal device determines that the voice text includes at least two different expression intentions, indicates that the voice text is abnormal, performs step 309, stops voice input, increases a stop button and/or a pause button of voice input, and may select to move the stop button and/or the pause button of voice input from an edge corner of a display screen of the terminal device to the middle, so as to facilitate clicking by a user, and stops voice input.
Optionally, if the terminal device determines that the voice text includes at least two different expression intentions, the voice text is indicated to be abnormal, and if the terminal device uses the gesture or the facial expression to control the stop of the voice input function, the terminal device reduces the requirement of the gesture or the facial expression to control the voice input function, for example, reduces the duration or the complexity of the gesture or the facial expression, so that the terminal device is more sensitive to the gesture or the facial expression stopping the voice input function, and the user is convenient to stop the voice input of the terminal device.
Optionally, referring to fig. 3, when determining that the voice text includes the expression intention, the terminal device determines that the voice text is normal, performs step 308, then the terminal device determines an information slot value included in the voice text, if the terminal device determines that the voice text includes at least two different information slot values, it indicates that the voice text is abnormal, performs step 309, stops voice input, increases a stop button and/or a pause button of voice input, and may select to move the stop button and/or the pause button of voice input from an edge corner of a display screen of the terminal device to the middle, so as to facilitate clicking by a user, stop voice input, otherwise, the voice text is normal, performs step 310, and performs the voice instruction.
Optionally, if the terminal device determines that the voice text includes at least two different information slot values, it indicates that the voice text is abnormal, and if the terminal device uses a gesture or a facial expression to control the stop of the voice input function, the terminal device reduces a requirement of the gesture or the facial expression to control the voice input function, for example, reduces duration or complexity of the gesture or the facial expression, so that the vehicle-mounted terminal is more sensitive to the gesture or the facial expression stopping the voice input function, and the user is facilitated to stop the voice input of the terminal device.
In order to better implement the above method, the embodiment of the present invention may further provide a voice command recognition device, where the voice command recognition device may be specifically integrated in a network device, and the network device may be a device such as a terminal.
For example, as shown in fig. 4, the voice instruction recognition apparatus may include a recognition unit 401, an acquisition unit 402, and a judgment unit 403, as follows:
(1) Identification unit 401
The recognition unit 401 is configured to recognize a voice instruction input by a user and generate a voice text.
For example, the user turns on the voice input function of the terminal device, the recognition unit 401 of the terminal device turns on a recording device such as a microphone, the recording device collects a voice instruction input by the user, and the recognition unit 401 recognizes the voice instruction by using an automatic voice recognition technology and converts the voice instruction into a voice text.
(2) Acquisition unit 402
An obtaining unit 402, configured to obtain an information slot value of the voice text.
For example, after the terminal device acquires the voice text, the recognition unit 401 of the terminal device transmits the voice text to the acquisition unit 402, and the acquisition unit 402 analyzes the voice text using NLU technology to acquire the domain, the expression intention, the information slot, and the information slot value to which the voice text belongs.
Optionally, the obtaining unit 402 classifies the voice text by using a natural language processing technology, so as to identify the expression intention and the belonging field of the voice text, and determines the information slot of the voice text, and the obtaining unit 402 marks the voice text in sequence, fills the information slot of the voice text, and generates an information slot value.
(3) Judgment unit 403
A judging unit 403, configured to judge whether to execute the voice instruction according to the information slot value.
For example, the obtaining unit 402 transmits the expression intention, the belonging field, the information slot and the information slot value of the voice text to the judging unit 403, the judging unit 403 determines that the voice text includes at least two different belonging fields, then indicates that the voice text is abnormal, the terminal device increases the stop button and/or the pause button of the voice input, and may select to move the stop button and/or the pause button of the voice input from the edge corner of the display screen of the terminal device to the middle, so as to facilitate clicking by the user, stop the voice input, determine that the voice text is normal if the judging unit 403 determines that the voice text includes one of the belonging fields, then the terminal device judges that the voice text includes the expression intention, determine that the voice text is abnormal if the judging unit 403 determines that the voice text includes at least two different expression intention, determine that the voice text is normal if the voice text includes one of the expression intention, then the terminal device judges that the voice text includes the information slot value, and if the judging unit 403 determines that the voice text includes at least two different information slot values, the terminal device executes the instruction to perform the normal voice text.
Optionally, the determining unit 403 determines that the voice text includes at least two different fields, which indicates that the voice text is abnormal, and when the terminal device uses the gesture or the facial expression to control the voice input function to stop, the terminal device reduces the requirement of the gesture or the facial expression to control the voice input function, for example, reduces the duration or the complexity of the gesture or the facial expression, so that the vehicle-mounted terminal senses the gesture or the facial expression for stopping the voice input function more sensitively, and the user is convenient for stopping the voice input by the terminal device.
Accordingly, embodiments of the present invention also provide a terminal, as shown in fig. 5, which may include a Radio Frequency (RF) circuit 501, a memory 502 including one or more computer readable storage media, an input unit 503, a display unit 504, a sensor 505, an audio circuit 506, a wireless fidelity (WiFi, wireless Fidelity) module 507, a processor 508 including one or more processing cores, and a power supply 509. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 5 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the RF circuit 501 may be configured to receive and send information or signals during a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 508; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 501 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, RF circuitry 501 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), general packet radio service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message service (SMS, short Messaging Service), and the like.
The memory 502 may be used to store software programs and modules that the processor 508 performs various functional applications and data processing by executing the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the terminal, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 502 may also include a memory controller to provide access to the memory 502 by the processor 508 and the input unit 503.
The input unit 503 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 503 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 508, and can receive commands from the processor 508 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 503 may comprise other input devices besides a touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 504 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the terminal, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 504 may include a display panel, which may be optionally configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 508 to determine the type of touch event, and the processor 508 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 5 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.
The terminal may also include at least one sensor 505, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured in the terminal are not described in detail herein.
Audio circuitry 506, speakers, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 506 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 506 and converted into audio data, which are processed by the audio data output processor 508, and then sent to, for example, another terminal via the RF circuit 501, or the audio data are output to the memory 502 for further processing. The audio circuit 506 may also include an ear bud jack to provide communication of the peripheral ear bud with the terminal.
The WiFi belongs to a short-distance wireless transmission technology, and the terminal can help the user to send and receive e-mails, browse web pages, access streaming media and the like through the WiFi module 507, so that wireless broadband internet access is provided for the user. Although fig. 5 shows a WiFi module 507, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as required within a range that does not change the essence of the invention.
The processor 508 is a control center of the terminal, and connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 502, and calling data stored in the memory 502, thereby performing overall monitoring of the mobile phone. Optionally, the processor 508 may include one or more processing cores; preferably, the processor 508 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 508.
The terminal also includes a power supply 509 (e.g., a battery) for powering the various components, which may be logically connected to the processor 508 via a power management system so as to provide for the management of charge, discharge, and power consumption by the power management system. The power supply 509 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. In this embodiment, the processor 508 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 502 according to the following instructions, and the processor 508 executes the application programs stored in the memory 502, so as to implement various functions: and recognizing a voice instruction input by a user, generating a voice text, acquiring an information slot value of the voice text, and judging whether to execute the voice instruction according to the information slot value.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
The foregoing describes in detail a voice command recognition method, apparatus, terminal device and storage medium provided by the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing description of the embodiments is only for helping to understand the technical solution and core idea of the present invention; those of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (8)

1. A method for recognizing a voice command, comprising:
recognizing a voice instruction input by a user, and generating a voice text;
acquiring an information slot value of the voice text;
judging whether to execute the voice command according to the information slot value;
the judging whether to execute the voice command according to the information slot value comprises the following steps:
determining that the voice text comprises at least two different belonging areas, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises a belonging field, and judging whether to execute the voice instruction according to the expression intention of the voice text;
the stopping of the voice input includes: the requirement of controlling the voice input function by the gesture or the facial expression is reduced so that the terminal equipment is more sensitive to the sensing of the gesture or the facial expression stopping the voice input function.
2. The method of claim 1, wherein the obtaining the information slot value of the phonetic text comprises:
determining the domain and the expression intention according to the content of the voice text;
acquiring an information slot of the voice text according to the belonging field and the expression intention;
and filling the information slot to generate an information slot value.
3. The method of claim 1, wherein said determining whether to execute the voice instruction based on the expressive intent of the voice text comprises:
determining that the voice text comprises at least two different expression intents, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises the expression intention, and judging whether to execute the voice instruction according to the information slot value.
4. The method of claim 3, wherein said determining whether to execute the voice command based on the information slot value comprises:
determining that the voice text comprises at least two different information slot values, not executing the voice instruction, and stopping voice input;
determining that the voice text comprises the information slot value and executing the voice instruction.
5. The method of any of claims 1-4, wherein the stopping the voice input further comprises:
stop button for increasing voice input.
6. A voice command recognition apparatus, comprising:
the recognition unit is used for recognizing the voice instruction input by the user and generating a voice text;
the acquisition unit is used for acquiring the information slot value of the voice text;
the judging unit is used for judging whether to execute the voice instruction according to the information slot value;
the judging unit is specifically configured to: determining that the voice text comprises at least two different belonging fields, not executing the voice instruction, reducing the requirement of a gesture or a facial expression to control a voice input function so that the terminal equipment is more sensitive to the sensing of the gesture or the facial expression stopping the voice input function;
determining that the voice text comprises a belonging field, and judging whether to execute the voice instruction according to the expression intention of the voice text.
7. A terminal device, comprising:
a memory for storing an application program;
a processor for implementing the steps in the speech instruction recognition method according to any one of claims 1 to 5 when executing said application program.
8. A storage medium having stored thereon an application program which when executed by a processor performs the steps of the speech instruction recognition method according to any of claims 1 to 5.
CN202010722276.6A 2020-07-24 2020-07-24 Voice instruction recognition method, device, terminal equipment and storage medium Active CN111897916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010722276.6A CN111897916B (en) 2020-07-24 2020-07-24 Voice instruction recognition method, device, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010722276.6A CN111897916B (en) 2020-07-24 2020-07-24 Voice instruction recognition method, device, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111897916A CN111897916A (en) 2020-11-06
CN111897916B true CN111897916B (en) 2024-03-19

Family

ID=73190897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010722276.6A Active CN111897916B (en) 2020-07-24 2020-07-24 Voice instruction recognition method, device, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111897916B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115588432B (en) * 2022-11-23 2023-03-31 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105206266A (en) * 2015-09-01 2015-12-30 重庆长安汽车股份有限公司 Vehicle-mounted voice control system and method based on user intention guess
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
CN109543192A (en) * 2018-11-30 2019-03-29 北京羽扇智信息科技有限公司 Natural language analytic method, device, equipment and storage medium
CN109616111A (en) * 2018-12-24 2019-04-12 北京恒泰实达科技股份有限公司 A kind of scene interactivity control method based on speech recognition
CN109800407A (en) * 2017-11-15 2019-05-24 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer equipment and storage medium
CN110659970A (en) * 2018-06-12 2020-01-07 百度在线网络技术(北京)有限公司 Account information processing method and device based on voice recognition and electronic equipment
CN110827816A (en) * 2019-11-08 2020-02-21 杭州依图医疗技术有限公司 Voice instruction recognition method and device, electronic equipment and storage medium
CN111124121A (en) * 2019-12-24 2020-05-08 腾讯科技(深圳)有限公司 Voice interaction information processing method and device, storage medium and computer equipment
CN111261157A (en) * 2020-01-03 2020-06-09 苏州思必驰信息科技有限公司 Control method, device and equipment for short video and storage medium
CN111341311A (en) * 2020-02-21 2020-06-26 深圳前海微众银行股份有限公司 Voice conversation method and device
CN111373473A (en) * 2018-03-05 2020-07-03 华为技术有限公司 Electronic equipment and method for performing voice recognition by using same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140094744A (en) * 2013-01-22 2014-07-31 한국전자통신연구원 Method and apparatus for post-editing voice recognition results in portable device
US11100140B2 (en) * 2018-06-04 2021-08-24 International Business Machines Corporation Generation of domain specific type system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
CN105206266A (en) * 2015-09-01 2015-12-30 重庆长安汽车股份有限公司 Vehicle-mounted voice control system and method based on user intention guess
CN109800407A (en) * 2017-11-15 2019-05-24 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer equipment and storage medium
CN111373473A (en) * 2018-03-05 2020-07-03 华为技术有限公司 Electronic equipment and method for performing voice recognition by using same
CN110659970A (en) * 2018-06-12 2020-01-07 百度在线网络技术(北京)有限公司 Account information processing method and device based on voice recognition and electronic equipment
CN109543192A (en) * 2018-11-30 2019-03-29 北京羽扇智信息科技有限公司 Natural language analytic method, device, equipment and storage medium
CN109616111A (en) * 2018-12-24 2019-04-12 北京恒泰实达科技股份有限公司 A kind of scene interactivity control method based on speech recognition
CN110827816A (en) * 2019-11-08 2020-02-21 杭州依图医疗技术有限公司 Voice instruction recognition method and device, electronic equipment and storage medium
CN111124121A (en) * 2019-12-24 2020-05-08 腾讯科技(深圳)有限公司 Voice interaction information processing method and device, storage medium and computer equipment
CN111261157A (en) * 2020-01-03 2020-06-09 苏州思必驰信息科技有限公司 Control method, device and equipment for short video and storage medium
CN111341311A (en) * 2020-02-21 2020-06-26 深圳前海微众银行股份有限公司 Voice conversation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Mehmet Berkehan Akçay 等;《Speech Communication》;56-76 *
面向限定领域问答系统的自然语言理解方法综述;王东升 等;《计算机科学》;1-8+41 *

Also Published As

Publication number Publication date
CN111897916A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
KR102470275B1 (en) Voice control method and electronic device
KR20200027554A (en) Speech recognition method and apparatus, and storage medium
CN108549519B (en) Split screen processing method and device, storage medium and electronic equipment
CN106528545B (en) Voice information processing method and device
CN109284144B (en) Fast application processing method and mobile terminal
US9921735B2 (en) Apparatuses and methods for inputting a uniform resource locator
CN109949795A (en) A kind of method and device of control smart machine interaction
WO2019007414A1 (en) Method for realizing support of application for multiple languages, storage device, and mobile terminal
CN112230877A (en) Voice operation method and device, storage medium and electronic equipment
CN111273955B (en) Thermal restoration plug-in optimization method and device, storage medium and electronic equipment
CN111897916B (en) Voice instruction recognition method, device, terminal equipment and storage medium
CN113393838A (en) Voice processing method and device, computer readable storage medium and computer equipment
CN109062643A (en) A kind of display interface method of adjustment, device and terminal
CN111580911A (en) Operation prompting method and device for terminal, storage medium and terminal
CN108170360B (en) Control method of gesture function and mobile terminal
CN111027406B (en) Picture identification method and device, storage medium and electronic equipment
CN115116434A (en) Application implementation method and device, storage medium and electronic equipment
CN105635379B (en) Noise suppression method and device
CN109032482B (en) Split screen control method and device, storage medium and electronic equipment
CN108446579B (en) Method and terminal for identifying graphic code
CN111355991A (en) Video playing method and device, storage medium and mobile terminal
CN115831120B (en) Corpus data acquisition method and device, electronic equipment and readable storage medium
CN111405649B (en) Information transmission method and device and mobile terminal
CN111182141B (en) Method, device and storage medium for acquiring state of terminal device
CN115995231B (en) Voice wakeup method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant