CN111627442A - Speech recognition method, processor, system, computer equipment and readable storage medium - Google Patents

Speech recognition method, processor, system, computer equipment and readable storage medium Download PDF

Info

Publication number
CN111627442A
CN111627442A CN202010462534.1A CN202010462534A CN111627442A CN 111627442 A CN111627442 A CN 111627442A CN 202010462534 A CN202010462534 A CN 202010462534A CN 111627442 A CN111627442 A CN 111627442A
Authority
CN
China
Prior art keywords
action
voice signal
scene
user
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010462534.1A
Other languages
Chinese (zh)
Inventor
葛友杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xingluo Intelligent Technology Co Ltd
Original Assignee
Xingluo Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xingluo Intelligent Technology Co Ltd filed Critical Xingluo Intelligent Technology Co Ltd
Priority to CN202010462534.1A priority Critical patent/CN111627442A/en
Publication of CN111627442A publication Critical patent/CN111627442A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention provides a voice recognition method, a processor, a system, computer equipment and a computer readable storage medium, wherein the method comprises the following steps: receiving and analyzing a first voice signal input by a user, and determining a first action expected to be executed by the first voice signal or a first object of the first action operation; acquiring a second voice signal input by a user within a set time before receiving the first voice signal, and determining a first scene where the user is currently located according to the second voice signal; and determining a first action expected to be executed in the first voice signal and a first object of the first action operation according to the first voice signal and the first scene, and sending a control instruction, wherein the control instruction controls the first action to be executed on the first object. The method can realize the recognition of the user voice under the condition that the user semantics is incomplete.

Description

Speech recognition method, processor, system, computer equipment and readable storage medium
Technical Field
The present invention relates to the field of speech recognition technology, and in particular, to a speech recognition method, processor, system, computer device, and readable storage medium.
Background
The existing voice recognition analysis processing scene can only recognize instructions determined according to voice intentions, such as turning on a lamp in a living room, and instructions which are not clear to the voice intentions are often difficult to recognize. For example, scenes with short semantic divergence, such as opening or brightening of the client instruction, cannot be processed. At present, a semantic analysis system is urgently needed to be provided to solve the defects in the prior art.
Disclosure of Invention
The present invention is directed to overcoming the drawbacks of the prior art and providing a speech recognition method, system, computer device and readable storage medium, so as to solve the problem of the prior art that the speech intention is ambiguous and difficult to recognize.
In order to realize the purpose, the following technical scheme is adopted:
a first aspect of the present invention provides a speech recognition method, including:
receiving and analyzing a first voice signal input by a user, and determining a first action expected to be executed by the first voice signal or a first object of the first action operation;
acquiring a second voice signal input by a user within a set time before receiving the first voice signal, and determining a first scene where the user is currently located according to the second voice signal;
and determining a first action expected to be executed in the first voice signal and a first object of the first action operation according to the first voice signal and the first scene, and sending a control instruction, wherein the control instruction is used for controlling the first action to be executed on the first object.
In a specific embodiment, the obtaining a second voice signal input by a user within a preset time before receiving the first voice signal and determining, according to the second voice signal, a first scene where the user is currently located specifically includes:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal comprises a second action expected to be executed and a second object of the second action operation;
determining a second scene in which the user is located when inputting the second voice signal according to the second action and the second object;
determining the second scene as the first scene.
In a specific embodiment, the obtaining a second voice signal input by a user within a preset time before receiving the first voice signal and determining, according to the second voice signal, a first scene where the user is currently located specifically includes:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal only comprises a second action expected to be executed or only comprises a second object of a second action operation expected to be executed;
determining a second scene where the user is currently located according to the second operation action, or determining the second scene where the user is currently located according to the second object;
determining the second scene as the first scene.
In a specific embodiment, the determining, according to the first speech signal and the current first scene, a first action and a first object of the first action operation expected to be performed in the first speech signal specifically includes:
if the first voice signal only comprises the first action, acquiring an operation object of the first action in the first scene and a first probability of the operation object being executed from an established user scene data stack;
determining a priority of the operation object according to the first probability;
and executing the first operation on the operation object with the highest priority.
In a specific embodiment, the determining, according to the first speech signal and the current first scene, a first action and a first object of the first action operation expected to be performed in the first speech signal specifically includes:
if the first voice signal only comprises the first object, acquiring an operation action matched with the first object in the first scene and a second probability of executing the operation action from the established user scene data stack;
determining the priority of the operation action matched with the first object according to the second probability;
performing the operation action with the highest priority on the first object.
In a specific embodiment, the establishing a user context data stack specifically includes:
acquiring historical voice information input by a user, analyzing the historical voice information, and acquiring a third scene where the historical voice information input by the user is located, a third action expected to be executed by the historical voice information and a third object of the third action operation;
and storing the third scene, the third action, the third object and the corresponding relation among the third scene, the third action, the third object and the third object to form the user scene data stack.
A second aspect of the present invention provides a speech recognition processor, the processor comprising:
a receiving recognition unit for receiving and recognizing a first voice signal input by a user, the first voice signal including a first action or a first object of the first action operation expected to be performed;
a first scene determining unit, configured to acquire a second voice signal input by a user within a set time before receiving the first voice signal, and determine a first scene where the user is currently located according to the second voice signal;
a first action and first object determination unit for determining a first action to be performed in the first voice signal and a first object of the first action operation according to the first voice signal and the first scene, and sending a control instruction which controls the first action to be performed on the first object.
In a specific embodiment, the first scenario determination unit is specifically configured to:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal comprises a second action expected to be executed and a second object of the second action operation;
determining a second scene in which the user is located when inputting the second voice signal according to the second action and the second object;
determining the second scene as the first scene.
In a specific embodiment, the first scenario determination unit is specifically configured to:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal only comprises a second action expected to be executed or only comprises a second object of a second action operation expected to be executed;
determining a second scene where the user is currently located according to the second operation action, or determining the second scene where the user is currently located according to the second object;
determining the second scene as the first scene.
In a particular embodiment, the first action and first object determination unit is particularly configured to:
if the first voice signal only comprises the first action, acquiring an operation object of the first action in the first scene and a first probability of the operation object being executed from an established user scene data stack;
determining a priority of the operation object according to the first probability;
and executing the first operation on the operation object with the highest priority.
In a particular embodiment, the first action and first object determination unit is particularly configured to:
if the first voice signal only comprises the first object, acquiring an operation action matched with the first object in the first scene and a second probability of executing the operation action from the established user scene data stack;
determining the priority of the operation action matched with the first object according to the second probability;
performing the operation action with the highest priority on the first object.
In a specific embodiment, the system further comprises:
acquiring historical voice information input by a user, analyzing the historical voice information, and acquiring a third scene where the historical voice information input by the user is located, a third action expected to be executed by the historical voice information and a third object of the third action operation;
and storing the third scene, the third action, the third object and the corresponding relation among the third scene, the third action, the third object and the third object to form the user scene data stack.
A third aspect of the present invention provides a speech recognition processing system comprising a sound pickup apparatus, an execution apparatus, and the aforementioned processor, wherein,
the pickup equipment is used for collecting a first voice signal input by a user and sending the first voice signal to the processor;
the execution device is used for receiving the control instruction and executing the first action on the first object.
A fourth aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer device, carries out the aforementioned method steps.
A fifth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to cause the computer device to perform the steps of the method.
The invention has the beneficial effects that: the voice recognition method receives and analyzes a first voice signal input by a user, acquires a first scene at a position where a second voice signal input by the user is received within a set time of the first voice signal, determines a first action and a first object expected to be executed in the first voice signal according to the first voice signal and the first scene, and executes a first operation on the first object. According to the method provided by the embodiment of the invention, under the condition that the first voice signal input by the user is incomplete, the first voice input by the user can be analyzed, and further the operation expected to be performed by the user can be executed. The method and the device solve the defect that the voice information input by a user in the prior art is incomplete in semantics and cannot be operated.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope of the present invention.
Fig. 1 is a schematic flowchart of a speech signal recognition method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a speech signal recognition processor according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition system according to a third embodiment of the present invention.
Detailed Description
Hereinafter, various embodiments of the present invention will be described more fully. The invention is capable of various embodiments and of modifications and variations therein. However, it should be understood that: there is no intention to limit various embodiments of the invention to the specific embodiments disclosed herein, but on the contrary, the intention is to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of various embodiments of the invention.
Hereinafter, the terms "includes" or "may include" used in various embodiments of the present invention indicate the presence of disclosed functions, operations, or elements, and do not limit the addition of one or more functions, operations, or elements. Furthermore, as used in various embodiments of the present invention, the terms "comprises," "comprising," "includes," "including," "has," "having" and their derivatives are intended to mean that the specified features, numbers, steps, operations, elements, components, or combinations of the foregoing, are only meant to indicate that a particular feature, number, step, operation, element, component, or combination of the foregoing, is not to be understood as first excluding the existence of, or adding to the possibility of, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
In various embodiments of the invention, the expression "a or/and B" includes any or all combinations of the words listed simultaneously, e.g., may include a, may include B, or may include both a and B.
Expressions (such as "first", "second", and the like) used in various embodiments of the present invention may modify various constituent elements in various embodiments, but may not limit the respective constituent elements. For example, the above description does not limit the order and/or importance of the elements described. The foregoing description is for the purpose of distinguishing one element from another. For example, the first user device and the second user device indicate different user devices, although both are user devices. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of various embodiments of the present invention.
It should be noted that: in the present invention, unless otherwise explicitly stated or defined, the terms "mounted," "connected," "fixed," and the like are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium; there may be communication between the interiors of the two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, it should be understood by those skilled in the art that the terms indicating an orientation or a positional relationship herein are based on the orientations and the positional relationships shown in the drawings and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation and operate, and thus, should not be construed as limiting the present invention.
The terminology used in the various embodiments of the present invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the present invention. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
An embodiment of the present invention provides a speech recognition method, as shown in fig. 1, the method includes the following steps:
s1, receiving and analyzing a first voice signal input by a user, and determining a first action or a first object of the first action operation which is expected to be executed by the first voice signal.
Specifically, a first voice signal input by a user is received, the first voice signal is recognized, and a first action or a first object of the first action operation expected to be executed by the first voice signal is determined.
In a specific embodiment, the first speech signal is semantically incomplete, i.e. the first speech signal comprises only the first action expected to be performed or only the first object of the first action operation.
For example, if the light is turned on as a complete semantic signal, the first voice signal only includes the first action to be performed to turn on, or only includes the first object light of the first action operation.
S2, acquiring a second voice signal input by the user within the set time before the first voice signal is received, and determining the first scene where the user is currently located according to the second voice signal.
And after receiving the first voice signal, acquiring a second voice signal input by the user within a set time before receiving the first voice signal, wherein the set time can be within 3 minutes, namely after receiving the first voice signal input by the user, judging the second voice signal input by the user within 3 minutes before receiving the first voice signal. Wherein the second speech signal may be a semantically complete speech signal, i.e. the second speech signal comprises objects of a second action and a second action operation that are expected to need to be performed.
Wherein the second voice signal may also be the aforementioned action voice signal or object voice signal.
When the second voice signal is a voice signal with complete voice, the scene where the user inputs the second voice signal, the second action expected to be executed by the second voice signal and the second object of the second action operation can be determined according to the second voice signal.
And when the second voice signal is an action voice signal, determining a final second scene, a second action and a second object according to the second voice signal.
And when the second voice signal is the object voice signal, acquiring a scene finally determined according to the second voice signal and executing an action to determine a second scene, a second action and a second object where the second voice signal is located.
After the second scene is determined, because the time difference between the reception of the first voice signal and the reception of the second voice signal is short, it can be considered that the first scene in which the user is currently located is the same as the scene in which the user inputs the second voice signal, that is, the first scene and the second scene are the same scene.
S3, determining a first action expected to be executed in the first voice signal and a first object of the first action operation according to the first voice signal and the current first scene, and executing the first action on the first object.
Specifically, if the first voice signal is an action voice signal, determining an operation object of the first action in the first scene, acquiring a probability corresponding to the first operation object, performing priority ordering according to the operation object, and sequentially executing the first action on the first operation object according to the priority.
For example, assuming that the first voice signal is an action operation signal, assuming that the first action is on, and it is known from the first scene that the second scene in which the user inputs the second voice signal is an intelligent home scene, the scene in which the user inputs the first voice signal is considered to be the intelligent home scene, in the intelligent home scene, all operation objects corresponding to the on action and corresponding probabilities are obtained, assuming that the probability of turning on the light is 0.6, and the probability of turning on the air conditioner is 0.3, the priorities of the operation objects are sorted according to the order of the probabilities from large to small, and the higher the probability, the higher the priority of turning on the light is higher than the priority of turning on the air conditioner. And executing opening actions on the equal numbers according to the priority.
The voice recognition method receives and analyzes a first voice signal input by a user, acquires a first scene at a second voice signal input by the user within a set time of receiving the first voice signal, determines a first action and a first object expected to be executed in the first voice signal according to the first voice signal and the first scene, and executes a first operation on the first object. According to the method provided by the embodiment of the invention, under the condition that the first voice signal input by the user is incomplete, the first voice input by the user can be analyzed, and further the operation expected to be performed by the user can be executed. The method and the device solve the defect that the voice information input by a user in the prior art is incomplete in semantics and cannot be operated.
Based on the first embodiment of the present invention, the second embodiment of the present invention provides a speech recognition processor, as shown in fig. 2, the processor 1 includes: a reception recognition unit 10, a first scene determination unit 11 and a first action and first object determination unit 12, wherein the receiving and recognizing unit 10 is used for receiving a first voice signal input by a user, the first voice signal comprises a first action expected to be executed or a first object of the first action operation, the first scene determination unit 11 is configured to acquire a second voice signal input by a user within a set time before receiving the first voice signal, determining a first scene in which the user is currently located according to the second voice signal, the first action and first object determining unit 12 being configured to determine a first action and a first object of the first action operation expected to be performed in the first voice signal according to the first voice signal and the first scene, and sending a control instruction which controls the first action to be executed on the first object.
The first scene determining unit 11 is specifically configured to acquire a second voice signal input by a user within a set time before the first voice signal is received, where the second voice signal includes a second action expected to be performed and a second object of the second action operation, determine, according to the second action and the second object, a second scene where the user is located when the second voice signal is input, and determine, according to the second scene, the first scene.
The first scene determining unit 11 is specifically configured to acquire a second voice signal input by a user within a set time before the first voice signal is received, where the second voice signal only includes a second action expected to be performed or only includes a second object of a second action operation expected to be performed, determine a second scene where the user is currently located according to the second action operation, or determine the second scene where the user is currently located according to the second object, and determine the first scene according to the second scene.
The first action and first object determining unit 12 is specifically configured to, if only the first action is included in the first speech signal, obtain, from an established user scene data stack, an operation object of the first action in the first scene and a first probability corresponding to the operation object, determine a priority of the operation object according to the first probability, and execute the first operation on the operation object with the highest priority.
The first action and first object determining unit 12 is specifically configured to, if only the first object is included in the first speech signal, obtain, from an established user scene data stack, an operation action in the first scene that matches the first object and a second probability that the operation action is performed, determine, according to the second probability, a priority of the operation action that matches the first object, and perform, on the first object, the operation action with a highest priority.
The system 1 further includes a user scene data stack establishing unit, where the user scene data stack establishing unit is configured to obtain historical voice information input by a user, analyze the historical voice information, obtain a third scene in which the historical voice information is input by the user, a third action expected to be executed by the historical voice information, and a third object of the third action operation, and store the third scene, the third action, the third object, and a correspondence relationship among the third scene, the third action, the third object, and the third object to form the user scene data stack.
Based on the second embodiment of the present invention, the third embodiment of the present invention provides a voice recognition system, as shown in fig. 3, the voice recognition system 100 includes a sound pickup apparatus 2, an execution apparatus 3, and the aforementioned processor 1, where the sound pickup apparatus 2 is configured to collect a first voice signal input by a user and send the first voice signal to the processor 1, and the execution apparatus 3 is configured to receive the control instruction and execute the first action on the first object.
Based on the first embodiment of the present invention, a computer device is provided in the fourth embodiment of the present invention, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to make the computer device execute the steps of the foregoing method.
Based on the first embodiment of the present invention, a fifth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a computer device, implements the foregoing method steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The above-described embodiments are merely illustrative of several embodiments of the present invention, which are described in more detail and detail, but are not to be construed as limiting the scope of the present invention. It should be noted that, for those skilled in the art, other various changes and modifications can be made according to the above-described technical solutions and concepts, and all such changes and modifications should fall within the protection scope of the present invention.

Claims (15)

1. A speech recognition method, comprising:
receiving and analyzing a first voice signal input by a user, and determining a first action expected to be executed by the first voice signal or a first object of the first action operation;
acquiring a second voice signal input by a user within a set time before receiving the first voice signal, and determining a first scene where the user is currently located according to the second voice signal;
and determining a first action expected to be executed in the first voice signal and a first object of the first action operation according to the first voice signal and the first scene, and sending a control instruction, wherein the control instruction is used for controlling the first action to be executed on the first object.
2. The method according to claim 1, wherein the acquiring a second voice signal input by the user within a preset time before receiving the first voice signal, and the determining a first scene in which the user is currently located according to the second voice signal specifically comprises:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal comprises a second action expected to be executed and a second object of the second action operation;
determining a second scene in which the user is located when inputting the second voice signal according to the second action and the second object;
determining the second scene as the first scene.
3. The method according to claim 1, wherein the acquiring a second voice signal input by the user within a preset time before receiving the first voice signal, and the determining a first scene in which the user is currently located according to the second voice signal specifically comprises:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal only comprises a second action expected to be executed or only comprises a second object of a second action operation expected to be executed;
determining a second scene where the user is currently located according to the second operation action, or determining the second scene where the user is currently located according to the second object;
determining the second scene as the first scene.
4. The method according to claim 2 or 3, wherein the determining, from the first speech signal and the current first scene, a first action expected to be performed in the first speech signal and a first object of the first action operation specifically comprises:
if the first voice signal only comprises the first action, acquiring an operation object of the first action in the first scene and a first probability of the operation object being executed from an established user scene data stack;
determining a priority of the operation object according to the first probability;
and executing the first operation on the operation object with the highest priority.
5. The method according to claim 2 or 3, wherein the determining, from the first speech signal and the current first scene, a first action expected to be performed in the first speech signal and a first object of the first action operation specifically comprises:
if the first voice signal only comprises the first object, acquiring an operation action matched with the first object in the first scene and a second probability of executing the operation action from the established user scene data stack;
determining the priority of the operation action matched with the first object according to the second probability;
performing the operation action with the highest priority on the first object.
6. The method according to claim 4 or 5, wherein the establishing a user context data stack specifically comprises:
acquiring historical voice information input by a user, analyzing the historical voice information, and acquiring a third scene where the historical voice information input by the user is located, a third action expected to be executed by the historical voice information and a third object of the third action operation;
and storing the third scene, the third action, the third object and the corresponding relation among the third scene, the third action, the third object and the third object to form the user scene data stack.
7. A speech recognition processor, the processor comprising:
the voice recognition device comprises a receiving and recognizing unit, a processing unit and a processing unit, wherein the receiving and recognizing unit is used for receiving and recognizing a first voice signal input by a user, and determining a first action expected to be executed by the first voice signal or a first object of the first action operation;
a first scene determining unit, configured to acquire a second voice signal input by a user within a set time before receiving the first voice signal, and determine a first scene where the user is currently located according to the second voice signal;
a first action and first object determination unit for determining a first action to be performed in the first voice signal and a first object of the first action operation according to the first voice signal and the first scene, and sending a control instruction which controls the first action to be performed on the first object.
8. The processor of claim 7, wherein the first scenario determination unit is specifically configured to:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal comprises a second action expected to be executed and a second object of the second action operation;
determining a second scene in which the user is located when inputting the second voice signal according to the second action and the second object;
determining the second scene as the first scene.
9. The system according to claim 7, wherein the first scenario determination unit is specifically configured to:
acquiring a second voice signal input by a user within a set time before the first voice signal is received, wherein the second voice signal only comprises a second action expected to be executed or only comprises a second object of a second action operation expected to be executed;
determining a second scene where the user is currently located according to the second operation action, or determining the second scene where the user is currently located according to the second object;
determining the second scene as the first scene.
10. The system according to claim 8 or 9, characterized in that the first action and first object determination unit is specifically configured to:
if the first voice signal only comprises the first action, acquiring an operation object of the first action in the first scene and a first probability of the operation object being executed from an established user scene data stack;
determining a priority of the operation object according to the first probability;
and executing the first operation on the operation object with the highest priority.
11. The system according to claim 8 or 9, characterized in that the first action and first object determination unit is specifically configured to:
if the first voice signal only comprises the first object, acquiring an operation action matched with the first object in the first scene and a second probability of executing the operation action from the established user scene data stack;
determining the priority of the operation action matched with the first object according to the second probability;
performing the operation action with the highest priority on the first object.
12. The system according to claim 10 or 11, characterized in that the system further comprises:
acquiring historical voice information input by a user, analyzing the historical voice information, and acquiring a third scene where the historical voice information input by the user is located, a third action expected to be executed by the historical voice information and a third object of the third action operation;
and storing the third scene, the third action, the third object and the corresponding relation among the third scene, the third action, the third object and the third object to form the user scene data stack.
13. A speech recognition processing system comprising a sound pick-up device, an execution device and a processor according to claims 7-12,
the pickup equipment is used for collecting a first voice signal input by a user and sending the first voice signal to the processor;
the execution device is used for receiving the control instruction and executing the first action on the first object.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a computer device, carries out the method steps of any one of the preceding claims 1 to 6.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor executes the computer program to cause the computer device to perform the steps of the method of any of claims 1 to 6.
CN202010462534.1A 2020-05-27 2020-05-27 Speech recognition method, processor, system, computer equipment and readable storage medium Pending CN111627442A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010462534.1A CN111627442A (en) 2020-05-27 2020-05-27 Speech recognition method, processor, system, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010462534.1A CN111627442A (en) 2020-05-27 2020-05-27 Speech recognition method, processor, system, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN111627442A true CN111627442A (en) 2020-09-04

Family

ID=72272554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010462534.1A Pending CN111627442A (en) 2020-05-27 2020-05-27 Speech recognition method, processor, system, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111627442A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107533542A (en) * 2015-01-23 2018-01-02 微软技术许可有限责任公司 Method for understanding incomplete natural language querying
CN107908116A (en) * 2017-10-20 2018-04-13 深圳市艾特智能科技有限公司 Sound control method, intelligent domestic system, storage medium and computer equipment
CN108197213A (en) * 2017-12-28 2018-06-22 中兴通讯股份有限公司 Action performs method, apparatus, storage medium and electronic device
CN108306797A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 Sound control intelligent household device, method, system, terminal and storage medium
CN109690672A (en) * 2016-07-15 2019-04-26 搜诺思公司 Voice is inputted and carries out contextualization
CN110109596A (en) * 2019-05-08 2019-08-09 芋头科技(杭州)有限公司 Recommended method, device and the controller and medium of interactive mode
CN111128168A (en) * 2019-12-30 2020-05-08 斑马网络技术有限公司 Voice control method, device and storage medium
CN111177338A (en) * 2019-12-03 2020-05-19 北京博瑞彤芸科技股份有限公司 Context-based multi-turn dialogue method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107533542A (en) * 2015-01-23 2018-01-02 微软技术许可有限责任公司 Method for understanding incomplete natural language querying
CN109690672A (en) * 2016-07-15 2019-04-26 搜诺思公司 Voice is inputted and carries out contextualization
CN107908116A (en) * 2017-10-20 2018-04-13 深圳市艾特智能科技有限公司 Sound control method, intelligent domestic system, storage medium and computer equipment
CN108197213A (en) * 2017-12-28 2018-06-22 中兴通讯股份有限公司 Action performs method, apparatus, storage medium and electronic device
CN108306797A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 Sound control intelligent household device, method, system, terminal and storage medium
CN110109596A (en) * 2019-05-08 2019-08-09 芋头科技(杭州)有限公司 Recommended method, device and the controller and medium of interactive mode
CN111177338A (en) * 2019-12-03 2020-05-19 北京博瑞彤芸科技股份有限公司 Context-based multi-turn dialogue method
CN111128168A (en) * 2019-12-30 2020-05-08 斑马网络技术有限公司 Voice control method, device and storage medium

Similar Documents

Publication Publication Date Title
WO2020014899A1 (en) Voice control method, central control device, and storage medium
US10643605B2 (en) Automatic multi-performance evaluation system for hybrid speech recognition
US11823658B2 (en) Trial-based calibration for audio-based identification, recognition, and detection system
US11869487B1 (en) Allocation of local and remote resources for speech processing
KR101828273B1 (en) Apparatus and method for voice command recognition based on combination of dialog models
CN108470568B (en) Intelligent device control method and device, storage medium and electronic device
CN110782885B (en) Voice text correction method and device, computer equipment and computer storage medium
WO2020006878A1 (en) Voice recognition test method and apparatus, computer device and storage medium
CN112446218A (en) Long and short sentence text semantic matching method and device, computer equipment and storage medium
CN110570867A (en) Voice processing method and system for locally added corpus
CN113571096B (en) Speech emotion classification model training method and device, computer equipment and medium
CN111627442A (en) Speech recognition method, processor, system, computer equipment and readable storage medium
CN111028841B (en) Method and device for awakening system to adjust parameters, computer equipment and storage medium
CN115083412B (en) Voice interaction method and related device, electronic equipment and storage medium
CN111883109B (en) Voice information processing and verification model training method, device, equipment and medium
CN109410928B (en) Denoising method and chip based on voice recognition
CN111652083B (en) Weak supervision time sequence action detection method and system based on self-adaptive sampling
WO2022266825A1 (en) Speech processing method and apparatus, and system
CN111599377A (en) Equipment state detection method and system based on audio recognition and mobile terminal
KR20210130465A (en) Dialogue system and method for controlling the same
CN114203178B (en) Intelligent voice system rejection method and device and computer equipment
CN114783419B (en) Text recognition method and device combined with priori knowledge and computer equipment
CN117301074B (en) Control method and chip of intelligent robot
CN113095110B (en) Method, device, medium and electronic equipment for dynamically warehousing face data
US20230300578A1 (en) V2x communication method and apparatus using human language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200904