CN111477217A - Command word recognition method and device - Google Patents

Command word recognition method and device Download PDF

Info

Publication number
CN111477217A
CN111477217A CN202010268839.9A CN202010268839A CN111477217A CN 111477217 A CN111477217 A CN 111477217A CN 202010268839 A CN202010268839 A CN 202010268839A CN 111477217 A CN111477217 A CN 111477217A
Authority
CN
China
Prior art keywords
state node
state
command word
degree
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010268839.9A
Other languages
Chinese (zh)
Other versions
CN111477217B (en
Inventor
张猛
冯大航
陈孝良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202010268839.9A priority Critical patent/CN111477217B/en
Publication of CN111477217A publication Critical patent/CN111477217A/en
Application granted granted Critical
Publication of CN111477217B publication Critical patent/CN111477217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a command word recognition method and a device, wherein the method comprises the following steps: acquiring a voice frame to be recognized; decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize a command word corresponding to the voice frame to be recognized; each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.

Description

Command word recognition method and device
Technical Field
The invention relates to the field of voice recognition, in particular to a command word recognition method and a command word recognition device.
Background
Speech recognition technology has been widely used in various industries. Among them, the recognition of command words is an important branch of the application of speech recognition technology. The command word is recognized as an instruction for recognizing a person to shout to the equipment, for example, in an intelligent elevator scene, the person shout to the equipment to go to five floors, the equipment recognizes that the command word is 'go to five floors', and a program in the equipment executes the instruction. Currently, the command word is generally recognized based on a decoding network and a certain decoding algorithm is applied to finally recognize the command word, the decoding network is a plurality of groups of state nodes with logical direction in a program, and each command word corresponds to one group of state nodes.
Currently, although increasingly sophisticated, decoding algorithms enable recognition of command words on a variety of devices. However, the storage space is a scarce resource, the storage space is obviously affected by the storage of the state node of each command word in the decoding network, and as the number of command words increases, the storage space is difficult to support the storage of the state node of the command word, which is a problem to be solved urgently.
Disclosure of Invention
The invention provides a command word recognition method and a command word recognition device, which solve the problem that in the prior art, storage space is difficult to support state node storage of command words.
In a first aspect, the present invention provides a command word recognition method, including: acquiring a voice frame to be recognized; decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize a command word corresponding to the voice frame to be recognized; each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
In the method, each state node group comprises a compound state node group, and the out-degree and/or in-degree of the compound state node is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words, that is, the compound state node is a state node in which a plurality of command words are multiplexed, thereby saving the state node and saving the storage space.
Optionally, the composite state node is a first state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the first N words of the plurality of command words are the same; n is a positive integer; a plurality of state nodes from a first state node of the composite state node group to the first state node correspond to the first N words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
In the method, the first N characters of the command words can be stored simultaneously by multiplexing the first state node to the plurality of state nodes of the first state node, and the control conditions are judged at the first state node, so that the command words jump to different state nodes and correspond to the command words, and the storage space is saved.
Optionally, the composite state node is a second state node; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the last M characters of the command words are the same; m is a positive integer; a plurality of state nodes from the second state node to a tail state node of the compound state node group correspond to the last M words of the plurality of command words.
In the method, the second state node is multiplexed to the plurality of state nodes of the tail state node, and the last M characters of the plurality of command words are stored simultaneously, so that the storage space is saved.
Optionally, the composite state node group includes a first state node and a second state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the second state node precedes the first state node; the middle continuous K characters of the command words are the same; k is an integer; a plurality of state nodes of the second state node to the first state node of the composite state node group correspond to the K words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
In the above manner, the plurality of state nodes from the second state node to the first state node can be multiplexed, the continuous K characters in the middle of the plurality of command words can be stored at the same time, and the judgment control condition of the first state node is used, so that the state nodes jump to different state nodes and correspond to the plurality of command words, and the storage space is saved.
Optionally, the command word recognition method is executed by a terminal device, and the terminal device includes a single chip microcomputer device.
In the mode, because the storage space generally allocated by the single chip microcomputer device is smaller, when the command word identification method is applied to the single chip microcomputer device, the reduction ratio of the storage space is larger, and the storage space is saved more obviously.
Optionally, the speech frame to be recognized is decoded based on a decoding network and a preset decoding algorithm, so as to recognize a command word corresponding to the speech frame to be recognized; the method comprises the following steps: identifying each state score on each path from the head state node to the tail state node in each state node group based on a decoding network and a preset decoding algorithm; and determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the voice frame to be recognized.
In the method, based on the decoding network and the preset decoding algorithm, each state score on each path is identified, and the command word corresponding to the path with the highest state score is determined, so that the method for determining the command word through the state score is provided.
Optionally, before determining the command word corresponding to the path with the highest state score in the paths and using the command word as the command word corresponding to the voice frame to be recognized, the method further includes: determining that the state score of at least one path in the paths is greater than a preset threshold; determining a command word corresponding to the path with the highest state score in the paths as a command word corresponding to the voice frame to be recognized; the method comprises the following steps: and determining a command word corresponding to the path with the highest state score from at least one path with the state score larger than a preset threshold value.
In the method, the state score of at least one path in each path is determined to be larger than a preset threshold value, so that the command word with at least one path is determined to meet certain precision, the command word corresponding to the path with the highest state score can be directly determined in the command words of at least one path, and the command word of the voice frame to be recognized is determined on the basis of precision.
In a second aspect, the present invention provides a command word recognition apparatus, including: the acquisition module is used for acquiring a voice frame to be recognized; the decoding module is used for decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize the command word corresponding to the voice frame to be recognized; each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
Optionally, the composite state node is a first state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the first N words of the plurality of command words are the same; n is a positive integer; a plurality of state nodes from a first state node of the composite state node group to the first state node correspond to the first N words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
Optionally, the composite state node is a second state node; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the last M characters of the command words are the same; m is a positive integer; a plurality of state nodes from the second state node to a tail state node of the compound state node group correspond to the last M words of the plurality of command words.
Optionally, the composite state node group includes a first state node and a second state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the second state node precedes the first state node; the middle continuous K characters of the command words are the same; k is an integer; a plurality of state nodes of the second state node to the first state node of the composite state node group correspond to the K words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
Optionally, the device is a single chip microcomputer device.
Optionally, the decoding module is specifically configured to: identifying each state score on each path from the head state node to the tail state node in each state node group based on a decoding network and a preset decoding algorithm; and determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the voice frame to be recognized.
Optionally, the decoding module is further configured to: determining that the state score of at least one path in the paths is greater than a preset threshold; the decoding module is specifically configured to: and determining a command word corresponding to the path with the highest state score from at least one path with the state score larger than a preset threshold value.
The advantageous effects of the second aspect and the various optional apparatuses of the second aspect may refer to the advantageous effects of the first aspect and the various optional methods of the first aspect, and are not described herein again.
In a third aspect, the present invention provides a computer device comprising a program or instructions for performing the method of the first aspect and the alternatives of the first aspect when the program or instructions are executed.
In a fourth aspect, the present invention provides a storage medium comprising a program or instructions which, when executed, is adapted to perform the method of the first aspect and the alternatives of the first aspect.
Drawings
Fig. 1 is a schematic flowchart illustrating steps of a command word recognition method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a command word recognition apparatus according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions, the technical solutions will be described in detail below with reference to the drawings and the specific embodiments of the specification, and it should be understood that the specific features in the embodiments and examples of the present application are detailed descriptions of the technical solutions of the present application, but not limitations of the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
Command word: (command word):
the command word is an instruction that a person shouts at the device, and then a program in the device recognizes the instruction and makes a corresponding action. The command words include an offline command word (offline command word) and an online command word (onlinecom command word). After a set of command words is generated, not all of the command words may be deployed on different devices. By offline command words, the command word recognition function is not necessary for the equipment, and a dynamic library and a model related to the command words are not necessarily existed when the system is started, so that the normal use of other functions of the system is not influenced even if the dynamic library and the model are not existed. The system initializes the offline command word recognition function if the corresponding dynamic library and model exist, and does not initialize the function if not. Corresponding to the command word, the on-line command word, for example, the command word corresponding to the wake-up function, the wake-up function must exist, otherwise the system cannot work normally.
Decoding network (Decoding network):
the command word is required to be decoded when being recognized, and the command word is finally recognized by applying a certain decoding algorithm based on a decoding network, wherein the decoding network is a plurality of state nodes with logical direction in a program. These nodes form a graph that jumps between these different state nodes when decoding.
Currently, although the decoding algorithms for recognizing command words in voice frames are becoming more sophisticated, the recognition of command words is implemented on a variety of devices. However, as the number of command words increases, the storage space of the command words is difficult to support the storage of the state nodes of the command words, which is a problem to be solved.
To this end, as shown in fig. 1, the present application provides a command word recognition method.
Step 101: and acquiring a voice frame to be recognized.
Step 102: and decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize the command word corresponding to the voice frame to be recognized.
In the implementation steps 101 to 102, it should be noted that a section of speech frame to be recognized includes a plurality of frame speeches, a plurality of frame speeches correspond to one state, each of the plurality of states (for example, three states) is combined into one phoneme, and a plurality of phonemes are combined into one english word (or chinese single character, phrase). That is to say, the speech recognition result of a section of speech frame to be recognized can be obtained according to the matching of each frame of speech and state. While the decoding network is essentially a graph made up of a plurality of state nodes, each of which is a phoneme (or state of a factor) of a word. The decoding network comprises each state node group (actually a path formed by a plurality of state nodes), each state node group corresponds to a word (or a single word or a phrase of Chinese), and therefore the decoding network is the basis of speech recognition. Then, the rule how to judge whether each frame of speech matches the state is the decoding algorithm according to the existing decoding network and the obtained speech frame to be recognized. Further, the speech recognition process is exactly how to search a matching best path (i.e. a matching state node group) in the decoding network according to the decoding algorithm, and the probability that the speech frame to be recognized corresponds to the path is the largest, and the process is called "decoding".
In each state node group of the decoding network, each path from the head state node to the tail state node is uniquely corresponding to one command word. Each state node group comprises a plurality of connected state nodes, the degree of each state node comprises an in degree and an out degree, and the degree of each state node is not 0, wherein the in degree of the first state node is 0, but the out degree is not 0; the out-degree of the tail state node is 0, but the in-degree is not 0. Each word in the command word may correspond to one or more state nodes; for example, each word corresponds to 3 state nodes.
Each state node group may have only one path or a plurality of paths. If the in-degree and the out-degree of the state nodes in a state node group are not more than 1, it is obvious that the state node group has only one path (which can be recorded as a single state node group). Accordingly, the state node groups may further include a compound state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
In an alternative embodiment (hereinafter referred to as embodiment (1)) to the case where the out-degree of the composite state node is not less than 2, the composite state node is a first state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the first N characters of the command words are the same, and N is a positive integer; a plurality of state nodes from a first state node of the composite state node group to the first state node correspond to the first N words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
For example, each word in a command word may occupy three state nodes when constructing a decoding network. Taking two command words of "call zhang san" and "call lie si" as an example, the first 2 words of the two command words "call" are the same, i.e. N is 2. The 6 state nodes formed by the two words of "call" can be multiplexed, that is, the 3 rd state node of the "call" word in the 6 state nodes formed by the two words of "call" can be set as the first state node, and the out degree and the in degree of the first state node are 2 and 1 respectively. Then one jumping direction of the first state node is 6 state nodes formed by three in the open state, and the other jumping direction is 6 state nodes formed by four in the closed state, and the specific jumping can be realized by judging control conditions at the first state node; more command words are only required to increase the degree of the first state node, and so on, and are not described herein again.
In an alternative embodiment (hereinafter referred to as embodiment (2)) to the case where the composite state node has an in-degree of not less than 2, the composite state node is a second state node; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the last M characters of the command words are the same; m is a positive integer; a plurality of state nodes from the second state node to a tail state node of the compound state node group correspond to the last M words of the plurality of command words.
For example, each word in the command word occupies three state nodes when constructing the decoding network. There are two command words: "read information"; "send information". Obviously, the last 2 words of "info" of the two command words are the same, i.e. when M is 2. Then the 6 state nodes formed by the two words "information" can be multiplexed. The first state node of the 6 state nodes formed by the two words of "information" can be taken as the second state node, and the in-degree of the second state node is 2 and the out-degree is 1. Then, one direction of jumping to the second state node is 6 state nodes formed by reading, the other direction of jumping to the second state node is 6 state nodes formed by reading, and the direction of jumping to the second state node is 6 state nodes formed by direction, and the jumping to the second state node from which direction is determined does not need to be determined by the second state node, so that the second state node may not be provided with a judgment control condition; more command words are only required to increase the degree of entry of the second state node, and so on, and are not described herein again.
It should be noted that the cases to which embodiment (1) and embodiment (2) are applied may be present simultaneously, and in this case, embodiment (1) and embodiment (2) are combined with each other. For example, in the elevator usage scenario, the command word "go to building XX", such as going to first building, going to second building, "go" and "building" are repeated in each command word, so that the "go" and "building" status nodes need only be constructed once in the decoding network. I.e. when N is 1 and M is 1. The first state node is the last state node of the 'go', and the second state node is the first state node of the 'building'.
Further, there may be repeated words in "XX", for example, "ten" in "go to the tenth floor" and "go to the eleventh floor" is repeated, and at this time, the state node of "ten" also needs to be constructed only once, and there is no need to be constructed repeatedly, and at this time, the last state node of "ten" may also be used as the first state node, and the first state node of "one" may also be used as the second state node. Therefore, the last state node of the ten is possible to jump to two types, such as jumping to a building or a first state node, in this case, the control is only needed to be carried out by judging the control condition in the program, and no extra storage space is consumed.
It should be noted that, in the embodiments (1) and (2) as well as other combinable cases, in an alternative embodiment (hereinafter referred to as embodiment (3)), the composite state node group includes a first state node and a second state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the second state node precedes the first state node; the middle continuous K characters of the command words are the same; k is an integer; a plurality of state nodes of the second state node to the first state node of the composite state node group correspond to the K words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
For example, each word in the command word occupies three state nodes when constructing the decoding network. There are two command words: "shutdown after printing file"; "bright screen before playing file". Obviously, the last 2 words of the two command words are the same "file", i.e., K is 2. Then the 6 state nodes formed by the two words of "file" can be multiplexed. The first state node of the 6 state nodes formed by the two words of the file can be taken as the second state node, and the in-degree and the out-degree of the second state node are 2 and 1 respectively. On one hand, one direction of jumping to the second state node is 6 state nodes formed by printing, and the other direction of jumping to the second state node is 6 state nodes formed by playing; on the other hand, one jumping direction of the first state node is 6 state nodes formed by 'power off', the other jumping direction is 6 state nodes formed by 'bright screen', and specific jumping can be realized through judging control conditions at the first state node; the direction from which the node jumps to the second state node is not required to be judged by the second state node, so that the judgment control condition is not required to be set on the second state node; more command words are only required to increase the out-degree of the first state node and/or the in-degree of the second state node, and so on, and are not described herein again.
The command word recognition method of steps 101 to 102 may be performed by a terminal device. The terminal device may include a terminal device with a low storage space (e.g., a low memory), and typically, the terminal device with a low storage space is, for example, a single chip microcomputer device. At the moment, the influence of each state node on the storage space of the whole terminal equipment is obvious, after the redundant state nodes are removed, the reduction ratio of the storage space is larger, the storage space is saved more obviously, and the significance of multiplexing the composite state nodes is larger.
Further, from the viewpoint of control cost, for some low-end devices, they are low in cost, and the configured memory is also low, such as only 512KB memory, and it is very significant to optimize the memory. For example, low-end devices are usually shipped in huge quantities, if the memory is not optimized, 1MB of memory may be configured on hardware, and 512KB of memory may be enough after optimization. Therefore, if each device saves 1 yuan, if the shipment volume is huge, the cost is saved a lot.
It should be noted that, in an alternative embodiment of step 102, step 102 may be performed as follows:
step (a): and identifying each state score on each path from the head state node to the tail state node in each state node group based on a decoding network and a preset decoding algorithm.
Step (b): and determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the voice frame to be recognized.
The decoding network in the above embodiment may be any one of or a combination of steps 101 to 102 in this application, and an alternative embodiment, and the predetermined decoding algorithm may be a viterbi decoding algorithm (viterbi algorithm) or the like.
It should be noted that, before the step (b), it may be determined that the state score of at least one path in the paths is greater than a preset threshold, it is determined that a relationship between the state scores of the at least one path exists, and then the path with the highest state score is found. After determining that the state score of at least one of the paths is greater than the preset threshold, step (b) may be performed as follows: and determining a command word corresponding to the path with the highest state score from at least one path with the state score larger than a preset threshold value.
As shown in fig. 2, the present invention provides a command word recognition apparatus, including: an obtaining module 201, configured to obtain a speech frame to be recognized; the decoding module 202 is configured to decode the speech frame to be recognized based on a decoding network and a preset decoding algorithm, so as to recognize a command word corresponding to the speech frame to be recognized; each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
Optionally, the composite state node is a first state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the first N words of the plurality of command words are the same; n is a positive integer; a plurality of state nodes from a first state node of the composite state node group to the first state node correspond to the first N words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
Optionally, the composite state node is a second state node; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the last M characters of the command words are the same; m is a positive integer; a plurality of state nodes from the second state node to a tail state node of the compound state node group correspond to the last M words of the plurality of command words.
Optionally, the composite state node group includes a first state node and a second state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the second state node precedes the first state node; the middle continuous K characters of the command words are the same; k is an integer; a plurality of state nodes of the second state node to the first state node of the composite state node group correspond to the K words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
Optionally, the device is a single chip microcomputer device.
Optionally, the decoding module 202 is specifically configured to: identifying each state score on each path from the head state node to the tail state node in each state node group based on a decoding network and a preset decoding algorithm; and determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the voice frame to be recognized.
Optionally, the decoding module 202 is further configured to: determining that the state score of at least one path in the paths is greater than a preset threshold; the decoding module 202 is specifically configured to: and determining a command word corresponding to the path with the highest state score from at least one path with the state score larger than a preset threshold value.
The embodiment of the application provides computer equipment, which comprises a program or an instruction, and when the program or the instruction is executed, the program or the instruction is used for executing the command word identification method and any optional method provided by the embodiment of the application.
The embodiment of the application provides a storage medium, which comprises a program or an instruction, and when the program or the instruction is executed, the program or the instruction is used for executing a command word recognition method and any optional method provided by the embodiment of the application.
Finally, it should be noted that: as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A command word recognition method, the method comprising:
acquiring a voice frame to be recognized;
decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize a command word corresponding to the voice frame to be recognized;
each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
2. The method of claim 1, wherein the composite state node is a first state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the first N words of the plurality of command words are the same; n is a positive integer; a plurality of state nodes from a first state node of the composite state node group to the first state node correspond to the first N words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
3. The method of claim 1, wherein the composite state node is a second state node; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the last M characters of the command words are the same; m is a positive integer; a plurality of state nodes from the second state node to a tail state node of the compound state node group correspond to the last M words of the plurality of command words.
4. The method of claim 1, wherein the set of composite state nodes comprises a first state node and a second state node; the in-degree of the first state node is 1, and the out-degree of the first state node is not less than 2; the in-degree of the second state node is not less than 2, and the out-degree of the second state node is 1; the second state node precedes the first state node; the middle continuous K characters of the command words are the same; k is an integer; a plurality of state nodes of the second state node to the first state node of the composite state node group correspond to the K words of the plurality of command words; the preset decoding algorithm is provided with a judgment control condition at the first state node.
5. The method of any one of claims 1 to 4, wherein the command word recognition method is performed by a terminal device, the terminal device comprising a single chip microcomputer device.
6. The method according to any one of claims 1 to 4, characterized in that the speech frame to be recognized is decoded based on a decoding network and a preset decoding algorithm, so as to recognize a command word corresponding to the speech frame to be recognized; the method comprises the following steps:
identifying each state score on each path from the head state node to the tail state node in each state node group based on a decoding network and a preset decoding algorithm;
and determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the voice frame to be recognized.
7. The method of claim 6, wherein before determining the command word corresponding to the path with the highest state score in the paths as the command word corresponding to the speech frame to be recognized, the method further comprises:
determining that the state score of at least one path in the paths is greater than a preset threshold;
determining a command word corresponding to a path with the highest state score in the paths, and taking the command word as a command word corresponding to the voice frame to be recognized; the method comprises the following steps:
and determining a command word corresponding to the path with the highest state score from at least one path with the state score larger than a preset threshold value.
8. A command word recognition apparatus, comprising:
the acquisition module is used for acquiring a voice frame to be recognized;
the decoding module is used for decoding the voice frame to be recognized based on a decoding network and a preset decoding algorithm so as to recognize the command word corresponding to the voice frame to be recognized; each path from the head state node to the tail state node in each state node group of the decoding network uniquely corresponds to one command word; each state node group comprises a composite state node group; the compound state node group comprises compound state nodes, and the out-degree and/or in-degree of the compound state nodes is not less than 2, so that a plurality of paths of the compound state node group correspond to a plurality of command words.
9. A computer device comprising a program or instructions that, when executed, perform the method of any of claims 1 to 7.
10. A storage medium comprising a program or instructions which, when executed, perform the method of any one of claims 1 to 7.
CN202010268839.9A 2020-04-08 2020-04-08 Command word recognition method and device Active CN111477217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010268839.9A CN111477217B (en) 2020-04-08 2020-04-08 Command word recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010268839.9A CN111477217B (en) 2020-04-08 2020-04-08 Command word recognition method and device

Publications (2)

Publication Number Publication Date
CN111477217A true CN111477217A (en) 2020-07-31
CN111477217B CN111477217B (en) 2023-10-10

Family

ID=71750190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010268839.9A Active CN111477217B (en) 2020-04-08 2020-04-08 Command word recognition method and device

Country Status (1)

Country Link
CN (1) CN111477217B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014101717A1 (en) * 2012-12-28 2014-07-03 安徽科大讯飞信息科技股份有限公司 Voice recognizing method and system for personalized user information
US20140236591A1 (en) * 2013-01-30 2014-08-21 Tencent Technology (Shenzhen) Company Limited Method and system for automatic speech recognition
CN105321518A (en) * 2014-08-05 2016-02-10 中国科学院声学研究所 Rejection method for low-resource embedded speech recognition
CN110046276A (en) * 2019-04-19 2019-07-23 北京搜狗科技发展有限公司 The search method and device of keyword in a kind of voice
CN110322884A (en) * 2019-07-09 2019-10-11 科大讯飞股份有限公司 A kind of slotting word method, apparatus, equipment and the storage medium of decoding network
CN110827802A (en) * 2019-10-31 2020-02-21 苏州思必驰信息科技有限公司 Speech recognition training and decoding method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014101717A1 (en) * 2012-12-28 2014-07-03 安徽科大讯飞信息科技股份有限公司 Voice recognizing method and system for personalized user information
US20140236591A1 (en) * 2013-01-30 2014-08-21 Tencent Technology (Shenzhen) Company Limited Method and system for automatic speech recognition
CN105321518A (en) * 2014-08-05 2016-02-10 中国科学院声学研究所 Rejection method for low-resource embedded speech recognition
CN110046276A (en) * 2019-04-19 2019-07-23 北京搜狗科技发展有限公司 The search method and device of keyword in a kind of voice
CN110322884A (en) * 2019-07-09 2019-10-11 科大讯飞股份有限公司 A kind of slotting word method, apparatus, equipment and the storage medium of decoding network
CN110827802A (en) * 2019-10-31 2020-02-21 苏州思必驰信息科技有限公司 Speech recognition training and decoding method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘加;陈谐;单煜翔;史永哲;: "大规模词表连续语音识别引擎紧致动态网络的构建" *

Also Published As

Publication number Publication date
CN111477217B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN104881276B (en) Dynamic command disambiguation
CN102971787B (en) Method and system for endpoint automatic detection of audio record
CN106683677B (en) Voice recognition method and device
US7058575B2 (en) Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
US20110082688A1 (en) Apparatus and Method for Analyzing Intention
US20070094005A1 (en) Conversation control apparatus
CN109840052B (en) Audio processing method and device, electronic equipment and storage medium
JPH10319988A (en) Speaker identifying method and speaker recognizing device
US20200193964A1 (en) Method and device for training an acoustic model
CN111627423A (en) VAD tail point detection method, device, server and computer readable medium
JP2018087935A (en) Voice language identification device, and method and program thereof
CN111897256A (en) Children programming control system
CN111477217A (en) Command word recognition method and device
CN116189677A (en) Method, system, equipment and storage medium for identifying multi-model voice command words
CN108932943A (en) Order word sound detection method, device, equipment and storage medium
CN111402865A (en) Method for generating speech recognition training data and method for training speech recognition model
JP4689032B2 (en) Speech recognition device for executing substitution rules on syntax
US20030110032A1 (en) Fast search in speech recognition
CN115831109A (en) Voice awakening method and device, storage medium and electronic equipment
CN110187994A (en) A kind of failure separation method, equipment and fault isolation system
CN115547345A (en) Voiceprint recognition model training and related recognition method, electronic device and storage medium
CN115294974A (en) Voice recognition method, device, equipment and storage medium
CN111970311B (en) Session segmentation method, electronic device and computer readable medium
Beuls et al. Simulating the emergence of grammatical agreement in multi-agent language games
JPH07261785A (en) Voice recognition method and voice recognition device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant