CN117542347A - Command word recognition method and device, computer readable storage medium and terminal - Google Patents

Command word recognition method and device, computer readable storage medium and terminal Download PDF

Info

Publication number
CN117542347A
CN117542347A CN202311517838.3A CN202311517838A CN117542347A CN 117542347 A CN117542347 A CN 117542347A CN 202311517838 A CN202311517838 A CN 202311517838A CN 117542347 A CN117542347 A CN 117542347A
Authority
CN
China
Prior art keywords
command word
prefix
word recognition
words
candidate command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311517838.3A
Other languages
Chinese (zh)
Inventor
刘志忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN202311517838.3A priority Critical patent/CN117542347A/en
Publication of CN117542347A publication Critical patent/CN117542347A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The disclosure provides a command word recognition method and device, a computer readable storage medium and a terminal, wherein the command word recognition method comprises the following steps: performing voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1; counting the confidence coefficient of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results; and rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value. The scheme can inhibit or cut the command word recognition result with the opposite confidence level lower than the threshold value threshold, and correct the command word recognition result so as to reduce the false alarm rate.

Description

Command word recognition method and device, computer readable storage medium and terminal
Technical Field
The embodiment of the invention relates to the technical field of voice recognition, in particular to a command word recognition method and device, a computer readable storage medium and a terminal.
Background
With the development of speech processing technology, keyword recognition technology is applied in more and more interactive scenes. The terminal with command word recognition function generates pulse code modulation (Pulse Code Modulation, PCM) data through a series of audio preprocessing and encoding and decoding operations, recognizes keywords through the keyword recognition module, and converts the keywords into corresponding control signals through post-processing. For example, in an intelligent cabin scene, a user sends out a voice control command, such as "call making", and after keyword recognition, the voice control command is converted into a corresponding control signal, so that a call making function can be realized, and a convenient and quick experience is provided for the user to drive.
At present, a viterbi algorithm (viterbi) algorithm is generally used for performing speech recognition decoding by combining an acoustic model, a language model and a pronunciation dictionary, a plurality of optimal (Nbest) paths are obtained, each optimal path corresponds to a speech recognition candidate result, and the most reasonable path is selected from the plurality of optimal paths.
However, due to the influences of different application scenes, environmental noise, similar words and the like of command word recognition, false recognition of the command word is easy to occur, and the false alarm rate of the command word result is high.
Disclosure of Invention
The technical problem solved by the embodiment of the invention is that false recognition of the command word is easy to occur in the existing command word recognition, so that the false alarm rate of the command word result is higher.
In order to solve the above technical problems, an embodiment of the present invention provides a command word recognition method, including: performing voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1; counting the confidence coefficient of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results; and rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value.
Optionally, the counting the confidence of each prefix word according to the occurrence probability of the prefix word of the recognition result of the N candidate command words includes: calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results according to the recognition output probability of each candidate command word recognition result; performing alignment operation on prefix words of the N candidate command word recognition results to obtain alignment results, wherein the alignment results are used for representing whether the prefix words of the candidate command word recognition results are identical; and accumulating the occurrence probabilities of the same prefix words according to the alignment result and the occurrence probabilities of the prefix words of each candidate command word recognition result in the prefix words of the N candidate command word recognition results, and obtaining the confidence coefficient of each prefix word based on the accumulation result.
Optionally, the aligning prefix words of the N candidate command word recognition results includes: when aligning N pieces of prefix words of the candidate command word recognition results, responding to the fact that the prefix words of the first candidate command word recognition results are contained by the prefix words of the second candidate command word recognition results, taking the prefix words of the first candidate command word recognition results and the words after the prefix words as first word groups, taking the prefix words of the second candidate command word recognition results and the words after the prefix words as second word groups, and aligning the first word groups and the second word groups; and in response to the first phrase and the second phrase being completely aligned, judging that the prefix word of the first candidate command word recognition result is identical to the prefix word of the second candidate command word recognition result.
Optionally, the aligning the prefix words of the N candidate command word recognition results includes: and carrying out alignment operation on prefix words of the N candidate command word recognition results by adopting a construction confusion network algorithm or a dynamic programming algorithm.
Optionally, the calculating, according to the recognition output probability of each candidate command word recognition result, the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results includes: calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results by adopting the following formula; p (P) j =exp(score j )/sum(exp(score j ) A) is provided; wherein P is j The occurrence probability of the prefix word of the j candidate command word recognition result in the prefix words of the N candidate command word recognition results is given; exp (score) j ) The occurrence probability of the prefix word of the identification result of the j candidate command word is more than or equal to 1 and less than or equal to N; sum (exp (score) j ) Is the sum of the occurrence probabilities of the prefix words of the N candidate command word recognition results.
Optionally, the threshold value includes: setting a threshold value or the maximum occurrence probability of the prefix words of the N candidate command word recognition results.
The embodiment of the invention also provides a command word recognition device, which comprises: the voice recognition unit is used for carrying out voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1; the confidence degree determining unit is used for counting the confidence degree of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results; and the rejecting unit is used for rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value threshold.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, performs the steps of any of the command word recognition methods described above.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of any command word recognition method when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
n candidate command word recognition results obtained by carrying out voice recognition on the command word voice data are subjected to statistics on confidence degrees of all the prefix words according to occurrence probabilities of the prefix words of the N candidate command word recognition results, and suppression or clipping is carried out on the N candidate command word recognition results according to the relation between the prefix word with the highest confidence degree and a threshold value threshold. If the prefix word with the highest confidence is smaller than a threshold, the ambiguity of the N candidate command word recognition results is larger, and the N candidate command word recognition results are refused. Therefore, the command word recognition result with the confidence level lower than the threshold value can be restrained or cut, and the correction of the command word recognition result is realized so as to reduce the false alarm rate.
Drawings
FIG. 1 is a flow chart of a command word recognition method in an embodiment of the invention;
FIG. 2 is a flow chart of one embodiment of step 12;
fig. 3 is a schematic structural diagram of a command word recognition device in an embodiment of the present invention.
Detailed Description
As described above, conventionally, speech recognition decoding is generally performed by a viterbi algorithm (viterbi) algorithm in combination with an acoustic model, a language model, and a pronunciation dictionary, and in order to ensure decoding efficiency, a decoding diagram is generally pruned, and a general method is to obtain Nbest paths by a beam search (beam search) method, and a weighted finite state transducer (wfst) obtained by performing operations such as determining these Nbest paths is a so-called word diagram (Lattice), and as with a decoding diagram HCLG, the input of Lattice is in a state of a hidden markov model (Hidden Marko model, HMM) and output as a word sequence. In different application scenarios, the optimal path of the viterbi algorithm is not necessarily reasonable. The most reasonable path can be selected through some re-scoring (recore) schemes by storing a plurality of optimal paths for speech recognition in Lattice.
In order to ensure that the keyword uttered by the user can be accurately recognized when the command words are recognized, on the one hand, the other hand, the false recognition of the command words usually comprises in-set false recognition and out-of-set false recognition, wherein the in-set false recognition refers to the false recognition caused by mutual recognition of the command words, and the out-of-set false recognition refers to the false recognition caused by some environmental noise or similar words under the condition of complex environment. For example, a user speaking "previous channel" may identify "next channel," speaking "on" may identify "off," etc.
In conclusion, due to the influences of different application scenes, environmental noise, similar words and the like of command word recognition, false recognition of the command words is easy to occur, so that the false alarm rate of the command word result is high, and the accuracy of the command word recognition result is influenced.
In order to solve the above problems, in the embodiment of the present invention, N candidate command word recognition results obtained by performing speech recognition on command word speech data are counted according to occurrence probabilities of prefix words of the N candidate command word recognition results, confidence degrees of the respective prefix words are counted, and the N candidate command word recognition results are suppressed or cut according to a relationship between the prefix word with the highest confidence degree and a threshold value threshold. If the prefix word with the highest confidence is smaller than a threshold, the ambiguity of the N candidate command word recognition results is larger, and the N candidate command word recognition results are refused. Therefore, the command word recognition result with the confidence level lower than the threshold value can be restrained or cut, and the correction of the command word recognition result is realized so as to reduce the false alarm rate.
In order to make the above objects, features and advantages of the embodiments of the present invention more comprehensible, the following detailed description of the embodiments of the present invention refers to the accompanying drawings.
The embodiment of the invention provides a command word recognition method, which can be executed by a terminal, a chip or a chip module with a command word recognition function in the terminal, and a chip or a chip module with a data processing function in the command word recognition. The terminal can be a mobile phone, a computer, a tablet personal computer, intelligent home equipment, an intelligent cabin, an edge platform and the like.
Fig. 1 shows a command word recognition method according to an embodiment of the present invention, where the command word recognition method may specifically include the following steps 11 to 13.
And step 11, carrying out voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1.
And step 12, counting the confidence of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results.
And step 13, rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value.
From the above, the N candidate command word recognition results obtained by performing voice recognition on the command word voice data are counted according to the occurrence probability of the prefix words of the N candidate command word recognition results, the confidence of each prefix word is counted, and the N candidate command word recognition results are suppressed or cut according to the relationship between the prefix word with the highest confidence and the threshold value threshold. If the prefix word with the highest confidence is smaller than a threshold, the ambiguity of the N candidate command word recognition results is larger, and the N candidate command word recognition results are refused. Therefore, the command word recognition result with the confidence lower than the threshold value can be restrained or cut under the condition of relatively smaller recognition accuracy loss, and the correction of the command word recognition result is realized so as to reduce the false alarm rate.
In one implementation of step 11, the command word speech data may be subjected to speech recognition in the following manner, to obtain N candidate command word recognition results. Specifically, the command word speech data is input to a decoder composed of an acoustic model, a language model and a pronunciation dictionary, the command word speech data is subjected to speech decoding by the decoder to obtain a plurality of candidate command word recognition results and probabilities (also referred to as scores) of the candidate command word recognition results, and the specific recognition scheme of the decoder on the command word speech data can refer to the prior art and will not be repeated here. And sequencing the recognition output probability of each command word recognition result from high to low to obtain N command word recognition results with the top N ranks as N candidate command word recognition results. In practical application, due to the influence of factors such as training data distribution and noise, correct command word recognition results may appear in candidate command word recognition results except for the first name of the recognition output probability rank, and the probability of obtaining the correct command word recognition results is improved by selecting N candidate command word recognition results.
In some embodiments, the recognition output probability of each candidate command word recognition result can be obtained through log threshold conversion according to the acoustic posterior probability and the language model gram probability.
Taking the acoustic posterior probability and the language model gram probability as examples, the acoustic posterior probability and the language model gram probability can be weighted and calculated to obtain the recognition output probability of the command word recognition result. For example, the recognition output probability of the command word recognition result is calculated using the following formula (1).
score=amscore/amscale+lmscore/lmscale;(1)
Wherein score is the recognition output probability of command word recognition result, ambcore is the acoustic posterior probability, ambale is the weight factor of acoustic model, lmscore is the language model gram probability, lmscale is the weight factor of language model.
In a specific implementation, the specific value of N may be configured according to actual requirements, which is not limited herein.
Fig. 2 is a flowchart of an embodiment of step 12, and in an embodiment of step 12, the confidence of each prefix word may be counted in the following manner, which may specifically include the following steps 121 to 123.
Step 121, calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results according to the recognition output probability of each candidate command word recognition result.
The prefix word may be the forefront word of the sentence in the select command word recognition result or a specified number of characters.
In a specific implementation, the recognition output probability (may also be referred to as a joint score) of each candidate command word recognition result may be taken as the occurrence probability of the prefix word of each candidate command word recognition result. And calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results according to the sum of the occurrence probability of the prefix word of each candidate command word recognition result and the occurrence probability of the prefix word of the N candidate command word recognition result.
And (3) calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results by adopting the following formula (2).
P j =exp(score j )/sum(exp(score j ));(2)
Wherein P is j The occurrence probability of the prefix word of the j candidate command word recognition result in the prefix words of the N candidate command word recognition results is given; exp (score) j ) The occurrence probability of the prefix word of the identification result of the j candidate command word is more than or equal to 1 and less than or equal to N; sum (exp (score) j ) Is the sum of the occurrence probabilities of the prefix words of the N candidate command word recognition results.
Step 122, performing alignment operation on the prefix words of the N candidate command word recognition results to obtain an alignment result, where the alignment result is used to represent whether the prefix words of the candidate command word recognition results are the same.
In some embodiments, a build confusion network algorithm may be employed. An confusion network is constructed based on a word graph (Lattice), on one hand, when wfst is constructed, operations such as determinism, minimization and the like can be performed, so that some words can appear in advance, and the Lattice needs to be cut to facilitate quick alignment operation.
In other embodiments, the dynamic programming algorithm performs an alignment operation on prefix words of the N candidate command word recognition results. The alignment operation is performed based on a dynamic programming algorithm, and is generally performed by adopting a distance editing mode.
Some misalignment may occur during the alignment operation of the prefix word, and in order to improve the alignment accuracy, the secondary alignment may be performed as follows.
Specifically, when aligning prefix words of N candidate command word recognition results, responding to the fact that the prefix words of a first candidate command word recognition result are contained by the prefix words of a second candidate command word recognition result, taking the prefix words of the first candidate command word recognition result and words after the prefix words as a first phrase, taking the prefix words of the second candidate command word recognition result and words after the prefix words as a second phrase, and aligning the first phrase and the second phrase; and in response to the first phrase and the second phrase being completely aligned, judging that the prefix word of the first candidate command word recognition result is identical to the prefix word of the second candidate command word recognition result.
For example, the prefix word of the first candidate command word recognition result is "up", the prefix word of the second candidate command word recognition result is "up", and the prefix word "up" is "up" included by "up". And taking the prefix word 'up' of the first candidate command word recognition result and the word 'one channel' after the prefix word as a first phrase 'up channel', and taking the prefix word 'up' of the second candidate command word recognition result and the word 'channel' after the prefix word as a second phrase 'up channel'. And carrying out alignment operation on the first phrase 'previous channel' and the second phrase 'previous channel', and judging that the prefix word of the first candidate command word recognition result is identical to the prefix word of the second candidate command word recognition result if the first phrase 'previous channel' is completely aligned with the second phrase 'previous channel'. It should be noted that the foregoing examples have been provided merely for the purpose of understanding and are not intended to limit the scope of the present invention.
And step 123, accumulating the occurrence probabilities of the same prefix words according to the alignment result and the occurrence probabilities of the prefix words of each candidate command word recognition result in the prefix words of the N candidate command word recognition results, and obtaining the confidence coefficient of each prefix word based on the accumulated results.
That is, the occurrence probabilities of the same prefix word are accumulated, and the accumulated result is used as the confidence of the prefix word.
For example, the prefix word of the "last channel" of the candidate command word recognition result with the first probability rank is "last", and the occurrence probability of the prefix word is P1; the prefix word of the candidate command word recognition result 'next channel' with the second probability ranking is 'next', and the occurrence probability of the prefix word is P2; the prefix word of the previous channel of the candidate command word recognition result of the third probability ranking is 'previous', and the occurrence probability of the prefix word is P3; the prefix word of the last frequency of the candidate command word recognition result with the fourth probability rank is the last, and the occurrence probability of the prefix word is P4. The confidence of the prefix "last" is p1+p3+p4. The confidence of the prefix "next" is P2.
In a specific implementation of step 13, the threshold includes: setting a threshold value or the maximum occurrence probability of the prefix words of the N candidate command word recognition results.
In some embodiments, the N candidate command word recognition results are rejected in response to the prefix word with the highest confidence level being less than a set threshold.
In other embodiments, the N candidate command word recognition results are rejected in response to the prefix word with the highest confidence being less than the maximum probability of occurrence of the prefix word for the N candidate command word recognition results. In general, the more the number of times that the prefix of the multiple candidate command word recognition results appears and the greater the confidence of the prefix word, the greater the probability that the candidate command word recognition results tend to be correct is represented, but if the prefix word with the highest confidence is smaller than the maximum occurrence probability of the prefix word of the N candidate command word recognition results, the ambiguity of the N candidate command word recognition results obtained this time is represented to be greater, that is, the probability that the N candidate command word recognition results are misidentified to be higher is represented. False recognition can be screened out by rejecting N candidate command word recognition results, and the probability that the final command word recognition result is correct is improved.
In still other embodiments, in response to the prefix word with the highest confidence level being greater than or equal to a threshold value, a correct recognition result is selected from the N candidate command word recognition results.
For example, the N candidate command word recognition results are selected to be the same as the prefix word with the highest confidence, and the candidate command word recognition result with the highest occurrence probability of the prefix word is the correct command word recognition result corresponding to the command word voice data.
The embodiment of the invention also provides a command word recognition device which can be used for realizing the command word recognition method.
Referring to fig. 3, the command word recognition device 30 may include: a voice recognition unit 31, configured to perform voice recognition on the command word voice data to obtain N candidate command word recognition results, where N is a positive integer greater than 1; a confidence determining unit 32, configured to count the confidence of each prefix word according to the occurrence probabilities of the prefix words of the N candidate command word recognition results; and the rejecting unit 33 is configured to reject the N candidate command word recognition results in response to the prefix word with the highest confidence level being smaller than a threshold.
In a specific implementation, the command word recognition device 30 may correspond to a Chip having command word recognition in a terminal, such as an SOC (System-On-a-Chip), etc.; or the terminal comprises a chip module with a command word recognition function; or corresponds to a chip module having a chip with a data processing function, or corresponds to a terminal.
In specific implementation, the specific working principle and workflow of the command word recognition device 30 may be referred to the description of the command word recognition method in the above embodiment, and will not be repeated here.
The embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the command word recognition method provided by any of the above embodiments of the present invention.
The computer readable storage medium may include non-volatile memory (non-volatile) or non-transitory memory, and may also include optical disks, mechanical hard disks, solid state disks, and the like.
Specifically, in the embodiment of the present invention, the processor may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), field programmable gate arrays (field programmable gate array, abbreviated as FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory. The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example but not limitation, many forms of random access memory (Random Access Memory, abbreviated as RAM) are available, such as Static random access memory (Static RAM, abbreviated as SRAM), dynamic Random Access Memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM, abbreviated as SDRAM), double data rate Synchronous dynamic random access memory (Double Data Rate SDRAM, abbreviated as DDR SDRAM), enhanced Synchronous dynamic random access memory (Enhanced SDRAM, abbreviated as ESDRAM), synchronous link dynamic random access memory (Synchronous DRAM, abbreviated as SLDRAM), and direct memory bus random access memory (Direct Rambus RAM, abbreviated as DR RAM).
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the command word recognition method provided by any embodiment when running the computer program.
The memory is coupled to the processor and may be located within the terminal or external to the terminal. The memory and the processor may be connected by a communication bus.
The terminal can include, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal equipment, and can also be a server, a cloud platform and the like.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer program may be stored in or transmitted from one computer readable storage medium to another, for example, by wired or wireless means from one website, computer, server, or data center.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal, each module/unit included in the device, product, or application may be implemented by using hardware such as a circuit, different modules/units may be located in the same component (for example, a chip, a circuit module, or the like) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program, where the software program runs on a processor integrated inside the terminal, and the remaining (if any) part of the modules/units may be implemented by using hardware such as a circuit.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.
The term "plurality" as used in the embodiments herein refers to two or more.
The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.
It should be noted that the serial numbers of the steps in the present embodiment do not represent a limitation on the execution sequence of the steps.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims (9)

1. A command word recognition method, comprising:
performing voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1;
counting the confidence coefficient of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results;
and rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value.
2. The method for recognizing command words according to claim 1, wherein the counting confidence of each prefix word according to occurrence probabilities of prefix words of recognition results of the N candidate command words comprises:
calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results according to the recognition output probability of each candidate command word recognition result;
performing alignment operation on prefix words of the N candidate command word recognition results to obtain alignment results, wherein the alignment results are used for representing whether the prefix words of the candidate command word recognition results are identical;
and accumulating the occurrence probabilities of the same prefix words according to the alignment result and the occurrence probabilities of the prefix words of each candidate command word recognition result in the prefix words of the N candidate command word recognition results, and obtaining the confidence coefficient of each prefix word based on the accumulation result.
3. The method for recognizing command words according to claim 2, wherein said aligning prefix words of N candidate command word recognition results comprises:
when aligning N pieces of prefix words of the candidate command word recognition results, responding to the fact that the prefix words of the first candidate command word recognition results are contained by the prefix words of the second candidate command word recognition results, taking the prefix words of the first candidate command word recognition results and the words after the prefix words as first word groups, taking the prefix words of the second candidate command word recognition results and the words after the prefix words as second word groups, and aligning the first word groups and the second word groups;
and in response to the first phrase and the second phrase being completely aligned, judging that the prefix word of the first candidate command word recognition result is identical to the prefix word of the second candidate command word recognition result.
4. The method for recognizing command words according to claim 2, wherein the aligning prefix words of the N candidate command word recognition results comprises:
and carrying out alignment operation on prefix words of the N candidate command word recognition results by adopting a construction confusion network algorithm or a dynamic programming algorithm.
5. The method for recognizing command words according to claim 2, wherein calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results based on the recognition output probability of each candidate command word recognition result comprises:
calculating the occurrence probability of the prefix word of each candidate command word recognition result in the prefix words of the N candidate command word recognition results by adopting the following formula;
P j =exp(score j )/sum(exp(score j ));
wherein P is j The occurrence probability of the prefix word of the j candidate command word recognition result in the prefix words of the N candidate command word recognition results is given; exp (score) j ) The occurrence probability of the prefix word of the identification result of the j candidate command word is more than or equal to 1 and less than or equal to N; sum (exp (score) j ) Is the sum of the occurrence probabilities of the prefix words of the N candidate command word recognition results.
6. The command word recognition method of claim 1, wherein the threshold comprises: setting a threshold value or the maximum occurrence probability of the prefix words of the N candidate command word recognition results.
7. A command word recognition apparatus, comprising:
the voice recognition unit is used for carrying out voice recognition on the command word voice data to obtain N candidate command word recognition results, wherein N is a positive integer greater than 1;
the confidence degree determining unit is used for counting the confidence degree of each prefix word according to the occurrence probability of the prefix word of the N candidate command word recognition results;
and the rejecting unit is used for rejecting the N candidate command word recognition results in response to the prefix word with the highest confidence coefficient being smaller than a threshold value threshold.
8. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when run by a processor performs the steps of the command word recognition method according to any of claims 1 to 6.
9. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor executes the steps of the command word recognition method according to any of claims 1 to 6 when the computer program is executed.
CN202311517838.3A 2023-11-14 2023-11-14 Command word recognition method and device, computer readable storage medium and terminal Pending CN117542347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311517838.3A CN117542347A (en) 2023-11-14 2023-11-14 Command word recognition method and device, computer readable storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311517838.3A CN117542347A (en) 2023-11-14 2023-11-14 Command word recognition method and device, computer readable storage medium and terminal

Publications (1)

Publication Number Publication Date
CN117542347A true CN117542347A (en) 2024-02-09

Family

ID=89795230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311517838.3A Pending CN117542347A (en) 2023-11-14 2023-11-14 Command word recognition method and device, computer readable storage medium and terminal

Country Status (1)

Country Link
CN (1) CN117542347A (en)

Similar Documents

Publication Publication Date Title
US11322153B2 (en) Conversation interaction method, apparatus and computer readable storage medium
US10176802B1 (en) Lattice encoding using recurrent neural networks
Henderson et al. Discriminative spoken language understanding using word confusion networks
KR20220035222A (en) Speech recognition error correction method, related devices, and readable storage medium
US8666739B2 (en) Method for estimating language model weight and system for the same
CN110610707B (en) Voice keyword recognition method and device, electronic equipment and storage medium
CN112287670A (en) Text error correction method, system, computer device and readable storage medium
US6178401B1 (en) Method for reducing search complexity in a speech recognition system
US20210193121A1 (en) Speech recognition method, apparatus, and device, and storage medium
CN105632499A (en) Method and device for optimizing voice recognition result
CN111274367A (en) Semantic analysis method, semantic analysis system and non-transitory computer readable medium
CN109036471B (en) Voice endpoint detection method and device
US11651139B2 (en) Text output method and system, storage medium, and electronic device
US7401019B2 (en) Phonetic fragment search in speech data
WO2021040842A1 (en) Optimizing a keyword spotting system
CN110970031B (en) Speech recognition system and method
US7010484B2 (en) Method of phrase verification with probabilistic confidence tagging
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
US9542939B1 (en) Duration ratio modeling for improved speech recognition
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN111640423B (en) Word boundary estimation method and device and electronic equipment
CN111128172B (en) Voice recognition method, electronic equipment and storage medium
US9530103B2 (en) Combining of results from multiple decoders
JP2003208195A5 (en)
KR20200102309A (en) System and method for voice recognition using word similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination