CN112669848A - Offline voice recognition method and device, electronic equipment and storage medium - Google Patents
Offline voice recognition method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112669848A CN112669848A CN202011474804.7A CN202011474804A CN112669848A CN 112669848 A CN112669848 A CN 112669848A CN 202011474804 A CN202011474804 A CN 202011474804A CN 112669848 A CN112669848 A CN 112669848A
- Authority
- CN
- China
- Prior art keywords
- command
- words
- processed
- word
- voiceprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000002452 interceptive effect Effects 0.000 claims description 16
- 238000005457 optimization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The embodiment of the application discloses an offline voice recognition method and device, electronic equipment and a storage medium. The method comprises the following steps: selecting a command word to be processed from at least two command words according to the number of the syllable emphasis of the at least two command words; determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words; and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed. The single-microphone offline voice recognition accuracy is improved, the stability and the reliability of the product are further improved, and the user experience is improved.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an offline speech recognition method and apparatus, an electronic device, and a storage medium.
Background
In many household devices or toys, an offline voice recognition function is required, and in many cases, a single-microphone offline voice scheme is generally selected for development and design in consideration of cost performance. However, the single-microphone offline voice scheme often has a phenomenon of misrecognition due to the fact that the single-microphone offline voice scheme does not have the characteristics of noise reduction, interference and the like, so that misoperation of equipment with an offline voice recognition function, such as a robot, is caused, the function of the equipment, such as the robot, is unstable, and user experience is affected.
Disclosure of Invention
The application provides an offline voice recognition method, an offline voice recognition device, electronic equipment and a storage medium, so that the accuracy of single-microphone offline voice recognition is improved.
In a first aspect, an embodiment of the present application provides an offline speech recognition method, where the method includes:
selecting a command word to be processed from at least two command words according to the number of the syllable emphasis of the at least two command words;
determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words;
and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed.
In a second aspect, an embodiment of the present application provides an offline speech recognition apparatus, including:
the command word selection module is used for selecting command words to be processed from at least two command words according to the number of the syllables of the at least two command words;
the interference word determining module is used for determining the interference words of the command words to be processed from the candidate words according to the voiceprint characteristics of the command words to be processed and the voiceprint characteristics of the candidate words;
and the model optimization module is used for optimizing the offline speech recognition model according to the at least two command words and the interference words of the command words to be processed.
In a third aspect, an embodiment of the present application provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement an offline speech recognition method as in any embodiment of the present application.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements an offline speech recognition method according to any of the embodiments of the present application.
According to the technical scheme of the embodiment of the application, the command words to be processed are selected from at least two command words by introducing the number of the syllable emphasis; determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words; and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed. Through the technical scheme, the single-microphone offline voice recognition accuracy is improved, the stability and the reliability of the product are further improved, and the user experience is improved.
Drawings
Fig. 1 is a flowchart of an offline speech recognition method according to an embodiment of the present application;
fig. 2 is a flowchart of an offline speech recognition method according to a second embodiment of the present application;
fig. 3 is a flowchart of an offline speech recognition method according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of an offline speech recognition apparatus according to a fourth embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The embodiments of the present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the embodiments of the application and are not limiting of the application. It should be further noted that, for convenience of description, only some structures related to the embodiments of the present application are shown in the drawings, not all of the structures are shown.
Example one
Fig. 1 is a flowchart of an offline speech recognition method according to an embodiment of the present disclosure, which is applicable to how to improve accuracy of single-microphone offline speech recognition. The method can be executed by the offline voice recognition device provided by the embodiment of the application, and the device can be implemented in a hardware and/or software manner and can be integrated into an electronic device carrying an offline voice recognition function, such as a robot, and the like. Referring to fig. 1, the method may specifically include:
s110, selecting a command word to be processed from the at least two command words according to the number of the syllable emphasis of the at least two command words.
The command words refer to words for voice waking up and controlling an electronic device having an offline voice recognition function, such as a cup, a wrap, a forward, a backward, a stop, a Show time, a Listen to me, and the like. A syllable with stress refers to a syllable in a word that is relatively powerful or prominent in pronunciation.
In consideration of the actual interactive scene, under the condition that the number of the stressed syllables of the words is small, the misrecognition is easy to generate, and in order to improve the accuracy of the off-line speech recognition, the embodiment introduces the number of the stressed syllables, and selects the named words with relatively small number of the stressed syllables from the command words for processing. That is, the command word to be processed is a command word that has a small number of syllables and needs to be processed. Alternatively, the number of command words to be processed may be one or more.
Alternatively, the present embodiment may determine the number of stressed syllables of each command word by the command word stressed syllable generator, and then may select a command word to be processed from the plurality of command words according to the number of stressed syllables of each command word. For example, it may be: and regarding each command word, if the number of the syllable emphasis of the command word is less than the set number, taking the command word as a command word to be processed. The set number is set according to actual conditions and can be flexibly adjusted. For example, if the number is set to 1 and the number of syllables of "stop" in the above example is 1, "stop" is determined as the command word to be processed.
And S120, determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words.
The voiceprint characteristic refers to a sound wave spectrum which is displayed by an electroacoustic instrument and carries speech information. The voiceprint characteristics of different words are different. The candidate words are one or more words selected from a pre-constructed word library, and further, the number of the candidate words in this embodiment is preferably multiple; optionally, selecting the candidate word from the pre-constructed word library may be to use a word in the word library that is similar to the voiceprint of the to-be-processed named word but has a different meaning as the candidate word. The interfering words are one or more of the candidate words, and may be, for example, words in which the voiceprint features of the candidate words are most similar to the voiceprint features of the command word to be processed, or may also be words in which the similarity between the voiceprint features of the candidate words and the voiceprint features of the command word to be processed is greater than a set value.
Optionally, according to the voiceprint feature of the command word to be processed and the voiceprint feature of the candidate word, determining the interfering word of the command word to be processed from the candidate word may be: determining the voiceprint similarity between the voiceprint features of the command words to be processed and the voiceprint features of the candidate words; and determining interference words of the command words to be processed from the candidate words according to the voiceprint similarity. In this embodiment, the voiceprint similarity between the voiceprint feature of the command word to be processed and the voiceprint feature of the candidate word may be characterized by the cosine similarity between the voiceprint feature of the command word to be processed and the voiceprint feature of the candidate word.
Specifically, aiming at each word in the candidate words, calculating the voiceprint similarity between the voiceprint feature of the command word to be processed and the voiceprint feature of the word; and if the voiceprint similarity is within a preset numerical range, taking the word as an interference word of the command word to be processed. Wherein, the preset numerical range is set according to the actual situation. For example, if the command word to be processed is "stop", and the voiceprints of one of the candidate words, "store", "stop", and "stop" are similar, but have different meanings, then "store" is used as the interfering word for "stop".
S130, optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed.
The off-line speech recognition refers to a high-tech technology for a machine to convert a speech signal into corresponding text or a command through recognition and understanding, and the off-line speech recognition model refers to a model recognition strategy for realizing off-line speech recognition.
In this embodiment, at least two command words and interfering words of the command words to be processed are introduced into the offline speech recognition model, and the offline speech recognition model is continuously optimized through an actual speech test. And optimizing the command words with the recognition accuracy rate not reaching the preset accuracy rate, wherein the specific optimization process is specifically described in the subsequent embodiment.
It should be noted that, in the actual test optimization process, when comparing the voiceprints, it is determined that the voiceprints are highly similar as long as the similarity is higher than 80%. And then selecting the command word with the highest similarity as a final recognition result from the highly similar command words. If the offline voice module receives that the similarity of a voiceprint interfering with the voiceprint is 85.6%, when the interference word is not added, the similarity of the interference voiceprint and the stop is highest, the user can mistakenly think that the stop is identified, after the interference word is added, the matching degree of the interference voiceprint and the stop is higher and is 90.6%, the voiceprint identification result is the stop, when the stop is identified, the machine does not set corresponding action or feedback, and only when the stop is identified, the machine has action or feedback, so that the probability of misoperation or error feedback is reduced.
According to the technical scheme of the embodiment of the application, the command words to be processed are selected from at least two command words by introducing the number of the syllable emphasis; determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words; and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed. Through the technical scheme, the single-microphone offline voice recognition accuracy is improved, the stability and the reliability of the product are further improved, and the user experience is improved.
Example two
Fig. 2 is a flowchart of an offline speech recognition method according to a second embodiment of the present application; on the basis of the above embodiment, optimization is performed, specifically, before "selecting a command word to be processed from at least two command words according to the number of syllables of the at least two command words" is operated, optimization processing is performed on the command words, so as to avoid the problem of machine misrecognition caused by "retromorphic pronunciation words" and "soft words".
As shown in fig. 2, the method specifically includes:
s210, determining a first compliant command word and a non-compliant command word from the at least two command words according to the pinyin feature and/or the voiceprint feature.
Wherein, the non-compliant command words refer to command words with "retroflex words" and/or "soft words" in the command words, such as "cup, round-the-corner". A compliance command word refers to a command word that does not include "retroflex words" and/or "soft words" in the command word.
In this embodiment, the first compliant command word and the non-compliant command word are filtered from the at least two command words according to the pinyin features and/or the voiceprint features of the at least two command words. For example, in the above examples of cups, wraparound, go forward, go back, stop, Show time, Listen to me, etc., the cup, wraparound is the non-compliance command word and the go forward, go back, stop, Show time, Listen to me is the first compliance command word.
And S220, synonymy converting the non-compliance command words to obtain second compliance command words related to the non-compliance command words.
In this embodiment, the second compliance command word is a word having the same meaning as the non-compliance command word, and the second compliance command word does not include the "retroflex word" and/or the "soft word".
During offline speech recognition, the occurrence of non-compliant command words, such as "soft words," can cause problems with machine misrecognition; similarly, the pronunciation of the command word of the retromorphism sound is not obvious enough, so that the voiceprint feature is not obvious, and the situation of machine misrecognition is easy to occur. For example, a "cup" may be recognized by a machine as a "cup" when a "quilt" is received during an offline speech recognition process of the machine. Therefore, in order to avoid the above problem, in the present embodiment, a synonymy transformation is performed on the non-compliance command word, so as to obtain a second compliance command word associated with the non-compliance command word. For example, the cup is changed into the paper cup, and the problem that the cup is recognized when the received voice is the quilt can be solved in the off-line voice recognition process of the machine.
S230, combining the first compliance command word and the second compliance command word into at least two new command words.
In this embodiment, the first compliance command term and the second compliance command term are combined into at least two new command terms to obtain the optimized actual command term. For example, the command words in the above example are optimized as follows: paper cup, turn, go forward, go backward, stop, store, Show time, Listen to me.
S240, selecting the command word to be processed from the at least two command words according to the number of the syllable emphasis of the at least two command words.
And S250, determining the interference words of the command words to be processed from the candidate words according to the voiceprint characteristics of the command words to be processed and the voiceprint characteristics of the candidate words.
And S260, optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed.
According to the technical scheme, before the command words to be processed are determined, the command words are optimized, the non-compliant command words are subjected to synonymy conversion, the compliant command words are obtained, the influence on the offline voice recognition accuracy rate caused by the use of the command words such as 'retroflex voice words' and/or 'soft voice words' is avoided, the offline voice recognition accuracy rate is further improved, the stability and the reliability of products are improved, and the user experience is improved.
EXAMPLE III
Fig. 3 is a flowchart of an offline speech recognition method according to a third embodiment of the present application; on the basis of the embodiment, optimization is carried out, and after the offline speech recognition model is optimized, the optimization process of the command words and the interference words of the command words is added.
As shown in fig. 3, the method may specifically include:
s310, selecting a command word to be processed from the at least two command words according to the number of the syllable emphasis of the at least two command words.
S320, determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words.
S330, optimizing the off-line speech recognition model according to the at least two command words and the interference words of the command words to be processed.
S340, if the recognition accuracy of the offline voice recognition model is lower than a set value, optimizing at least two command words and/or interference words of the command words to be processed.
In the process of actually testing the offline speech recognition model, the recognition accuracy of the offline speech recognition model is low due to the fact that the syllable emphasis of the command word is not obvious enough, or the machine recognizes the command word as the interference word without responding due to the fact that the interference word of the command word is too similar to the command word, so that the recognition accuracy of the offline speech recognition model is low. In this embodiment, in order to solve the above problem, in the offline speech recognition process, if it is monitored that the recognition accuracy of the offline speech recognition model is lower than a set value, at least two command words and/or interfering words of the command words to be processed are optimized. The set value is set according to actual conditions, such as 95%.
Optionally, the interfering words of the command word to be processed are optimized, and if the voiceprint similarity between the command word to be processed and the interfering words of the command word to be processed is greater than the set threshold, the interfering words of the command word to be processed are determined again from the candidate words. The threshold is set according to actual conditions, for example, 95%.
Specifically, calculating the voiceprint similarity between the command words to be processed and the interference words of the command words to be processed; and if the voiceprint similarity between the command word to be processed and the interference word of the command word to be processed is greater than a set threshold value, re-determining the interference word of the command word to be processed from the candidate words according to the voiceprint characteristics of the command word to be processed and the voiceprint characteristics of the words in the candidate words.
Optionally, the command word may be optimized by selecting, from the candidate words, other words having the same meaning as the command word instead of the current command word based on the synonymy transformation. Or by collecting human voice and then performing voice synthesis (i.e. synthesizing the voiceprint of the command word and the voiceprint collected by the human voice), generating a new command word with the same meaning as the command word, and the like.
According to the technical scheme, the recognition accuracy of the off-line voice recognition model is detected, if the recognition accuracy of the off-line voice recognition model is lower than a set value, interference words of at least two command words and command words to be processed are optimized, the off-line voice recognition accuracy is improved, the stability and the reliability of a product are improved, and the user experience is improved.
Example four
Fig. 4 is a schematic structural diagram of an offline speech recognition apparatus according to a fourth embodiment of the present application; the embodiment can be applied to the situation of how to improve the accuracy of single-microphone offline voice recognition. The device can be realized by adopting a hardware and/or software mode, and can be integrated in electronic equipment bearing an offline voice recognition function, such as a robot and the like.
As shown in fig. 4, the apparatus may include a command term selection module 410, an interfering term determination module 420, and a model optimization module 430, wherein,
a command word selecting module 410, configured to select a command word to be processed from the at least two command words according to the number of the syllable emphasis of the at least two command words;
the interference word determining module 420 is configured to determine an interference word of the command word to be processed from the candidate words according to the voiceprint features of the command word to be processed and the voiceprint features of the candidate words;
and the model optimization module 430 is configured to optimize the offline speech recognition model according to the at least two command words and the interference words of the command words to be processed.
According to the technical scheme of the embodiment of the application, the command words to be processed are selected from at least two command words by introducing the number of the syllable emphasis; determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words; and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed. Through the technical scheme, the single-microphone offline voice recognition accuracy is improved, the stability and the reliability of the product are further improved, and the user experience is improved.
Further, the apparatus also includes a command term filtering module, a second compliance command term determining module, and a new command term determining module, wherein,
the command word screening module is used for determining a first compliance command word and a first non-compliance command word from at least two command words according to the pinyin characteristics and/or the voiceprint characteristics;
the compliance command word determining module is used for carrying out synonymy conversion on the non-compliance command words to obtain second compliance command words related to the non-compliance command words;
a new command word determination module to combine the first and second compliant command words into new at least two command words.
Further, the command word selection module 410 is specifically configured to:
and regarding each command word, if the number of the syllable emphasis of the command word is less than the set number, taking the command word as a command word to be processed.
Further, the interfering word determining module 420 includes a voiceprint similarity determining unit and an interfering word determining unit, wherein,
the voiceprint similarity determining unit is used for determining the voiceprint similarity between the voiceprint characteristics of the command words to be processed and the voiceprint characteristics of the candidate words;
and the interference word determining unit is used for determining the interference words of the command words to be processed from the candidate words according to the voiceprint similarity.
Further, the device also comprises an optimization module used for,
and if the recognition accuracy rate of the offline voice recognition model is lower than a set value, optimizing at least two command words and interference words of the command words to be processed.
Further, the optimization module comprises an interference word optimization unit, and the interference word optimization unit is used for re-determining the interference words of the command words to be processed from the candidate words if the voiceprint similarity between the command words to be processed and the interference words of the command words to be processed is greater than a set threshold.
The offline speech recognition device provided by the above embodiment can execute the offline speech recognition method provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device provided in the fifth embodiment of the present application, and fig. 5 shows a block diagram of an exemplary electronic device suitable for implementing the embodiments of the present application. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application. Alternatively, the electronic device in this embodiment may typically be an electronic device with an offline voice recognition function, such as a robot.
As shown in FIG. 5, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes programs stored in the system memory 28 to execute various functional applications and data processing, such as implementing the offline speech recognition method provided by the embodiments of the present application.
EXAMPLE six
A sixth embodiment of the present application further provides a computer-readable storage medium, on which a computer program (or referred to as computer-executable instructions) is stored, where the computer program is used for executing the offline speech recognition method provided in the embodiment of the present application when executed by a processor, and the method includes:
selecting a command word to be processed from the at least two command words according to the number of the syllable emphasis of the at least two command words;
determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words;
and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed.
The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the embodiments of the present application have been described in more detail through the above embodiments, the embodiments of the present application are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.
Claims (10)
1. An off-line speech recognition method, comprising:
selecting a command word to be processed from at least two command words according to the number of the syllable emphasis of the at least two command words;
determining interference words of the command words to be processed from the candidate words according to the voiceprint features of the command words to be processed and the voiceprint features of the candidate words;
and optimizing the off-line voice recognition model according to the at least two command words and the interference words of the command words to be processed.
2. The method of claim 1, further comprising, prior to selecting a command word to be processed from at least two command words based on the number of stressed syllables of the at least two command words:
determining a first compliant command term and a non-compliant command term from the at least two command terms based on the pinyin features and/or the voiceprint features;
performing synonymy conversion on the unqualified compliance command words to obtain second compliance command words related to the unqualified compliance command words;
composing the first and second compliant command words into new at least two command words.
3. The method of claim 1, wherein selecting a command word to be processed from at least two command words according to their number of stressed syllables comprises:
and regarding each command word, if the number of the syllable emphasis of the command word is less than the set number, taking the command word as a command word to be processed.
4. The method of claim 1, wherein determining the interfering word for the command word to be processed from the candidate words according to the voiceprint characteristics of the command word to be processed and the voiceprint characteristics of the candidate words comprises:
determining the voiceprint similarity between the voiceprint features of the command words to be processed and the voiceprint features of the candidate words;
and determining interference words of the command words to be processed from the candidate words according to the voiceprint similarity.
5. The method of claim 1, further comprising, after optimizing an offline speech recognition model based on the at least two command terms and interfering terms of the command terms to be processed:
and if the recognition accuracy rate of the offline voice recognition model is lower than a set value, optimizing the at least two command words and/or the interference words of the command words to be processed.
6. The method of claim 5, wherein optimizing the interfering words of the pending command words comprises:
and if the voiceprint similarity between the command word to be processed and the interference word of the command word to be processed is greater than a set threshold value, re-determining the interference word of the command word to be processed from the candidate words.
7. An off-line speech recognition apparatus, comprising:
the command word selection module to be processed is used for selecting command words to be processed from at least two command words according to the number of the syllables of the at least two command words;
the interference word determining module is used for determining the interference words of the command words to be processed from the candidate words according to the voiceprint characteristics of the command words to be processed and the voiceprint characteristics of the candidate words;
and the model optimization module is used for optimizing the offline speech recognition model according to the at least two command words and the interference words of the command words to be processed.
8. The apparatus of claim 7, further comprising:
the command word screening module is used for determining a first compliance command word and a first non-compliance command word from at least two command words according to the pinyin characteristics and/or the voiceprint characteristics;
a compliance command term determining module, configured to perform synonymy conversion on the non-compliance command terms to obtain second compliance command terms associated with the non-compliance command terms;
a new command word determination module to combine the first and second compliant command words into new at least two command words.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the offline speech recognition method of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the offline speech recognition method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011474804.7A CN112669848B (en) | 2020-12-14 | 2020-12-14 | Offline voice recognition method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011474804.7A CN112669848B (en) | 2020-12-14 | 2020-12-14 | Offline voice recognition method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112669848A true CN112669848A (en) | 2021-04-16 |
CN112669848B CN112669848B (en) | 2023-12-01 |
Family
ID=75404477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011474804.7A Active CN112669848B (en) | 2020-12-14 | 2020-12-14 | Offline voice recognition method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112669848B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998034217A1 (en) * | 1997-01-30 | 1998-08-06 | Dragon Systems, Inc. | Speech recognition using multiple recognizors |
JP2000020087A (en) * | 1998-06-30 | 2000-01-21 | Brother Ind Ltd | Speech recognition apparatus and memory medium |
US6684185B1 (en) * | 1998-09-04 | 2004-01-27 | Matsushita Electric Industrial Co., Ltd. | Small footprint language and vocabulary independent word recognizer using registration by word spelling |
US20110224982A1 (en) * | 2010-03-12 | 2011-09-15 | c/o Microsoft Corporation | Automatic speech recognition based upon information retrieval methods |
US20150066921A1 (en) * | 2013-08-28 | 2015-03-05 | AV Music Group, LLC | Systems and methods for identifying word phrases based on stress patterns |
US9286888B1 (en) * | 2014-11-13 | 2016-03-15 | Hyundai Motor Company | Speech recognition system and speech recognition method |
CN108831458A (en) * | 2018-05-29 | 2018-11-16 | 广东声将军科技有限公司 | A kind of offline voice is to order transform method and system |
CN109410927A (en) * | 2018-11-29 | 2019-03-01 | 北京蓦然认知科技有限公司 | Offline order word parses the audio recognition method combined, device and system with cloud |
CN109710929A (en) * | 2018-12-18 | 2019-05-03 | 金蝶软件(中国)有限公司 | A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text |
US20190385599A1 (en) * | 2017-06-29 | 2019-12-19 | Tencent Technology (Shenzhen) Company Limited | Speech recognition method and apparatus, and storage medium |
-
2020
- 2020-12-14 CN CN202011474804.7A patent/CN112669848B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1998034217A1 (en) * | 1997-01-30 | 1998-08-06 | Dragon Systems, Inc. | Speech recognition using multiple recognizors |
JP2000020087A (en) * | 1998-06-30 | 2000-01-21 | Brother Ind Ltd | Speech recognition apparatus and memory medium |
US6684185B1 (en) * | 1998-09-04 | 2004-01-27 | Matsushita Electric Industrial Co., Ltd. | Small footprint language and vocabulary independent word recognizer using registration by word spelling |
US20110224982A1 (en) * | 2010-03-12 | 2011-09-15 | c/o Microsoft Corporation | Automatic speech recognition based upon information retrieval methods |
US20150066921A1 (en) * | 2013-08-28 | 2015-03-05 | AV Music Group, LLC | Systems and methods for identifying word phrases based on stress patterns |
US9286888B1 (en) * | 2014-11-13 | 2016-03-15 | Hyundai Motor Company | Speech recognition system and speech recognition method |
US20190385599A1 (en) * | 2017-06-29 | 2019-12-19 | Tencent Technology (Shenzhen) Company Limited | Speech recognition method and apparatus, and storage medium |
CN108831458A (en) * | 2018-05-29 | 2018-11-16 | 广东声将军科技有限公司 | A kind of offline voice is to order transform method and system |
CN109410927A (en) * | 2018-11-29 | 2019-03-01 | 北京蓦然认知科技有限公司 | Offline order word parses the audio recognition method combined, device and system with cloud |
CN109710929A (en) * | 2018-12-18 | 2019-05-03 | 金蝶软件(中国)有限公司 | A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text |
Non-Patent Citations (3)
Title |
---|
YING CUI; K. TAKAYA: "Recognition of syllables in a continuous stream of speech by PARCOR parameters of linear predictive vocoder", 《CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, 2005.》, pages 852 - 855 * |
YONG XU; WU GUO; LIRONG DAI: "《A hybrid fragment / syllable-based system for improved OOV term detectio》", 《2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING》, pages 378 - 382 * |
张策,韦鹏程,石熙: "《小语料库重庆话语音识别的研究》", 《计算机测量与控制》, vol. 26, no. 11, pages 252 - 255 * |
Also Published As
Publication number | Publication date |
---|---|
CN112669848B (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108922564B (en) | Emotion recognition method and device, computer equipment and storage medium | |
JP2019046468A (en) | Interface smart interactive control method, apparatus, system and program | |
US20180357998A1 (en) | Wake-on-voice keyword detection with integrated language identification | |
CN108564944B (en) | Intelligent control method, system, equipment and storage medium | |
US10217458B2 (en) | Technologies for improved keyword spotting | |
US20220122622A1 (en) | Cascaded Encoders for Simplified Streaming and Non-Streaming ASR | |
US9099091B2 (en) | Method and apparatus of adaptive textual prediction of voice data | |
US20230298609A1 (en) | Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation | |
CN117043856A (en) | End-to-end model on high-efficiency streaming non-recursive devices | |
US20230352006A1 (en) | Tied and reduced rnn-t | |
CN118176537A (en) | Training for long form speech recognition | |
US20120053937A1 (en) | Generalizing text content summary from speech content | |
CN113611316A (en) | Man-machine interaction method, device, equipment and storage medium | |
CN111400463B (en) | Dialogue response method, device, equipment and medium | |
CN118076997A (en) | Large-scale language model data selection for rare word speech recognition | |
WO2022086640A1 (en) | Fast emit low-latency streaming asr with sequence-level emission regularization | |
CN113435198A (en) | Automatic correction display method and device for caption dialect words | |
CN112863496B (en) | Voice endpoint detection method and device | |
CN112837674B (en) | Voice recognition method, device, related system and equipment | |
CN112669848B (en) | Offline voice recognition method and device, electronic equipment and storage medium | |
CN109036379B (en) | Speech recognition method, apparatus and storage medium | |
US12119014B2 (en) | Joint acoustic echo cancelation, speech enhancement, and voice separation for automatic speech recognition | |
JP2015036826A (en) | Communication processor, communication processing method and communication processing program | |
CN114501112B (en) | Method, apparatus, device, medium, and article for generating video notes | |
JP2024538718A (en) | Optimizing the inference performance of conformers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |