CN105913841B

CN105913841B - Voice recognition method, device and terminal

Info

Publication number: CN105913841B
Application number: CN201610509372.6A
Authority: CN
Inventors: 伍亮雄; 刘鸣; 王乐
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2016-06-30
Filing date: 2016-06-30
Publication date: 2020-04-03
Anticipated expiration: 2036-06-30
Also published as: CN105913841A

Abstract

The disclosure relates to a voice recognition method, a voice recognition device and a terminal. The method comprises the following steps: acquiring input voice to be recognized; and recognizing the voice to be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replacing system defaults letter standard voice. By applying the scheme of the embodiment of the disclosure, the voice of the user can be recognized more accurately.

Description

Voice recognition method, device and terminal

Technical Field

The present disclosure relates to the field of mobile communications technologies, and in particular, to a voice recognition method, an apparatus, and a terminal.

Background

Speech recognition technology is now widely used in the present stage, with the aim of converting the vocabulary content in human speech into computer-readable input, such as keystrokes, binary codes or character sequences. Applications of speech recognition technology include voice dialing, voice navigation, indoor device control, voice document retrieval, simple dictation data entry, and the like.

To meet different requirements of users, dialect adaptation is beginning to be added to the speech recognition technology, for example: guangdong language, Sichuan language, etc. However, for languages with standard pronunciation alphabet structures, such as mandarin and english, default alphabet standard voices are set in the voice recognition system, and if the voice uttered by the user is in a spelling mode with local accents and the accents are very different, the voice recognition rate is very low, and the voice recognition function is almost disabled.

Disclosure of Invention

The disclosure provides a voice recognition method, a voice recognition device and a terminal, which can more accurately recognize voice of a user.

According to a first aspect of the embodiments of the present disclosure, there is provided a speech recognition method, including:

acquiring input voice to be recognized;

and recognizing the voice to be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replacing system defaults letter standard voice.

Optionally, the calibrating the speech recognition according to the text includes:

composing a new text calibration voice using the letter calibration voice;

and recognizing the input voice to be recognized according to the character calibration voice.

acquiring stored character calibration voice, wherein the stored character calibration voice is new character calibration voice formed by recognized voice after the historical voice to be recognized is recognized according to the letter calibration voice;

and recognizing the input voice to be recognized according to the acquired character calibration voice.

Optionally, the default alphabetical standard voice of the alphabetical calibration voice replacement system includes:

collecting letter calibration voice by recording pronunciations of all letters of an alphabet;

and replacing the acquired letter standard voice with default letter standard voice of a system.

Optionally, the calibrating the speech to be recognized input according to the text includes:

acquiring the character calibration voice and the voice characteristic information of the voice to be recognized;

and recognizing the input voice to be recognized according to the matching relation between the character calibration voice and the voice characteristic information of the voice to be recognized.

Optionally, the voice feature information may include one or more of the following items: tone color, pitch, duration, and intensity of the speech.

Optionally, the forming a new text calibration voice by using the letter calibration voice includes:

obtaining a new character calibration voice through single letter calibration voice spelling; or the like, or, alternatively,

and combining a plurality of letter calibration voices and spelling according to a continuous reading rule to obtain a new character calibration voice.

Optionally, fuzzy approximation relations are set between the set letters in the letter calibration voice.

According to a second aspect of the embodiments of the present disclosure, there is provided a speech recognition apparatus including:

the acquisition module is used for acquiring input voice to be recognized;

and the voice recognition module is used for recognizing the voice to be recognized of the acquisition module according to letter calibration voice or character calibration voice, wherein the letter calibration voice replaces the default letter standard voice of the system.

Optionally, the speech recognition module includes:

the first recognition submodule is used for forming new character calibration voice by using the letter calibration voice and recognizing the input voice to be recognized according to the character calibration voice; alternatively, the first and second electrodes may be,

and the second recognition submodule is used for acquiring stored character calibration voice, wherein the stored character calibration voice is new character calibration voice formed by recognized voice after the historical voice to be recognized is recognized according to the letter calibration voice, and the input voice to be recognized is recognized according to the acquired character calibration voice.

Optionally, the apparatus further comprises:

and the letter voice replacing module is used for acquiring letter calibration voice by recording pronunciations of all letters of the alphabet and replacing the acquired letter standard voice with default letter standard voice of the system.

Optionally, the voice recognition module obtains the character calibration voice and the voice feature information of the voice to be recognized, and recognizes the input voice to be recognized according to a matching relationship between the character calibration voice and the voice feature information of the voice to be recognized.

Optionally, the first recognition sub-module obtains a new character calibration voice by spelling with a single character calibration voice or obtains a new character calibration voice by combining a plurality of character calibration voices and spelling according to a continuous reading rule.

Optionally, the apparatus further comprises:

and the fuzzy setting module is used for setting fuzzy approximate relations among the set letters in the letter calibration voice.

According to a third aspect of the embodiments of the present disclosure, there is provided a mobile terminal including:

a processor and a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring input voice to be recognized;

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the method and the device, after the input voice to be recognized is obtained, the voice to be recognized can be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replaces system default letter standard voice, so that a user can be accurately recognized even if the voice has local accent, and the voice recognition capability is improved.

Furthermore, the method can also have two processing modes, namely, the letter calibration voice is used for forming a new character calibration voice, and the input voice to be recognized is recognized according to the character calibration voice; the method can also be used for acquiring stored character calibration voice, wherein the stored character calibration voice is new character calibration voice formed by recognized voice after historical voice to be recognized is recognized according to the letter calibration voice; and recognizing the input voice to be recognized according to the acquired character calibration voice, so that the input voice to be recognized can be recognized according to the character calibration voice, and the voice recognition capability and the recognition efficiency can be improved.

Further, the present disclosure may calibrate the speech by recording the pronunciation of all the letters of the alphabet as letters.

Furthermore, the method and the device can identify the input voice to be identified according to the matching relation between the character calibration voice and the voice feature information of the voice to be identified.

Further, the present disclosure may obtain a new character calibration voice by spelling with a single letter calibration voice, or obtain a new character calibration voice by combining a plurality of letter calibration voices and spelling according to a continuous reading rule.

Furthermore, the fuzzy approximation relation can be set between the set letters in the letter calibration voice, and the problem that the pronunciation of individual letters of accents in some places is similar can be solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of speech recognition according to an exemplary embodiment of the present disclosure.

FIG. 2 is a flow chart illustrating another speech recognition method according to an exemplary embodiment of the present disclosure.

FIG. 3 is a block diagram illustrating a speech recognition apparatus according to an example embodiment of the present disclosure.

FIG. 4 is a block diagram illustrating another speech recognition apparatus according to an example embodiment of the present disclosure.

Fig. 5 is a block diagram illustrating a mobile terminal according to an exemplary embodiment of the present disclosure.

Fig. 6 is a block diagram illustrating an arrangement of a device according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The method can be applied to a terminal, and as shown in fig. 1, the method can include the following steps:

in step 101, an input speech to be recognized is acquired.

In step 102, the speech to be recognized is recognized according to letter calibration speech or character calibration speech, wherein the letter calibration speech replaces system default letter standard speech.

The step can use the letter calibration voice to form a new character calibration voice, and recognize the input voice to be recognized according to the character calibration voice; or acquiring stored character calibration voice, wherein the stored character calibration voice is a new character calibration voice formed by recognized voice after the historical voice to be recognized is recognized according to the letter calibration voice, and recognizing the input voice to be recognized according to the acquired character calibration voice.

Wherein, the step can collect the letter calibration voice by recording the pronunciations of all the letters of the alphabet; and replacing the default letter standard voice of the system with the collected letter standard voice.

The step can acquire the character calibration voice and the voice feature information of the voice to be recognized, and recognize the input voice to be recognized according to the matching relation between the character calibration voice and the voice feature information of the voice to be recognized.

This step of composing a new text alignment voice using the letter alignment voice may include: obtaining a new character calibration voice through single letter calibration voice spelling; or combining a plurality of letter calibration voices and spelling according to the continuous reading rule to obtain a new character calibration voice.

The step can obtain the character calibration voice and the voice characteristic information of the voice to be recognized; and recognizing the input voice to be recognized according to the matching relation between the character calibration voice and the voice characteristic information of the voice to be recognized. The voice feature information may include one or more of the following items: tone color, pitch, duration, and intensity of the speech.

As can be seen from this embodiment, the technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: according to the method and the device, after the input voice to be recognized is obtained, the voice to be recognized can be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replaces system default letter standard voice, so that a user can be accurately recognized even if the voice has local accent, and the voice recognition capability is improved.

FIG. 2 is a flow diagram illustrating another speech recognition method according to an example embodiment of the present disclosure.

The method can be applied to a terminal, and the embodiment describes the technical scheme of the disclosure in more detail with respect to fig. 1.

The technical solution is described in detail below with reference to fig. 2. As shown in fig. 2, the method may include the steps of:

in step 201, the letter alignment voices obtained by the user recording all letter sounds of the alphabet one by one are collected.

This disclosure refers to all letter sounds recorded by a user as letter alignment voices. The method and the device have the advantages that the self-entry function aiming at the standard pronunciation letters is provided, a user records all the pronunciation letters once by self to obtain letter calibration voice, and the letter calibration voice is used as the standard subsequently, so that the accent problem of pronunciation with a standard pronunciation letter structure can be solved.

The letters may be, for example, english letters, chinese letters, or letters of other languages.

In step 202, the acquired user recorded letter calibration voice is substituted for the original default letter standard voice of the system.

Because the default letter standard voice of the system is difficult to recognize letter pronunciation with local accent, the acquired letter calibration voice recorded by the user replaces the original default letter standard voice of the system, so that the letter pronunciation standard set by the system takes the acquired letter calibration voice as a recognition standard, and the letter pronunciation with the local accent is easy to recognize.

In step 203, a new text alignment voice is composed using the letter alignment voice.

The speech recognition system can also learn the human's read-through behavior by considering that when a human reads the pronunciation of any word or word, the pronunciation is given by a single letter or by combining multiple single letters and giving the pronunciation according to the corresponding read-through rule. Therefore, the step of the method can obtain new character calibration voice through single letter calibration phonetic spelling; or combining a plurality of letter calibration voices and spelling according to the continuous reading rule to obtain a new character calibration voice.

For example: the pinyin of the apple is pingguo, and single letters or letter combinations of p, ing, g, u and o can be combined and spelled according to a continuous reading mode of p-ing-g-u-o, so that new character standard voice can be obtained. That is, after the voice recognition system replaces the default configuration of the letter standard voice with the self-recorded letter calibration voice, the same readthrough rule is used to recombine a plurality of letter calibration voices or a single letter calibration voice (for example, some single letters form a word) is directly used to obtain a new word calibration voice, and the new word calibration voice can replace the word voice which is obtained by the system according to the letter standard voice.

Here, the continuous reading means that, for example, in the same sense group of english, the former word ends with consonant phonemes and the latter word begins with vowel phonemes, and when speaking or reading a sentence, it is customary to naturally put the two phonemes together and read, and this speech phenomenon is called continuous reading. The syllables formed by continuous reading are not repeated, and only need to be carried along naturally, but can not be read too heavily. The continuous reading rule refers to the habit of continuous reading, for example, in the case of "consonant + vowel" type continuous reading, the continuous reading rule is that if the former word of two adjacent words is ended with consonant and the latter word is started with vowel, the consonant and the vowel are pieced together for continuous reading.

It should be noted that the system can generally use a text-contained speech library to spell and store some common words or vocabularies according to the default alphabetic standard speech. The method can replace the original character voice after all character voices carried by the system are spelled again by the letter calibration voice to obtain the new character standard voice.

In step 204, the speech to be recognized spoken by the speech recognition user is aligned according to the text.

The method comprises the steps of acquiring character calibration voice and voice characteristic information of voice to be recognized; and recognizing the input voice to be recognized according to the matching relation between the character calibration voice and the voice characteristic information of the voice to be recognized. The voice feature information may include one or more of the following items: tone color, pitch, duration, and intensity of the speech.

It should be noted that, the input speech to be recognized is recognized according to the speech feature information, and the recognition may be performed by using an existing recognition algorithm, which is not limited in the present disclosure.

It should be further noted that, considering that some pronunciation confusion may exist in local accents, the present disclosure may set fuzzy approximation relations between the set letters in the letter alignment speech, and associate the pronunciation with possible pronunciation confusion, for example: the letter sounds s sh, c ch, etc. are set.

According to the scheme, the user sets letter calibration voice according to the condition of the accent of the user, records all pronunciation letters as the letter calibration voice, replaces the letter standard voice of the system, and then uses the letter calibration voice to form new character calibration voice to identify the input voice to be identified, so that the accent problem of the pronunciation with a standard pronunciation letter structure can be solved, and the recognition degree of the voice input can be improved.

It should be further noted that, the present disclosure may also recognize a new character calibration voice composed of recognized voices after recognizing a historical voice to be recognized according to a letter calibration voice, then may directly acquire a stored character calibration voice, and recognize the input voice to be recognized according to the acquired character calibration voice.

Corresponding to the embodiment of the application function implementation method, the disclosure also provides a voice recognition device, a terminal and a corresponding embodiment.

The apparatus may be provided in a terminal. As shown in fig. 3, a speech recognition apparatus may include: an acquisition module 31 and a voice recognition module 32.

And the obtaining module 31 is configured to obtain the input speech to be recognized.

And the voice recognition module 32 is configured to recognize the to-be-recognized voice of the acquisition module according to letter calibration voice or character calibration voice, where the letter calibration voice replaces a default letter standard voice of the system.

The speech recognition module 32 may use the letter calibration speech to form a new character calibration speech, and recognize the input speech to be recognized according to the character calibration speech; or acquiring stored character calibration voice, wherein the stored character calibration voice is a new character calibration voice formed by recognized voice after the historical voice to be recognized is recognized according to the letter calibration voice, and recognizing the input voice to be recognized according to the acquired character calibration voice.

According to the embodiment, after the input voice to be recognized is acquired, the voice to be recognized can be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replaces system default letter standard voice, so that a user can accurately recognize even if the voice has local accents, and the voice recognition capability is improved.

FIG. 4 is another block diagram illustrating an apparatus for multi-party telephony in accordance with an exemplary embodiment of the present disclosure.

The apparatus may be provided in a terminal. As shown in fig. 4, a speech recognition apparatus may include: an acquisition module 31, a voice recognition module 32, a letter voice replacement module 33, and a fuzzy setting module 34.

The functions of the acquisition module 31 and the speech recognition module 32 can be referred to the description in fig. 3.

The voice recognition module 32 may include: a first identification submodule 321 or a second identification submodule 322.

The first recognition submodule 321 is configured to use the letter calibration voice to form a new character calibration voice, and recognize the input voice to be recognized according to the character calibration voice.

And a second recognition submodule 322, configured to obtain a stored character calibration voice, where the stored character calibration voice is a new character calibration voice composed of recognized voices after a historical to-be-recognized voice is recognized according to the character calibration voice, and recognize the input to-be-recognized voice according to the obtained character calibration voice.

Wherein calibrating the speech to be recognized input for speech recognition based on the text may include: acquiring character calibration voice and voice characteristic information of the voice to be recognized; and recognizing the input voice to be recognized according to the matching relation between the character calibration voice and the voice characteristic information of the voice to be recognized. The voice feature information may include one or more of the following items: tone color, pitch, duration, and intensity of the speech.

Wherein the apparatus may further comprise: an alphabetical speech replacement module 33.

And the letter voice replacing module 33 is used for collecting letter calibration voice by recording the pronunciations of all the letters of the alphabet, and replacing the collected letter standard voice with the default letter standard voice of the system. Because the default letter standard voice of the system is difficult to recognize letter pronunciation with local accent, the acquired letter calibration voice recorded by the user replaces the original default letter standard voice of the system, so that the letter pronunciation standard set by the system takes the acquired letter calibration voice as a recognition standard, and the letter pronunciation with the local accent is easy to recognize.

The voice recognition module 32 obtains the character calibration voice and the voice feature information of the voice to be recognized, and recognizes the input voice to be recognized according to the matching relationship between the character calibration voice and the voice feature information of the voice to be recognized.

The first recognition sub-module 321 obtains a new character calibration voice by spelling with a single character calibration voice or obtains a new character calibration voice by combining a plurality of character calibration voices and spelling according to a continuous reading rule.

Wherein the apparatus may further comprise: an ambiguity setting module 34.

And the fuzzy setting module 34 is used for setting fuzzy approximate relations among the set letters in the letter calibration voice. Considering that local accents may have some pronunciation confusion, the present disclosure may set fuzzy approximation relationships between set letters in the letter alignment speech, associating pronunciation confusion that may exist, for example: the letter sounds s sh, c ch, etc. are set.

Therefore, according to the scheme, the user sets letter calibration voice according to the condition of the accent of the user, records all the pronunciation letters once by self to serve as the letter calibration voice, replaces the letter standard voice of the system, and then uses the letter calibration voice to form new character calibration voice to identify the input voice to be identified, so that the accent problem of the pronunciation with a standard pronunciation letter structure can be solved, and the recognition degree of the voice input can be improved.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.

Fig. 5 is a block diagram illustrating the present disclosure according to an exemplary embodiment.

As shown in fig. 5, includes: a processor 501 and a memory 502 for storing processor-executable instructions;

wherein the processor 501 is configured to:

acquiring input voice to be recognized;

It should be further noted that other programs stored in the memory 502 refer to the description in the foregoing method flow, which is not described herein again, and the processor 501 is also configured to execute the other programs stored in the memory 502.

For example, the device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, device 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A power supply component 606 provides power to the various components of the device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 600.

The multimedia component 608 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the device 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the device 600, the sensor component 614 may also detect a change in position of the device 600 or a component of the device 600, the presence or absence of user contact with the device 600, orientation or acceleration/deceleration of the device 600, and a change in temperature of the device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communications between the device 600 and other devices in a wired or wireless manner. The device 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 620 of the device 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, instructions in the storage medium, when executed by a processor of a terminal device, enable the terminal to perform a speech recognition method, the method comprising:

acquiring input voice to be recognized;

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A speech recognition method, comprising:

acquiring input voice to be recognized;

recognizing the voice to be recognized according to letter calibration voice or character calibration voice, wherein the letter calibration voice replacing system defaults letter standard voice; fuzzy approximate relations are set among set letters in the letter calibration voice;

the voice to be recognized is recognized according to the character calibration voice, and the method comprises the following steps:

2. The method of claim 1, wherein the recognizing the speech to be recognized according to the text-aligned speech comprises:

composing a new text calibration voice using the letter calibration voice;

3. The method of claim 1, wherein the letter calibration voice replacement system defaults to letter standard voice comprising:

4. The method of claim 1, wherein said calibrating the speech to be recognized of the speech recognition input according to the text comprises:

5. The method of claim 4, wherein the speech feature information comprises one or more of: tone color, pitch, duration, and intensity of the speech.

6. The method of claim 2, wherein said composing a new text alignment voice using said letter alignment voice comprises:

7. A speech recognition apparatus, comprising:

the acquisition module is used for acquiring input voice to be recognized;

the voice recognition module is used for recognizing the voice to be recognized of the acquisition module according to letter calibration voice or character calibration voice, wherein the letter calibration voice replacement system defaults letter standard voice; fuzzy approximate relations are set among set letters in the letter calibration voice;

the speech recognition module includes: and the second recognition submodule is used for acquiring stored character calibration voice, wherein the stored character calibration voice is new character calibration voice formed by recognized voice after the historical voice to be recognized is recognized according to the letter calibration voice, and the input voice to be recognized is recognized according to the acquired character calibration voice.

8. The speech recognition device of claim 7, wherein the speech recognition module further comprises:

and the first recognition submodule is used for forming new character calibration voice by using the letter calibration voice and recognizing the input voice to be recognized according to the character calibration voice.

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 7, wherein:

the voice recognition module acquires the character calibration voice and the voice feature information of the voice to be recognized, and recognizes the input voice to be recognized according to the matching relation between the character calibration voice and the voice feature information of the voice to be recognized.

11. The apparatus of claim 8, wherein:

and the first recognition submodule obtains new character calibration voice by spelling through single letter calibration voice or obtains new character calibration voice by combining a plurality of letter calibration voices and spelling according to a continuous reading rule.

12. A mobile terminal, comprising:

a processor and a memory for storing processor-executable instructions;

wherein the processor is configured to:

acquiring input voice to be recognized;

13. A computer-readable storage medium, having stored thereon a computer program which, when executed by one or more processors, causes the processors to perform the speech recognition method of any of claims 1 to 6.