Specific embodiment
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary
, and it is not intended to limit the scope of the present disclosure.In addition, in the following description, descriptions of well-known structures and technologies are omitted, with
Avoid unnecessarily obscuring the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein
The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of
Or add other one or more features, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood
Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification
Meaning, without that should be explained with idealization or excessively mechanical mode.
It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to
Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C "
Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or
System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come
Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least
One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have
B and C, and/or the system with A, B, C etc.).It should also be understood by those skilled in the art that substantially arbitrarily indicating two or more
The adversative conjunction and/or phrase of optional project shall be construed as either in specification, claims or attached drawing
A possibility that giving including one of these projects, either one or two projects of these projects.For example, phrase " A or B " should
A possibility that being understood to include " A " or " B " or " A and B ".
Embodiment of the disclosure provides a kind of audio recognition method and system.The audio recognition method includes obtaining to language
The text file that sound file obtains after being pre-processed;Text file is converted according to predetermined Pinyin coding rule, is obtained
Corresponding first hand over word file;First hand over word file is matched with the second hand over word file, obtains matching result,
In, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule
's;And voice document is identified according to matching result.
Fig. 1 diagrammatically illustrates the system tray that can apply audio recognition method and system according to the embodiment of the present disclosure
Structure.It should be noted that being only the example that can apply the system architecture of the embodiment of the present disclosure shown in Fig. 1, to help this field
Technical staff understands the technology contents of the disclosure, but be not meant to the embodiment of the present disclosure may not be usable for other equipment, system,
Environment or scene.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network according to this embodiment
104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide communication link
Medium.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103
(merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user
The website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to the use received
The data such as family request analyze etc. processing, and by processing result (such as according to user's request or the webpage of generation, believe
Breath or data etc.) feed back to terminal device.
It should be noted that audio recognition method provided by the embodiment of the present disclosure can generally be executed by server 105.
Correspondingly, speech recognition system provided by the embodiment of the present disclosure generally can be set in server 105.The embodiment of the present disclosure
Provided audio recognition method can also by be different from server 105 and can with terminal device 101,102,103 and/or clothes
The server or server cluster that business device 105 communicates execute.Correspondingly, speech recognition system provided by the embodiment of the present disclosure
It can be set in the service that is different from server 105 and can be communicated with terminal device 101,102,103 and/or server 105
In device or server cluster.
In embodiment of the disclosure, which can also be by any one in terminal device 101,102,103
A or multiple execution.Correspondingly, speech recognition system provided by the embodiment of the present disclosure generally can be set in terminal device
101, in any one or more in 102,103.Audio recognition method provided by the embodiment of the present disclosure can also be by difference
In terminal device 101,102,103 and the terminal device that can be communicated with terminal device 101,102,103 and/or server 105
Or terminal device cluster executes.Correspondingly, speech recognition system provided by the embodiment of the present disclosure also can be set in being different from
Terminal device 101,102,103 and the terminal device that can be communicated with terminal device 101,102,103 and/or server 105 or
In terminal device cluster.
Fig. 2A diagrammatically illustrates the flow chart of the audio recognition method according to the embodiment of the present disclosure.
As shown in Figure 2 A, this method may include operation S201~S204, in which:
In operation S201, the text file obtained after pre-processing to voice document is obtained.
In embodiment of the disclosure, voice document can be the file comprising acoustic information.Pretreatment can be right
Before voice document is identified, the sound in voice document is converted into text in advance.
In accordance with an embodiment of the present disclosure, after being pre-processed to the sound in voice document, available corresponding text
This document, the text in this article this document are the texts by obtaining after identifying to the sound in upper voice file.
In operation S202, text file is converted according to predetermined Pinyin coding rule, obtains corresponding first conversion
Code file.
In embodiment of the disclosure, Pinyin coding rule, which can be, encodes the phonetic of Chinese character, since phonetic can
To include initial consonant and simple or compound vowel of a Chinese syllable, therefore, carrying out coding to phonetic may include being carried out in coding and phonetic to the initial consonant in phonetic
Simple or compound vowel of a Chinese syllable is encoded, as described in Tables 1 and 2.
Table 1 consonant coding rule
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
b |
1 |
p |
2 |
m |
3 |
f |
4 |
d |
5 |
t |
6 |
n |
7 |
1 |
7 |
g |
8 |
k |
9 |
h |
4 |
j |
B |
q |
|
x |
|
zh |
|
ch |
F |
sh |
G |
r |
H |
z |
E |
c |
F |
s |
G |
y |
I |
w |
J |
It is empty |
Z |
|
|
2 simple or compound vowel of a Chinese syllable coding rule of table
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
Initial consonant |
Number |
a |
1 |
o |
2 |
e |
3 |
i |
4 |
u |
5 |
v |
6 |
ai |
7 |
ei |
7 |
ui |
8 |
ao |
9 |
ou |
A |
iu |
B |
ie |
C |
ue |
D |
er |
E |
an |
F |
en |
G |
in |
H |
un |
I |
ven |
J |
ang |
F |
eng |
G |
ing |
H |
ong |
K |
ian |
L |
uan |
M |
iang |
N |
uang |
O |
iong |
P |
iao |
Q |
ia |
R |
uo |
S |
ua |
T |
ve |
U |
iou |
V |
uai |
W |
uei |
X |
|
|
|
|
|
|
It should be noted that in embodiment of the disclosure, Pinyin coding rule can be preset, not do herein
It limits.
In accordance with an embodiment of the present disclosure, text file is converted according to predetermined Pinyin coding rule, can be will be literary
The initial consonant and simple or compound vowel of a Chinese syllable that the phonetic of text is included in this document are converted according to the coding rule in table 1 and table 2 respectively, also
The initial consonant for including in the phonetic for the similar text that pronounces can be converted to and be identically numbered, can also will pronounce similar text
Phonetic in include simple or compound vowel of a Chinese syllable be converted to and be identically numbered.For example, initial consonant n and initial consonant l can be converted to 7, initial consonant s harmony
Female sh is converted to G, and simple or compound vowel of a Chinese syllable ai and ei are converted to 7, and simple or compound vowel of a Chinese syllable en and eng are converted to G.
In embodiment of the disclosure, in the case where being converted text file according to predetermined Pinyin coding rule,
The corresponding first hand over word file of available this article this document.
For example, the text for including in the text file obtained is " hello ", the corresponding phonetic of the text is " ni hao ".?
In table 1, the corresponding coding " 7 " of initial consonant " n ", the corresponding coding " 4 " of initial consonant " h ";In table 2, the corresponding coding " 4 " of simple or compound vowel of a Chinese syllable " i ", simple or compound vowel of a Chinese syllable
" ao " corresponding coding " 9 ", then after being converted to text file according to predetermined Pinyin coding rule, obtained the first hand over word
File can be " 7449 ".
It should be noted that needing before being converted to text file according to predetermined Pinyin coding rule by text
Text conversion in file is at phonetic, at this point it is possible to utilize a kind of Python (explanation type computer program design of object-oriented
Language) in the library Pypinyi (a kind of open source library that Chinese character is converted to phonetic of Python) convert text to phonetic.
By embodiment of the disclosure, the corresponding text file of voice document is encoded, it can be for subsequent operation (such as
Operate S203) convenience is provided, it can also reduce to a certain extent and voice document is converted into error caused by other texts.
In operation S203, the first hand over word file is matched with the second hand over word file, obtains matching result,
In, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule
's.
It in embodiment of the disclosure, may include multiple keywords in keyword text set, it will be in multiple keyword
Each keyword converted according to above-mentioned predetermined Pinyin coding rule, available second hand over word file.
It should be noted that in embodiment of the disclosure, since the effect of the tone of text can not show a candle to the phonetic of text
Obviously, therefore embodiment of the disclosure does not consider the influence of tone for the time being.
For example, keyword " continuing to pay dues " can be converted to " D547 ", keyword " tuition fee " can be converted to " DD47 ", keyword
" instrument " can be converted to " I4C4 ", and keyword " abandonment " can be converted to " I4C4 ".
In accordance with an embodiment of the present disclosure, the first hand over word file match with the second hand over word file can be
Each of one hand over word file and the second hand over word file conversion after keyword (being properly termed as keyword hand over word) into
Row matching, wherein matched mode may include various ways, for example, mode one, the first hand over word file and second are converted
Code file is exactly matched;First hand over word file and the second hand over word file are carried out participle matching by mode two;Mode
Three, the initial consonant of the first hand over word file and the initial consonant of the second hand over word file are exactly matched, by the first hand over word file
Simple or compound vowel of a Chinese syllable and the second hand over word file simple or compound vowel of a Chinese syllable carry out fuzzy matching.Wherein, three of the above mode can select a selection, can also be with
Free match selection, it is not limited here.
For example, by the text conversion in text file at the available text-string of phonetic, text character string can be with
It is expressed as Str, keyword can be expressed as Keyword, and the first hand over word file can be expressed as StrEncoding, and second turn
Escape file can be expressed as KeywordEncoding.Land use systems one by the first hand over word file StrEncoding and
When second hand over word file KeywordEncoding is matched, it can be looked into the first hand over word file StrEncoding
The hand over word of keyword Keyword included in the second hand over word file KeywordEncoding is looked for, if can be from first turn
The hand over word that any one or more above-mentioned keywords are found in escape file StrEncoding, then show this article this document
In include keyword Keyword;If any of above key cannot be found from the first hand over word file StrEncoding
The hand over word of word then shows not including keyword Keyword in this article this document.
In operation S204, voice document is identified according to matching result.
It in embodiment of the disclosure, can be according to above-mentioned by the first hand over word file and the progress of the second hand over word file
Matching result is obtained after matching to identify voice document.Specifically, it if in text file including keyword, can use
The keyword replaces corresponding text in text file.
In accordance with an embodiment of the present disclosure, as shown in Figure 2 B, after user 210 inputs one section of voice by electronic equipment 220,
As shown in Figure 2 C, it may include: firstly, voice document is processed into model text that electronic equipment 220, which executes operation S301~S312,
Part;Secondly, the voice case identified will be needed to pre-process according to model file as destination file (being properly termed as Ben Wenben part);Again
It is secondary, the destination file and keyword text are converted according to predetermined Pinyin coding rule respectively;Then, by the knot after conversion
Fruit file is matched with the keyword text after conversion and identifies the voice case according to matching result.Specific step is as follows:
Firstly, generating voice document by executing operation S301, then executes operation S302 and a large amount of voice document is handled, connect
Execute operation S303 extraction process after voice document phonetic feature, the S304 that redos carry out model training, then
It executes operation S305 and generates corresponding model file;Secondly, identified voice case is needed by executing operation S306 and obtaining,
Then operation S307 is executed to pre-process voice case for resulting text using model file;Again, S308 is operated by executing
Resulting text is converted according to predetermined Pinyin coding rule, the resulting text after being converted (is properly termed as the first conversion
Code file), it further, executes operation S309 and obtains keyword text, then execute operation S310 for keyword text according to pre-
Determine Pinyin coding rule to be converted, the keyword text (being properly termed as the first hand over word file) after being converted;Then, lead to
It crosses execution operation S311 to match the resulting text after conversion with the keyword text after conversion, the successful match the case where
Under, operation S312 can be executed to return to keyword.
Unlike this, a kind of existing speech recognition schemes are by machine learning and deep learning come to continuous language
What sound was identified.Specifically, as shown in Figure 2 D, executing operation S401~S409 includes: firstly, voice document is processed into
Model file;Secondly, Key word voice file matches with treated by model file;Again;Return to matching result.Tool
Steps are as follows for body: firstly, obtaining voice document by executing operation S401, then executing operation S402 and carries out the voice document
Denoising then executes operation S403 and extracts phonetic feature, and the phonetic feature of extraction is carried out model by the S404 that redos
Training, to obtain a model file;Secondly, obtaining Key word voice file by executing operation S405, operation is then executed
S406 extracts the Key word voice feature of Key word voice file, further, executes operation S407 for model file and extraction
Key word voice feature is matched;Again, it if successful match, executes operation S408 and shows in voice document containing key
Word executes operation S409 and shows in voice document without containing keyword if it fails to match.Wherein, there is new voice mould
When type is passed to, identification can be carried out to the new speech model according to the speech model that trains of operation S404 and by the new language
Sound model conversion is text, then carries out keyword match to the text again.
Unlike this, another existing speech recognition schemes, are just to incorporate in the training stage of speech recognition to key
The identification of word, as shown in Figure 2 E, executing operation S501~S512 includes: firstly, respectively to keyword voice document and voice text
Part is handled;Secondly, will treated Key word voice file and treated that voice document carries out model training, to obtain
One model file;Again, model file is matched with the Key word voice feature of extraction;Then, matching result is returned.
Specific step is as follows: firstly, obtaining Key word voice file by executing operation S501, then executing operation S502 for keyword
Voice document carries out denoising, the Key word voice feature that operation S503 extracts Key word voice file is then executed, into one
Step executes operation S504 and obtains voice document, then executes operation S505 for the voice document and carry out denoising, then execute
It operates S506 and extracts phonetic feature;Secondly, executing operation S507 for the Key word voice feature of extraction and the phonetic feature of extraction
Model training is carried out, to obtain a model file;Again, it executes operation S508 and obtains Key word voice file, then execute
Operate S509 extract Key word voice file Key word voice feature, further, execute operation S510 by model file with mention
The Key word voice feature taken is matched;Then, it if successful match, executes operation S511 and shows to contain in voice document
Keyword executes operation S512 and shows in voice document without containing keyword if it fails to match.
But since the voice training collection being based on when training speech recognition system is usually that mandarin standard and word speed are suitable
Preferably, the voice environment of noiseless influence, and the voice document of this speech recognition system progress speech recognition is used, it is usually general
Call not enough specification even dialect, and word speed speed differs greatly, and has the voice environment of noise effect, will lead to language in this way
The accuracy of sound identification is very low.
And in embodiment of the disclosure, voice document is handled as text file in advance, and this article this document is carried out
Initial consonant, simple or compound vowel of a Chinese syllable conversion, then keyword is subjected to initial consonant, simple or compound vowel of a Chinese syllable conversion, the text after conversion then will be passed through using multi-purpose mode
Keyword after file and conversion is matched, and some texts misidentified by voice can be enabled normally to be matched
Come.
By embodiment of the disclosure, by obtaining text file after being pre-processed voice document, herein by this article
Part and keyword are converted according to predetermined Pinyin coding rule, and by after conversion text file and keyword match,
It is all or part of to solve the low technical problem of accuracy of speech recognition in the related technology, improve the accuracy of speech recognition.
Below with reference to Fig. 3 A~Fig. 3 D, Fig. 2A~method shown in fig. 2 C is described further in conjunction with specific embodiments.
Fig. 3 A, which is diagrammatically illustrated, carries out text file according to predetermined Pinyin coding rule according to the embodiment of the present disclosure
It is converted to the flow chart of corresponding first hand over word file.
When being exactly matched the first hand over word file and the second hand over word file due to Land use systems one, it may incite somebody to action
One complete word is forced cutting and opens, and keyword is caused to be misidentified.For example, the text file after conversion is
" A1B2C3D4 ", the keyword after conversion are " 1B2C ", wherein " B2C3 " is a word, and the text after the conversion is literary
When part is matched with the keyword after the conversion, keyword is easy to cause to be misidentified.In order to further overcome keyword quilt
The problem of misrecognition, the disclosure additionally provide a kind of optional embodiment, which can use mode two for the first conversion
Code file and the second hand over word file carry out participle matching.It in this embodiment, (i.e. will be literary with reference to Fig. 2A operation S202 described
This document is converted according to predetermined Pinyin coding rule, obtains corresponding first hand over word file) it may include operation S501
~S503.As shown in Figure 3A, in which:
In operation S601, text file is subjected to word segmentation processing, obtains the one or more words for belonging to text file.
In operation S602, one or more words are converted into corresponding phonetic.
In operation S603, the phonetic being converted to by one or more words is carried out according to predetermined Pinyin coding rule
Conversion, obtains the first hand over word file.
In embodiment of the disclosure, the mode of word segmentation processing can include but is not limited to stammerer participle, wherein the stammerer
Participle is a kind of participle tool, can be by one section of text at one or more individual words.
In accordance with an embodiment of the present disclosure, as shown in Figure 3B, in the case where carrying out word segmentation processing to text file, behaviour is executed
Making S701~S709 may include: firstly, according to predetermined Pinyin coding rule respectively to textual phrase set [..., guardrail ...]
It is converted with keyword [Hunan];Secondly, by the key after the textual phrase set [..., guardrail ...] and conversion after conversion
Word [Hunan] is matched;Again, matching result is returned.Specific step is as follows: firstly, being belonged to by executing operation S701
In one or more words of this article this document, wherein the one or more word can be expressed as textual phrase set [...,
Guardrail ...], it then executes operation S702 and text phrase set [..., guardrail ...] is carried out handling available textual phrase
Phonetic set [..., hulan ...], wherein may include to this in text phrase phonetic set [..., hulan ...]
Or multiple words converted after corresponding phonetic, then execute operation S703 according to predetermined Pinyin coding rule to the text
Phrase phonetic set [..., hulan ...] is converted, available textual phrase code set [..., 457F ...], wherein
Text phrase code set [..., 457F ...] it is properly termed as the first hand over word file, further, executes operation S704 and obtain
Keyword [Hunan] then executes operation S705 and carries out handling available corresponding keyword phonetic to the keyword [Hunan]
[hunan] then executes operation S706 and converts according to predetermined Pinyin coding rule to the keyword phonetic [hunan], with
Obtain corresponding keyword coding [457F];Secondly, execute operation S707 by textual phrase code set [..., 457F ...] and
Keyword coding [457F] is matched, and matching result is obtained;Again, due to text phrase code set [...,
457F ...] in include keyword coding [457F], therefore operation can be executed and S708 and return to keyword [Hunan], such as
Do not include keyword in fruit textual phrase code set to encode, then can execute operation S709 to show text phrase coded set
There is no keyword in conjunction.
It should be noted that above-mentioned textual phrase code set can be expressed as KeywordsEncoding, by this article
This phrase code set KeywordsEncoding is matched with the second hand over word file KeywordEncoding, can be
The search key encoded K eywordEncoding in text phrase code set KeywordsEncoding.
By embodiment of the disclosure, after being segmented to text file, obtained word is converted to
Then first hand over word file matches the first hand over word file and the second hand over word file, it is possible to reduce because of adjacent volume
Combination between code and the case where cause keyword to be misidentified, and then can be further improved the accuracy of speech recognition.
Fig. 3 C diagrammatically illustrate according to the embodiment of the present disclosure by the first hand over word file and the second hand over word file into
Row matching obtains the flow chart of matching result.
The word as similar in pronunciation may be that initial consonant is identical, and simple or compound vowel of a Chinese syllable is different, therefore Land use systems two are by the first hand over word
File and the second hand over word file carry out participle matching, may miss word similar in pronunciation, keyword is caused to be missed.For
Further overcome the problems, such as that keyword is missed, the disclosure additionally provides a kind of optional embodiment, which can benefit
The initial consonant of first hand over word file and the initial consonant of the second hand over word file are exactly matched with mode 3, by the first hand over word
The simple or compound vowel of a Chinese syllable of the simple or compound vowel of a Chinese syllable of file and the second hand over word file carries out fuzzy matching.In this embodiment, it is described with reference to Fig. 2A and Fig. 3 A
Operation S203 (matching the first hand over word file with the second hand over word file, obtain matching result) may include behaviour
Make S701~S704.As shown in Figure 3 C, in which:
The initial consonant portion in the phonetic of the word is obtained for each word in one or more words in operation S801
Hand over word corresponding to point.
In operation S802, whether hand over word corresponding to the initial consonant part in the phonetic of the hand over word and each keyword is judged
It is identical.
Hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word is then further judged if they are the same in operation S803
Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of each keyword meets preset condition.
In operation S804, if meeting preset condition, matching result is generated according to each keyword.
In embodiment of the disclosure, hand over word can be text or the corresponding phonetic of word according to predetermined Pinyin coding
The coding that rule obtains after being converted;The corresponding hand over word in initial consonant part can be by text or the corresponding phonetic of word according to
Predetermined Pinyin coding rule corresponding coding in extracted initial consonant part after being converted.
In accordance with an embodiment of the present disclosure, as shown in Figure 3D, executing operation S901~S914 may include: firstly, mentioning respectively
Keyword [continuing to pay dues] after the initial consonant part of textual phrase set [..., tuition fee ...] after taking conversion, simple or compound vowel of a Chinese syllable part and conversion
Initial consonant part, simple or compound vowel of a Chinese syllable part;Secondly, by after the initial consonant part of the textual phrase set [..., tuition fee ...] after conversion and conversion
The initial consonant part of keyword [continuing to pay dues] is matched;Again, by the simple or compound vowel of a Chinese syllable of the textual phrase set [..., tuition fee ...] after conversion
Part is matched with the simple or compound vowel of a Chinese syllable part of the keyword [continuing to pay dues] after conversion;Then, matching result is returned.Specific step is as follows:
Firstly, the execution operation available textual phrase set of S901 [..., tuition fee ...], operation S902 is then executed to text word
Group set [..., tuition fee ...] handle available textual phrase phonetic set [..., xuefei ...], then executes behaviour
Make S903 and text phrase phonetic set [..., xuefei ...] is converted according to predetermined Pinyin coding rule, it is available
Textual phrase code set [..., DD47 ...], wherein text phrase code set [..., DD47 ...] it is properly termed as first
Hand over word file, further, execute operation S904 extract the initial consonant part of text phrase code set [..., DD47 ...] with
Obtain textual phrase consonant coding set [..., D4 ...], and execute operation S905 extract textual phrase code set [...,
DD47 ...] simple or compound vowel of a Chinese syllable part to obtain textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...], for keyword, can hold
Row operation S906 obtains keyword [continuing to pay dues], and then execution operation S907 handle to the keyword [continuing to pay dues] available right
The keyword phonetic [xufei] answered, the S908 that redos is according to predetermined Pinyin coding rule to the keyword phonetic [xufei]
It is converted, available corresponding keyword coding [D547], further, executes operation S909 and extract keyword coding
The initial consonant part of [D547] executes operation S910 and extracts keyword coding [D547] to obtain keyword consonant coding [D4]
Simple or compound vowel of a Chinese syllable part with obtain keyword simple or compound vowel of a Chinese syllable coding [57];Secondly, executing operation S911 judges textual phrase consonant coding set
It whether include keyword consonant coding [D4] in [..., D4 ...];Again, due to textual phrase consonant coding set [...,
D4 ...] in be include keyword consonant coding [D4], therefore can further execute operation S912, judge textual phrase rhythm
It is pre- whether the difference between the hand over word and keyword simple or compound vowel of a Chinese syllable coding [57] of the phrase in female code set [..., D7 ...] meets
If condition, then, if the hand over word and keyword simple or compound vowel of a Chinese syllable of the phrase in textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...] are compiled
Difference between code [57] meets preset condition, then can execute operation S913 and return to keyword [continuing to pay dues], if textual phrase
The word having in keyword consonant coding or textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...] is not included in consonant coding set
Difference between hand over word and keyword the simple or compound vowel of a Chinese syllable coding [57] of group is unsatisfactory for preset condition, then can execute operation S914, table
Keyword is not included in bright textual phrase set.
By embodiment of the disclosure, in the corresponding hand over word in initial consonant part and the initial consonant portion of keyword for judging word
In the case where dividing corresponding hand over word inconsistent, no longer execution subsequent operation improves the efficiency of speech recognition.
As a kind of optional embodiment, hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the above-mentioned judgement word with
It may include: judgement that whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of keyword, which meets preset condition,
Turn corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of hand over word corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of the word and each keyword
Whether the smallest hamming distance between escape is less than preset value.
It should be noted that before being explained to embodiment of the disclosure, can first clearly several nouns meaning,
Wherein:
Hamming distance: for indicating that the identical word of two length corresponds to the quantity of different positions, for example, " 1011101 " and
Hamming distance between " 1001001 " is 2.
Smallest hamming distance: for indicating the minimum value in multiple Hamming distances.
Editing distance: also known as Levenshtein distance changes into another by one between two word strings for indicating
Required minimum edit operation times.Wherein, the edit operation of license may include that a character is substituted for another character,
It is inserted into a character, deletes a character.In general, editing distance is smaller, and the similarity of two word strings is bigger.
Recall rate: also known as recall ratio, relevant documentation number for indicating to retrieve are all related literary in document library
The ratio of gear number, what is measured is the recall ratio of searching system;Different from recall rate, precision is intended to indicate that the correlation retrieved
The ratio of number of files and the total number of documents retrieved, what is measured is the precision ratio of searching system.
In embodiment of the disclosure, the setting of preset value can include but is not limited to, for the pre- of double word keyword
If value can be set to 1,2 can be set to for the preset value of three word keywords, the preset value of the keyword above for three words can be with
It is set as 3.
In accordance with an embodiment of the present disclosure, as shown in Figure 3D, operation S912 is executed to judge textual phrase simple or compound vowel of a Chinese syllable code set
It is default whether the smallest hamming distance between hand over word and keyword the simple or compound vowel of a Chinese syllable coding [57] of the phrase in [..., D7 ...] is less than
Value executes S913, further if being less than to return to keyword [continuing to pay dues].
By embodiment of the disclosure, in the case that the smallest hamming distance between hand over word is less than preset value, according to
Keyword generates matching result, the phonetically similar word of misrecognition and nearly sound word can be identified as keyword, improve recognition accuracy
And recall rate.
As a kind of optional embodiment, if above-mentioned audio recognition method can also include: the phonetic for judging the word
In initial consonant part corresponding to hand over word it is different from hand over word corresponding to the initial consonant part in the phonetic of keyword, then directly
Voice document is identified according to text file.
In embodiment of the disclosure, as shown in Figure 3D, if judging textual phrase initial consonant by executing operation S911
Do not include keyword consonant coding [D4] in code set [..., D4 ...], or executes operation S912 and judge textual phrase rhythm
Difference between the hand over word and keyword simple or compound vowel of a Chinese syllable coding [57] of phrase in female code set [..., D7 ...] is unsatisfactory for presetting
Condition can then execute operation S914 and think do not have keyword in text file.It in turn, can be directly according to text file to language
Sound file is identified.
By embodiment of the disclosure, in the corresponding hand over word in initial consonant part and the initial consonant portion of keyword for judging word
In the case where dividing corresponding hand over word inconsistent, no longer execution subsequent operation improves the efficiency of speech recognition.
Fig. 4 diagrammatically illustrates the block diagram of speech recognition system according to an embodiment of the present disclosure.
As shown in figure 4, the speech recognition system 400 may include obtaining module 410, conversion module 420, matching module
430 and first identification module 440, in which:
Obtain module 410 be used to obtain voice document is pre-processed after obtained text file.
Conversion module 420 obtains corresponding first for converting text file according to predetermined Pinyin coding rule
Hand over word file.
Matching module 430 is used to match the first hand over word file with the second hand over word file, obtains matching result,
Wherein, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule
's.
First identification module 440 is for identifying voice document according to matching result.
By embodiment of the disclosure, by obtaining text file after being pre-processed voice document, herein by this article
Part and keyword are converted according to predetermined Pinyin coding rule, and by after conversion text file and keyword match,
It is all or part of to solve the low technical problem of accuracy of speech recognition in the related technology, improve the accuracy of speech recognition.
Fig. 5 A diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure.
It in this embodiment, may include processing unit 421, the first converting unit with reference to Fig. 4 conversion module 420 described
422 and second converting unit 423, as shown in Figure 5A, in which:
Processing unit 421 is used to text file carrying out word segmentation processing, obtains the one or more words for belonging to text file
Language.
First converting unit 422 is used to one or more words being converted to corresponding phonetic.
Second converting unit 423 is used for the phonetic being converted to by one or more words according to predetermined Pinyin coding
Rule is converted, and the first hand over word file is obtained.
By embodiment of the disclosure, after being segmented to text file, obtained word is converted to
Then first hand over word file matches the first hand over word file and the second hand over word file, it is possible to reduce because of adjacent volume
Combination between code and the case where cause keyword to be misidentified, and then can be further improved the accuracy of speech recognition.
Fig. 5 B diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure.
In this embodiment, it may include acquiring unit 431, first sentence with reference to Fig. 4 and Fig. 5 A matching module 430 described
Disconnected unit 432, second judgment unit 433 and matching unit 434, as shown in Figure 5 B, in which:
Acquiring unit 431 is used to obtain the sound in the phonetic of the word for each word in one or more words
Hand over word corresponding to female part.
First judging unit 432 is used to judge to turn corresponding to the initial consonant part in the phonetic of the hand over word and each keyword
Whether escape is identical.
Second judgment unit 433 is used for corresponding to the initial consonant part in the phonetic for judging the hand over word and each keyword
The identical situation of hand over word under, further judge hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and each key
Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of word meets preset condition.
Matching unit 434 is used in the case where meeting preset condition, generates matching result according to each keyword.
By embodiment of the disclosure, in the corresponding hand over word in initial consonant part and the initial consonant portion of keyword for judging word
In the case where dividing corresponding hand over word inconsistent, no longer execution subsequent operation improves the efficiency of speech recognition.
As a kind of optional embodiment, above-mentioned second judgment unit is also used to judge the simple or compound vowel of a Chinese syllable portion in the phonetic of the word
Smallest hamming distance between hand over word corresponding to hand over word corresponding to point and the simple or compound vowel of a Chinese syllable part in the phonetic of each keyword
Whether preset value is less than.
By embodiment of the disclosure, in the case that the smallest hamming distance between hand over word is less than preset value, according to
Keyword generates matching result, the phonetically similar word of misrecognition and nearly sound word can be identified as keyword, improve recognition accuracy
And recall rate.
Fig. 5 C diagrammatically illustrates the block diagram of speech recognition system according to another embodiment of the present disclosure.
Exist in the embodiment, the speech recognition system 400 is in addition to may include the respective mode with reference to Fig. 4 and Fig. 5 B description
It can also include the second identification module 450 except block and unit.
As shown in Figure 5 C, which can also include the second identification module 450, in which:
Second identification module 450 is for hand over word and pass corresponding to the initial consonant part in the phonetic for judging the word
In the case that hand over word corresponding to initial consonant part in the phonetic of key word is different, directly according to text file to voice document into
Row identification.
By embodiment of the disclosure, in the corresponding hand over word in initial consonant part and the initial consonant portion of keyword for judging word
In the case where dividing corresponding hand over word inconsistent, no longer execution subsequent operation improves the efficiency of speech recognition.
It is understood that acquisition module 410 above-mentioned, conversion module 420, matching module 430, the first identification module
440 and second identification module 450 etc. may be incorporated in a module and realize or any one module therein can be split
It is divided into multiple modules.Alternatively, at least partly function of one or more modules in these modules can be with other modules extremely
Small part function combines, and realizes in a module.According to an embodiment of the invention, obtaining module 410, conversion module
420, at least one of matching module 430, the first identification module 440 and second identification module 450 can be at least by partly
It is embodied as hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), system on chip, substrate
On system, the system in encapsulation, specific integrated circuit (ASIC), can with to circuit carry out it is integrated or encapsulate it is any its
The hardware such as his rational method or firmware realize, or with software, three kinds of implementations of hardware and firmware it is appropriately combined come
It realizes.Alternatively, obtaining module 410, conversion module 420, matching module 430, the first identification module 440 and the second identification module
At least one of 450 can at least be implemented partly as computer program module, can when the program is run by computer
To execute the function of corresponding module.
It should be noted that speech recognition system part is known with voice in embodiment of the disclosure in embodiment of the disclosure
Other method part be it is corresponding, the description of speech recognition system part is with specific reference to audio recognition method part, herein no longer
It repeats.
As on the other hand, the disclosure additionally provides a kind of computer system, the computer system may include: one or
Multiple processors;Storage device, for storing one or more programs, wherein when one or more programs are one or more
When processor executes, so that one or more processors realize audio recognition method as described above.
Fig. 6 diagrammatically illustrates the frame of the computer system for being adapted for carrying out audio recognition method according to the embodiment of the present disclosure
Figure.Computer system shown in Fig. 6 is only an example, should not function to the embodiment of the present disclosure and use scope bring and appoint
What is limited.
As shown in fig. 6, include processor 1001 according to the computer system 1000 of the embodiment of the present disclosure, it can be according to depositing
It stores up the program in read-only memory (ROM) 1002 or is loaded into random access storage device (RAM) from storage section 1008
Program in 1003 and execute various movements appropriate and processing.Processor 1001 for example may include general purpose microprocessor (example
Such as CPU), instruction set processor and/or related chip group and/or special microprocessor be (for example, specific integrated circuit
(ASIC)), etc..Processor 1001 can also include the onboard storage device for caching purposes.Processor 1001 may include
Fig. 2A~Fig. 2 C is referred to for executing, the different movements of the method flow according to the embodiment of the present disclosure of Fig. 3 A~Fig. 3 D description
Single treatment unit either multiple processing units.
In RAM 1003, it is stored with computer system 1000 and operates required various programs and data.Processor 1001,
ROM 1002 and RAM 1003 is connected with each other by bus 1004.Processor 1001 is by executing ROM 1002 and/or RAM
Program in 1003 is executed above with reference to Fig. 2A~Fig. 2 C, the various operations of Fig. 3 A~Fig. 3 D description.It is noted that the journey
Sequence also can store in one or more memories in addition to ROM 1002 and RAM 1003.Processor 1001 can also lead to
The program for executing and being stored in one or more of memories is crossed to retouch to execute above with reference to Fig. 2A~Fig. 2 C, Fig. 3 A~Fig. 3 D
The various operations stated.
In accordance with an embodiment of the present disclosure, computer system 1000 can also include input/output (I/O) interface 1005, defeated
Enter/export (I/O) interface 1005 and is also connected to bus 1004.Computer system 1000 can also include being connected to I/O interface
1005 with one or more in lower component: the importation 1006 including keyboard, mouse etc.;Including such as cathode-ray tube
(CRT), the output par, c 1007 of liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 1008 including hard disk etc.;With
And the communications portion 1009 of the network interface card including LAN card, modem etc..Communications portion 1009 via such as because
The network of spy's net executes communication process.Driver 1010 is also connected to I/O interface 1005 as needed.Detachable media 1011,
Such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1010, in order to from thereon
The computer program of reading is mounted into storage section 1008 as needed.
In accordance with an embodiment of the present disclosure, it may be implemented as computer software journey above with reference to the method for flow chart description
Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer readable storage medium
Computer program, which includes the program code for method shown in execution flow chart.In such implementation
In example, which can be downloaded and installed from network by communications portion 1009, and/or from detachable media
1011 are mounted.The computer program by processor 1001 execute when, execute limited in the system of the embodiment of the present disclosure it is upper
State function.In accordance with an embodiment of the present disclosure, system as described above, unit, module, unit etc. can pass through computer
Program module is realized.
It should be noted that computer readable storage medium shown in the disclosure can be computer-readable signal media or
Person's computer readable storage medium either the two any combination.Computer readable storage medium for example can be ---
But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group
It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires
Connect, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed it is read-only
Memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory
Part or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium, which can be, any include or stores
The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And
In the disclosure, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not
It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer
Any computer readable storage medium other than readable storage medium storing program for executing, which can send, propagate or
Person's transmission is for by the use of instruction execution system, device or device or program in connection.It is computer-readable to deposit
The program code for including on storage media can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable,
RF etc. or above-mentioned any appropriate combination.In accordance with an embodiment of the present disclosure, computer readable storage medium may include
One or more storages other than above-described ROM 1002 and/or RAM 1003 and/or ROM 1002 and RAM 1003
Device.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
As on the other hand, the disclosure additionally provides a kind of computer-readable medium, is stored thereon with executable instruction, should
Instruction makes processor realize above- mentioned information querying method when being executed by processor.The computer-readable medium can be above-mentioned implementation
Included in equipment described in example;It is also possible to individualism, and without in the supplying equipment.Above-mentioned computer-readable Jie
Matter carries one or more program, when said one or multiple programs are executed by the equipment, so that the equipment
It executes: obtaining the text file obtained after pre-processing to voice document;By text file according to predetermined Pinyin coding rule
It is converted, obtains corresponding first hand over word file;First hand over word file is matched with the second hand over word file, is obtained
To matching result, wherein the second hand over word file is by keyword each in keyword text set according to predetermined Pinyin coding rule
It is converted to;And voice document is identified according to matching result.
Embodiment of the disclosure is described above.But the purpose that these embodiments are merely to illustrate that, and
It is not intended to limit the scope of the present disclosure.Although respectively describing each embodiment above, but it is not intended that each reality
Use cannot be advantageously combined by applying the measure in example.The scope of the present disclosure is defined by the appended claims and the equivalents thereof.It does not take off
From the scope of the present disclosure, those skilled in the art can make a variety of alternatives and modifications, these alternatives and modifications should all fall in this
Within scope of disclosure.