CN109949814A

CN109949814A - Audio recognition method, system, computer system and computer readable storage medium

Info

Publication number: CN109949814A
Application number: CN201711388674.3A
Authority: CN
Inventors: 臧瑞瑞; 牛慧倩; 师超鹏
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Huijun Technology Co.,Ltd.
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-06-28

Abstract

Present disclose provides a kind of audio recognition methods, this method comprises: the text file that acquisition obtains after pre-processing to voice document；Text file is converted according to predetermined Pinyin coding rule, obtains corresponding first hand over word file；First hand over word file is matched with the second hand over word file, obtains matching result, wherein keyword each in keyword text set is converted to by the second hand over word file according to predetermined Pinyin coding rule；And voice document is identified according to matching result.The disclosure additionally provides a kind of speech recognition system, a kind of computer system and a kind of computer readable storage medium.

Description

Audio recognition method, system, computer system and computer readable storage medium

Technical field

This disclosure relates to technical field of voice recognition, more particularly, to a kind of audio recognition method, system, computer System and computer readable storage medium.

Background technique

Currently, speech recognition is substantially using after being first identified as text file for voice document, then to text file into The identification method of row keyword match determines corresponding language if can be matched to corresponding keyword in text file Contain the keyword in sound file.But the noise due to including in voice document and slight dialect can all be identified as other texts Word, for example, what contact staff said is typically all mandarin, and what user said is all much dialect, this in service calls In the case of, it is easy to lead to that corresponding keyword can not be found during subsequent keyword match.

However, at least there are the following problems in the related technology: mesh for inventor's discovery during realizing disclosure design Preceding above situation is relatively common because the voice training collection being based on when training speech recognition system be usually mandarin standard and The voice environment of suitable, the noiseless influence of word speed, and the voice document of this speech recognition system progress speech recognition is used, one As be that mandarin not enough standardizes even dialect, and word speed speed differs greatly, and has the voice environment of noise effect, thus can Cause the accuracy of speech recognition very low.

Summary of the invention

In view of this, present disclose provides a kind of respectively by pretreated voice document and keyword according to predetermined phonetic Coding rule is converted and is matched to improve the audio recognition method of the accuracy of speech recognition and system.

An aspect of this disclosure provides a kind of audio recognition method, which includes: to obtain to voice The text file that file obtains after being pre-processed；Above-mentioned text file is converted according to predetermined Pinyin coding rule, is obtained To corresponding first hand over word file；Above-mentioned first hand over word file is matched with the second hand over word file, is matched As a result, wherein above-mentioned second hand over word file is to advise keyword each in keyword text set according to above-mentioned predetermined Pinyin coding Then it is converted to；And upper voice file is identified according to above-mentioned matching result.

In accordance with an embodiment of the present disclosure, above-mentioned text file is converted according to predetermined Pinyin coding rule, is obtained pair The the first hand over word file answered includes: that above-mentioned text file is carried out word segmentation processing, obtains one that belongs to above-mentioned text file Or multiple words；Said one or multiple words are converted into corresponding phonetic；And to pass through said one or multiple words The phonetic being converted to is converted according to above-mentioned predetermined Pinyin coding rule, obtains above-mentioned first hand over word file.

In accordance with an embodiment of the present disclosure, above-mentioned first hand over word file is matched with the second hand over word file, is obtained Matching result includes: to obtain the initial consonant part in the phonetic of the word for each word in said one or multiple words Corresponding hand over word；Whether judge hand over word corresponding to the initial consonant part in the phonetic of the hand over word and above-mentioned each keyword It is identical；If they are the same, then further judge hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and above-mentioned each keyword Phonetic in simple or compound vowel of a Chinese syllable part corresponding to difference between hand over word whether meet preset condition；And if meeting above-mentioned default Condition then generates above-mentioned matching result according to above-mentioned each keyword.

In accordance with an embodiment of the present disclosure, judge hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and above-mentioned pass It includes: to judge the word that whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of key word, which meets preset condition, Phonetic in simple or compound vowel of a Chinese syllable part corresponding to hand over word and above-mentioned each keyword phonetic in simple or compound vowel of a Chinese syllable part corresponding to conversion Whether the smallest hamming distance between code is less than preset value.

In accordance with an embodiment of the present disclosure, above-mentioned audio recognition method further include: if judging the sound in the phonetic of the word Hand over word corresponding to female part is different from hand over word corresponding to the initial consonant part in the phonetic of above-mentioned keyword, then direct root Upper voice file is identified according to above-mentioned text file.

Another aspect of the disclosure provides a kind of speech recognition system, which includes: acquisition module, For obtain voice document is pre-processed after obtained text file；Conversion module, for by above-mentioned text file according to Predetermined Pinyin coding rule is converted, and corresponding first hand over word file is obtained；Matching module, for being converted above-mentioned first Code file is matched with the second hand over word file, obtains matching result, wherein above-mentioned second hand over word file is by keyword Each keyword is converted to according to above-mentioned predetermined Pinyin coding rule in text set；And first identification module, it is used for Upper voice file is identified according to above-mentioned matching result.

In accordance with an embodiment of the present disclosure, above-mentioned conversion module includes: processing unit, for being divided above-mentioned text file Word processing, obtains the one or more words for belonging to above-mentioned text file；First converting unit, for by said one or multiple Word is converted to corresponding phonetic；And second converting unit, for being converted to by said one or multiple words Phonetic is converted according to above-mentioned predetermined Pinyin coding rule, obtains above-mentioned first hand over word file.

In accordance with an embodiment of the present disclosure, above-mentioned matching module includes: acquiring unit, for being directed to said one or multiple words Each word in language obtains hand over word corresponding to the initial consonant part in the phonetic of the word；First judging unit, for sentencing Break the hand over word and hand over word corresponding to the initial consonant part in the phonetic of above-mentioned each keyword it is whether identical；Second judgement is single Member, for judging hand over word feelings identical with hand over word corresponding to the initial consonant part in the phonetic of above-mentioned each keyword Under condition, in the phonetic that further judges hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and above-mentioned each keyword Whether the difference between hand over word corresponding to simple or compound vowel of a Chinese syllable part meets preset condition；And matching unit, for meet it is above-mentioned In the case where preset condition, above-mentioned matching result is generated according to above-mentioned each keyword.

In accordance with an embodiment of the present disclosure, above-mentioned second judgment unit includes: judgment sub-unit, for judging the spelling of the word Hand over word corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of hand over word corresponding to simple or compound vowel of a Chinese syllable part in sound and above-mentioned each keyword it Between smallest hamming distance whether be less than preset value.

In accordance with an embodiment of the present disclosure, above-mentioned speech recognition system further include: the second identification module, for judging this Turn corresponding to initial consonant part in the phonetic of hand over word corresponding to initial consonant part in the phonetic of word and above-mentioned keyword In the case that escape is different, directly upper voice file is identified according to above-mentioned text file.

Another aspect of the present disclosure provides a kind of computer system, which includes: one or more processing Device；Memory, for storing one or more programs, wherein when said one or multiple programs are by said one or multiple places When managing device execution, so that said one or multiple processors realize web page interrogation method as described above.

Another aspect of the present disclosure provides a kind of computer readable storage medium, is stored thereon with executable instruction, should Instruction makes processor realize the above-mentioned audio recognition method of item as above when being executed by processor.

In accordance with an embodiment of the present disclosure, because using respectively by pretreated voice document and keyword according to predetermined Pinyin coding rule carries out conversion and matched technological means, and it is accurate can at least to be partially solved speech recognition in the related technology Low technical problem is spent, and therefore may be implemented to improve the technical effect of the accuracy of speech recognition.

Detailed description of the invention

By referring to the drawings to the description of the embodiment of the present disclosure, the above-mentioned and other purposes of the disclosure, feature and Advantage will be apparent from, in the accompanying drawings:

Fig. 1 diagrammatically illustrates the system tray that can apply audio recognition method and system according to the embodiment of the present disclosure Structure；

Fig. 2A diagrammatically illustrates the flow chart of the audio recognition method according to the embodiment of the present disclosure；

Fig. 2 B diagrammatically illustrates the scene figure that can apply audio recognition method according to the embodiment of the present disclosure；

Fig. 2 C diagrammatically illustrates the flow chart of the audio recognition method according to disclosure preferred embodiment；

Fig. 2 D diagrammatically illustrates the flow chart of audio recognition method in the related technology；

Fig. 2 E diagrammatically illustrates the flow chart of another audio recognition method in the related technology；

Fig. 3 A, which is diagrammatically illustrated, carries out text file according to predetermined Pinyin coding rule according to the embodiment of the present disclosure It is converted to the flow chart of corresponding first hand over word file；

Fig. 3 B is diagrammatically illustrated text file according to another embodiment of the disclosure according to predetermined Pinyin coding rule Carry out the flow chart for being converted to corresponding first hand over word file；

Fig. 3 C diagrammatically illustrate according to the embodiment of the present disclosure by the first hand over word file and the second hand over word file into Row matching obtains the flow chart of matching result；

Fig. 3 D is diagrammatically illustrated corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic for judging the word according to the embodiment of the present disclosure Hand over word and keyword phonetic in simple or compound vowel of a Chinese syllable part corresponding to difference between hand over word whether meet preset condition Flow chart；

Fig. 4 diagrammatically illustrates the block diagram of speech recognition system according to an embodiment of the present disclosure；

Fig. 5 A diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure；

Fig. 5 B diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure；

Fig. 5 C diagrammatically illustrates the block diagram of speech recognition system according to another embodiment of the present disclosure；And

Fig. 6 diagrammatically illustrates the block diagram of the computer system for being adapted for carrying out robot according to the embodiment of the present disclosure.

Specific embodiment

Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary , and it is not intended to limit the scope of the present disclosure.In addition, in the following description, descriptions of well-known structures and technologies are omitted, with Avoid unnecessarily obscuring the concept of the disclosure.

Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of Or add other one or more features, step, operation or component.

There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification Meaning, without that should be explained with idealization or excessively mechanical mode.

It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C " Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have B and C, and/or the system with A, B, C etc.).It should also be understood by those skilled in the art that substantially arbitrarily indicating two or more The adversative conjunction and/or phrase of optional project shall be construed as either in specification, claims or attached drawing A possibility that giving including one of these projects, either one or two projects of these projects.For example, phrase " A or B " should A possibility that being understood to include " A " or " B " or " A and B ".

Embodiment of the disclosure provides a kind of audio recognition method and system.The audio recognition method includes obtaining to language The text file that sound file obtains after being pre-processed；Text file is converted according to predetermined Pinyin coding rule, is obtained Corresponding first hand over word file；First hand over word file is matched with the second hand over word file, obtains matching result, In, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule 's；And voice document is identified according to matching result.

Fig. 1 diagrammatically illustrates the system tray that can apply audio recognition method and system according to the embodiment of the present disclosure Structure.It should be noted that being only the example that can apply the system architecture of the embodiment of the present disclosure shown in Fig. 1, to help this field Technical staff understands the technology contents of the disclosure, but be not meant to the embodiment of the present disclosure may not be usable for other equipment, system, Environment or scene.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network according to this embodiment 104 and server 105.Network 104 between terminal device 101,102,103 and server 105 to provide communication link Medium.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103 (merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.

Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..

Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user The website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to the use received The data such as family request analyze etc. processing, and by processing result (such as according to user's request or the webpage of generation, believe Breath or data etc.) feed back to terminal device.

It should be noted that audio recognition method provided by the embodiment of the present disclosure can generally be executed by server 105. Correspondingly, speech recognition system provided by the embodiment of the present disclosure generally can be set in server 105.The embodiment of the present disclosure Provided audio recognition method can also by be different from server 105 and can with terminal device 101,102,103 and/or clothes The server or server cluster that business device 105 communicates execute.Correspondingly, speech recognition system provided by the embodiment of the present disclosure It can be set in the service that is different from server 105 and can be communicated with terminal device 101,102,103 and/or server 105 In device or server cluster.

In embodiment of the disclosure, which can also be by any one in terminal device 101,102,103 A or multiple execution.Correspondingly, speech recognition system provided by the embodiment of the present disclosure generally can be set in terminal device 101, in any one or more in 102,103.Audio recognition method provided by the embodiment of the present disclosure can also be by difference In terminal device 101,102,103 and the terminal device that can be communicated with terminal device 101,102,103 and/or server 105 Or terminal device cluster executes.Correspondingly, speech recognition system provided by the embodiment of the present disclosure also can be set in being different from Terminal device 101,102,103 and the terminal device that can be communicated with terminal device 101,102,103 and/or server 105 or In terminal device cluster.

Fig. 2A diagrammatically illustrates the flow chart of the audio recognition method according to the embodiment of the present disclosure.

As shown in Figure 2 A, this method may include operation S201~S204, in which:

In operation S201, the text file obtained after pre-processing to voice document is obtained.

In embodiment of the disclosure, voice document can be the file comprising acoustic information.Pretreatment can be right Before voice document is identified, the sound in voice document is converted into text in advance.

In accordance with an embodiment of the present disclosure, after being pre-processed to the sound in voice document, available corresponding text This document, the text in this article this document are the texts by obtaining after identifying to the sound in upper voice file.

In operation S202, text file is converted according to predetermined Pinyin coding rule, obtains corresponding first conversion Code file.

In embodiment of the disclosure, Pinyin coding rule, which can be, encodes the phonetic of Chinese character, since phonetic can To include initial consonant and simple or compound vowel of a Chinese syllable, therefore, carrying out coding to phonetic may include being carried out in coding and phonetic to the initial consonant in phonetic Simple or compound vowel of a Chinese syllable is encoded, as described in Tables 1 and 2.

Table 1 consonant coding rule

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

b

1

p

2

m

3

f

4

d

5

t

6

n

7

1

7

g

8

k

9

h

4

j

B

q

x

zh

ch

F

sh

G

r

H

z

E

c

F

s

G

y

I

w

J

It is empty

Z

2 simple or compound vowel of a Chinese syllable coding rule of table

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

Initial consonant

Number

a

1

o

2

e

3

i

4

u

5

v

6

ai

7

ei

7

ui

8

ao

9

ou

A

iu

B

ie

C

ue

D

er

E

an

F

en

G

in

H

un

I

ven

J

ang

F

eng

G

ing

H

ong

K

ian

L

uan

M

iang

N

uang

O

iong

P

iao

Q

ia

R

uo

S

ua

T

ve

U

iou

V

uai

W

uei

X

It should be noted that in embodiment of the disclosure, Pinyin coding rule can be preset, not do herein It limits.

In accordance with an embodiment of the present disclosure, text file is converted according to predetermined Pinyin coding rule, can be will be literary The initial consonant and simple or compound vowel of a Chinese syllable that the phonetic of text is included in this document are converted according to the coding rule in table 1 and table 2 respectively, also The initial consonant for including in the phonetic for the similar text that pronounces can be converted to and be identically numbered, can also will pronounce similar text Phonetic in include simple or compound vowel of a Chinese syllable be converted to and be identically numbered.For example, initial consonant n and initial consonant l can be converted to 7, initial consonant s harmony Female sh is converted to G, and simple or compound vowel of a Chinese syllable ai and ei are converted to 7, and simple or compound vowel of a Chinese syllable en and eng are converted to G.

In embodiment of the disclosure, in the case where being converted text file according to predetermined Pinyin coding rule, The corresponding first hand over word file of available this article this document.

For example, the text for including in the text file obtained is " hello ", the corresponding phonetic of the text is " ni hao ".? In table 1, the corresponding coding " 7 " of initial consonant " n ", the corresponding coding " 4 " of initial consonant " h "；In table 2, the corresponding coding " 4 " of simple or compound vowel of a Chinese syllable " i ", simple or compound vowel of a Chinese syllable " ao " corresponding coding " 9 ", then after being converted to text file according to predetermined Pinyin coding rule, obtained the first hand over word File can be " 7449 ".

It should be noted that needing before being converted to text file according to predetermined Pinyin coding rule by text Text conversion in file is at phonetic, at this point it is possible to utilize a kind of Python (explanation type computer program design of object-oriented Language) in the library Pypinyi (a kind of open source library that Chinese character is converted to phonetic of Python) convert text to phonetic.

By embodiment of the disclosure, the corresponding text file of voice document is encoded, it can be for subsequent operation (such as Operate S203) convenience is provided, it can also reduce to a certain extent and voice document is converted into error caused by other texts.

In operation S203, the first hand over word file is matched with the second hand over word file, obtains matching result, In, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule 's.

It in embodiment of the disclosure, may include multiple keywords in keyword text set, it will be in multiple keyword Each keyword converted according to above-mentioned predetermined Pinyin coding rule, available second hand over word file.

It should be noted that in embodiment of the disclosure, since the effect of the tone of text can not show a candle to the phonetic of text Obviously, therefore embodiment of the disclosure does not consider the influence of tone for the time being.

For example, keyword " continuing to pay dues " can be converted to " D547 ", keyword " tuition fee " can be converted to " DD47 ", keyword " instrument " can be converted to " I4C4 ", and keyword " abandonment " can be converted to " I4C4 ".

In accordance with an embodiment of the present disclosure, the first hand over word file match with the second hand over word file can be Each of one hand over word file and the second hand over word file conversion after keyword (being properly termed as keyword hand over word) into Row matching, wherein matched mode may include various ways, for example, mode one, the first hand over word file and second are converted Code file is exactly matched；First hand over word file and the second hand over word file are carried out participle matching by mode two；Mode Three, the initial consonant of the first hand over word file and the initial consonant of the second hand over word file are exactly matched, by the first hand over word file Simple or compound vowel of a Chinese syllable and the second hand over word file simple or compound vowel of a Chinese syllable carry out fuzzy matching.Wherein, three of the above mode can select a selection, can also be with Free match selection, it is not limited here.

For example, by the text conversion in text file at the available text-string of phonetic, text character string can be with It is expressed as Str, keyword can be expressed as Keyword, and the first hand over word file can be expressed as StrEncoding, and second turn Escape file can be expressed as KeywordEncoding.Land use systems one by the first hand over word file StrEncoding and When second hand over word file KeywordEncoding is matched, it can be looked into the first hand over word file StrEncoding The hand over word of keyword Keyword included in the second hand over word file KeywordEncoding is looked for, if can be from first turn The hand over word that any one or more above-mentioned keywords are found in escape file StrEncoding, then show this article this document In include keyword Keyword；If any of above key cannot be found from the first hand over word file StrEncoding The hand over word of word then shows not including keyword Keyword in this article this document.

In operation S204, voice document is identified according to matching result.

It in embodiment of the disclosure, can be according to above-mentioned by the first hand over word file and the progress of the second hand over word file Matching result is obtained after matching to identify voice document.Specifically, it if in text file including keyword, can use The keyword replaces corresponding text in text file.

In accordance with an embodiment of the present disclosure, as shown in Figure 2 B, after user 210 inputs one section of voice by electronic equipment 220, As shown in Figure 2 C, it may include: firstly, voice document is processed into model text that electronic equipment 220, which executes operation S301~S312, Part；Secondly, the voice case identified will be needed to pre-process according to model file as destination file (being properly termed as Ben Wenben part)；Again It is secondary, the destination file and keyword text are converted according to predetermined Pinyin coding rule respectively；Then, by the knot after conversion Fruit file is matched with the keyword text after conversion and identifies the voice case according to matching result.Specific step is as follows: Firstly, generating voice document by executing operation S301, then executes operation S302 and a large amount of voice document is handled, connect Execute operation S303 extraction process after voice document phonetic feature, the S304 that redos carry out model training, then It executes operation S305 and generates corresponding model file；Secondly, identified voice case is needed by executing operation S306 and obtaining, Then operation S307 is executed to pre-process voice case for resulting text using model file；Again, S308 is operated by executing Resulting text is converted according to predetermined Pinyin coding rule, the resulting text after being converted (is properly termed as the first conversion Code file), it further, executes operation S309 and obtains keyword text, then execute operation S310 for keyword text according to pre- Determine Pinyin coding rule to be converted, the keyword text (being properly termed as the first hand over word file) after being converted；Then, lead to It crosses execution operation S311 to match the resulting text after conversion with the keyword text after conversion, the successful match the case where Under, operation S312 can be executed to return to keyword.

Unlike this, a kind of existing speech recognition schemes are by machine learning and deep learning come to continuous language What sound was identified.Specifically, as shown in Figure 2 D, executing operation S401~S409 includes: firstly, voice document is processed into Model file；Secondly, Key word voice file matches with treated by model file；Again；Return to matching result.Tool Steps are as follows for body: firstly, obtaining voice document by executing operation S401, then executing operation S402 and carries out the voice document Denoising then executes operation S403 and extracts phonetic feature, and the phonetic feature of extraction is carried out model by the S404 that redos Training, to obtain a model file；Secondly, obtaining Key word voice file by executing operation S405, operation is then executed S406 extracts the Key word voice feature of Key word voice file, further, executes operation S407 for model file and extraction Key word voice feature is matched；Again, it if successful match, executes operation S408 and shows in voice document containing key Word executes operation S409 and shows in voice document without containing keyword if it fails to match.Wherein, there is new voice mould When type is passed to, identification can be carried out to the new speech model according to the speech model that trains of operation S404 and by the new language Sound model conversion is text, then carries out keyword match to the text again.

Unlike this, another existing speech recognition schemes, are just to incorporate in the training stage of speech recognition to key The identification of word, as shown in Figure 2 E, executing operation S501~S512 includes: firstly, respectively to keyword voice document and voice text Part is handled；Secondly, will treated Key word voice file and treated that voice document carries out model training, to obtain One model file；Again, model file is matched with the Key word voice feature of extraction；Then, matching result is returned. Specific step is as follows: firstly, obtaining Key word voice file by executing operation S501, then executing operation S502 for keyword Voice document carries out denoising, the Key word voice feature that operation S503 extracts Key word voice file is then executed, into one Step executes operation S504 and obtains voice document, then executes operation S505 for the voice document and carry out denoising, then execute It operates S506 and extracts phonetic feature；Secondly, executing operation S507 for the Key word voice feature of extraction and the phonetic feature of extraction Model training is carried out, to obtain a model file；Again, it executes operation S508 and obtains Key word voice file, then execute Operate S509 extract Key word voice file Key word voice feature, further, execute operation S510 by model file with mention The Key word voice feature taken is matched；Then, it if successful match, executes operation S511 and shows to contain in voice document Keyword executes operation S512 and shows in voice document without containing keyword if it fails to match.

But since the voice training collection being based on when training speech recognition system is usually that mandarin standard and word speed are suitable Preferably, the voice environment of noiseless influence, and the voice document of this speech recognition system progress speech recognition is used, it is usually general Call not enough specification even dialect, and word speed speed differs greatly, and has the voice environment of noise effect, will lead to language in this way The accuracy of sound identification is very low.

And in embodiment of the disclosure, voice document is handled as text file in advance, and this article this document is carried out Initial consonant, simple or compound vowel of a Chinese syllable conversion, then keyword is subjected to initial consonant, simple or compound vowel of a Chinese syllable conversion, the text after conversion then will be passed through using multi-purpose mode Keyword after file and conversion is matched, and some texts misidentified by voice can be enabled normally to be matched Come.

By embodiment of the disclosure, by obtaining text file after being pre-processed voice document, herein by this article Part and keyword are converted according to predetermined Pinyin coding rule, and by after conversion text file and keyword match, It is all or part of to solve the low technical problem of accuracy of speech recognition in the related technology, improve the accuracy of speech recognition.

Below with reference to Fig. 3 A~Fig. 3 D, Fig. 2A~method shown in fig. 2 C is described further in conjunction with specific embodiments.

Fig. 3 A, which is diagrammatically illustrated, carries out text file according to predetermined Pinyin coding rule according to the embodiment of the present disclosure It is converted to the flow chart of corresponding first hand over word file.

When being exactly matched the first hand over word file and the second hand over word file due to Land use systems one, it may incite somebody to action One complete word is forced cutting and opens, and keyword is caused to be misidentified.For example, the text file after conversion is " A1B2C3D4 ", the keyword after conversion are " 1B2C ", wherein " B2C3 " is a word, and the text after the conversion is literary When part is matched with the keyword after the conversion, keyword is easy to cause to be misidentified.In order to further overcome keyword quilt The problem of misrecognition, the disclosure additionally provide a kind of optional embodiment, which can use mode two for the first conversion Code file and the second hand over word file carry out participle matching.It in this embodiment, (i.e. will be literary with reference to Fig. 2A operation S202 described This document is converted according to predetermined Pinyin coding rule, obtains corresponding first hand over word file) it may include operation S501 ~S503.As shown in Figure 3A, in which:

In operation S601, text file is subjected to word segmentation processing, obtains the one or more words for belonging to text file.

In operation S602, one or more words are converted into corresponding phonetic.

In operation S603, the phonetic being converted to by one or more words is carried out according to predetermined Pinyin coding rule Conversion, obtains the first hand over word file.

In embodiment of the disclosure, the mode of word segmentation processing can include but is not limited to stammerer participle, wherein the stammerer Participle is a kind of participle tool, can be by one section of text at one or more individual words.

In accordance with an embodiment of the present disclosure, as shown in Figure 3B, in the case where carrying out word segmentation processing to text file, behaviour is executed Making S701~S709 may include: firstly, according to predetermined Pinyin coding rule respectively to textual phrase set [..., guardrail ...] It is converted with keyword [Hunan]；Secondly, by the key after the textual phrase set [..., guardrail ...] and conversion after conversion Word [Hunan] is matched；Again, matching result is returned.Specific step is as follows: firstly, being belonged to by executing operation S701 In one or more words of this article this document, wherein the one or more word can be expressed as textual phrase set [..., Guardrail ...], it then executes operation S702 and text phrase set [..., guardrail ...] is carried out handling available textual phrase Phonetic set [..., hulan ...], wherein may include to this in text phrase phonetic set [..., hulan ...] Or multiple words converted after corresponding phonetic, then execute operation S703 according to predetermined Pinyin coding rule to the text Phrase phonetic set [..., hulan ...] is converted, available textual phrase code set [..., 457F ...], wherein Text phrase code set [..., 457F ...] it is properly termed as the first hand over word file, further, executes operation S704 and obtain Keyword [Hunan] then executes operation S705 and carries out handling available corresponding keyword phonetic to the keyword [Hunan] [hunan] then executes operation S706 and converts according to predetermined Pinyin coding rule to the keyword phonetic [hunan], with Obtain corresponding keyword coding [457F]；Secondly, execute operation S707 by textual phrase code set [..., 457F ...] and Keyword coding [457F] is matched, and matching result is obtained；Again, due to text phrase code set [..., 457F ...] in include keyword coding [457F], therefore operation can be executed and S708 and return to keyword [Hunan], such as Do not include keyword in fruit textual phrase code set to encode, then can execute operation S709 to show text phrase coded set There is no keyword in conjunction.

It should be noted that above-mentioned textual phrase code set can be expressed as KeywordsEncoding, by this article This phrase code set KeywordsEncoding is matched with the second hand over word file KeywordEncoding, can be The search key encoded K eywordEncoding in text phrase code set KeywordsEncoding.

By embodiment of the disclosure, after being segmented to text file, obtained word is converted to Then first hand over word file matches the first hand over word file and the second hand over word file, it is possible to reduce because of adjacent volume Combination between code and the case where cause keyword to be misidentified, and then can be further improved the accuracy of speech recognition.

Fig. 3 C diagrammatically illustrate according to the embodiment of the present disclosure by the first hand over word file and the second hand over word file into Row matching obtains the flow chart of matching result.

The word as similar in pronunciation may be that initial consonant is identical, and simple or compound vowel of a Chinese syllable is different, therefore Land use systems two are by the first hand over word File and the second hand over word file carry out participle matching, may miss word similar in pronunciation, keyword is caused to be missed.For Further overcome the problems, such as that keyword is missed, the disclosure additionally provides a kind of optional embodiment, which can benefit The initial consonant of first hand over word file and the initial consonant of the second hand over word file are exactly matched with mode 3, by the first hand over word The simple or compound vowel of a Chinese syllable of the simple or compound vowel of a Chinese syllable of file and the second hand over word file carries out fuzzy matching.In this embodiment, it is described with reference to Fig. 2A and Fig. 3 A Operation S203 (matching the first hand over word file with the second hand over word file, obtain matching result) may include behaviour Make S701~S704.As shown in Figure 3 C, in which:

The initial consonant portion in the phonetic of the word is obtained for each word in one or more words in operation S801 Hand over word corresponding to point.

In operation S802, whether hand over word corresponding to the initial consonant part in the phonetic of the hand over word and each keyword is judged It is identical.

Hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word is then further judged if they are the same in operation S803 Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of each keyword meets preset condition.

In operation S804, if meeting preset condition, matching result is generated according to each keyword.

In embodiment of the disclosure, hand over word can be text or the corresponding phonetic of word according to predetermined Pinyin coding The coding that rule obtains after being converted；The corresponding hand over word in initial consonant part can be by text or the corresponding phonetic of word according to Predetermined Pinyin coding rule corresponding coding in extracted initial consonant part after being converted.

In accordance with an embodiment of the present disclosure, as shown in Figure 3D, executing operation S901~S914 may include: firstly, mentioning respectively Keyword [continuing to pay dues] after the initial consonant part of textual phrase set [..., tuition fee ...] after taking conversion, simple or compound vowel of a Chinese syllable part and conversion Initial consonant part, simple or compound vowel of a Chinese syllable part；Secondly, by after the initial consonant part of the textual phrase set [..., tuition fee ...] after conversion and conversion The initial consonant part of keyword [continuing to pay dues] is matched；Again, by the simple or compound vowel of a Chinese syllable of the textual phrase set [..., tuition fee ...] after conversion Part is matched with the simple or compound vowel of a Chinese syllable part of the keyword [continuing to pay dues] after conversion；Then, matching result is returned.Specific step is as follows: Firstly, the execution operation available textual phrase set of S901 [..., tuition fee ...], operation S902 is then executed to text word Group set [..., tuition fee ...] handle available textual phrase phonetic set [..., xuefei ...], then executes behaviour Make S903 and text phrase phonetic set [..., xuefei ...] is converted according to predetermined Pinyin coding rule, it is available Textual phrase code set [..., DD47 ...], wherein text phrase code set [..., DD47 ...] it is properly termed as first Hand over word file, further, execute operation S904 extract the initial consonant part of text phrase code set [..., DD47 ...] with Obtain textual phrase consonant coding set [..., D4 ...], and execute operation S905 extract textual phrase code set [..., DD47 ...] simple or compound vowel of a Chinese syllable part to obtain textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...], for keyword, can hold Row operation S906 obtains keyword [continuing to pay dues], and then execution operation S907 handle to the keyword [continuing to pay dues] available right The keyword phonetic [xufei] answered, the S908 that redos is according to predetermined Pinyin coding rule to the keyword phonetic [xufei] It is converted, available corresponding keyword coding [D547], further, executes operation S909 and extract keyword coding The initial consonant part of [D547] executes operation S910 and extracts keyword coding [D547] to obtain keyword consonant coding [D4] Simple or compound vowel of a Chinese syllable part with obtain keyword simple or compound vowel of a Chinese syllable coding [57]；Secondly, executing operation S911 judges textual phrase consonant coding set It whether include keyword consonant coding [D4] in [..., D4 ...]；Again, due to textual phrase consonant coding set [..., D4 ...] in be include keyword consonant coding [D4], therefore can further execute operation S912, judge textual phrase rhythm It is pre- whether the difference between the hand over word and keyword simple or compound vowel of a Chinese syllable coding [57] of the phrase in female code set [..., D7 ...] meets If condition, then, if the hand over word and keyword simple or compound vowel of a Chinese syllable of the phrase in textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...] are compiled Difference between code [57] meets preset condition, then can execute operation S913 and return to keyword [continuing to pay dues], if textual phrase The word having in keyword consonant coding or textual phrase simple or compound vowel of a Chinese syllable code set [..., D7 ...] is not included in consonant coding set Difference between hand over word and keyword the simple or compound vowel of a Chinese syllable coding [57] of group is unsatisfactory for preset condition, then can execute operation S914, table Keyword is not included in bright textual phrase set.

By embodiment of the disclosure, in the corresponding hand over word in initial consonant part and the initial consonant portion of keyword for judging word In the case where dividing corresponding hand over word inconsistent, no longer execution subsequent operation improves the efficiency of speech recognition.

As a kind of optional embodiment, hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the above-mentioned judgement word with It may include: judgement that whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of keyword, which meets preset condition, Turn corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of hand over word corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of the word and each keyword Whether the smallest hamming distance between escape is less than preset value.

It should be noted that before being explained to embodiment of the disclosure, can first clearly several nouns meaning, Wherein:

Hamming distance: for indicating that the identical word of two length corresponds to the quantity of different positions, for example, " 1011101 " and Hamming distance between " 1001001 " is 2.

Smallest hamming distance: for indicating the minimum value in multiple Hamming distances.

Editing distance: also known as Levenshtein distance changes into another by one between two word strings for indicating Required minimum edit operation times.Wherein, the edit operation of license may include that a character is substituted for another character, It is inserted into a character, deletes a character.In general, editing distance is smaller, and the similarity of two word strings is bigger.

Recall rate: also known as recall ratio, relevant documentation number for indicating to retrieve are all related literary in document library The ratio of gear number, what is measured is the recall ratio of searching system；Different from recall rate, precision is intended to indicate that the correlation retrieved The ratio of number of files and the total number of documents retrieved, what is measured is the precision ratio of searching system.

In embodiment of the disclosure, the setting of preset value can include but is not limited to, for the pre- of double word keyword If value can be set to 1,2 can be set to for the preset value of three word keywords, the preset value of the keyword above for three words can be with It is set as 3.

In accordance with an embodiment of the present disclosure, as shown in Figure 3D, operation S912 is executed to judge textual phrase simple or compound vowel of a Chinese syllable code set It is default whether the smallest hamming distance between hand over word and keyword the simple or compound vowel of a Chinese syllable coding [57] of the phrase in [..., D7 ...] is less than Value executes S913, further if being less than to return to keyword [continuing to pay dues].

By embodiment of the disclosure, in the case that the smallest hamming distance between hand over word is less than preset value, according to Keyword generates matching result, the phonetically similar word of misrecognition and nearly sound word can be identified as keyword, improve recognition accuracy And recall rate.

As a kind of optional embodiment, if above-mentioned audio recognition method can also include: the phonetic for judging the word In initial consonant part corresponding to hand over word it is different from hand over word corresponding to the initial consonant part in the phonetic of keyword, then directly Voice document is identified according to text file.

In embodiment of the disclosure, as shown in Figure 3D, if judging textual phrase initial consonant by executing operation S911 Do not include keyword consonant coding [D4] in code set [..., D4 ...], or executes operation S912 and judge textual phrase rhythm Difference between the hand over word and keyword simple or compound vowel of a Chinese syllable coding [57] of phrase in female code set [..., D7 ...] is unsatisfactory for presetting Condition can then execute operation S914 and think do not have keyword in text file.It in turn, can be directly according to text file to language Sound file is identified.

Fig. 4 diagrammatically illustrates the block diagram of speech recognition system according to an embodiment of the present disclosure.

As shown in figure 4, the speech recognition system 400 may include obtaining module 410, conversion module 420, matching module 430 and first identification module 440, in which:

Obtain module 410 be used to obtain voice document is pre-processed after obtained text file.

Conversion module 420 obtains corresponding first for converting text file according to predetermined Pinyin coding rule Hand over word file.

Matching module 430 is used to match the first hand over word file with the second hand over word file, obtains matching result, Wherein, the second hand over word file is to be converted to keyword each in keyword text set according to predetermined Pinyin coding rule 's.

First identification module 440 is for identifying voice document according to matching result.

Fig. 5 A diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure.

It in this embodiment, may include processing unit 421, the first converting unit with reference to Fig. 4 conversion module 420 described 422 and second converting unit 423, as shown in Figure 5A, in which:

Processing unit 421 is used to text file carrying out word segmentation processing, obtains the one or more words for belonging to text file Language.

First converting unit 422 is used to one or more words being converted to corresponding phonetic.

Second converting unit 423 is used for the phonetic being converted to by one or more words according to predetermined Pinyin coding Rule is converted, and the first hand over word file is obtained.

Fig. 5 B diagrammatically illustrates the block diagram of conversion module according to an embodiment of the present disclosure.

In this embodiment, it may include acquiring unit 431, first sentence with reference to Fig. 4 and Fig. 5 A matching module 430 described Disconnected unit 432, second judgment unit 433 and matching unit 434, as shown in Figure 5 B, in which:

Acquiring unit 431 is used to obtain the sound in the phonetic of the word for each word in one or more words Hand over word corresponding to female part.

First judging unit 432 is used to judge to turn corresponding to the initial consonant part in the phonetic of the hand over word and each keyword Whether escape is identical.

Second judgment unit 433 is used for corresponding to the initial consonant part in the phonetic for judging the hand over word and each keyword The identical situation of hand over word under, further judge hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and each key Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of word meets preset condition.

Matching unit 434 is used in the case where meeting preset condition, generates matching result according to each keyword.

As a kind of optional embodiment, above-mentioned second judgment unit is also used to judge the simple or compound vowel of a Chinese syllable portion in the phonetic of the word Smallest hamming distance between hand over word corresponding to hand over word corresponding to point and the simple or compound vowel of a Chinese syllable part in the phonetic of each keyword Whether preset value is less than.

Fig. 5 C diagrammatically illustrates the block diagram of speech recognition system according to another embodiment of the present disclosure.

Exist in the embodiment, the speech recognition system 400 is in addition to may include the respective mode with reference to Fig. 4 and Fig. 5 B description It can also include the second identification module 450 except block and unit.

As shown in Figure 5 C, which can also include the second identification module 450, in which:

Second identification module 450 is for hand over word and pass corresponding to the initial consonant part in the phonetic for judging the word In the case that hand over word corresponding to initial consonant part in the phonetic of key word is different, directly according to text file to voice document into Row identification.

It is understood that acquisition module 410 above-mentioned, conversion module 420, matching module 430, the first identification module 440 and second identification module 450 etc. may be incorporated in a module and realize or any one module therein can be split It is divided into multiple modules.Alternatively, at least partly function of one or more modules in these modules can be with other modules extremely Small part function combines, and realizes in a module.According to an embodiment of the invention, obtaining module 410, conversion module 420, at least one of matching module 430, the first identification module 440 and second identification module 450 can be at least by partly It is embodied as hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), system on chip, substrate On system, the system in encapsulation, specific integrated circuit (ASIC), can with to circuit carry out it is integrated or encapsulate it is any its The hardware such as his rational method or firmware realize, or with software, three kinds of implementations of hardware and firmware it is appropriately combined come It realizes.Alternatively, obtaining module 410, conversion module 420, matching module 430, the first identification module 440 and the second identification module At least one of 450 can at least be implemented partly as computer program module, can when the program is run by computer To execute the function of corresponding module.

It should be noted that speech recognition system part is known with voice in embodiment of the disclosure in embodiment of the disclosure Other method part be it is corresponding, the description of speech recognition system part is with specific reference to audio recognition method part, herein no longer It repeats.

As on the other hand, the disclosure additionally provides a kind of computer system, the computer system may include: one or Multiple processors；Storage device, for storing one or more programs, wherein when one or more programs are one or more When processor executes, so that one or more processors realize audio recognition method as described above.

Fig. 6 diagrammatically illustrates the frame of the computer system for being adapted for carrying out audio recognition method according to the embodiment of the present disclosure Figure.Computer system shown in Fig. 6 is only an example, should not function to the embodiment of the present disclosure and use scope bring and appoint What is limited.

As shown in fig. 6, include processor 1001 according to the computer system 1000 of the embodiment of the present disclosure, it can be according to depositing It stores up the program in read-only memory (ROM) 1002 or is loaded into random access storage device (RAM) from storage section 1008 Program in 1003 and execute various movements appropriate and processing.Processor 1001 for example may include general purpose microprocessor (example Such as CPU), instruction set processor and/or related chip group and/or special microprocessor be (for example, specific integrated circuit (ASIC)), etc..Processor 1001 can also include the onboard storage device for caching purposes.Processor 1001 may include Fig. 2A~Fig. 2 C is referred to for executing, the different movements of the method flow according to the embodiment of the present disclosure of Fig. 3 A~Fig. 3 D description Single treatment unit either multiple processing units.

In RAM 1003, it is stored with computer system 1000 and operates required various programs and data.Processor 1001, ROM 1002 and RAM 1003 is connected with each other by bus 1004.Processor 1001 is by executing ROM 1002 and/or RAM Program in 1003 is executed above with reference to Fig. 2A~Fig. 2 C, the various operations of Fig. 3 A~Fig. 3 D description.It is noted that the journey Sequence also can store in one or more memories in addition to ROM 1002 and RAM 1003.Processor 1001 can also lead to The program for executing and being stored in one or more of memories is crossed to retouch to execute above with reference to Fig. 2A~Fig. 2 C, Fig. 3 A~Fig. 3 D The various operations stated.

In accordance with an embodiment of the present disclosure, computer system 1000 can also include input/output (I/O) interface 1005, defeated Enter/export (I/O) interface 1005 and is also connected to bus 1004.Computer system 1000 can also include being connected to I/O interface 1005 with one or more in lower component: the importation 1006 including keyboard, mouse etc.；Including such as cathode-ray tube (CRT), the output par, c 1007 of liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 1008 including hard disk etc.；With And the communications portion 1009 of the network interface card including LAN card, modem etc..Communications portion 1009 via such as because The network of spy's net executes communication process.Driver 1010 is also connected to I/O interface 1005 as needed.Detachable media 1011, Such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1010, in order to from thereon The computer program of reading is mounted into storage section 1008 as needed.

In accordance with an embodiment of the present disclosure, it may be implemented as computer software journey above with reference to the method for flow chart description Sequence.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer readable storage medium Computer program, which includes the program code for method shown in execution flow chart.In such implementation In example, which can be downloaded and installed from network by communications portion 1009, and/or from detachable media 1011 are mounted.The computer program by processor 1001 execute when, execute limited in the system of the embodiment of the present disclosure it is upper State function.In accordance with an embodiment of the present disclosure, system as described above, unit, module, unit etc. can pass through computer Program module is realized.

It should be noted that computer readable storage medium shown in the disclosure can be computer-readable signal media or Person's computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires Connect, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed it is read-only Memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the disclosure, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer readable storage medium other than readable storage medium storing program for executing, which can send, propagate or Person's transmission is for by the use of instruction execution system, device or device or program in connection.It is computer-readable to deposit The program code for including on storage media can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.In accordance with an embodiment of the present disclosure, computer readable storage medium may include One or more storages other than above-described ROM 1002 and/or RAM 1003 and/or ROM 1002 and RAM 1003 Device.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

As on the other hand, the disclosure additionally provides a kind of computer-readable medium, is stored thereon with executable instruction, should Instruction makes processor realize above- mentioned information querying method when being executed by processor.The computer-readable medium can be above-mentioned implementation Included in equipment described in example；It is also possible to individualism, and without in the supplying equipment.Above-mentioned computer-readable Jie Matter carries one or more program, when said one or multiple programs are executed by the equipment, so that the equipment It executes: obtaining the text file obtained after pre-processing to voice document；By text file according to predetermined Pinyin coding rule It is converted, obtains corresponding first hand over word file；First hand over word file is matched with the second hand over word file, is obtained To matching result, wherein the second hand over word file is by keyword each in keyword text set according to predetermined Pinyin coding rule It is converted to；And voice document is identified according to matching result.

Embodiment of the disclosure is described above.But the purpose that these embodiments are merely to illustrate that, and It is not intended to limit the scope of the present disclosure.Although respectively describing each embodiment above, but it is not intended that each reality Use cannot be advantageously combined by applying the measure in example.The scope of the present disclosure is defined by the appended claims and the equivalents thereof.It does not take off From the scope of the present disclosure, those skilled in the art can make a variety of alternatives and modifications, these alternatives and modifications should all fall in this Within scope of disclosure.

Claims

1. a kind of audio recognition method, comprising:

Obtain the text file obtained after pre-processing to voice document；

The text file is converted according to predetermined Pinyin coding rule, obtains corresponding first hand over word file；

The first hand over word file is matched with the second hand over word file, obtains matching result, wherein described second turn Keyword each in keyword text set is converted to by escape file according to the predetermined Pinyin coding rule；And

Institute's voice file is identified according to the matching result.

2. according to the method described in claim 1, wherein, the text file is turned according to predetermined Pinyin coding rule It changes, obtaining corresponding first hand over word file includes:

The text file is subjected to word segmentation processing, obtains the one or more words for belonging to the text file；

One or more of words are converted into corresponding phonetic；And

The phonetic being converted to by one or more of words is converted according to the predetermined Pinyin coding rule, is obtained To the first hand over word file.

3. according to the method described in claim 2, wherein, the first hand over word file and the second hand over word file are carried out Match, obtaining matching result includes:

For each word in one or more of words, obtains and turn corresponding to the initial consonant part in the phonetic of the word Escape；

Judge whether the hand over word and hand over word corresponding to the initial consonant part in the phonetic of each keyword are identical；

If they are the same, then further judge hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and each keyword Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in phonetic meets preset condition；And

If meeting the preset condition, the matching result is generated according to each keyword.

4. according to the method described in claim 3, wherein, judging hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the keyword meets preset condition

Judge the simple or compound vowel of a Chinese syllable portion in the phonetic of hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and each keyword Whether the smallest hamming distance between hand over word corresponding to point is less than preset value.

5. according to the method described in claim 3, wherein, the method also includes:

If judging the initial consonant in the phonetic of hand over word corresponding to the initial consonant part in the phonetic of the word and the keyword Hand over word corresponding to part is different, then is directly identified according to the text file to institute's voice file.

6. a kind of speech recognition system, comprising:

Obtain module, for obtain voice document is pre-processed after obtained text file；

Conversion module obtains corresponding first turn for converting the text file according to predetermined Pinyin coding rule Escape file；

Matching module obtains matching result for matching the first hand over word file with the second hand over word file, In, the second hand over word file is to be turned keyword each in keyword text set according to the predetermined Pinyin coding rule It gets in return；And

First identification module, for being identified according to the matching result to institute's voice file.

7. system according to claim 6, wherein the conversion module includes:

Processing unit obtains the one or more for belonging to the text file for the text file to be carried out word segmentation processing Word；

First converting unit, for one or more of words to be converted to corresponding phonetic；And

Second converting unit, for being compiled to the phonetic being converted to by one or more of words according to the predetermined phonetic Code rule is converted, and the first hand over word file is obtained.

8. system according to claim 7, wherein the matching module includes:

Acquiring unit, for obtaining the initial consonant in the phonetic of the word for each word in one or more of words Hand over word corresponding to part；

First judging unit, for judging the hand over word and conversion corresponding to the initial consonant part in the phonetic of each keyword Whether code is identical；

Second judgment unit, for corresponding to the initial consonant part in the phonetic for judging the hand over word and each keyword In the identical situation of hand over word, hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of the word and each pass are further judged Whether the difference between hand over word corresponding to the simple or compound vowel of a Chinese syllable part in the phonetic of key word meets preset condition；And

Matching unit, for generating the matching result according to each keyword in the case where meeting the preset condition.

9. system according to claim 8, wherein the second judgment unit is also used to judge in the phonetic of the word Between hand over word corresponding to simple or compound vowel of a Chinese syllable part in the phonetic of hand over word corresponding to simple or compound vowel of a Chinese syllable part and each keyword most Whether small Hamming distance is less than preset value.

10. system according to claim 8, wherein the system also includes:

Second identification module, for hand over word corresponding to the initial consonant part in the phonetic for judging the word and the key In the case that hand over word corresponding to initial consonant part in the phonetic of word is different, directly according to the text file to the voice File is identified.

11. a kind of computer system, comprising:

One or more processors；

Memory, for storing one or more programs,

Wherein, when one or more of programs are executed by one or more of processors, so that one or more of Processor realizes audio recognition method described in any one of claims 1 to 5.

12. a kind of computer readable storage medium, is stored thereon with executable instruction, which makes to handle when being executed by processor Device realizes audio recognition method described in any one of claims 1 to 5.