CN109410918A

CN109410918A - For obtaining the method and device of information

Info

Publication number: CN109410918A
Application number: CN201811198500.5A
Authority: CN
Inventors: 钱胜; 王知践; 李俊博
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2019-03-01
Anticipated expiration: 2038-10-15
Also published as: CN109410918B

Abstract

The embodiment of the present application discloses the method for obtaining information.One specific embodiment of this method includes: the acquisition phonetic feature tonic train from voice signal to be processed, and above-mentioned phonetic feature tonic train is for characterizing the corresponding text of voice signal to be processed；Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtain corresponding to the Pinyin information of above-mentioned phonetic feature tonic train, above-mentioned phonetic identification model is used to match the Pinyin information of corresponding phonetic feature tonic train by phonetic unit set, above-mentioned phonetic unit single text for identification；The text information of corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information.The embodiment reduces the data processing amount and memory space for obtaining Pinyin information, improves the accuracy for obtaining text information.

Description

For obtaining the method and device of information

Technical field

The invention relates to technical field of voice recognition, and in particular to for obtaining the method and device of information.

Background technique

Speech recognition technology can convert voice signals into text information, and then handle text information, with reality Existing corresponding data processing.User can realize remote behaviour to the smart machine with speech identifying function by voice signal Control.The occasion of information can not be inputted particularly with being not easy to be manually entered information or user, speech recognition technology greatly improves The efficiency of information interchange.

Summary of the invention

The embodiment of the present application proposes the method and device for obtaining information.

In a first aspect, the embodiment of the present application provides a kind of method for obtaining information, this method comprises: to be processed Phonetic feature tonic train is obtained in voice signal, above-mentioned phonetic feature tonic train is corresponding for characterizing voice signal to be processed Text；Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature tonic train Pinyin information, above-mentioned phonetic identification model are used to match the phonetic of corresponding phonetic feature tonic train by phonetic unit set Information, above-mentioned phonetic unit single text for identification；Corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information Text information.

In some embodiments, above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic identification model, it is corresponded to The Pinyin information of predicate sound feature tonic train, comprising: at interval of the first setting quantity from above-mentioned phonetic feature tonic train Frame extracts a frame initial speech, obtains initial speech frame sequence；By the second setting adjacent in above-mentioned initial speech frame sequence The initial speech of number of frames merges into secondary voice frame sequence.

In some embodiments, above-mentioned phonetic unit includes initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme, tone Mark, above-mentioned tone mark are used to indicate the pronunciation character for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and, on It states and above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtain the phonetic letter for corresponding to above-mentioned phonetic feature tonic train Breath, comprising: obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence；It is filtered out from above-mentioned voice amplitude waveform diagram pair The spike speech frame for answering amplitude extreme value obtains spike voice frame sequence；For the spike voice in above-mentioned spike voice frame sequence Frame matches target phonetic unit corresponding with the spike speech frame from above-mentioned phonetic unit set, and is spelled by the target Sound unit determines the corresponding target Pinyin information of the spike speech frame；According to the corresponding spike speech frame of target Pinyin information upper The sequence stated in spike voice frame sequence is ranked up target Pinyin information, obtains corresponding to above-mentioned phonetic feature tonic train Pinyin information.

In some embodiments, above-mentioned phonetic unit set is constructed by following steps: obtaining initial consonant set of phonemes and rhythm Female set of phonemes；For the initial consonant phoneme in above-mentioned initial consonant set of phonemes, filtered out and the sound from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes The matched simple or compound vowel of a Chinese syllable phoneme of vowel element, obtains the phonetic unit for corresponding to initial sounds element.

In some embodiments, above-mentioned to be filtered out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and the matched rhythm vowel of the initial consonant phoneme Element obtains the phonetic unit for corresponding to initial sounds element, comprising: filters out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme The simple or compound vowel of a Chinese syllable phoneme matched obtains simple or compound vowel of a Chinese syllable phoneme subclass；Determine simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass The tone of Pinyin information identifies to obtain tone logo collection；By simple or compound vowel of a Chinese syllable phoneme and sound in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass Adjust logo collection in tone identifier combination in pairs should initial consonant phoneme phonetic unit.

Second aspect, the embodiment of the present application provide a kind of for obtaining the device of information, which includes: phonetic feature Tonic train acquiring unit is configured to obtain phonetic feature tonic train, above-mentioned phonetic feature from voice signal to be processed Tonic train is for characterizing the corresponding text of voice signal to be processed；Pinyin information acquiring unit is configured to above-mentioned voice Feature tonic train imports phonetic identification model, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, above-mentioned phonetic Identification model is used to match the Pinyin information of corresponding phonetic feature tonic train, above-mentioned phonetic unit by phonetic unit set Single text for identification；Text information acquiring unit is configured to search correspondence according to above-mentioned Pinyin information above-mentioned to be processed The text information of voice signal.

In some embodiments, above-mentioned Pinyin information acquiring unit includes: initial speech retrieval subelement, is matched It is set to from above-mentioned phonetic feature tonic train and extracts a frame initial speech at interval of the first setting number of frames, obtain initial language Sound frame sequence；Secondary speech frame retrieval subelement is configured to set in above-mentioned initial speech frame sequence adjacent second The initial speech for determining number of frames merges into secondary voice frame sequence.

In some embodiments, above-mentioned phonetic unit includes initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme, tone Mark, above-mentioned tone mark are used to indicate the pronunciation character for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and, on Stating Pinyin information acquiring unit includes: that voice amplitude waveform diagram obtains subelement, is configured to obtain above-mentioned secondary speech frame sequence The voice amplitude waveform diagram of column；Spike speech frame retrieval subelement is configured to sieve from above-mentioned voice amplitude waveform diagram The spike speech frame for selecting corresponding amplitude extreme value obtains spike voice frame sequence；Target Pinyin information obtains subelement, is configured Spike speech frame in above-mentioned spike voice frame sequence in pairs matches and the spike voice from above-mentioned phonetic unit set The corresponding target phonetic unit of frame, and the corresponding target Pinyin information of the spike speech frame is determined by the target phonetic unit； Pinyin information obtains subelement, is configured to according to the corresponding spike speech frame of target Pinyin information in above-mentioned spike speech frame sequence Sequence in column is ranked up target Pinyin information, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train.

In some embodiments, above-mentioned phonetic unit set construction unit is configured to construct phonetic unit set, above-mentioned Phonetic unit set construction unit includes: that set of phonemes obtains subelement, is configured to obtain initial consonant set of phonemes and rhythm vowel Element set；Phonetic unit obtains subelement, is configured to for the initial consonant phoneme in above-mentioned initial consonant set of phonemes, from above-mentioned simple or compound vowel of a Chinese syllable Filtered out in set of phonemes with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme, obtain the phonetic unit for corresponding to initial sounds element.

In some embodiments, it includes: that simple or compound vowel of a Chinese syllable phoneme subclass obtains module that above-mentioned phonetic unit, which obtains subelement, is matched It is set to filter out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and obtains simple or compound vowel of a Chinese syllable phoneme subclass with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme；Sound It adjusts logo collection to obtain module, is configured to determine the phonetic of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass The tone of information identifies to obtain tone logo collection；Phonetic unit obtains module, is configured to the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme Tone identifier combination in subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection in pairs should initial consonant phoneme phonetic unit.

The third aspect, the embodiment of the present application provide a kind of server, comprising: one or more processors；Memory, On be stored with one or more programs, when said one or multiple programs are executed by said one or multiple processors so that Said one or multiple processors execute the method for obtaining information of above-mentioned first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, It is characterized in that, which realizes the method for obtaining information of above-mentioned first aspect when being executed by processor.

The method and device provided by the embodiments of the present application for being used to obtain information, the technical program is first from voice to be processed Phonetic feature tonic train is extracted in signal；Then above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtained pair Answer the Pinyin information of above-mentioned phonetic feature tonic train；Finally corresponding above-mentioned voice to be processed is searched according to above-mentioned Pinyin information to believe Number text information.The technical program reduces the data processing amount and memory space for obtaining Pinyin information, improves acquisition text The accuracy of word information.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for obtaining information of the application；

Fig. 3 is the schematic diagram according to an application scenarios of the method for obtaining information of the application；

Fig. 4 is the flow chart according to one embodiment of the phonetic unit set construction method of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for obtaining information of the application；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can the method for obtaining information using the embodiment of the present application or the device for obtaining information Exemplary system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various speech processing applications, such as audio collection application, sound can be installed on terminal device 101,102,103 Frequency filtration application, audio identification application, audio play application, audio transmission tool etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, the various electronic equipments of audio collection, including but not limited to smart phone, plate are can be with display screen and supported Computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic Image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, move State image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal is set Standby 101,102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or Software module (such as providing Distributed Services), also may be implemented into single software or software module, not do herein specific It limits.

Server 105 can be to provide the server of various services, for example, to terminal device 101,102,103 send to Handle the server that voice signal carries out speech processes.Server can carry out the data such as the voice signal to be processed received The processing such as analysis, deletes the noise signal in voice signal to be processed, and speech recognition result is fed back to terminal device.

It should be noted that the embodiment of the present application provided by for obtain information method can by terminal device 101, 102, it 103 is individually performed, or can also be executed jointly by terminal device 101,102,103 and server 105.Correspondingly, it uses It can be set in terminal device 101,102,103, also can be set in server 105 in the device for obtaining information.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software also may be implemented into Module is not specifically limited herein.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the method for obtaining information according to the application is shown 200.This be used for obtain information method the following steps are included:

Step 201, phonetic feature tonic train is obtained from voice signal to be processed.

In the present embodiment, for obtain the method for information executing subject (such as terminal device shown in FIG. 1 101, 102,103 or server 105) voice signal to be processed can be got by wired connection mode or radio connection. Wherein, voice signal to be processed is the audio signal obtained comprising acquisition voice, such as can be various audio analog signals.It needs It is noted that above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX Connection, Zigbee connection, UWB (ultra wideband) connection and other currently known or exploitation in the future wireless connections Mode.

Existing voice recognition methods needs to consume a large amount of calculating during converting voice signals into text information Resource and memory space, and be frequently necessary to smart machine networking and just can be carried out.Therefore, existing audio recognition method is carrying out The data-handling capacity that smart machine can be reduced in speech recognition process, be not readily applicable to that processing capacity is smaller or memory headroom compared with In small smart machine (such as can be embedded system), also, the accuracy of obtained text information is not high.

For this purpose, the executing subject of the application passes through the methods of speech recognition first extracts voice from voice signal to be processed Feature tonic train.Wherein, above-mentioned phonetic feature tonic train can be used for characterizing the corresponding text of voice signal to be processed.Language Sound feature tonic train can be comprising the audio frame sequence with temporal information and amplitude information, usually can be digital audio Signal.

Step 202, above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature sound The Pinyin information of frequency sequence.

It, can be by phonetic feature tonic train after executing subject obtains phonetic feature tonic train by audio recognition method Phonetic identification model is imported, the Pinyin information of phonetic feature tonic train is obtained.Wherein, above-mentioned phonetic identification model can be used for The Pinyin information of corresponding phonetic feature tonic train is matched by phonetic unit set.In existing voice recognition methods, in order to Which word corresponding recognition of speech signals is, it is necessary first to which phonetic feature tonic train is identified as a variety of possible initial consonant phonemes With simple or compound vowel of a Chinese syllable phoneme.Then, a variety of matchings and amendment then to adjacent initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme are carried out, finally can just be determined Corresponding Pinyin information.For example, phonetic feature tonic train includes the audio-frequency information that practical text is " 01 ".Existing voice When recognition methods handles " 01 " audio-frequency information in phonetic feature tonic train, available " sil ", " l ", A variety of basic pronunciation informations such as " ing ", " sil ", " y ", " i " and " sil ".Later, existing voice recognition methods is needed to above-mentioned A variety of basic pronunciation informations carry out various arrangement combinations, to identify real pronunciation as precisely as possible.For example, to a variety of Basic pronunciation information permutation and combination result may is that " sil-l+ing ", " l-ing+sil ", " ing-l+ing ", " l-ing+l ", " i-l+ing ", " l-ing+y ", " i-y+i ", " y-i+y ", " ing-y+i ", " y-i+l ", " sil-y+i ", " y-i+sil " etc.. Wherein, " sil " can indicate the pause that may be stored in actual speech；"-" can indicate that the basic pronunciation at left and right sides of "-" is believed Breath can be used as whole consideration；"+" can indicate that the basic pronunciation information at left and right sides of "+" is syntagmatic.As it can be seen that existing Audio recognition method needs to match every kind of above-mentioned permutation and combination result, finally can just obtain result.Existing voice is known Other method can get bulk information from phonetic feature tonic train, and carry out at corresponding data to these bulk informations Reason.Therefore, existing voice recognition methods needs to occupy the data processing amount of a large amount of memory space of executing subject, also, voice Recognition result may be inaccurate.

For this purpose, the executing subject of the application can obtain corresponding phonetic feature tonic train by phonetic identification model Pinyin information.Phonetic unit set in phonetic identification model can wrap containing multiple phonetic units, and each phonetic unit can be with Single text for identification.Such as: " trees ", " mansion ", " we go to travel ", " hearing cross-talk " etc. are not individual characters, " people ", " flower ", " sea " etc. include that the word of a word just belongs to individual character text (i.e. individual character) described in the application.Multiple phonetic units can be with Practical combinations comprising all initial consonant phonemes and simple or compound vowel of a Chinese syllable phoneme.Therefore, phonetic identification model can be matched with phonetic unit set The Pinyin information of corresponding phonetic feature tonic train.

It is above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic knowledge in some optional implementations of the present embodiment Other model obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, may comprise steps of:

The first step extracts a frame initial speech at interval of the first setting number of frames from above-mentioned phonetic feature tonic train Frame obtains initial speech frame sequence.

In order to further decrease the data processing amount for getting text information from voice signal to be processed, reduce empty to storage Between occupancy, the executing subject of the application can also mention from above-mentioned phonetic feature tonic train at interval of the first setting number of frames A frame initial speech is taken, initial speech frame sequence is obtained.In this way, having got enough useful informations, and reduce Size of data accelerates the efficiency that executing subject gets text information from voice signal to be processed.

The initial speech of the second setting number of frames adjacent in above-mentioned initial speech frame sequence is merged into two by second step Secondary voice frame sequence.

Further, executing subject can also merge the second setting number of frames adjacent in above-mentioned initial speech frame sequence For a frame initial speech, secondary voice frame sequence is obtained, data volume is further reduced.

In some optional implementations of the present embodiment, above-mentioned phonetic unit may include initial consonant phoneme and initial consonant The matched simple or compound vowel of a Chinese syllable phoneme of phoneme, tone mark.Wherein, above-mentioned tone mark can serve to indicate that by initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme The pronunciation character of the Pinyin information of composition.Usual: tone mark can Pinyin information sends out " one respectively with " 1 ", " 2 ", " 3 " and " 4 " Sound ", " two sound ", " three sound " and " four tones of standard Chinese pronunciation ".Tone mark can also be other expression-forms, no longer repeat one by one herein.And

It is above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic identification model, it obtains corresponding to above-mentioned phonetic feature audio sequence The Pinyin information of column can with the following steps are included:

The first step obtains the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence.

The voice amplitude waveform diagram of the available above-mentioned secondary voice frame sequence of executing subject.Wherein, voice amplitude waveform The audio amplitude figure that figure can be the curve comprising multiple audio amplitudes or be made of multiple rectangular strips.

Second step filters out the spike speech frame of corresponding amplitude extreme value from above-mentioned voice amplitude waveform diagram, obtains spike Voice frame sequence.

In general, the pronunciation of each word contains the audio-frequency information of certain time length, from voice signal to be processed to secondary After voice frame sequence, data volume reduces, but contains the audio-frequency information of certain time length.In practice, each word pair The maximal audio information answered is best able to represent the pronunciation of the word.In order to accurately determine the audio-frequency information of each word, executing subject can To obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence first, extreme value then is asked to voice amplitude waveform diagram again, really Multiple spike speech frames that voice amplitude waveform diagram includes are made, spike voice frame sequence is obtained.Wherein, spike speech frame can be with It is considered with the most important audio-frequency information of each word.

Third step matches the spike speech frame in above-mentioned spike voice frame sequence from above-mentioned phonetic unit set Target phonetic unit corresponding with the spike speech frame out, and determine that the spike speech frame is corresponding by the target phonetic unit Target Pinyin information.

After obtaining spike voice frame sequence, executing subject can match and the spike speech frame from phonetic unit set Corresponding target phonetic unit.Since phonetic unit itself contains initial consonant phoneme and the matched rhythm vowel of initial consonant phoneme simultaneously Element and tone mark, therefore spike voice frame sequence can be identified on the basis of single word, and identify accuracy very It is high.Since every phonetic unit has corresponding initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme and tone to identify, Executing subject can determine the corresponding target Pinyin information of the spike speech frame by target phonetic unit.

4th step is right according to sequence of the corresponding spike speech frame of target Pinyin information in above-mentioned spike voice frame sequence Target Pinyin information is ranked up, and obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train.

Executing subject can be according to the corresponding spike speech frame of target Pinyin information in above-mentioned spike voice frame sequence Sequence is ranked up target Pinyin information, then again as unit of each Pinyin information to adjacent Pinyin information progress Match, obtains the Pinyin information of above-mentioned phonetic feature tonic train.In this way, substantially increasing the accuracy rate for obtaining Pinyin information.

Step 203, the text information of corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information.

After obtaining Pinyin information, executing subject can inquire answering for text according to Pinyin information enquiring electronic dictionary etc. With to obtain the text information of corresponding Pinyin information.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for obtaining information of the present embodiment Figure.In the application scenarios of Fig. 3, user issues voice signal to terminal device 102.Terminal device 102 receives voice letter After number, phonetic feature tonic train is extracted first from voice signal；Then phonetic feature tonic train is imported into phonetic identification Model obtains Pinyin information；The text for finally inquiring corresponding Pinyin information obtains the text information of corresponding voice signal.Later, Terminal device 102 can also be by word-information display on the screen of terminal device 102.

The method provided by the above embodiment of the application extracts phonetic feature audio sequence first from voice signal to be processed Column；Then above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature tonic train Pinyin information；The text information of corresponding above-mentioned voice signal to be processed is finally searched according to above-mentioned Pinyin information.The technical program The data processing amount and memory space for obtaining Pinyin information are reduced, the efficiency and accuracy rate for obtaining text information are improved.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of phonetic unit set construction method.It should The process 400 of phonetic unit set construction method, comprising the following steps:

Step 401, initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes are obtained.

In the present embodiment, executing subject (such as the clothes shown in FIG. 1 of phonetic unit set construction method operation thereon Business device 105) initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes can be obtained by wired connection mode or radio connection.

The executing subject of the present embodiment can pass through the application queries such as electronic dictionary, electronic dictionary to whole initial consonant phonemes Set and simple or compound vowel of a Chinese syllable set of phonemes.Initial consonant set of phonemes contains possible whole initial consonant phoneme in Pinyin information.Simple or compound vowel of a Chinese syllable phoneme Set contains possible whole simple or compound vowel of a Chinese syllable phoneme in Pinyin information.

Step 402, for the initial consonant phoneme in above-mentioned initial consonant set of phonemes, filtered out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes with The matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme obtains the phonetic unit for corresponding to initial sounds element.

Pinyin information includes initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and initial consonant phoneme is before simple or compound vowel of a Chinese syllable phoneme.For this purpose, executing master Body can determine the simple or compound vowel of a Chinese syllable phoneme with the initial consonant phoneme, Jin Erke using each initial consonant phoneme in initial consonant set of phonemes as starting point To obtain corresponding to all possible phonetic unit of initial sounds element.

It is above-mentioned to be filtered out and the sound from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes in some optional implementations of the present embodiment The matched simple or compound vowel of a Chinese syllable phoneme of vowel element, obtains the phonetic unit for corresponding to initial sounds element, may comprise steps of:

The first step filters out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and obtains rhythm vowel with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme Sub-prime set.

For a certain initial consonant phoneme, usually Pinyin information in practice can be matched into multiple simple or compound vowel of a Chinese syllable phonemes. For example, initial consonant phoneme is " zh ", then corresponding simple or compound vowel of a Chinese syllable phoneme may is that " i ", " a ", " e ", " ong ", " ui ", " eng " etc..It holds Row main body can will have the simple or compound vowel of a Chinese syllable phonotactics of matching relationship at simple or compound vowel of a Chinese syllable phoneme subclass with the initial consonant phoneme.

Second step determines the tone mark of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass Knowledge obtains tone logo collection.

Pinyin information can be the pronunciation of " sound ", " two sound ", " three sound " or " four tones of standard Chinese pronunciation ".Executing subject can pass through inquiry The modes such as electronic dictionary, which are determined by the Pinyin information that initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme form, includes which kind is pronounced, and then can be with Tone is arranged for Pinyin information to identify.Tone identifies the pronunciation that can serve to indicate that Pinyin information.

Third step, by the tone mark in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection Know the phonetic unit for being combined into corresponding initial sounds element.

Obtain simple or compound vowel of a Chinese syllable phoneme subclass, initial consonant phoneme and simple or compound vowel of a Chinese syllable the phoneme composition of each initial consonant phoneme in initial consonant set of phonemes Pinyin information tone logo collection after, executing subject can in initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass simple or compound vowel of a Chinese syllable phoneme, Tone mark in tone logo collection carries out various combinations, obtains the phonetic unit of corresponding initial consonant phoneme.Thus, it is possible to comprising The pronunciation situation of all Pinyin information and Pinyin information.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for obtaining letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, the present embodiment may include: that phonetic feature tonic train obtains for obtaining the device 500 of information Take unit 501, Pinyin information acquiring unit 502 and text information acquiring unit 503.Wherein, phonetic feature tonic train obtains Unit 501 is configured to obtain phonetic feature tonic train from voice signal to be processed, and above-mentioned phonetic feature tonic train is used In the corresponding text of characterization voice signal to be processed；Pinyin information acquiring unit 502 is configured to above-mentioned phonetic feature audio Sequence imports phonetic identification model, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, above-mentioned phonetic identification model For matching the Pinyin information of corresponding phonetic feature tonic train by phonetic unit set, above-mentioned phonetic unit is for identification Single text；Text information acquiring unit 503 is configured to search corresponding above-mentioned voice letter to be processed according to above-mentioned Pinyin information Number text information.

In some optional implementations of the present embodiment, above-mentioned Pinyin information acquiring unit 502 may include: initial Speech frame retrieval subelement (not shown) and secondary speech frame retrieval subelement (not shown).Wherein, Initial speech retrieval subelement is configured to from above-mentioned phonetic feature tonic train at interval of the first setting number of frames A frame initial speech is extracted, initial speech frame sequence is obtained；Secondary speech frame retrieval subelement is configured to will be above-mentioned The initial speech of the second adjacent setting number of frames merges into secondary voice frame sequence in initial speech frame sequence.

In some optional implementations of the present embodiment, above-mentioned phonetic unit may include initial consonant phoneme and initial consonant The matched simple or compound vowel of a Chinese syllable phoneme of phoneme, tone mark, above-mentioned tone mark can serve to indicate that be made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme Pinyin information pronunciation character, and, above-mentioned Pinyin information acquiring unit 502 may include: voice amplitude waveform diagram obtain Subelement (not shown), spike speech frame retrieval subelement (not shown), target Pinyin information obtain son list First (not shown) and Pinyin information obtain subelement (not shown).Wherein, voice amplitude waveform diagram obtains subelement It is configured to obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence；Spike speech frame retrieval subelement is configured At the spike speech frame for filtering out corresponding amplitude extreme value from above-mentioned voice amplitude waveform diagram, spike voice frame sequence is obtained；Mesh Mark Pinyin information obtains subelement and is configured to for the spike speech frame in above-mentioned spike voice frame sequence, from above-mentioned phonetic list Target phonetic unit corresponding with the spike speech frame is matched in member set, and the spike is determined by the target phonetic unit The corresponding target Pinyin information of speech frame；Pinyin information obtains subelement and is configured to according to the corresponding spike of target Pinyin information Sequence of the speech frame in above-mentioned spike voice frame sequence is ranked up target Pinyin information, obtains corresponding to above-mentioned phonetic feature The Pinyin information of tonic train.

In some optional implementations of the present embodiment, above-mentioned apparatus further includes phonetic unit set construction unit (not shown) is configured to construct phonetic unit set.Above-mentioned phonetic unit set construction unit may include: phone set It closes and obtains subelement (not shown) and phonetic unit acquisition subelement (not shown).Wherein, set of phonemes obtains son Unit is configured to obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes；Phonetic unit obtains subelement and is configured to for above-mentioned Initial consonant phoneme in initial consonant set of phonemes filters out and the matched rhythm vowel of the initial consonant phoneme from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes Element obtains the phonetic unit for corresponding to initial sounds element.

In some optional implementations of the present embodiment, it may include: simple or compound vowel of a Chinese syllable that above-mentioned phonetic unit, which obtains subelement, Phoneme subclass obtains module (not shown), tone logo collection obtains module (not shown) and phonetic unit obtains Module (not shown).Wherein, simple or compound vowel of a Chinese syllable phoneme subclass obtains module and is configured to screen from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes Simple or compound vowel of a Chinese syllable phoneme subclass is obtained with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme out；Tone logo collection obtains module and is configured to really The tone of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition identifies to obtain tone identification sets in the fixed initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass It closes；Phonetic unit obtains module and is configured to simple or compound vowel of a Chinese syllable phoneme and tone identification sets in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass Tone identifier combination in conjunction in pairs should initial consonant phoneme phonetic unit.

The present embodiment additionally provides a kind of server, comprising: one or more processors；Memory is stored thereon with one A or multiple programs, when said one or multiple programs are executed by said one or multiple processors so that said one or Multiple processors execute the above-mentioned method for obtaining information.

The present embodiment additionally provides a kind of computer-readable medium, is stored thereon with computer program, and the program is processed Device realizes the above-mentioned method for obtaining information when executing.

Below with reference to Fig. 6, it illustrates the servers for being suitable for being used to realize the embodiment of the present application (for example, the service in Fig. 1 Device 105) computer system 600 structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to the application The function and use scope of embodiment bring any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.

It should be noted that the above-mentioned computer-readable medium of the application can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In application, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include phonetic feature tonic train acquiring unit, Pinyin information acquiring unit and text information acquiring unit.Wherein, these units Title does not constitute the restriction to the unit itself under certain conditions, for example, text information acquiring unit can also be described For " for obtaining the unit of text information by Pinyin information ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: phonetic feature tonic train is obtained from voice signal to be processed, above-mentioned phonetic feature tonic train is for characterizing wait locate Manage the corresponding text of voice signal；Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned voice The Pinyin information of feature tonic train, above-mentioned phonetic identification model are used to match corresponding phonetic feature by phonetic unit set The Pinyin information of tonic train, above-mentioned phonetic unit single text for identification；It is above-mentioned that correspondence is searched according to above-mentioned Pinyin information The text information of voice signal to be processed.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for obtaining information, comprising:

Phonetic feature tonic train is obtained from voice signal to be processed, the phonetic feature tonic train is to be processed for characterizing The corresponding text of voice signal；

The phonetic feature tonic train is imported into phonetic identification model, obtains the phonetic for corresponding to the phonetic feature tonic train Information, the phonetic that the phonetic identification model is used to match corresponding phonetic feature tonic train by phonetic unit set are believed Breath, the phonetic unit single text for identification；

The text information of the corresponding voice signal to be processed is searched according to the Pinyin information.

It is described the phonetic feature tonic train is imported into phonetic to identify mould 2. according to the method described in claim 1, wherein Type obtains the Pinyin information for corresponding to the phonetic feature tonic train, comprising:

A frame initial speech is extracted at interval of the first setting number of frames from the phonetic feature tonic train, obtains initial language Sound frame sequence；

The initial speech of the second setting number of frames adjacent in the initial speech frame sequence is merged into secondary speech frame sequence Column.

3. according to the method described in claim 2, wherein, the phonetic unit includes initial consonant phoneme, matched with initial consonant phoneme Simple or compound vowel of a Chinese syllable phoneme, tone mark, the tone mark are used to indicate the hair for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme Sound feature, and,

It is described that the phonetic feature tonic train is imported into phonetic identification model, it obtains corresponding to the phonetic feature tonic train Pinyin information, comprising:

Obtain the voice amplitude waveform diagram of the secondary voice frame sequence；

The spike speech frame that corresponding amplitude extreme value is filtered out from the voice amplitude waveform diagram obtains spike voice frame sequence；

For the spike speech frame in the spike voice frame sequence, matched from the phonetic unit set and the spike language The corresponding target phonetic unit of sound frame, and determine that the corresponding target phonetic of the spike speech frame is believed by the target phonetic unit Breath；

Target phonetic is believed according to sequence of the corresponding spike speech frame of target Pinyin information in the spike voice frame sequence Breath is ranked up, and obtains the Pinyin information for corresponding to the phonetic feature tonic train.

4. according to the method described in claim 1, wherein, the phonetic unit set is constructed by following steps:

Obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes；

For the initial consonant phoneme in the initial consonant set of phonemes, filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme The simple or compound vowel of a Chinese syllable phoneme matched obtains the phonetic unit for corresponding to initial sounds element.

5. described to be filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme according to the method described in claim 4, wherein Matched simple or compound vowel of a Chinese syllable phoneme obtains the phonetic unit for corresponding to initial sounds element, comprising:

It is filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and obtains simple or compound vowel of a Chinese syllable phoneme subclass with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme；

Determine that the tone of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass identifies to obtain tone Logo collection；

By the tone identifier combination in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection at correspondence The phonetic unit of the initial consonant phoneme.

6. a kind of for obtaining the device of information, comprising:

Phonetic feature tonic train acquiring unit is configured to obtain phonetic feature tonic train from voice signal to be processed, The phonetic feature tonic train is for characterizing the corresponding text of voice signal to be processed；

Pinyin information acquiring unit is configured to the phonetic feature tonic train importing phonetic identification model, be corresponded to The Pinyin information of the phonetic feature tonic train, the phonetic identification model are used to match correspondence by phonetic unit set The Pinyin information of phonetic feature tonic train, the phonetic unit single text for identification；

Text information acquiring unit is configured to search the text of the corresponding voice signal to be processed according to the Pinyin information Information.

7. device according to claim 6, wherein the Pinyin information acquiring unit includes:

Initial speech retrieval subelement is configured to from the phonetic feature tonic train at interval of the first setting number It measures frame and extracts a frame initial speech, obtain initial speech frame sequence；

Secondary speech frame retrieval subelement is configured to the second setting quantity adjacent in the initial speech frame sequence The initial speech of frame merges into secondary voice frame sequence.

8. device according to claim 7, wherein the phonetic unit includes initial consonant phoneme, matched with initial consonant phoneme Simple or compound vowel of a Chinese syllable phoneme, tone mark, the tone mark are used to indicate the hair for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme Sound feature, and,

The Pinyin information acquiring unit includes:

Voice amplitude waveform diagram obtains subelement, is configured to obtain the voice amplitude waveform diagram of the secondary voice frame sequence；

Spike speech frame retrieval subelement is configured to filter out corresponding amplitude extreme value from the voice amplitude waveform diagram Spike speech frame, obtain spike voice frame sequence；

Target Pinyin information obtains subelement, is configured to for the spike speech frame in the spike voice frame sequence, from institute It states and matches target phonetic unit corresponding with the spike speech frame in phonetic unit set, and is true by the target phonetic unit The fixed corresponding target Pinyin information of the spike speech frame；

Pinyin information obtains subelement, is configured to according to the corresponding spike speech frame of target Pinyin information in the spike voice Sequence in frame sequence is ranked up target Pinyin information, obtains the Pinyin information for corresponding to the phonetic feature tonic train.

9. device according to claim 6, wherein described device further includes phonetic unit set construction unit, is configured At building phonetic unit set, the phonetic unit set construction unit includes:

Set of phonemes obtains subelement, is configured to obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes；

Phonetic unit obtains subelement, is configured to for the initial consonant phoneme in the initial consonant set of phonemes, from the rhythm vowel Element set in filter out with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme, obtain the phonetic unit for corresponding to initial sounds element.

10. device according to claim 9, wherein the phonetic unit obtains subelement and includes:

Simple or compound vowel of a Chinese syllable phoneme subclass obtains module, is configured to filter out from the simple or compound vowel of a Chinese syllable set of phonemes and matches with the initial consonant phoneme Simple or compound vowel of a Chinese syllable phoneme obtain simple or compound vowel of a Chinese syllable phoneme subclass；

Tone logo collection obtains module, is configured to determine simple or compound vowel of a Chinese syllable phoneme in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass and forms The tone of Pinyin information identify to obtain tone logo collection；

Phonetic unit obtains module, is configured to identify simple or compound vowel of a Chinese syllable phoneme in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass and tone Tone identifier combination in set in pairs should initial consonant phoneme phonetic unit.

11. a kind of server, comprising:

One or more processors；

Memory is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors Perform claim requires any method in 1 to 5.

12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Method of the Shi Shixian as described in any in claim 1 to 5.