CN109410918A - For obtaining the method and device of information - Google Patents
For obtaining the method and device of information Download PDFInfo
- Publication number
- CN109410918A CN109410918A CN201811198500.5A CN201811198500A CN109410918A CN 109410918 A CN109410918 A CN 109410918A CN 201811198500 A CN201811198500 A CN 201811198500A CN 109410918 A CN109410918 A CN 109410918A
- Authority
- CN
- China
- Prior art keywords
- phonetic
- phoneme
- chinese syllable
- compound vowel
- initial consonant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000001256 tonic effect Effects 0.000 claims abstract description 86
- 150000001875 compounds Chemical class 0.000 claims description 96
- 238000010586 diagram Methods 0.000 claims description 28
- 238000010276 construction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 230000033764 rhythmic process Effects 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 230000006854 communication Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the present application discloses the method for obtaining information.One specific embodiment of this method includes: the acquisition phonetic feature tonic train from voice signal to be processed, and above-mentioned phonetic feature tonic train is for characterizing the corresponding text of voice signal to be processed;Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtain corresponding to the Pinyin information of above-mentioned phonetic feature tonic train, above-mentioned phonetic identification model is used to match the Pinyin information of corresponding phonetic feature tonic train by phonetic unit set, above-mentioned phonetic unit single text for identification;The text information of corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information.The embodiment reduces the data processing amount and memory space for obtaining Pinyin information, improves the accuracy for obtaining text information.
Description
Technical field
The invention relates to technical field of voice recognition, and in particular to for obtaining the method and device of information.
Background technique
Speech recognition technology can convert voice signals into text information, and then handle text information, with reality
Existing corresponding data processing.User can realize remote behaviour to the smart machine with speech identifying function by voice signal
Control.The occasion of information can not be inputted particularly with being not easy to be manually entered information or user, speech recognition technology greatly improves
The efficiency of information interchange.
Summary of the invention
The embodiment of the present application proposes the method and device for obtaining information.
In a first aspect, the embodiment of the present application provides a kind of method for obtaining information, this method comprises: to be processed
Phonetic feature tonic train is obtained in voice signal, above-mentioned phonetic feature tonic train is corresponding for characterizing voice signal to be processed
Text;Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature tonic train
Pinyin information, above-mentioned phonetic identification model are used to match the phonetic of corresponding phonetic feature tonic train by phonetic unit set
Information, above-mentioned phonetic unit single text for identification;Corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information
Text information.
In some embodiments, above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic identification model, it is corresponded to
The Pinyin information of predicate sound feature tonic train, comprising: at interval of the first setting quantity from above-mentioned phonetic feature tonic train
Frame extracts a frame initial speech, obtains initial speech frame sequence;By the second setting adjacent in above-mentioned initial speech frame sequence
The initial speech of number of frames merges into secondary voice frame sequence.
In some embodiments, above-mentioned phonetic unit includes initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme, tone
Mark, above-mentioned tone mark are used to indicate the pronunciation character for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and, on
It states and above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtain the phonetic letter for corresponding to above-mentioned phonetic feature tonic train
Breath, comprising: obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence;It is filtered out from above-mentioned voice amplitude waveform diagram pair
The spike speech frame for answering amplitude extreme value obtains spike voice frame sequence;For the spike voice in above-mentioned spike voice frame sequence
Frame matches target phonetic unit corresponding with the spike speech frame from above-mentioned phonetic unit set, and is spelled by the target
Sound unit determines the corresponding target Pinyin information of the spike speech frame;According to the corresponding spike speech frame of target Pinyin information upper
The sequence stated in spike voice frame sequence is ranked up target Pinyin information, obtains corresponding to above-mentioned phonetic feature tonic train
Pinyin information.
In some embodiments, above-mentioned phonetic unit set is constructed by following steps: obtaining initial consonant set of phonemes and rhythm
Female set of phonemes;For the initial consonant phoneme in above-mentioned initial consonant set of phonemes, filtered out and the sound from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes
The matched simple or compound vowel of a Chinese syllable phoneme of vowel element, obtains the phonetic unit for corresponding to initial sounds element.
In some embodiments, above-mentioned to be filtered out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and the matched rhythm vowel of the initial consonant phoneme
Element obtains the phonetic unit for corresponding to initial sounds element, comprising: filters out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme
The simple or compound vowel of a Chinese syllable phoneme matched obtains simple or compound vowel of a Chinese syllable phoneme subclass;Determine simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass
The tone of Pinyin information identifies to obtain tone logo collection;By simple or compound vowel of a Chinese syllable phoneme and sound in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass
Adjust logo collection in tone identifier combination in pairs should initial consonant phoneme phonetic unit.
Second aspect, the embodiment of the present application provide a kind of for obtaining the device of information, which includes: phonetic feature
Tonic train acquiring unit is configured to obtain phonetic feature tonic train, above-mentioned phonetic feature from voice signal to be processed
Tonic train is for characterizing the corresponding text of voice signal to be processed;Pinyin information acquiring unit is configured to above-mentioned voice
Feature tonic train imports phonetic identification model, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, above-mentioned phonetic
Identification model is used to match the Pinyin information of corresponding phonetic feature tonic train, above-mentioned phonetic unit by phonetic unit set
Single text for identification;Text information acquiring unit is configured to search correspondence according to above-mentioned Pinyin information above-mentioned to be processed
The text information of voice signal.
In some embodiments, above-mentioned Pinyin information acquiring unit includes: initial speech retrieval subelement, is matched
It is set to from above-mentioned phonetic feature tonic train and extracts a frame initial speech at interval of the first setting number of frames, obtain initial language
Sound frame sequence;Secondary speech frame retrieval subelement is configured to set in above-mentioned initial speech frame sequence adjacent second
The initial speech for determining number of frames merges into secondary voice frame sequence.
In some embodiments, above-mentioned phonetic unit includes initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme, tone
Mark, above-mentioned tone mark are used to indicate the pronunciation character for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and, on
Stating Pinyin information acquiring unit includes: that voice amplitude waveform diagram obtains subelement, is configured to obtain above-mentioned secondary speech frame sequence
The voice amplitude waveform diagram of column;Spike speech frame retrieval subelement is configured to sieve from above-mentioned voice amplitude waveform diagram
The spike speech frame for selecting corresponding amplitude extreme value obtains spike voice frame sequence;Target Pinyin information obtains subelement, is configured
Spike speech frame in above-mentioned spike voice frame sequence in pairs matches and the spike voice from above-mentioned phonetic unit set
The corresponding target phonetic unit of frame, and the corresponding target Pinyin information of the spike speech frame is determined by the target phonetic unit;
Pinyin information obtains subelement, is configured to according to the corresponding spike speech frame of target Pinyin information in above-mentioned spike speech frame sequence
Sequence in column is ranked up target Pinyin information, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train.
In some embodiments, above-mentioned phonetic unit set construction unit is configured to construct phonetic unit set, above-mentioned
Phonetic unit set construction unit includes: that set of phonemes obtains subelement, is configured to obtain initial consonant set of phonemes and rhythm vowel
Element set;Phonetic unit obtains subelement, is configured to for the initial consonant phoneme in above-mentioned initial consonant set of phonemes, from above-mentioned simple or compound vowel of a Chinese syllable
Filtered out in set of phonemes with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme, obtain the phonetic unit for corresponding to initial sounds element.
In some embodiments, it includes: that simple or compound vowel of a Chinese syllable phoneme subclass obtains module that above-mentioned phonetic unit, which obtains subelement, is matched
It is set to filter out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and obtains simple or compound vowel of a Chinese syllable phoneme subclass with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme;Sound
It adjusts logo collection to obtain module, is configured to determine the phonetic of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass
The tone of information identifies to obtain tone logo collection;Phonetic unit obtains module, is configured to the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme
Tone identifier combination in subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection in pairs should initial consonant phoneme phonetic unit.
The third aspect, the embodiment of the present application provide a kind of server, comprising: one or more processors;Memory,
On be stored with one or more programs, when said one or multiple programs are executed by said one or multiple processors so that
Said one or multiple processors execute the method for obtaining information of above-mentioned first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program,
It is characterized in that, which realizes the method for obtaining information of above-mentioned first aspect when being executed by processor.
The method and device provided by the embodiments of the present application for being used to obtain information, the technical program is first from voice to be processed
Phonetic feature tonic train is extracted in signal;Then above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtained pair
Answer the Pinyin information of above-mentioned phonetic feature tonic train;Finally corresponding above-mentioned voice to be processed is searched according to above-mentioned Pinyin information to believe
Number text information.The technical program reduces the data processing amount and memory space for obtaining Pinyin information, improves acquisition text
The accuracy of word information.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for obtaining information of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for obtaining information of the application;
Fig. 4 is the flow chart according to one embodiment of the phonetic unit set construction method of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for obtaining information of the application;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can the method for obtaining information using the embodiment of the present application or the device for obtaining information
Exemplary system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various speech processing applications, such as audio collection application, sound can be installed on terminal device 101,102,103
Frequency filtration application, audio identification application, audio play application, audio transmission tool etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, the various electronic equipments of audio collection, including but not limited to smart phone, plate are can be with display screen and supported
Computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic
Image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, move
State image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal is set
Standby 101,102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or
Software module (such as providing Distributed Services), also may be implemented into single software or software module, not do herein specific
It limits.
Server 105 can be to provide the server of various services, for example, to terminal device 101,102,103 send to
Handle the server that voice signal carries out speech processes.Server can carry out the data such as the voice signal to be processed received
The processing such as analysis, deletes the noise signal in voice signal to be processed, and speech recognition result is fed back to terminal device.
It should be noted that the embodiment of the present application provided by for obtain information method can by terminal device 101,
102, it 103 is individually performed, or can also be executed jointly by terminal device 101,102,103 and server 105.Correspondingly, it uses
It can be set in terminal device 101,102,103, also can be set in server 105 in the device for obtaining information.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software also may be implemented into
Module is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for obtaining information according to the application is shown
200.This be used for obtain information method the following steps are included:
Step 201, phonetic feature tonic train is obtained from voice signal to be processed.
In the present embodiment, for obtain the method for information executing subject (such as terminal device shown in FIG. 1 101,
102,103 or server 105) voice signal to be processed can be got by wired connection mode or radio connection.
Wherein, voice signal to be processed is the audio signal obtained comprising acquisition voice, such as can be various audio analog signals.It needs
It is noted that above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX
Connection, Zigbee connection, UWB (ultra wideband) connection and other currently known or exploitation in the future wireless connections
Mode.
Existing voice recognition methods needs to consume a large amount of calculating during converting voice signals into text information
Resource and memory space, and be frequently necessary to smart machine networking and just can be carried out.Therefore, existing audio recognition method is carrying out
The data-handling capacity that smart machine can be reduced in speech recognition process, be not readily applicable to that processing capacity is smaller or memory headroom compared with
In small smart machine (such as can be embedded system), also, the accuracy of obtained text information is not high.
For this purpose, the executing subject of the application passes through the methods of speech recognition first extracts voice from voice signal to be processed
Feature tonic train.Wherein, above-mentioned phonetic feature tonic train can be used for characterizing the corresponding text of voice signal to be processed.Language
Sound feature tonic train can be comprising the audio frame sequence with temporal information and amplitude information, usually can be digital audio
Signal.
Step 202, above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature sound
The Pinyin information of frequency sequence.
It, can be by phonetic feature tonic train after executing subject obtains phonetic feature tonic train by audio recognition method
Phonetic identification model is imported, the Pinyin information of phonetic feature tonic train is obtained.Wherein, above-mentioned phonetic identification model can be used for
The Pinyin information of corresponding phonetic feature tonic train is matched by phonetic unit set.In existing voice recognition methods, in order to
Which word corresponding recognition of speech signals is, it is necessary first to which phonetic feature tonic train is identified as a variety of possible initial consonant phonemes
With simple or compound vowel of a Chinese syllable phoneme.Then, a variety of matchings and amendment then to adjacent initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme are carried out, finally can just be determined
Corresponding Pinyin information.For example, phonetic feature tonic train includes the audio-frequency information that practical text is " 01 ".Existing voice
When recognition methods handles " 01 " audio-frequency information in phonetic feature tonic train, available " sil ", " l ",
A variety of basic pronunciation informations such as " ing ", " sil ", " y ", " i " and " sil ".Later, existing voice recognition methods is needed to above-mentioned
A variety of basic pronunciation informations carry out various arrangement combinations, to identify real pronunciation as precisely as possible.For example, to a variety of
Basic pronunciation information permutation and combination result may is that " sil-l+ing ", " l-ing+sil ", " ing-l+ing ", " l-ing+l ",
" i-l+ing ", " l-ing+y ", " i-y+i ", " y-i+y ", " ing-y+i ", " y-i+l ", " sil-y+i ", " y-i+sil " etc..
Wherein, " sil " can indicate the pause that may be stored in actual speech;"-" can indicate that the basic pronunciation at left and right sides of "-" is believed
Breath can be used as whole consideration;"+" can indicate that the basic pronunciation information at left and right sides of "+" is syntagmatic.As it can be seen that existing
Audio recognition method needs to match every kind of above-mentioned permutation and combination result, finally can just obtain result.Existing voice is known
Other method can get bulk information from phonetic feature tonic train, and carry out at corresponding data to these bulk informations
Reason.Therefore, existing voice recognition methods needs to occupy the data processing amount of a large amount of memory space of executing subject, also, voice
Recognition result may be inaccurate.
For this purpose, the executing subject of the application can obtain corresponding phonetic feature tonic train by phonetic identification model
Pinyin information.Phonetic unit set in phonetic identification model can wrap containing multiple phonetic units, and each phonetic unit can be with
Single text for identification.Such as: " trees ", " mansion ", " we go to travel ", " hearing cross-talk " etc. are not individual characters, " people ",
" flower ", " sea " etc. include that the word of a word just belongs to individual character text (i.e. individual character) described in the application.Multiple phonetic units can be with
Practical combinations comprising all initial consonant phonemes and simple or compound vowel of a Chinese syllable phoneme.Therefore, phonetic identification model can be matched with phonetic unit set
The Pinyin information of corresponding phonetic feature tonic train.
It is above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic knowledge in some optional implementations of the present embodiment
Other model obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, may comprise steps of:
The first step extracts a frame initial speech at interval of the first setting number of frames from above-mentioned phonetic feature tonic train
Frame obtains initial speech frame sequence.
In order to further decrease the data processing amount for getting text information from voice signal to be processed, reduce empty to storage
Between occupancy, the executing subject of the application can also mention from above-mentioned phonetic feature tonic train at interval of the first setting number of frames
A frame initial speech is taken, initial speech frame sequence is obtained.In this way, having got enough useful informations, and reduce
Size of data accelerates the efficiency that executing subject gets text information from voice signal to be processed.
The initial speech of the second setting number of frames adjacent in above-mentioned initial speech frame sequence is merged into two by second step
Secondary voice frame sequence.
Further, executing subject can also merge the second setting number of frames adjacent in above-mentioned initial speech frame sequence
For a frame initial speech, secondary voice frame sequence is obtained, data volume is further reduced.
In some optional implementations of the present embodiment, above-mentioned phonetic unit may include initial consonant phoneme and initial consonant
The matched simple or compound vowel of a Chinese syllable phoneme of phoneme, tone mark.Wherein, above-mentioned tone mark can serve to indicate that by initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme
The pronunciation character of the Pinyin information of composition.Usual: tone mark can Pinyin information sends out " one respectively with " 1 ", " 2 ", " 3 " and " 4 "
Sound ", " two sound ", " three sound " and " four tones of standard Chinese pronunciation ".Tone mark can also be other expression-forms, no longer repeat one by one herein.And
It is above-mentioned that above-mentioned phonetic feature tonic train is imported into phonetic identification model, it obtains corresponding to above-mentioned phonetic feature audio sequence
The Pinyin information of column can with the following steps are included:
The first step obtains the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence.
The voice amplitude waveform diagram of the available above-mentioned secondary voice frame sequence of executing subject.Wherein, voice amplitude waveform
The audio amplitude figure that figure can be the curve comprising multiple audio amplitudes or be made of multiple rectangular strips.
Second step filters out the spike speech frame of corresponding amplitude extreme value from above-mentioned voice amplitude waveform diagram, obtains spike
Voice frame sequence.
In general, the pronunciation of each word contains the audio-frequency information of certain time length, from voice signal to be processed to secondary
After voice frame sequence, data volume reduces, but contains the audio-frequency information of certain time length.In practice, each word pair
The maximal audio information answered is best able to represent the pronunciation of the word.In order to accurately determine the audio-frequency information of each word, executing subject can
To obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence first, extreme value then is asked to voice amplitude waveform diagram again, really
Multiple spike speech frames that voice amplitude waveform diagram includes are made, spike voice frame sequence is obtained.Wherein, spike speech frame can be with
It is considered with the most important audio-frequency information of each word.
Third step matches the spike speech frame in above-mentioned spike voice frame sequence from above-mentioned phonetic unit set
Target phonetic unit corresponding with the spike speech frame out, and determine that the spike speech frame is corresponding by the target phonetic unit
Target Pinyin information.
After obtaining spike voice frame sequence, executing subject can match and the spike speech frame from phonetic unit set
Corresponding target phonetic unit.Since phonetic unit itself contains initial consonant phoneme and the matched rhythm vowel of initial consonant phoneme simultaneously
Element and tone mark, therefore spike voice frame sequence can be identified on the basis of single word, and identify accuracy very
It is high.Since every phonetic unit has corresponding initial consonant phoneme and the matched simple or compound vowel of a Chinese syllable phoneme of initial consonant phoneme and tone to identify,
Executing subject can determine the corresponding target Pinyin information of the spike speech frame by target phonetic unit.
4th step is right according to sequence of the corresponding spike speech frame of target Pinyin information in above-mentioned spike voice frame sequence
Target Pinyin information is ranked up, and obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train.
Executing subject can be according to the corresponding spike speech frame of target Pinyin information in above-mentioned spike voice frame sequence
Sequence is ranked up target Pinyin information, then again as unit of each Pinyin information to adjacent Pinyin information progress
Match, obtains the Pinyin information of above-mentioned phonetic feature tonic train.In this way, substantially increasing the accuracy rate for obtaining Pinyin information.
Step 203, the text information of corresponding above-mentioned voice signal to be processed is searched according to above-mentioned Pinyin information.
After obtaining Pinyin information, executing subject can inquire answering for text according to Pinyin information enquiring electronic dictionary etc.
With to obtain the text information of corresponding Pinyin information.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for obtaining information of the present embodiment
Figure.In the application scenarios of Fig. 3, user issues voice signal to terminal device 102.Terminal device 102 receives voice letter
After number, phonetic feature tonic train is extracted first from voice signal;Then phonetic feature tonic train is imported into phonetic identification
Model obtains Pinyin information;The text for finally inquiring corresponding Pinyin information obtains the text information of corresponding voice signal.Later,
Terminal device 102 can also be by word-information display on the screen of terminal device 102.
The method provided by the above embodiment of the application extracts phonetic feature audio sequence first from voice signal to be processed
Column;Then above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned phonetic feature tonic train
Pinyin information;The text information of corresponding above-mentioned voice signal to be processed is finally searched according to above-mentioned Pinyin information.The technical program
The data processing amount and memory space for obtaining Pinyin information are reduced, the efficiency and accuracy rate for obtaining text information are improved.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of phonetic unit set construction method.It should
The process 400 of phonetic unit set construction method, comprising the following steps:
Step 401, initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes are obtained.
In the present embodiment, executing subject (such as the clothes shown in FIG. 1 of phonetic unit set construction method operation thereon
Business device 105) initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes can be obtained by wired connection mode or radio connection.
The executing subject of the present embodiment can pass through the application queries such as electronic dictionary, electronic dictionary to whole initial consonant phonemes
Set and simple or compound vowel of a Chinese syllable set of phonemes.Initial consonant set of phonemes contains possible whole initial consonant phoneme in Pinyin information.Simple or compound vowel of a Chinese syllable phoneme
Set contains possible whole simple or compound vowel of a Chinese syllable phoneme in Pinyin information.
Step 402, for the initial consonant phoneme in above-mentioned initial consonant set of phonemes, filtered out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes with
The matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme obtains the phonetic unit for corresponding to initial sounds element.
Pinyin information includes initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme, and initial consonant phoneme is before simple or compound vowel of a Chinese syllable phoneme.For this purpose, executing master
Body can determine the simple or compound vowel of a Chinese syllable phoneme with the initial consonant phoneme, Jin Erke using each initial consonant phoneme in initial consonant set of phonemes as starting point
To obtain corresponding to all possible phonetic unit of initial sounds element.
It is above-mentioned to be filtered out and the sound from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes in some optional implementations of the present embodiment
The matched simple or compound vowel of a Chinese syllable phoneme of vowel element, obtains the phonetic unit for corresponding to initial sounds element, may comprise steps of:
The first step filters out from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes and obtains rhythm vowel with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme
Sub-prime set.
For a certain initial consonant phoneme, usually Pinyin information in practice can be matched into multiple simple or compound vowel of a Chinese syllable phonemes.
For example, initial consonant phoneme is " zh ", then corresponding simple or compound vowel of a Chinese syllable phoneme may is that " i ", " a ", " e ", " ong ", " ui ", " eng " etc..It holds
Row main body can will have the simple or compound vowel of a Chinese syllable phonotactics of matching relationship at simple or compound vowel of a Chinese syllable phoneme subclass with the initial consonant phoneme.
Second step determines the tone mark of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass
Knowledge obtains tone logo collection.
Pinyin information can be the pronunciation of " sound ", " two sound ", " three sound " or " four tones of standard Chinese pronunciation ".Executing subject can pass through inquiry
The modes such as electronic dictionary, which are determined by the Pinyin information that initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme form, includes which kind is pronounced, and then can be with
Tone is arranged for Pinyin information to identify.Tone identifies the pronunciation that can serve to indicate that Pinyin information.
Third step, by the tone mark in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection
Know the phonetic unit for being combined into corresponding initial sounds element.
Obtain simple or compound vowel of a Chinese syllable phoneme subclass, initial consonant phoneme and simple or compound vowel of a Chinese syllable the phoneme composition of each initial consonant phoneme in initial consonant set of phonemes
Pinyin information tone logo collection after, executing subject can in initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass simple or compound vowel of a Chinese syllable phoneme,
Tone mark in tone logo collection carries out various combinations, obtains the phonetic unit of corresponding initial consonant phoneme.Thus, it is possible to comprising
The pronunciation situation of all Pinyin information and Pinyin information.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for obtaining letter
One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, the present embodiment may include: that phonetic feature tonic train obtains for obtaining the device 500 of information
Take unit 501, Pinyin information acquiring unit 502 and text information acquiring unit 503.Wherein, phonetic feature tonic train obtains
Unit 501 is configured to obtain phonetic feature tonic train from voice signal to be processed, and above-mentioned phonetic feature tonic train is used
In the corresponding text of characterization voice signal to be processed;Pinyin information acquiring unit 502 is configured to above-mentioned phonetic feature audio
Sequence imports phonetic identification model, obtains the Pinyin information for corresponding to above-mentioned phonetic feature tonic train, above-mentioned phonetic identification model
For matching the Pinyin information of corresponding phonetic feature tonic train by phonetic unit set, above-mentioned phonetic unit is for identification
Single text;Text information acquiring unit 503 is configured to search corresponding above-mentioned voice letter to be processed according to above-mentioned Pinyin information
Number text information.
In some optional implementations of the present embodiment, above-mentioned Pinyin information acquiring unit 502 may include: initial
Speech frame retrieval subelement (not shown) and secondary speech frame retrieval subelement (not shown).Wherein,
Initial speech retrieval subelement is configured to from above-mentioned phonetic feature tonic train at interval of the first setting number of frames
A frame initial speech is extracted, initial speech frame sequence is obtained;Secondary speech frame retrieval subelement is configured to will be above-mentioned
The initial speech of the second adjacent setting number of frames merges into secondary voice frame sequence in initial speech frame sequence.
In some optional implementations of the present embodiment, above-mentioned phonetic unit may include initial consonant phoneme and initial consonant
The matched simple or compound vowel of a Chinese syllable phoneme of phoneme, tone mark, above-mentioned tone mark can serve to indicate that be made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme
Pinyin information pronunciation character, and, above-mentioned Pinyin information acquiring unit 502 may include: voice amplitude waveform diagram obtain
Subelement (not shown), spike speech frame retrieval subelement (not shown), target Pinyin information obtain son list
First (not shown) and Pinyin information obtain subelement (not shown).Wherein, voice amplitude waveform diagram obtains subelement
It is configured to obtain the voice amplitude waveform diagram of above-mentioned secondary voice frame sequence;Spike speech frame retrieval subelement is configured
At the spike speech frame for filtering out corresponding amplitude extreme value from above-mentioned voice amplitude waveform diagram, spike voice frame sequence is obtained;Mesh
Mark Pinyin information obtains subelement and is configured to for the spike speech frame in above-mentioned spike voice frame sequence, from above-mentioned phonetic list
Target phonetic unit corresponding with the spike speech frame is matched in member set, and the spike is determined by the target phonetic unit
The corresponding target Pinyin information of speech frame;Pinyin information obtains subelement and is configured to according to the corresponding spike of target Pinyin information
Sequence of the speech frame in above-mentioned spike voice frame sequence is ranked up target Pinyin information, obtains corresponding to above-mentioned phonetic feature
The Pinyin information of tonic train.
In some optional implementations of the present embodiment, above-mentioned apparatus further includes phonetic unit set construction unit
(not shown) is configured to construct phonetic unit set.Above-mentioned phonetic unit set construction unit may include: phone set
It closes and obtains subelement (not shown) and phonetic unit acquisition subelement (not shown).Wherein, set of phonemes obtains son
Unit is configured to obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes;Phonetic unit obtains subelement and is configured to for above-mentioned
Initial consonant phoneme in initial consonant set of phonemes filters out and the matched rhythm vowel of the initial consonant phoneme from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes
Element obtains the phonetic unit for corresponding to initial sounds element.
In some optional implementations of the present embodiment, it may include: simple or compound vowel of a Chinese syllable that above-mentioned phonetic unit, which obtains subelement,
Phoneme subclass obtains module (not shown), tone logo collection obtains module (not shown) and phonetic unit obtains
Module (not shown).Wherein, simple or compound vowel of a Chinese syllable phoneme subclass obtains module and is configured to screen from above-mentioned simple or compound vowel of a Chinese syllable set of phonemes
Simple or compound vowel of a Chinese syllable phoneme subclass is obtained with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme out;Tone logo collection obtains module and is configured to really
The tone of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition identifies to obtain tone identification sets in the fixed initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass
It closes;Phonetic unit obtains module and is configured to simple or compound vowel of a Chinese syllable phoneme and tone identification sets in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass
Tone identifier combination in conjunction in pairs should initial consonant phoneme phonetic unit.
The present embodiment additionally provides a kind of server, comprising: one or more processors;Memory is stored thereon with one
A or multiple programs, when said one or multiple programs are executed by said one or multiple processors so that said one or
Multiple processors execute the above-mentioned method for obtaining information.
The present embodiment additionally provides a kind of computer-readable medium, is stored thereon with computer program, and the program is processed
Device realizes the above-mentioned method for obtaining information when executing.
Below with reference to Fig. 6, it illustrates the servers for being suitable for being used to realize the embodiment of the present application (for example, the service in Fig. 1
Device 105) computer system 600 structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to the application
The function and use scope of embodiment bring any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and
Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;
And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon
Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media
611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes
Above-mentioned function.
It should be noted that the above-mentioned computer-readable medium of the application can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this application, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In application, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include phonetic feature tonic train acquiring unit, Pinyin information acquiring unit and text information acquiring unit.Wherein, these units
Title does not constitute the restriction to the unit itself under certain conditions, for example, text information acquiring unit can also be described
For " for obtaining the unit of text information by Pinyin information ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should
Device: phonetic feature tonic train is obtained from voice signal to be processed, above-mentioned phonetic feature tonic train is for characterizing wait locate
Manage the corresponding text of voice signal;Above-mentioned phonetic feature tonic train is imported into phonetic identification model, obtains corresponding to above-mentioned voice
The Pinyin information of feature tonic train, above-mentioned phonetic identification model are used to match corresponding phonetic feature by phonetic unit set
The Pinyin information of tonic train, above-mentioned phonetic unit single text for identification;It is above-mentioned that correspondence is searched according to above-mentioned Pinyin information
The text information of voice signal to be processed.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (12)
1. a kind of method for obtaining information, comprising:
Phonetic feature tonic train is obtained from voice signal to be processed, the phonetic feature tonic train is to be processed for characterizing
The corresponding text of voice signal;
The phonetic feature tonic train is imported into phonetic identification model, obtains the phonetic for corresponding to the phonetic feature tonic train
Information, the phonetic that the phonetic identification model is used to match corresponding phonetic feature tonic train by phonetic unit set are believed
Breath, the phonetic unit single text for identification;
The text information of the corresponding voice signal to be processed is searched according to the Pinyin information.
It is described the phonetic feature tonic train is imported into phonetic to identify mould 2. according to the method described in claim 1, wherein
Type obtains the Pinyin information for corresponding to the phonetic feature tonic train, comprising:
A frame initial speech is extracted at interval of the first setting number of frames from the phonetic feature tonic train, obtains initial language
Sound frame sequence;
The initial speech of the second setting number of frames adjacent in the initial speech frame sequence is merged into secondary speech frame sequence
Column.
3. according to the method described in claim 2, wherein, the phonetic unit includes initial consonant phoneme, matched with initial consonant phoneme
Simple or compound vowel of a Chinese syllable phoneme, tone mark, the tone mark are used to indicate the hair for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme
Sound feature, and,
It is described that the phonetic feature tonic train is imported into phonetic identification model, it obtains corresponding to the phonetic feature tonic train
Pinyin information, comprising:
Obtain the voice amplitude waveform diagram of the secondary voice frame sequence;
The spike speech frame that corresponding amplitude extreme value is filtered out from the voice amplitude waveform diagram obtains spike voice frame sequence;
For the spike speech frame in the spike voice frame sequence, matched from the phonetic unit set and the spike language
The corresponding target phonetic unit of sound frame, and determine that the corresponding target phonetic of the spike speech frame is believed by the target phonetic unit
Breath;
Target phonetic is believed according to sequence of the corresponding spike speech frame of target Pinyin information in the spike voice frame sequence
Breath is ranked up, and obtains the Pinyin information for corresponding to the phonetic feature tonic train.
4. according to the method described in claim 1, wherein, the phonetic unit set is constructed by following steps:
Obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes;
For the initial consonant phoneme in the initial consonant set of phonemes, filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme
The simple or compound vowel of a Chinese syllable phoneme matched obtains the phonetic unit for corresponding to initial sounds element.
5. described to be filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and the initial consonant phoneme according to the method described in claim 4, wherein
Matched simple or compound vowel of a Chinese syllable phoneme obtains the phonetic unit for corresponding to initial sounds element, comprising:
It is filtered out from the simple or compound vowel of a Chinese syllable set of phonemes and obtains simple or compound vowel of a Chinese syllable phoneme subclass with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme;
Determine that the tone of the Pinyin information of simple or compound vowel of a Chinese syllable phoneme composition in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass identifies to obtain tone
Logo collection;
By the tone identifier combination in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass in simple or compound vowel of a Chinese syllable phoneme and tone logo collection at correspondence
The phonetic unit of the initial consonant phoneme.
6. a kind of for obtaining the device of information, comprising:
Phonetic feature tonic train acquiring unit is configured to obtain phonetic feature tonic train from voice signal to be processed,
The phonetic feature tonic train is for characterizing the corresponding text of voice signal to be processed;
Pinyin information acquiring unit is configured to the phonetic feature tonic train importing phonetic identification model, be corresponded to
The Pinyin information of the phonetic feature tonic train, the phonetic identification model are used to match correspondence by phonetic unit set
The Pinyin information of phonetic feature tonic train, the phonetic unit single text for identification;
Text information acquiring unit is configured to search the text of the corresponding voice signal to be processed according to the Pinyin information
Information.
7. device according to claim 6, wherein the Pinyin information acquiring unit includes:
Initial speech retrieval subelement is configured to from the phonetic feature tonic train at interval of the first setting number
It measures frame and extracts a frame initial speech, obtain initial speech frame sequence;
Secondary speech frame retrieval subelement is configured to the second setting quantity adjacent in the initial speech frame sequence
The initial speech of frame merges into secondary voice frame sequence.
8. device according to claim 7, wherein the phonetic unit includes initial consonant phoneme, matched with initial consonant phoneme
Simple or compound vowel of a Chinese syllable phoneme, tone mark, the tone mark are used to indicate the hair for the Pinyin information being made of initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme
Sound feature, and,
The Pinyin information acquiring unit includes:
Voice amplitude waveform diagram obtains subelement, is configured to obtain the voice amplitude waveform diagram of the secondary voice frame sequence;
Spike speech frame retrieval subelement is configured to filter out corresponding amplitude extreme value from the voice amplitude waveform diagram
Spike speech frame, obtain spike voice frame sequence;
Target Pinyin information obtains subelement, is configured to for the spike speech frame in the spike voice frame sequence, from institute
It states and matches target phonetic unit corresponding with the spike speech frame in phonetic unit set, and is true by the target phonetic unit
The fixed corresponding target Pinyin information of the spike speech frame;
Pinyin information obtains subelement, is configured to according to the corresponding spike speech frame of target Pinyin information in the spike voice
Sequence in frame sequence is ranked up target Pinyin information, obtains the Pinyin information for corresponding to the phonetic feature tonic train.
9. device according to claim 6, wherein described device further includes phonetic unit set construction unit, is configured
At building phonetic unit set, the phonetic unit set construction unit includes:
Set of phonemes obtains subelement, is configured to obtain initial consonant set of phonemes and simple or compound vowel of a Chinese syllable set of phonemes;
Phonetic unit obtains subelement, is configured to for the initial consonant phoneme in the initial consonant set of phonemes, from the rhythm vowel
Element set in filter out with the matched simple or compound vowel of a Chinese syllable phoneme of the initial consonant phoneme, obtain the phonetic unit for corresponding to initial sounds element.
10. device according to claim 9, wherein the phonetic unit obtains subelement and includes:
Simple or compound vowel of a Chinese syllable phoneme subclass obtains module, is configured to filter out from the simple or compound vowel of a Chinese syllable set of phonemes and matches with the initial consonant phoneme
Simple or compound vowel of a Chinese syllable phoneme obtain simple or compound vowel of a Chinese syllable phoneme subclass;
Tone logo collection obtains module, is configured to determine simple or compound vowel of a Chinese syllable phoneme in the initial consonant phoneme and simple or compound vowel of a Chinese syllable phoneme subclass and forms
The tone of Pinyin information identify to obtain tone logo collection;
Phonetic unit obtains module, is configured to identify simple or compound vowel of a Chinese syllable phoneme in the initial consonant phoneme, simple or compound vowel of a Chinese syllable phoneme subclass and tone
Tone identifier combination in set in pairs should initial consonant phoneme phonetic unit.
11. a kind of server, comprising:
One or more processors;
Memory is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors
Perform claim requires any method in 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Method of the Shi Shixian as described in any in claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811198500.5A CN109410918B (en) | 2018-10-15 | 2018-10-15 | Method and device for acquiring information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811198500.5A CN109410918B (en) | 2018-10-15 | 2018-10-15 | Method and device for acquiring information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109410918A true CN109410918A (en) | 2019-03-01 |
CN109410918B CN109410918B (en) | 2020-01-24 |
Family
ID=65468027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811198500.5A Active CN109410918B (en) | 2018-10-15 | 2018-10-15 | Method and device for acquiring information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109410918B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161724A (en) * | 2019-12-16 | 2020-05-15 | 爱驰汽车有限公司 | Method, system, equipment and medium for Chinese audio-visual combined speech recognition |
CN111192572A (en) * | 2019-12-31 | 2020-05-22 | 斑马网络技术有限公司 | Semantic recognition method, device and system |
CN112541957A (en) * | 2020-12-09 | 2021-03-23 | 北京百度网讯科技有限公司 | Animation generation method, animation generation device, electronic equipment and computer readable medium |
CN113327576A (en) * | 2021-06-03 | 2021-08-31 | 多益网络有限公司 | Speech synthesis method, apparatus, device and storage medium |
CN114125506A (en) * | 2020-08-28 | 2022-03-01 | 上海哔哩哔哩科技有限公司 | Voice auditing method and device |
CN117116267A (en) * | 2023-10-24 | 2023-11-24 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996023298A2 (en) * | 1995-01-26 | 1996-08-01 | Apple Computer, Inc. | System amd method for generating and using context dependent sub-syllable models to recognize a tonal language |
CN1609828A (en) * | 2003-10-22 | 2005-04-27 | 无敌科技股份有限公司 | System and method for synthetizing English words through base phoneme |
CN102200839A (en) * | 2010-03-25 | 2011-09-28 | 阿里巴巴集团控股有限公司 | Method and system for processing pinyin string in process of inputting Chinese characters |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
CN107621892A (en) * | 2017-10-18 | 2018-01-23 | 北京百度网讯科技有限公司 | For obtaining the method and device of information |
-
2018
- 2018-10-15 CN CN201811198500.5A patent/CN109410918B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996023298A2 (en) * | 1995-01-26 | 1996-08-01 | Apple Computer, Inc. | System amd method for generating and using context dependent sub-syllable models to recognize a tonal language |
CN1609828A (en) * | 2003-10-22 | 2005-04-27 | 无敌科技股份有限公司 | System and method for synthetizing English words through base phoneme |
CN102200839A (en) * | 2010-03-25 | 2011-09-28 | 阿里巴巴集团控股有限公司 | Method and system for processing pinyin string in process of inputting Chinese characters |
CN102208186A (en) * | 2011-05-16 | 2011-10-05 | 南宁向明信息科技有限责任公司 | Chinese phonetic recognition method |
CN107621892A (en) * | 2017-10-18 | 2018-01-23 | 北京百度网讯科技有限公司 | For obtaining the method and device of information |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161724A (en) * | 2019-12-16 | 2020-05-15 | 爱驰汽车有限公司 | Method, system, equipment and medium for Chinese audio-visual combined speech recognition |
CN111161724B (en) * | 2019-12-16 | 2022-12-13 | 爱驰汽车有限公司 | Method, system, equipment and medium for Chinese audio-visual combined speech recognition |
CN111192572A (en) * | 2019-12-31 | 2020-05-22 | 斑马网络技术有限公司 | Semantic recognition method, device and system |
CN114125506A (en) * | 2020-08-28 | 2022-03-01 | 上海哔哩哔哩科技有限公司 | Voice auditing method and device |
CN114125506B (en) * | 2020-08-28 | 2024-03-19 | 上海哔哩哔哩科技有限公司 | Voice auditing method and device |
CN112541957A (en) * | 2020-12-09 | 2021-03-23 | 北京百度网讯科技有限公司 | Animation generation method, animation generation device, electronic equipment and computer readable medium |
CN112541957B (en) * | 2020-12-09 | 2024-05-21 | 北京百度网讯科技有限公司 | Animation generation method, device, electronic equipment and computer readable medium |
CN113327576A (en) * | 2021-06-03 | 2021-08-31 | 多益网络有限公司 | Speech synthesis method, apparatus, device and storage medium |
CN113327576B (en) * | 2021-06-03 | 2024-04-23 | 多益网络有限公司 | Speech synthesis method, device, equipment and storage medium |
CN117116267A (en) * | 2023-10-24 | 2023-11-24 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
CN117116267B (en) * | 2023-10-24 | 2024-02-13 | 科大讯飞股份有限公司 | Speech recognition method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109410918B (en) | 2020-01-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109410918A (en) | For obtaining the method and device of information | |
CN107623614A (en) | Method and apparatus for pushed information | |
CN107657017A (en) | Method and apparatus for providing voice service | |
CN109272984A (en) | Method and apparatus for interactive voice | |
CN110223705A (en) | Phonetics transfer method, device, equipment and readable storage medium storing program for executing | |
CN110288980A (en) | Audio recognition method, the training method of model, device, equipment and storage medium | |
CN108022586A (en) | Method and apparatus for controlling the page | |
CN107844586A (en) | News recommends method and apparatus | |
CN107767869A (en) | Method and apparatus for providing voice service | |
CN108428446A (en) | Audio recognition method and device | |
CN109545192A (en) | Method and apparatus for generating model | |
CN108648756A (en) | Voice interactive method, device and system | |
CN107808007A (en) | Information processing method and device | |
CN111798821B (en) | Sound conversion method, device, readable storage medium and electronic equipment | |
CN109102802A (en) | System for handling user spoken utterances | |
CN108877782A (en) | Audio recognition method and device | |
CN109145148A (en) | Information processing method and device | |
CN106921749A (en) | For the method and apparatus of pushed information | |
CN109739605A (en) | The method and apparatus for generating information | |
CN107463700A (en) | For obtaining the method, apparatus and equipment of information | |
CN109545193A (en) | Method and apparatus for generating model | |
CN108701127A (en) | Electronic equipment and its operating method | |
CN109754783A (en) | Method and apparatus for determining the boundary of audio sentence | |
CN109308901A (en) | Chanteur's recognition methods and device | |
CN107943877A (en) | The generation method and device of content of multimedia to be played |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |