CN108682423A - A kind of audio recognition method and device - Google Patents
A kind of audio recognition method and device Download PDFInfo
- Publication number
- CN108682423A CN108682423A CN201810504702.1A CN201810504702A CN108682423A CN 108682423 A CN108682423 A CN 108682423A CN 201810504702 A CN201810504702 A CN 201810504702A CN 108682423 A CN108682423 A CN 108682423A
- Authority
- CN
- China
- Prior art keywords
- information
- distance
- character string
- section
- pinyin character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
Abstract
The present invention provides a kind of audio recognition method and device, technical solution is:Receive voice signal;Determine the corresponding original character information of voice signal, and the multistage text information for including from current information displaying interface;The distance based on phonetic is carried out to every section of text information that the original character information and current information displaying interface include to calculate;Determine current information displaying interface include with the shortest passage information of the original character information distance, the corresponding final text information of the voice signal of user is determined at a distance from the original character information according to this section of text information.The present invention can adjust voice recognition result according to scene adaptive residing for user, mention the accuracy of speech recognition.
Description
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of audio recognition method and device.
Background technology
Speech recognition technology, also referred to as automatic speech recognition (Automatic Speech Recognition, ASR),
It is computer-readable input that its target, which is vocabulary Content Transformation in the voice by the mankind, for example, button, binary coding or
Person's character string.
Current speech recognition technology carries out the signal of input mainly according to tools such as acoustics, language model and dictionaries
Analysis, searching can export the word string of the signal with maximum probability (weight).For example, voice signal input by user sounds like
" liudehua ", since the word string that maximum probability exports the voice signal is " Liu Dehua ", the voice letter of the input of user
Number be eventually converted to word " Liu Dehua ", rather than similar " Liu Dehua ", Liu get Hua " etc. the similar word of pronunciation.
Existing speech recognition technology, it is already possible to meet the application demand of the overwhelming majority, however in specific scene
Or some problems are also will produce under specific context, for example, user is currently seeing a film acted the leading role by " Liu Dehua ", it is desirable to
Find some and " Liu Dehua " relevant information (video, document, webpage) etc., but when being " liudehua " due to pronunciation, it is defeated
Go out the probability highest of " Liu Dehua ", therefore, " Liu Dehua " can be used as final voice conversion results, however this result can not
Meet the user demand under this scene.
Invention content
In view of this, the purpose of the present invention is to provide a kind of audio recognition method, it can be according to scene residing for user certainly
Adjustment voice recognition result is adapted to, the accuracy of speech recognition is mentioned.
In order to achieve the above object, the present invention provides following technical solutions:
A kind of audio recognition method, this method include:
Receive voice signal;
The corresponding original character information of determination voice signal;
The multistage text information for including from current information displaying interface;
Every section of text information for including to the original character information and current information displaying interface is carried out based on phonetic
Distance calculates;
Determine current information displaying interface include with the shortest passage information of the original character information distance, root
The corresponding final text information of the voice signal of user is determined at a distance from the original character information according to this section of text information.
A kind of speech recognition equipment, including:Receiving unit, recognition unit, acquiring unit, processing unit;
The receiving unit, the voice signal for receiving user;
The recognition unit when receiving the voice signal of user for receiving unit, determines the voice signal pair of user
The original character information answered;
The acquiring unit when receiving the voice signal of user for receiving unit, is worked as from the acquisition of information display module
The multistage text information that preceding information displaying interface includes;
The processing unit, every section of word for including to the original character information and current information displaying interface are believed
Breath carries out the distance based on phonetic and calculates;For determining that current information displaying interface is including with the original character information distance
Shortest passage information determines the voice signal of user according to this section of text information at a distance from the original character information
Corresponding final text information.
As can be seen from the above technical solution, in the present invention, user is identified first with the speech recognition technology of the prior art
The corresponding original character information of voice signal, the information that this original character information and user are currently browsed shows interface Zhong Bao
Each section of text information contained is matched after being converted into the pinyin character string mutually met, find out current information displaying interface in this
The nearest passage information of original character information distance is determined according to the distance of this section of text information and this original character information
The corresponding final text information of voice signal of user.The present invention can adjust voice according to scene adaptive residing for user
Recognition result mentions the accuracy of speech recognition.
Description of the drawings
Fig. 1 is the flow chart of audio recognition method of the embodiment of the present invention;
Fig. 2 is the structural schematic diagram of speech recognition equipment of the embodiment of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with the accompanying drawings and according to embodiment,
Technical scheme of the present invention is described in detail.
The equipment that the present invention is suitable for including sound identification module and information displaying application module, described information displaying application
Module can be an application module with video playback capability, audio playing function, and/or text importing function, such as
One video player.Described information displaying application module can the request based on sound identification module its current information is shown
The text information for including in interface is supplied to the sound identification module, alternatively, will include in its current information displaying interface actively
Text information be pushed to the sound identification module;Described information shows that application module and the sound identification module can be belonged to
The different function module of same application can also be the function module for belonging to different application.
The method of the invention realizes the functions of above-mentioned sound identification module.
The implementation method of the present invention is specifically introduced below.
The flow chart of audio recognition method of the embodiment of the present invention referring to Fig. 1, Fig. 1, as shown in Figure 1, this method include with
Lower step:
Step 101 receives voice signal.
In the embodiment of the present invention, the voice signal of user is picked up by voice radio equipment and is sent to sound identification module.
Radio equipment can be microphone.User can realize voice signal input, the sound that microphone sends out user by microphone
Sound is converted into the sound identification module that voice signal is sent to equipment.
Step 102 determines the corresponding original character information of voice signal.
In the embodiment of the present invention, determine that the corresponding original character of the voice signal of user is believed using existing voice identification technology
Breath.
There are many existing voice identification technologies, can be identified and be determined the voice signal of user using any of which
Corresponding text information, for the ease of distinguishing, the voice signal that this identification and the text information determined are known as to user corresponds to
Original character information because follow-up will also be according to this text information and the language for combining the specific residing scene of user find out user
The corresponding more accurately text information of sound signal (is known as final text information).
Step 103 obtains the multistage text information that current information displaying interface includes.
In the embodiment of the present invention, information shows that interface can show that application module provides by information, and mould is applied in information displaying
Block can the request based on sound identification module by its current information show interface in include text information be supplied to the voice
Identification module, alternatively, its current information is actively shown that the text information for including in interface is pushed to the sound identification module.Cause
This, sound identification module can to information show application module hair request by way of or receive information show application module
The mode of active push obtains the multistage text information that current information displaying interface includes.
It should be noted that the information in the present invention is shown in interface, include many text informations, these text informations
It is to be shown according to certain format, these text informations is divided into multistage text information, such as multiple regard is shown with tabular form
Frequency title, each list items correspond to a video name, which is one section of independent text information;For another example using separation
Symbol, which is divided, shows multiple video names, and word corresponds to a video name between two separators, which is one section only
Vertical text information.In the present invention, current information shows that each section of text information in interface is mutual indepedent, and sound identification module needs
These the mutually independent multistage text informations for including in current information displaying interface are obtained, and corresponding with voice messaging one by one
Original character information carry out the distance based on phonetic calculating.
It is illustrated by taking television system as an example, it is assumed that information shows that application module is a video player, speech recognition mould
Block is a function module of the embedded video player, when user browses TV/film under the main interface that video playing rises
When information, which can show that all data that interface includes are encapsulated into map by current information, and map is handed over
To the binder of television system, the binder of television system can be by this data forwarding to sound identification module, so that language
Sound identification module obtains the multistage text information that current information displaying interface includes.In addition, the correlation at current information displaying interface
Information can then be obtained from the stack information of operating system, specially the stack top information of storehouse, parsed and answered from stack top information
The title (name each wrapped in operating system is unique) of packet stores current letter in the file that shows of name referring of this packet
Text information in breath displaying interface.
It should be noted that this step 103 can also execute before step 101 or step 102.
Step 104, every section of text information for including to the original character information and current information displaying interface carry out base
It is calculated in the distance of phonetic.
In the embodiment of the present invention, every section of text information that original character information and current information displaying interface include is turned
Pinyin character string is turned to, is calculated into row distance based on the pinyin character string after conversion.
There are two types of the modes for converting text information to pinyin character string, and one is be converted into not toned pinyin character
String, such as " Liu Dehua " is converted to " liudehua ";One is being converted to the pinyin character string with tone, using phonetic+
The mode of tone (sound call number 1,2,3,4 indicate, respectively represent, two sound, three sound, the four tones of standard Chinese pronunciation), such as by " Liu Dehua "
It is converted to " liu2de2hua2 ", wherein the tone of " liu2 " expression " liu " is 2, the tone of " de2 " expression " de " is 2,
The tone of " hua2 " expression " hua " is 2.
The mode that pinyin character string is converted to based on above two text information, to the original character information and current letter
It is at least three kinds following that every section of text information that breath displaying interface includes carries out the method that the distance based on phonetic calculates:
Method 1:
Convert the original character information to not toned first pinyin character string and the second phonetic with tone
Character string;
Convert every section of text information that current information displaying interface includes to pinyin character string to be matched, it is described to be matched
Pinyin character string is not toned pinyin character string;
By current information displaying interface include the corresponding pinyin character string to be matched of every section of text information respectively with it is described
First pinyin character string and the second pinyin character string are calculated into row distance, and calculate the pinyin character string and the first phonetic to be matched
The distance of character string and sum of the distance with the second pinyin character string, using the sum of the distance as this section of text information and the original
The distance of beginning text information.
Method 2:
Convert the original character information to not toned first pinyin character string and the second phonetic with tone
Character string;
Convert every section of text information that current information displaying interface includes to pinyin character string to be matched, it is described to be matched
Pinyin character string is the pinyin character string with tone;
By current information displaying interface include the corresponding pinyin character string to be matched of every section of text information respectively with it is described
First pinyin character string and the second pinyin character string are calculated into row distance, and calculate the pinyin character string and the first phonetic to be matched
The distance of character string and sum of the distance with the second pinyin character string, using the sum of the distance as this section of text information and the original
The distance of beginning text information.
Method 3:
Convert the original character information to not toned first pinyin character string and the second phonetic with tone
Character string;
Convert every section of text information that current information displaying interface includes to the not toned first phonetic word to be matched
Symbol string and the second pinyin character string to be matched with tone;
The corresponding first pinyin character string to be matched of current information displaying interface every section of text information including, second are waited for
Pinyin character string is matched to calculate into row distance with the first pinyin character string, the second pinyin character string respectively, and calculate this
One pinyin character string to be matched is at a distance from the first pinyin character string and the second pinyin character string to be matched and the second phonetic word
The sum of the distance for according with string, using the sum of the distance as this section of text information at a distance from the original character information.
In above-mentioned three kinds of methods, the distance that smallest edit distance algorithm calculates two pinyin character strings may be used, specifically
Including:The smallest edit distance that two pinyin character strings are calculated using smallest edit distance algorithm, by the smallest edit distance
Distance as two pinyin character strings.
Step 105, determine current information displaying interface include with the shortest one section of text of the original character information distance
Word information determines the corresponding final text of the voice signal of user according to this section of text information at a distance from the original character information
Word information.
According to all apart from result of calculation in step 104:Interface is shown to the original character information and current information
Including every section of text information at a distance from the original character information, can find out current information displaying interface include with institute
State the shortest passage information of original character information distance, the corresponding pinyin character string of this section of text information and the original text
The similarity highest of the corresponding pinyin character string of word information is most likely to be user and wishes that the word of the voice signal of its input turns
Change result.
Find out current information displaying interface include with the shortest passage information of the original character information distance it
Afterwards, due to regardless of distance, always it is including with the original character information finally to find current information displaying interface
Apart from shortest passage information, but the distance of this section of text information and original character information may be still bigger,
Such as this section of text information " Liu Xue " is converted into " liuxue ", original character information " Liu Dehua " is converted to " liudehua ", this
In the case of kind, this section of text information and original character information are actually unmatched, and " Liu Xue " is believed as the voice of user
Number corresponding text information is clearly mistake.
To solve the above-mentioned problems, the present invention in, find out current information displaying interface include with the original character believe
Breath is after shortest passage information, it is also necessary to further according to this section of text information and the original character information
Distance determines that the corresponding final text information of voice signal of user, specific method are:According to this section of text information and the original
Beginning text information determines a distance metric value, if be less than at a distance from the original character information should be away from for this section of text information
It, then, otherwise, will be described original using this section of text information as the corresponding final text information of the voice signal of user from metric
Voice signal corresponding final text information of the text information as user.
It, can be in advance according to the bigger principle setting one of the more long then distance metric value of string length about word in the present invention
The function f (x) of string length and distance metric value is accorded with, for example, f (x)=ax+b, wherein a and b are preset adjustment factor values, it can
With according to actual demand or experience setting.Simplest f (x) can be set to each specific string length and be arranged one
Corresponding distance metric value indicates the corresponding distance metric value of all sub- symbol string length by way of enumerating.
It is specifically as follows according to the method that this section of text information and the original character information determine a distance metric value:Really
The length L1 for the not toned pinyin character string that fixed this section of text information is transformed and original character information conversion
Made of not toned pinyin character string length L2, take the maximum length value in L1 and L2, institute determined according to the function
The corresponding distance metric value of maximum length value is stated, which is determined as one distance metric value.
Audio recognition method of the embodiment of the present invention is described in detail above, the present invention also provides a kind of voices
Identification device is described in detail below in conjunction with Fig. 2:
It is the structural schematic diagram of speech recognition equipment of the embodiment of the present invention referring to Fig. 2, Fig. 2, which knows where device
Further include information displaying application module in equipment, as shown in Fig. 2, the device includes:Receiving unit 201, obtains recognition unit 202
Take unit 203, processing unit 204;Wherein,
Receiving unit 201, the voice signal for receiving user;
Recognition unit 202 when receiving voice signal for receiving unit 201, determines the corresponding original text of voice signal
Word information;
Acquiring unit 203, when receiving voice signal for receiving unit 201, obtaining current information displaying interface includes
Multistage text information;
Processing unit 204, every section of word for including to the original character information and current information displaying interface are believed
Breath carries out the distance based on phonetic and calculates;For determining that current information displaying interface is including with the original character information distance
Shortest passage information determines the voice signal of user according to this section of text information at a distance from the original character information
Corresponding final text information.
In Fig. 2 shown devices,
The processing unit 204, every section of word for including to the original character information and current information displaying interface are believed
When breath carries out the calculating of the distance based on phonetic, it is used for:
Convert the original character information to not toned first pinyin character string and the second phonetic with tone
Character string;
Convert every section of text information that current information displaying interface includes to pinyin character string to be matched, it is described to be matched
Pinyin character string is not toned pinyin character string or the pinyin character string with tone;
By current information displaying interface include the corresponding pinyin character string to be matched of every section of text information respectively with it is described
First pinyin character string and the second pinyin character string are calculated into row distance, and calculate the pinyin character string and the first phonetic to be matched
The distance of character string and sum of the distance with the second pinyin character string, using the sum of the distance as this section of text information and the original
The distance of beginning text information.
In Fig. 2 shown devices,
The processing unit 204, every section of word for including to the original character information and current information displaying interface are believed
When breath carries out the calculating of the distance based on phonetic, it is used for:
Convert the original character information to not toned first pinyin character string and the second phonetic with tone
Character string;
Convert every section of text information that current information displaying interface includes to the not toned first phonetic word to be matched
Symbol string and the second pinyin character string to be matched with tone;
The corresponding first pinyin character string to be matched of current information displaying interface every section of text information including, second are waited for
Pinyin character string is matched to calculate into row distance with the first pinyin character string, the second pinyin character string respectively, and calculate this
One pinyin character string to be matched is at a distance from the first pinyin character string and the second pinyin character string to be matched and the second phonetic word
The sum of the distance for according with string, using the sum of the distance as this section of text information at a distance from the original character information.
In Fig. 2 shown devices,
The processing unit 204 calculates the distance of two pinyin character strings based on smallest edit distance algorithm, specific to wrap
It includes:The smallest edit distance that two pinyin character strings are calculated using smallest edit distance algorithm is made the smallest edit distance
For the distance of two pinyin character strings.
In Fig. 2 shown devices,
The processing unit 204 determines the language of user according to this section of text information at a distance from the original character information
When the corresponding final text information of sound signal, it is used for:According to this section of text information and the original character information determine one away from
From metric, if this section of text information is less than the distance metric value at a distance from the original character information, by the Duan Wen
Voice signal corresponding final text information of the word information as user, otherwise, using the original character information as user's
The corresponding final text information of voice signal.
Further include dispensing unit 205 in Fig. 2 shown devices;
The dispensing unit 205, for being arranged in advance according to the bigger principle of the more long then distance metric value of string length
One function about string length and distance metric value;
The processing unit 204 determines a distance metric value according to this section of text information and the original character information
When, it is used for:Determine the length L1 for the not toned pinyin character string that this section of text information is transformed and the original text
The length L2 for the not toned pinyin character string that word information is transformed, takes the maximum length value in L1 and L2, according to described
Function determines the corresponding distance metric value of the maximum length value, which is determined as one distance metric
Value.
It has been proved by practice that the method for the application present invention, can greatly improve the success rate of speech recognition, especially of the invention
Specific environment where user is participated in speech recognition process by method, can be very good to improve user experience.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
With within principle, any modification, equivalent substitution, improvement and etc. done should be included within the scope of protection of the invention god.
Claims (12)
1. a kind of audio recognition method, which is characterized in that this method includes:
Receive voice signal;
The corresponding original character information of determination voice signal;
The multistage text information for including from current information displaying interface;
Every section of text information for including to the original character information and current information displaying interface carries out the distance based on phonetic
It calculates;
Determine current information displaying interface include with the shortest passage information of the original character information distance, according to this
Section text information determines the corresponding final text information of the voice signal of user at a distance from the original character information.
2. according to the method described in claim 1, it is characterized in that,
Every section of text information for including to the original character information and current information displaying interface carries out the distance based on phonetic
The method of calculating is:
Convert the original character information to not toned first pinyin character string and the second pinyin character with tone
String;
Convert every section of text information that current information displaying interface includes to pinyin character string to be matched, the phonetic to be matched
Character string is not toned pinyin character string or the pinyin character string with tone;
The corresponding pinyin character string to be matched of every section of text information for including by current information displaying interface is respectively with described first
Pinyin character string and the second pinyin character string are calculated into row distance, and calculate the pinyin character string to be matched and the first pinyin character
The distance of string and sum of the distance with the second pinyin character string, using the sum of the distance as this section of text information and the original text
The distance of word information.
3. according to the method described in claim 1, it is characterized in that,
Every section of text information for including to the original character information and current information displaying interface carries out the distance based on phonetic
The method of calculating is:
Convert the original character information to not toned first pinyin character string and the second pinyin character with tone
String;
Convert every section of text information that current information displaying interface includes to the not toned first pinyin character string to be matched
With the second pinyin character string to be matched with tone;
By the corresponding first pinyin character string to be matched of current information displaying interface every section of text information including, second to be matched
Pinyin character string is calculated with the first pinyin character string, the second pinyin character string into row distance respectively, and is calculated this and first waited for
Pinyin character string is matched at a distance from the first pinyin character string and the second pinyin character string to be matched and the second pinyin character string
Sum of the distance, using the sum of the distance as this section of text information at a distance from the original character information.
4. according to the method in claim 2 or 3, which is characterized in that
The distance that two pinyin character strings are calculated based on smallest edit distance algorithm, is specifically included:It is calculated using smallest edit distance
Method calculates the smallest edit distance of two pinyin character strings, using the smallest edit distance as two pinyin character strings away from
From.
5. according to the method described in claim 1,2 or 3, which is characterized in that
The corresponding final word of the voice signal of user is determined at a distance from the original character information according to this section of text information
The method of information is:A distance metric value is determined according to this section of text information and the original character information, if the Duan Wen
Word information is less than the distance metric value at a distance from the original character information, then using this section of text information as the voice of user
The corresponding final text information of signal, otherwise, using the original character information as the corresponding final text of the voice signal of user
Word information.
6. according to the method described in claim 5, it is characterized in that,
In advance according to the bigger principle setting one of the more long then distance metric value of string length about string length and apart from degree
The function of magnitude;
It is according to the method that this section of text information and the original character information determine a distance metric value:Determine this section of word
The length L1 for the not toned pinyin character string that information is transformed and the original character information be transformed without
The length L2 of the pinyin character string of tone, takes the maximum length value in L1 and L2, and the maximum length is determined according to the function
It is worth corresponding distance metric value, which is determined as one distance metric value.
7. a kind of speech recognition equipment, which is characterized in that the device includes:Receiving unit, recognition unit, acquiring unit, processing
Unit;
The receiving unit, the voice signal for receiving user;
The recognition unit when receiving the voice signal of user for receiving unit, determines that the voice signal of user is corresponding
Original character information;
When receiving the voice signal of user for receiving unit, current letter is obtained from information display module for the acquiring unit
The multistage text information that breath displaying interface includes;
The processing unit, for the original character information and current information displaying interface every section of text information including into
Distance of the row based on phonetic calculates;For determining that current information shows that interface includes most short with the original character information distance
Passage information, determine that the voice signal of user is corresponding at a distance from the original character information according to this section of text information
Final text information.
8. speech recognition equipment according to claim 7, which is characterized in that
The processing unit, every section of text information for including to the original character information and current information displaying interface carry out base
When the distance of phonetic calculates, it is used for:
Convert the original character information to not toned first pinyin character string and the second pinyin character with tone
String;
Convert every section of text information that current information displaying interface includes to pinyin character string to be matched, the phonetic to be matched
Character string is not toned pinyin character string or the pinyin character string with tone;
The corresponding pinyin character string to be matched of every section of text information for including by current information displaying interface is respectively with described first
Pinyin character string and the second pinyin character string are calculated into row distance, and calculate the pinyin character string to be matched and the first pinyin character
The distance of string and sum of the distance with the second pinyin character string, using the sum of the distance as this section of text information and the original text
The distance of word information.
9. speech recognition equipment according to claim 7, which is characterized in that
The processing unit, every section of text information for including to the original character information and current information displaying interface carry out base
When the distance of phonetic calculates, it is used for:
Convert the original character information to not toned first pinyin character string and the second pinyin character with tone
String;
Convert every section of text information that current information displaying interface includes to the not toned first pinyin character string to be matched
With the second pinyin character string to be matched with tone;
By the corresponding first pinyin character string to be matched of current information displaying interface every section of text information including, second to be matched
Pinyin character string is calculated with the first pinyin character string, the second pinyin character string into row distance respectively, and is calculated this and first waited for
Pinyin character string is matched at a distance from the first pinyin character string and the second pinyin character string to be matched and the second pinyin character string
Sum of the distance, using the sum of the distance as this section of text information at a distance from the original character information.
10. speech recognition equipment according to claim 8 or claim 9, which is characterized in that
The processing unit is calculated the distance of two pinyin character strings based on smallest edit distance algorithm, specifically included:Using most
Small editing distance algorithm calculates the smallest edit distance of two pinyin character strings, is spelled the smallest edit distance as this two
The distance of sound character string.
11. according to the speech recognition equipment described in claim 7,8 or 9, which is characterized in that
The processing unit determines the voice signal pair of user according to this section of text information at a distance from the original character information
When the final text information answered, it is used for:A distance metric value is determined according to this section of text information and the original character information,
If this section of text information at a distance from the original character information be less than the distance metric value, using this section of text information as
The corresponding final text information of voice signal of user, otherwise, using the original character information as the voice signal pair of user
The final text information answered.
12. speech recognition equipment according to claim 11, which is characterized in that further include dispensing unit;
The dispensing unit, in advance according to the bigger principle setting one of the more long then distance metric value of string length about word
Accord with the function of string length and distance metric value;
The processing unit is used for when determining a distance metric value according to this section of text information and the original character information:
The length L1 and the original character information for determining the not toned pinyin character string that this section of text information is transformed turn
The length L2 of not toned pinyin character string, takes the maximum length value in L1 and L2 made of change, is determined according to the function
The corresponding distance metric value of the maximum length value, is determined as one distance metric value by the distance metric value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810504702.1A CN108682423A (en) | 2018-05-24 | 2018-05-24 | A kind of audio recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810504702.1A CN108682423A (en) | 2018-05-24 | 2018-05-24 | A kind of audio recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108682423A true CN108682423A (en) | 2018-10-19 |
Family
ID=63807951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810504702.1A Pending CN108682423A (en) | 2018-05-24 | 2018-05-24 | A kind of audio recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108682423A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948124A (en) * | 2019-03-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Voice document cutting method, device and computer equipment |
CN110767217A (en) * | 2019-10-30 | 2020-02-07 | 爱驰汽车有限公司 | Audio segmentation method, system, electronic device and storage medium |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131704A1 (en) * | 1997-04-14 | 2005-06-16 | At&T Corp. | System and method for providing remote automatic speech recognition and text to speech services via a packet network |
CN102148031A (en) * | 2011-04-01 | 2011-08-10 | 无锡大核科技有限公司 | Voice recognition and interaction system and method |
CN106297799A (en) * | 2016-08-09 | 2017-01-04 | 乐视控股(北京)有限公司 | Voice recognition processing method and device |
CN107659847A (en) * | 2016-09-22 | 2018-02-02 | 腾讯科技(北京)有限公司 | Voice interface method and apparatus |
-
2018
- 2018-05-24 CN CN201810504702.1A patent/CN108682423A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131704A1 (en) * | 1997-04-14 | 2005-06-16 | At&T Corp. | System and method for providing remote automatic speech recognition and text to speech services via a packet network |
CN102148031A (en) * | 2011-04-01 | 2011-08-10 | 无锡大核科技有限公司 | Voice recognition and interaction system and method |
CN106297799A (en) * | 2016-08-09 | 2017-01-04 | 乐视控股(北京)有限公司 | Voice recognition processing method and device |
CN107659847A (en) * | 2016-09-22 | 2018-02-02 | 腾讯科技(北京)有限公司 | Voice interface method and apparatus |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109948124A (en) * | 2019-03-15 | 2019-06-28 | 腾讯科技(深圳)有限公司 | Voice document cutting method, device and computer equipment |
CN109948124B (en) * | 2019-03-15 | 2022-12-23 | 腾讯科技(深圳)有限公司 | Voice file segmentation method and device and computer equipment |
CN110767217A (en) * | 2019-10-30 | 2020-02-07 | 爱驰汽车有限公司 | Audio segmentation method, system, electronic device and storage medium |
CN110767217B (en) * | 2019-10-30 | 2022-04-12 | 爱驰汽车有限公司 | Audio segmentation method, system, electronic device and storage medium |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105869629B (en) | Audio recognition method and device | |
US9047868B1 (en) | Language model data collection | |
TWI427620B (en) | A speech recognition result correction device and a speech recognition result correction method, and a speech recognition result correction system | |
CN111261144B (en) | Voice recognition method, device, terminal and storage medium | |
JP5042194B2 (en) | Apparatus and method for updating speaker template | |
JP2001273283A (en) | Method for identifying language and controlling audio reproducing device and communication device | |
WO2020087655A1 (en) | Translation method, apparatus and device, and readable storage medium | |
CN104168353A (en) | Bluetooth earphone and voice interaction control method thereof | |
CN106713111B (en) | Processing method for adding friends, terminal and server | |
WO2020155490A1 (en) | Method and apparatus for managing music based on speech analysis, and computer device | |
CN201919034U (en) | Network-based voice prompt system | |
CN106710585B (en) | Polyphone broadcasting method and system during interactive voice | |
CN108682423A (en) | A kind of audio recognition method and device | |
CN103794211B (en) | A kind of audio recognition method and system | |
US20180288109A1 (en) | Conference support system, conference support method, program for conference support apparatus, and program for terminal | |
US11714973B2 (en) | Methods and systems for control of content in an alternate language or accent | |
CN109710949A (en) | A kind of interpretation method and translator | |
TW200304638A (en) | Network-accessible speaker-dependent voice models of multiple persons | |
CN112201275A (en) | Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium | |
JP5112978B2 (en) | Speech recognition apparatus, speech recognition system, and program | |
CN106899486A (en) | A kind of message display method and device | |
CN114550718A (en) | Hot word speech recognition method, device, equipment and computer readable storage medium | |
CN102571882A (en) | Network-based voice reminding method and system | |
JP2013050605A (en) | Language model switching device and program for the same | |
JP2003131700A (en) | Voice information outputting device and its method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20210827 |