CN105845138A - Voice signal processing method and apparatus - Google Patents
Voice signal processing method and apparatus Download PDFInfo
- Publication number
- CN105845138A CN105845138A CN201610179999.XA CN201610179999A CN105845138A CN 105845138 A CN105845138 A CN 105845138A CN 201610179999 A CN201610179999 A CN 201610179999A CN 105845138 A CN105845138 A CN 105845138A
- Authority
- CN
- China
- Prior art keywords
- voice
- dropout
- threshold value
- signal
- voice segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42222—Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
Abstract
The embodiment of the invention provides a voice signal processing method and apparatus. The voice signal processing method includes the steps of receiving voice signals including at least one voice section, obtaining signal loss information of the at least voice section, determining the signal loss level of the voice signals according to the signal loss information of the at least voice section, and conducting speech recognition processing for the voice signals according to the signal loss level of the voice signals. Corresponding processing means is adopted according to the signal loss level of the voice signals, which improves the accuracy of voice signal identification.
Description
Technical field
The present embodiments relate to technical field of voice recognition, particularly relate to a kind of audio signal processing method
And device.
Background technology
Along with the development of intelligent television technology, occur in that voice television business, i.e. allow user to pass through voice
Man-machine interaction is carried out with TV.In order to support voice television business, occur on the basis of traditional remote controller
Voice remote controller.User carries out interactive voice by voice remote controller and TV.
Concrete, user speech is recorded by voice remote controller, generates analog voice signal, to simulation
Voice signal carries out analog digital conversion, to obtain audio digital signals, is then transferred to by audio digital signals
Television terminal, audio digital signals is identified by television terminal, performs corresponding operating according to recognition result,
Realize man-machine interaction
In the prior art, between voice remote controller and television terminal in main employing 2.4GHz frequency range
Radio Transmission Technology, such as Wi-Fi, bluetooth etc. communicate.Owing to Wi-Fi, bluetooth etc. are wirelessly transferred
Technology is highly susceptible to the interference of extraneous factor, therefore in the transmitting procedure of voice signal, it is likely that meeting
Dropout phenomenon occur, this can reduce the accuracy rate of speech recognition, affect Consumer's Experience.
Summary of the invention
The embodiment of the present invention provides a kind of audio signal processing method and device, in order to carry out speech recognition,
Improve the accuracy rate of voice signal identification.
The embodiment of the present invention provides a kind of audio signal processing method, including:
Receiving voice signal, described voice signal includes at least one voice segments;
Obtain the dropout information of at least one voice segments described;
Dropout information according at least one voice segments described, determines that the signal of described voice signal is lost
Mistake degree;
Dropout degree according to described voice signal, carries out voice recognition processing to described voice signal.
The embodiment of the present invention provides a kind of speech signal processing device, including:
Receiver module, is used for receiving voice signal, and described voice signal includes at least one voice segments;
Acquisition module, for obtaining the dropout information of at least one voice segments described;
Determine module, for the dropout information according at least one voice segments described, determine institute's predicate
The dropout degree of tone signal;
Processing module, for the dropout degree according to described voice signal, is carried out described voice signal
Voice recognition processing.
The audio signal processing method of embodiment of the present invention offer and device, include by obtaining voice signal
The dropout information of each voice segments, according to the dropout information of each voice segments, determine voice signal
Dropout degree, dropout degree based on voice signal, voice signal is carried out voice recognition processing.
The embodiment of the present invention has taken into full account the dropout impact on voice signal subsequent treatment, and can root
According to the dropout degree of voice signal, take corresponding processing mode, beneficially can improve voice signal
The accuracy rate identified.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality
Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under,
Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art,
On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The schematic flow sheet of the audio signal processing method that Fig. 1 provides for one embodiment of the invention;
The structural representation of the speech signal processing device that Fig. 2 provides for another embodiment of the present invention;
The structural representation of the speech signal processing device that Fig. 3 provides for further embodiment of this invention.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this
Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention,
Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on
Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise
The every other embodiment obtained, broadly falls into the scope of protection of the invention.
The schematic flow sheet of the audio signal processing method that Fig. 1 provides for one embodiment of the invention.Such as Fig. 1
Shown in, the method includes:
101, receiving voice signal, described voice signal includes at least one voice segments.
102, the dropout information of at least one voice segments is obtained.
103, according to the dropout information of at least one voice segments, the dropout degree of voice signal is determined.
104, according to the dropout degree of voice signal, voice signal is carried out voice recognition processing.
The present embodiment provides a kind of audio signal processing method, can be performed by speech signal processing device,
In order to improve the accuracy rate of voice signal identification.
The method that the present embodiment provides is applicable to the various application scenarios needing to carry out voice signal identification, special
Not, for using the Radio Transmission Technology in 2.4GHz frequency range, such as Wi-Fi, bluetooth etc. carry out language
The application scenarios of tone signal transmission, due to the Radio Transmission Technology such as Wi-Fi, bluetooth be highly susceptible to extraneous because of
The interference of element, therefore in the transmitting procedure of voice signal, it is easier to dropout phenomenon occur, therefore
The method that the present embodiment provides is more suitable for this application scenarios.Such as, in voice television business scenario,
Speech signal processing device can be located at television terminal or service end corresponding to television terminal realizes, thus uses
The voice signal that voice remote controller is sent by the method that the present embodiment provides carries out voice recognition processing, improves
The accuracy rate of speech recognition.
Below the Method And Principle of the present embodiment is described in detail with flow process.
Concrete, speech signal processing device receives voice signal.Such as, speech signal processing device can
To receive speech signal collection equipment (such as voice remote controller, smart mobile phone etc.) in each application scenarios
The voice signal sent.Wherein, for speech signal collection equipment, collect is analog voice signal,
Analog voice signal can be carried out analog digital conversion, then the voice signal after analog digital conversion is sent to language
Tone signal processing means.
Optionally, speech signal collection equipment before voice signal is sent to speech signal processing device,
Can also encode voice signal, compression etc. processes.If the language that speech signal processing device receives
Tone signal be encoded, compression after signal, then speech signal processing device is receiving voice signal
Afterwards, also to decompress voice signal, decoding etc. processes.
Owing to voice signal belongs to short-term stationarity signal, so voice can be believed by speech signal processing device
Number carry out segmentation, it is thus achieved that at least one voice segments.Wherein, speech signal processing device can use and can move
Dynamic limited length of window is weighted realizing.Each voice segments includes multiple signaling point.The present embodiment is not
The length of qualifier segment, the number of the signaling point that the length of voice segments is comprised by voice segments determines.According to
The difference of application scenarios, the being adapted to property of length of voice segments is arranged, such as, can be 256,1024 etc..
It addition, both can be continuous between voice segments and voice segments, it is also possible to overlapping.Preferably, segmentation can
With the method using overlapping segmentation, say, that there is overlap between previous voice segments and a rear voice segments
Point, so can ensure that seamlessly transitting between voice segments and voice segments, keep its seriality.
After voice signal is divided at least one voice segments, speech signal processing device can obtain
The dropout information of at least one voice segments.Wherein, the dropout information spinner of voice segments to include one
The information of signaling point loss situation in voice segments can be reflected a bit, the signaling point such as lost and losing continuously
The number etc. of the signaling point lost.For ease of describing, the signaling point lost continuously is considered as one by the present embodiment
Fragment, referred to as dropout fragment, and the signaling point lost continuously that dropout fragment is included
Number is as the length of this dropout fragment.
Based on above-mentioned, in an optional embodiment, to each voice segments at least one voice segments,
The amplitude of each two adjacent signaling point in this voice segments can be multiplied by speech signal processing device, obtains phase
Take advantage of result to be more than or equal to the signaling point that the adjacent signaling point of 0 is lost as this voice segments, wherein, be multiplied
The signaling point of the result the most not zero passage of the adjacent signaling point more than or equal to 0, and add up this voice segments
In the length of dropout fragment that formed of the signaling point lost continuously.What deserves to be explained is, voice segments can
To include one or more dropout fragment.Illustrate, it is assumed that a voice segments includes 200 letters
Number point, wherein, the 20th to the 40th signaling point is all lost, and forms a dropout fragment, its
A length of 21, it addition, the 80th to the 120th signaling point is all lost, form another dropout
Fragment, they are a length of 41 years old.
Owing to voice signal is divided at least one voice segments, so the signal of at least one voice segments is lost
Breath of breaking one's promise can reflect the dropout situation of voice signal.Therefore, at the dropout obtaining voice segments
After information, speech signal processing device can determine according to the dropout information of at least one voice segments
The dropout degree of voice signal.Wherein, the dropout degree of voice signal reflects that this voice signal is lost
The degree of signal, such as, can be that zero degree loses (i.e. not losing), slight loss or severe loss etc..
In an optional embodiment, speech signal processing device can be according to the letter of at least one voice segments
Number lose information, add up the number losing voice segments at least one voice segments, then will lose voice segments
Number compare with default hop count threshold value, according to number and the default hop count threshold value of losing voice segments
Comparative result, determines the dropout degree of voice signal.
Wherein, above-mentioned loss voice segments refers to that the number that there is the signaling point of dropout and loss meets finger
The voice segments of fixed condition.Such as, when the dropout information according to a voice segments, this voice segments is determined
Really occur signaling point to lose, and when the signaling point lost meets specified requirements, determine that this voice segments is for losing
Aphasia segment.Specified requirements can be lossing signal point sum more than first appointment number, such as 100,
Then speech signal processing device can identify the sum of the lossing signal point voice segments more than 50 as losing
Aphasia segment;Or, it is intended that condition can also be that the number of the signaling point lost continuously is more than the second appointment
Number, such as 60, then the number of the signaling point that speech signal processing device can identify continuous loss is big
Voice segments in 60 is as losing voice segments.
Illustrating at this, the present embodiment does not limit the value of above-mentioned hop count threshold value, can fit according to application scenarios
Answering property is arranged.
Further alternative, above-mentioned hop count threshold value is one, then speech signal processing device can will be lost
The number of voice segments compares with default hop count threshold value, if losing the number of voice segments more than presetting hop count
Threshold value, it is determined that voice signal is severe dropout, severe dropout refers to signaling point loss situation
The most serious;Otherwise, if the number losing voice segments less than or equal to default hop count threshold value but is not 0,
Then determining that voice signal is slight dropout, slight dropout refers to that signaling point loss situation is the lightest
Micro-;If losing the number of voice segments equal to 0, it is determined that voice signal is that zero degree signal is lost, and illustrates not
There is dropout.For example, it is assumed that hop count threshold value is 10, the sum of the voice segments that voice signal marks off
It is 60, if 60 voice segments have more than the voice segments generation dropout of 10, illustrates that this voice is believed
Number it is severe dropout;If 60 voice segments having voice segments generation dropout but less than 10,
Illustrate that this voice signal is slight dropout;If 60 voice segments not having voice segments generation signal lose
Lose, illustrate that this voice signal is that zero degree signal is lost.
Further alternative, above-mentioned hop count threshold value includes the first hop count threshold value and the second hop count threshold value, and
Two hop count threshold values are more than the first hop count threshold value.Based on this, speech signal processing device can will lose voice
The number of section compares with the first hop count threshold value and the second hop count threshold value respectively;If losing the individual of voice segments
Number, less than or equal to the first hop count threshold value, determines that voice signal is that zero degree signal is lost;If loss voice segments
Number more than the first hop count threshold value but less than or equal to the second hop count threshold value, determine that voice signal is slight
Dropout;If the number losing voice segments is more than the second hop count threshold value, determine that voice signal is severe letter
Number lose.
Illustrating at this, above-mentioned hop count threshold value can also include multiple hop count threshold value, thus by voice signal
Dropout situation is divided into more different dropout degree, the most once, two degree, three degree, four degree,
Five degree etc..
In another optional embodiment, speech signal processing device can be according at least one voice segments
Dropout information, adds up the dropout fragment that formed in each voice segments by the signaling point lost continuously
Length, according to the length of dropout fragment in each voice segments and the comparative result presetting threshold value of counting, really
The dropout degree of speech signal.
Wherein, above-mentioned dropout fragment refers to the fragment that the signaling point lost continuously in voice segments is formed.
The length of dropout fragment refers to the number of the signaling point lost continuously that dropout fragment includes.
If the signaling point lost continuously in certain voice segments is more, even if other voice segments does not occurs signal
Losing or dropout lesser extent, also resulting in voice signal cannot be correctly validated, so voice letter
Number processing means can be added up signal in each voice segments lose according to the dropout information of at least one voice segments
Lose the length of fragment;Then the length of dropout fragment in each voice segments is carried out with presetting threshold value of counting
Relatively, come really according to the length of dropout fragment in each voice segments and the comparative result presetting threshold value of counting
The dropout degree of speech signal.
Further alternative, above-mentioned threshold value of counting is one, then speech signal processing device can be by each language
In segment, the length of dropout fragment compares with this threshold value of counting, if there is dropout fragment
Length is more than the voice segments of threshold value of counting, it is determined that voice signal is severe dropout, and severe signal is lost
It is the most serious that mistake refers to that signaling point loses relatively situation;Otherwise, if the length that there is not dropout fragment is big
Voice segments in threshold value of counting, it is determined that voice signal is slight dropout.
Further alternative, above-mentioned threshold value of counting includes: the first threshold value and second of counting is counted threshold value, and
Second threshold value of counting is counted threshold value more than first.Based on this, speech signal processing device can be by each voice
In Duan, the length of dropout fragment threshold value and second threshold value of counting of counting with first respectively compares;If
At least one voice segments does not exist the length of dropout fragment count more than first the voice segments of threshold value,
Determine that voice signal is that zero degree signal is lost;If at least one voice segments exists the length of dropout fragment
Degree is counted threshold value more than first but be there is not the length of dropout fragment and count more than second the voice of threshold value
Section, determines that voice signal is slight dropout;If at least one voice segments exists dropout fragment
The length voice segments of counting threshold value more than second, determine that voice signal is severe dropout.
In another optional embodiment, speech signal processing device can be simultaneously according to losing voice segments
Number and the length of dropout fragment in the comparative result of default hop count threshold value and each voice segments are with default
The comparative result of threshold value of counting, determines the dropout degree of voice signal.
Such as, include the first hop count threshold value and the second hop count threshold value with hop count threshold value, and threshold value of counting include:
First threshold value and second of counting is counted as a example by threshold value, if losing the number of voice segments more than the second hop count threshold value,
And at least one voice segments exists the length of dropout fragment count more than second the voice segments of threshold value,
Determine that voice signal is severe dropout;If losing the number of voice segments less than or equal to the first hop count threshold
Value, and at least one voice segments does not exist the length of dropout fragment count more than first the language of threshold value
Segment, determines that voice signal is that zero degree signal is lost;Remaining situation, determines that voice signal is slight signal
Lose.
What deserves to be explained is, the above-mentioned various dropouts determining voice signal provided according to the present embodiment
The embodiment of degree, those skilled in the art are it is contemplated that similar expansion scheme, and these expansion scheme all belong to
In protection scope of the present invention, various expansion scheme is described the most one by one at this.
After the dropout degree determining voice signal, speech signal processing device can be believed according to voice
Number dropout degree, voice signal is carried out voice recognition processing.
Lose if it is determined that voice signal is zero degree signal, i.e. there is not dropout in voice signal, then voice
Signal processing apparatus directly can carry out voice recognition processing to voice signal, both can ensure that speech recognition
The efficiency processed, it is also ensured that the accuracy rate of speech recognition, improves Consumer's Experience.
If it is determined that voice signal is slight dropout, voice signal generation dropout is described but it is not very
Seriously, also in the range of can correctly identifying, then speech signal processing device can be at least one language
Loss voice segments in segment, utilizes the signaling point do not lost in loss voice segments to lose losing in voice segments
The signaling point lost compensates, and the voice segments after compensating is carried out voice recognition processing, to ensure voice
The accuracy rate identified, improves Consumer's Experience.
What deserves to be explained is, the present embodiment does not limit to utilize loses the signaling point pair do not lost in voice segments
Lose the concrete mode that the signaling point lost in voice segments compensates.More preferred, can be by whole
The signaling point lost in voice segments is point two parts, to every the most a part of, utilize and does not loses in this part
The signaling point lost in this part is compensated, owing to the signaling point in every part comes relatively by signaling point
Saying at a distance of relatively near, compensating at a distance of nearer signaling point so utilizing, it is ensured that voice letter after compensation
Number with do not occur lose voice signal more closely, based on compensate after voice signal carrying out voice knowledge
Time other, be conducive to improving the accuracy rate of speech recognition.
Further, in above-mentioned signal compensation process, it is preferred to use with the signaling point lost at a distance of nearest
The signaling point of this loss is compensated by the signaling point do not lost.
Generally, in above-mentioned signal compensation process, the number of the signaling point do not lost in every part is general
More than the number of the signaling point lost, so can completely compensate the signaling point of loss for every part.
Certainly, at some in particular cases, it is also possible to occur that in certain part, the number of lossing signal point is not more than
The number of lossing signal point, then for this part, can only compensate the signaling point of partial loss,
And for remaining the signaling point failing to be compensated, can use in another part at a distance of the most nearest signaling point
One by one it is compensated.
If it is determined that voice signal is severe dropout, illustrate that voice signal generation dropout ratio is more serious,
Having exceeded the scope that can correctly identify, then speech signal processing device exports information to user,
To point out this voice signal cannot normally identify for severe dropout.Wherein, speech signal processing device
By the way of text or voice, information can be exported to user, such as, can be on interactive interface
Output text prompt information, or output information of voice prompt.For user, can be according to this prompting
Information, takes corresponding measure in time, such as, re-enters voice signal, in order to obtain required voice in time
Service, improves Consumer's Experience.
The structural representation of the speech signal processing device that Fig. 2 provides for another embodiment of the present invention.Such as figure
Shown in 2, this device includes: receiver module 21, acquisition module 22, determine module 23 and processing module
24。
Receiver module 21, is used for receiving voice signal, and described voice signal includes at least one voice segments.
Acquisition module 22, for obtaining the dropout information of at least one voice segments.
Determine module 23, for the dropout information according at least one voice segments, determine voice signal
Dropout degree.
Processing module 24, for the dropout degree according to voice signal, carries out voice knowledge to voice signal
Other places are managed.
In an optional embodiment, acquisition module 22 specifically for:
To each voice segments at least one voice segments, by each two adjacent signaling point in institute's speech segment
Amplitude be multiplied, obtain the multiplied result adjacent signaling point more than or equal to 0 and lose as institute speech segment
Signaling point, and add up the dropout fragment that formed in institute's speech segment by the signaling point lost continuously
Length.
In an optional embodiment, as it is shown on figure 3, determine that the one of module 23 realizes structure and includes:
First determines that unit 231 and second determines at least one in unit 232.
Wherein, first determines unit 231, for the dropout information according at least one voice segments,
Add up the number losing voice segments at least one voice segments, according to losing the number of voice segments and default section
The comparative result of number threshold value, determines the dropout degree of voice signal.
Second determines unit 232, and for the dropout information according at least one voice segments, statistics is each
The length of the dropout fragment formed by the signaling point lost continuously in voice segments, according in each voice segments
The length of dropout fragment and the comparative result presetting threshold value of counting, determine the dropout of voice signal
Degree.
Further alternative, above-mentioned hop count threshold value includes: the first hop count threshold value and more than the first hop count threshold value
The second hop count threshold value;Accordingly, above-mentioned threshold value of counting includes: first counts threshold value and more than the first point
The second of number threshold values is counted threshold value.
Based on above-mentioned, first determines that unit 231 is particularly used in:
If the number losing voice segments is less than or equal to the first hop count threshold value, determine that voice signal is zero degree letter
Number lose;
If the number losing voice segments more than the first hop count threshold value but is less than or equal to the second hop count threshold value, really
Speech signal is slight dropout;
If the number losing voice segments is more than the second hop count threshold value, determine that voice signal is severe dropout.
Accordingly, second determines that unit 232 is particularly used in:
The language of threshold value if the length that there is not dropout fragment at least one voice segments is counted more than first
Segment, determines that voice signal is that zero degree signal is lost;
If at least one voice segments existing the length of dropout fragment count threshold value more than first but do not deposit
In the voice segments that the length of dropout fragment counts threshold value more than second, determine that voice signal is slight letter
Number lose;
The voice of threshold value if the length that there is dropout fragment at least one voice segments is counted more than second
Section, determines that voice signal is severe dropout.
In an optional embodiment, processing module 24 specifically for:
Lose if voice signal is zero degree signal, then directly voice signal is carried out voice recognition processing;
If voice signal is slight dropout, then to the loss voice segments at least one voice segments, profit
Compensate losing the signaling point lost in voice segments with losing the signaling point do not lost in voice segments, and
Voice segments after compensating is carried out voice recognition processing;
If voice signal is severe dropout, then export information to user, with suggestion voice signal
Cannot normally identify for severe dropout.
The speech signal processing device that the present embodiment provides, after having taken into full account that dropout is to voice signal
The continuous impact processed, and corresponding processing mode can be taked according to the dropout degree of voice signal,
Beneficially can improve the accuracy rate of voice signal identification.
Device embodiment described above is only schematically, wherein said illustrates as separating component
Unit can be or may not be physically separate, the parts shown as unit can be or
Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network
On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment
The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible
Understand and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality
The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly
Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words
Dividing and can embody with the form of software product, this computer software product can be stored in computer can
Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one
Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented
The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, rather than to it
Limit;Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area
Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or
Person carries out equivalent to wherein portion of techniques feature;And these amendments or replacement, do not make corresponding skill
The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.
Claims (10)
1. an audio signal processing method, it is characterised in that including:
Receiving voice signal, described voice signal includes at least one voice segments;
Obtain the dropout information of at least one voice segments described;
Dropout information according at least one voice segments described, determines that the signal of described voice signal is lost
Mistake degree;
Dropout degree according to described voice signal, carries out voice recognition processing to described voice signal.
Method the most according to claim 1, it is characterised in that at least one language described in described acquisition
The dropout information of segment, including:
To each voice segments at least one voice segments described, by the adjacent letter of each two in institute's speech segment
The amplitude of number point is multiplied, and obtains the multiplied result adjacent signaling point more than or equal to 0 as institute's speech segment
The signaling point lost, and add up the dropout sheet formed in institute's speech segment by the signaling point lost continuously
The length of section.
Method the most according to claim 1, it is characterised in that at least one language described in described basis
The dropout information of segment, determines the dropout degree of described voice signal, including:
Dropout information according at least one voice segments described, in statistics at least one voice segments described
Lose the number of voice segments, according to number and the comparative result of default hop count threshold value of described loss voice segments,
Determine the dropout degree of described voice signal;And/or
Dropout information according at least one voice segments described, adds up in each voice segments by losing continuously
The length of dropout fragment that formed of signaling point, according to the length of dropout fragment in each voice segments
With the comparative result of default threshold value of counting, determine the dropout degree of described voice signal.
Method the most according to claim 3, it is characterised in that described hop count threshold value includes: first
Hop count threshold value and the second hop count threshold value more than described first hop count threshold value;Described threshold value of counting includes: the
One counts threshold value and the second of threshold value of counting more than described first is counted threshold value;
Number according to described loss voice segments and the comparative result of default hop count threshold value, determine described voice
The dropout degree of signal, including:
If the number of described loss voice segments is less than or equal to described first hop count threshold value, determine described voice
Signal is that zero degree signal is lost;
If the number of described loss voice segments is more than described first hop count threshold value but less than or equal to described second
Hop count threshold value, determines that described voice signal is slight dropout;
If the number of described loss voice segments is more than described second hop count threshold value, determine that described voice signal is
Severe dropout;
The described comparison knot according to the length of dropout fragment in each voice segments with default threshold value of counting
Really, determine the dropout degree of described voice signal, including:
If the length that there is not dropout fragment at least one voice segments described is counted more than described first
The voice segments of threshold value, determines that described voice signal is that zero degree signal is lost;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described first
Value but there is not the length of dropout fragment and count more than described second the voice segments of threshold value, determine described
Voice signal is slight dropout;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described second
The voice segments of value, determines that described voice signal is severe dropout.
5. according to the method described in any one of claim 1-4, it is characterised in that described according to institute's predicate
The dropout degree of tone signal, carries out voice recognition processing to described voice signal, including:
Lose if described voice signal is zero degree signal, then directly described voice signal is carried out speech recognition
Process;
If described voice signal is slight dropout, then to the loss language at least one voice segments described
Segment, utilizes the signaling point do not lost in the described loss voice segments letter to losing in described loss voice segments
Number point compensates, and the voice segments after compensating is carried out voice recognition processing;
If described voice signal is severe dropout, then export information to user, described with prompting
Voice signal is that severe dropout cannot normally identify.
6. a speech signal processing device, it is characterised in that including:
Receiver module, is used for receiving voice signal, and described voice signal includes at least one voice segments;
Acquisition module, for obtaining the dropout information of at least one voice segments described;
Determine module, for the dropout information according at least one voice segments described, determine institute's predicate
The dropout degree of tone signal;
Processing module, for the dropout degree according to described voice signal, is carried out described voice signal
Voice recognition processing.
Device the most according to claim 6, it is characterised in that described acquisition module specifically for:
To each voice segments at least one voice segments described, by the adjacent letter of each two in institute's speech segment
The amplitude of number point is multiplied, and obtains the multiplied result adjacent signaling point more than or equal to 0 as institute's speech segment
The signaling point lost, and add up the dropout sheet formed in institute's speech segment by the signaling point lost continuously
The length of section.
Device the most according to claim 6, it is characterised in that described determine that module includes:
First determines unit, for the dropout information according at least one voice segments described, adds up institute
State the number losing voice segments at least one voice segments, according to the number of described loss voice segments with preset
The comparative result of hop count threshold value, determines the dropout degree of described voice signal;And/or
Second determines unit, and for the dropout information according at least one voice segments described, statistics is each
The length of the dropout fragment formed by the signaling point lost continuously in voice segments, according in each voice segments
The length of dropout fragment and the comparative result presetting threshold value of counting, determine the signal of described voice signal
Loss degree.
Device the most according to claim 8, it is characterised in that described hop count threshold value includes: first
Hop count threshold value and the second hop count threshold value more than described first hop count threshold value;Described threshold value of counting includes: the
One counts threshold value and the second of threshold value of counting more than described first is counted threshold value;
Described first determine unit specifically for:
If the number of described loss voice segments is less than or equal to described first hop count threshold value, determine described voice
Signal is that zero degree signal is lost;
If the number of described loss voice segments is more than described first hop count threshold value but less than or equal to described second
Hop count threshold value, determines that described voice signal is slight dropout;
If the number of described loss voice segments is more than described second hop count threshold value, determine that described voice signal is
Severe dropout;
Described second determine unit specifically for:
If the length that there is not dropout fragment at least one voice segments described is counted more than described first
The voice segments of threshold value, determines that described voice signal is that zero degree signal is lost;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described first
Value but there is not the length of dropout fragment and count more than described second the voice segments of threshold value, determine described
Voice signal is slight dropout;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described second
The voice segments of value, determines that described voice signal is severe dropout.
10. according to the device described in any one of claim 6-9, it is characterised in that described processing module
Specifically for:
Lose if described voice signal is zero degree signal, then directly described voice signal is carried out speech recognition
Process;
If described voice signal is slight dropout, then to the loss language at least one voice segments described
Segment, utilizes the signaling point do not lost in the described loss voice segments letter to losing in described loss voice segments
Number point compensates, and the voice segments after compensating is carried out voice recognition processing;
If described voice signal is severe dropout, then export information to user, described with prompting
Voice signal is that severe dropout cannot normally identify.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610179999.XA CN105845138A (en) | 2016-03-25 | 2016-03-25 | Voice signal processing method and apparatus |
PCT/CN2016/096988 WO2017161829A1 (en) | 2016-03-25 | 2016-08-26 | Voice signal information processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610179999.XA CN105845138A (en) | 2016-03-25 | 2016-03-25 | Voice signal processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105845138A true CN105845138A (en) | 2016-08-10 |
Family
ID=56583905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610179999.XA Pending CN105845138A (en) | 2016-03-25 | 2016-03-25 | Voice signal processing method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105845138A (en) |
WO (1) | WO2017161829A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106856093A (en) * | 2017-02-23 | 2017-06-16 | 海信集团有限公司 | Audio-frequency information processing method, intelligent terminal and Voice command terminal |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
WO2017161829A1 (en) * | 2016-03-25 | 2017-09-28 | 乐视控股(北京)有限公司 | Voice signal information processing method and device |
CN107316638A (en) * | 2017-06-28 | 2017-11-03 | 北京粉笔未来科技有限公司 | A kind of poem recites evaluating method and system, a kind of terminal and storage medium |
CN107990908A (en) * | 2017-11-20 | 2018-05-04 | 广东欧珀移动通信有限公司 | A kind of phonetic navigation method and device based on Bluetooth communication |
CN108831438A (en) * | 2018-07-24 | 2018-11-16 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN108965562A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109003619A (en) * | 2018-07-24 | 2018-12-14 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109065017A (en) * | 2018-07-24 | 2018-12-21 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109121042A (en) * | 2018-07-26 | 2019-01-01 | Oppo广东移动通信有限公司 | Voice data processing method and Related product |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113270096A (en) * | 2021-05-13 | 2021-08-17 | 前海七剑科技(深圳)有限公司 | Voice response method and device, electronic equipment and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002297180A (en) * | 2001-03-29 | 2002-10-11 | Sanyo Electric Co Ltd | Voice recognizing device |
CN1604572A (en) * | 2004-11-09 | 2005-04-06 | 北京中星微电子有限公司 | A semantic integrity ensuring method under IP network environment |
CN1731718A (en) * | 2004-08-06 | 2006-02-08 | 北京中星微电子有限公司 | Noise reduction method and device concerning IP network voice data packet lost |
CN103632679A (en) * | 2012-08-21 | 2014-03-12 | 华为技术有限公司 | An audio stream quality assessment method and an apparatus |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2517393C2 (en) * | 2008-06-11 | 2014-05-27 | Ниппон Телеграф Энд Телефон Корпорейшн | Method of estimating quality of audio signal, apparatus and computer-readable medium storing programme |
CN102568470B (en) * | 2012-01-11 | 2013-12-25 | 广州酷狗计算机科技有限公司 | Acoustic fidelity identification method and system for audio files |
CN105845138A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Voice signal processing method and apparatus |
-
2016
- 2016-03-25 CN CN201610179999.XA patent/CN105845138A/en active Pending
- 2016-08-26 WO PCT/CN2016/096988 patent/WO2017161829A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002297180A (en) * | 2001-03-29 | 2002-10-11 | Sanyo Electric Co Ltd | Voice recognizing device |
CN1731718A (en) * | 2004-08-06 | 2006-02-08 | 北京中星微电子有限公司 | Noise reduction method and device concerning IP network voice data packet lost |
CN1604572A (en) * | 2004-11-09 | 2005-04-06 | 北京中星微电子有限公司 | A semantic integrity ensuring method under IP network environment |
CN103632679A (en) * | 2012-08-21 | 2014-03-12 | 华为技术有限公司 | An audio stream quality assessment method and an apparatus |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017161829A1 (en) * | 2016-03-25 | 2017-09-28 | 乐视控股(北京)有限公司 | Voice signal information processing method and device |
CN106856093A (en) * | 2017-02-23 | 2017-06-16 | 海信集团有限公司 | Audio-frequency information processing method, intelligent terminal and Voice command terminal |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
CN107316638A (en) * | 2017-06-28 | 2017-11-03 | 北京粉笔未来科技有限公司 | A kind of poem recites evaluating method and system, a kind of terminal and storage medium |
CN107990908A (en) * | 2017-11-20 | 2018-05-04 | 广东欧珀移动通信有限公司 | A kind of phonetic navigation method and device based on Bluetooth communication |
CN108965562A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN108831438A (en) * | 2018-07-24 | 2018-11-16 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109003619A (en) * | 2018-07-24 | 2018-12-14 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109065017A (en) * | 2018-07-24 | 2018-12-21 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN108831438B (en) * | 2018-07-24 | 2021-01-08 | Oppo(重庆)智能科技有限公司 | Voice data generation method and device, electronic device and computer readable storage medium |
CN108965562B (en) * | 2018-07-24 | 2021-04-13 | Oppo(重庆)智能科技有限公司 | Voice data generation method and related device |
CN109065017B (en) * | 2018-07-24 | 2021-04-16 | Oppo(重庆)智能科技有限公司 | Voice data generation method and related device |
CN109121042A (en) * | 2018-07-26 | 2019-01-01 | Oppo广东移动通信有限公司 | Voice data processing method and Related product |
CN109121042B (en) * | 2018-07-26 | 2020-12-08 | Oppo广东移动通信有限公司 | Voice data processing method and related product |
Also Published As
Publication number | Publication date |
---|---|
WO2017161829A1 (en) | 2017-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105845138A (en) | Voice signal processing method and apparatus | |
JP6820360B2 (en) | Signal classification methods and signal classification devices, as well as coding / decoding methods and coding / decoding devices. | |
KR102037195B1 (en) | Voice detection methods, devices and storage media | |
CN102056026B (en) | Audio/video synchronization detection method and system, and voice detection method and system | |
JP2019531494A (en) | Voice quality evaluation method and apparatus | |
CN103035238B (en) | Encoding method and decoding method of voice frequency data | |
CN107808670A (en) | Voice data processing method, device, equipment and storage medium | |
CN109036412A (en) | voice awakening method and system | |
CN107533850B (en) | Audio content identification method and device | |
CN102714034B (en) | Signal processing method, device and system | |
US9424743B2 (en) | Real-time traffic detection | |
CN110265065B (en) | Method for constructing voice endpoint detection model and voice endpoint detection system | |
CN106356077B (en) | A kind of laugh detection method and device | |
CN107644643A (en) | A kind of voice interactive system and method | |
KR20140031790A (en) | Robust voice activity detection in adverse environments | |
CN105847252B (en) | A kind of method and device of more account switchings | |
US20160323687A1 (en) | Stereo decoding method and apparatus | |
CN107705791A (en) | Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition | |
CN109410956A (en) | A kind of object identifying method of audio data, device, equipment and storage medium | |
CN104978955A (en) | Voice control method and system | |
CN103050116A (en) | Voice command identification method and system | |
CN106782529A (en) | The wake-up selected ci poem selection method and device of speech recognition | |
CN102376306B (en) | Method and device for acquiring level of speech frame | |
CN109524013A (en) | A kind of method of speech processing, device, medium and smart machine | |
CN107957861B (en) | Method and device for instantly playing audio data in sound card signal input channel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160810 |