CN105845138A - Voice signal processing method and apparatus - Google Patents

Voice signal processing method and apparatus Download PDF

Info

Publication number
CN105845138A
CN105845138A CN201610179999.XA CN201610179999A CN105845138A CN 105845138 A CN105845138 A CN 105845138A CN 201610179999 A CN201610179999 A CN 201610179999A CN 105845138 A CN105845138 A CN 105845138A
Authority
CN
China
Prior art keywords
voice
dropout
threshold value
signal
voice segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610179999.XA
Other languages
Chinese (zh)
Inventor
王永庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshi Zhixin Electronic Technology Tianjin Co Ltd
LeTV Holding Beijing Co Ltd
Original Assignee
Leshi Zhixin Electronic Technology Tianjin Co Ltd
LeTV Holding Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshi Zhixin Electronic Technology Tianjin Co Ltd, LeTV Holding Beijing Co Ltd filed Critical Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority to CN201610179999.XA priority Critical patent/CN105845138A/en
Publication of CN105845138A publication Critical patent/CN105845138A/en
Priority to PCT/CN2016/096988 priority patent/WO2017161829A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42222Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device

Abstract

The embodiment of the invention provides a voice signal processing method and apparatus. The voice signal processing method includes the steps of receiving voice signals including at least one voice section, obtaining signal loss information of the at least voice section, determining the signal loss level of the voice signals according to the signal loss information of the at least voice section, and conducting speech recognition processing for the voice signals according to the signal loss level of the voice signals. Corresponding processing means is adopted according to the signal loss level of the voice signals, which improves the accuracy of voice signal identification.

Description

Audio signal processing method and device
Technical field
The present embodiments relate to technical field of voice recognition, particularly relate to a kind of audio signal processing method And device.
Background technology
Along with the development of intelligent television technology, occur in that voice television business, i.e. allow user to pass through voice Man-machine interaction is carried out with TV.In order to support voice television business, occur on the basis of traditional remote controller Voice remote controller.User carries out interactive voice by voice remote controller and TV.
Concrete, user speech is recorded by voice remote controller, generates analog voice signal, to simulation Voice signal carries out analog digital conversion, to obtain audio digital signals, is then transferred to by audio digital signals Television terminal, audio digital signals is identified by television terminal, performs corresponding operating according to recognition result, Realize man-machine interaction
In the prior art, between voice remote controller and television terminal in main employing 2.4GHz frequency range Radio Transmission Technology, such as Wi-Fi, bluetooth etc. communicate.Owing to Wi-Fi, bluetooth etc. are wirelessly transferred Technology is highly susceptible to the interference of extraneous factor, therefore in the transmitting procedure of voice signal, it is likely that meeting Dropout phenomenon occur, this can reduce the accuracy rate of speech recognition, affect Consumer's Experience.
Summary of the invention
The embodiment of the present invention provides a kind of audio signal processing method and device, in order to carry out speech recognition, Improve the accuracy rate of voice signal identification.
The embodiment of the present invention provides a kind of audio signal processing method, including:
Receiving voice signal, described voice signal includes at least one voice segments;
Obtain the dropout information of at least one voice segments described;
Dropout information according at least one voice segments described, determines that the signal of described voice signal is lost Mistake degree;
Dropout degree according to described voice signal, carries out voice recognition processing to described voice signal.
The embodiment of the present invention provides a kind of speech signal processing device, including:
Receiver module, is used for receiving voice signal, and described voice signal includes at least one voice segments;
Acquisition module, for obtaining the dropout information of at least one voice segments described;
Determine module, for the dropout information according at least one voice segments described, determine institute's predicate The dropout degree of tone signal;
Processing module, for the dropout degree according to described voice signal, is carried out described voice signal Voice recognition processing.
The audio signal processing method of embodiment of the present invention offer and device, include by obtaining voice signal The dropout information of each voice segments, according to the dropout information of each voice segments, determine voice signal Dropout degree, dropout degree based on voice signal, voice signal is carried out voice recognition processing. The embodiment of the present invention has taken into full account the dropout impact on voice signal subsequent treatment, and can root According to the dropout degree of voice signal, take corresponding processing mode, beneficially can improve voice signal The accuracy rate identified.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that under, Accompanying drawing during face describes is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
The schematic flow sheet of the audio signal processing method that Fig. 1 provides for one embodiment of the invention;
The structural representation of the speech signal processing device that Fig. 2 provides for another embodiment of the present invention;
The structural representation of the speech signal processing device that Fig. 3 provides for further embodiment of this invention.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention rather than whole embodiments.Based on Embodiment in the present invention, those of ordinary skill in the art are obtained under not making creative work premise The every other embodiment obtained, broadly falls into the scope of protection of the invention.
The schematic flow sheet of the audio signal processing method that Fig. 1 provides for one embodiment of the invention.Such as Fig. 1 Shown in, the method includes:
101, receiving voice signal, described voice signal includes at least one voice segments.
102, the dropout information of at least one voice segments is obtained.
103, according to the dropout information of at least one voice segments, the dropout degree of voice signal is determined.
104, according to the dropout degree of voice signal, voice signal is carried out voice recognition processing.
The present embodiment provides a kind of audio signal processing method, can be performed by speech signal processing device, In order to improve the accuracy rate of voice signal identification.
The method that the present embodiment provides is applicable to the various application scenarios needing to carry out voice signal identification, special Not, for using the Radio Transmission Technology in 2.4GHz frequency range, such as Wi-Fi, bluetooth etc. carry out language The application scenarios of tone signal transmission, due to the Radio Transmission Technology such as Wi-Fi, bluetooth be highly susceptible to extraneous because of The interference of element, therefore in the transmitting procedure of voice signal, it is easier to dropout phenomenon occur, therefore The method that the present embodiment provides is more suitable for this application scenarios.Such as, in voice television business scenario, Speech signal processing device can be located at television terminal or service end corresponding to television terminal realizes, thus uses The voice signal that voice remote controller is sent by the method that the present embodiment provides carries out voice recognition processing, improves The accuracy rate of speech recognition.
Below the Method And Principle of the present embodiment is described in detail with flow process.
Concrete, speech signal processing device receives voice signal.Such as, speech signal processing device can To receive speech signal collection equipment (such as voice remote controller, smart mobile phone etc.) in each application scenarios The voice signal sent.Wherein, for speech signal collection equipment, collect is analog voice signal, Analog voice signal can be carried out analog digital conversion, then the voice signal after analog digital conversion is sent to language Tone signal processing means.
Optionally, speech signal collection equipment before voice signal is sent to speech signal processing device, Can also encode voice signal, compression etc. processes.If the language that speech signal processing device receives Tone signal be encoded, compression after signal, then speech signal processing device is receiving voice signal Afterwards, also to decompress voice signal, decoding etc. processes.
Owing to voice signal belongs to short-term stationarity signal, so voice can be believed by speech signal processing device Number carry out segmentation, it is thus achieved that at least one voice segments.Wherein, speech signal processing device can use and can move Dynamic limited length of window is weighted realizing.Each voice segments includes multiple signaling point.The present embodiment is not The length of qualifier segment, the number of the signaling point that the length of voice segments is comprised by voice segments determines.According to The difference of application scenarios, the being adapted to property of length of voice segments is arranged, such as, can be 256,1024 etc..
It addition, both can be continuous between voice segments and voice segments, it is also possible to overlapping.Preferably, segmentation can With the method using overlapping segmentation, say, that there is overlap between previous voice segments and a rear voice segments Point, so can ensure that seamlessly transitting between voice segments and voice segments, keep its seriality.
After voice signal is divided at least one voice segments, speech signal processing device can obtain The dropout information of at least one voice segments.Wherein, the dropout information spinner of voice segments to include one The information of signaling point loss situation in voice segments can be reflected a bit, the signaling point such as lost and losing continuously The number etc. of the signaling point lost.For ease of describing, the signaling point lost continuously is considered as one by the present embodiment Fragment, referred to as dropout fragment, and the signaling point lost continuously that dropout fragment is included Number is as the length of this dropout fragment.
Based on above-mentioned, in an optional embodiment, to each voice segments at least one voice segments, The amplitude of each two adjacent signaling point in this voice segments can be multiplied by speech signal processing device, obtains phase Take advantage of result to be more than or equal to the signaling point that the adjacent signaling point of 0 is lost as this voice segments, wherein, be multiplied The signaling point of the result the most not zero passage of the adjacent signaling point more than or equal to 0, and add up this voice segments In the length of dropout fragment that formed of the signaling point lost continuously.What deserves to be explained is, voice segments can To include one or more dropout fragment.Illustrate, it is assumed that a voice segments includes 200 letters Number point, wherein, the 20th to the 40th signaling point is all lost, and forms a dropout fragment, its A length of 21, it addition, the 80th to the 120th signaling point is all lost, form another dropout Fragment, they are a length of 41 years old.
Owing to voice signal is divided at least one voice segments, so the signal of at least one voice segments is lost Breath of breaking one's promise can reflect the dropout situation of voice signal.Therefore, at the dropout obtaining voice segments After information, speech signal processing device can determine according to the dropout information of at least one voice segments The dropout degree of voice signal.Wherein, the dropout degree of voice signal reflects that this voice signal is lost The degree of signal, such as, can be that zero degree loses (i.e. not losing), slight loss or severe loss etc..
In an optional embodiment, speech signal processing device can be according to the letter of at least one voice segments Number lose information, add up the number losing voice segments at least one voice segments, then will lose voice segments Number compare with default hop count threshold value, according to number and the default hop count threshold value of losing voice segments Comparative result, determines the dropout degree of voice signal.
Wherein, above-mentioned loss voice segments refers to that the number that there is the signaling point of dropout and loss meets finger The voice segments of fixed condition.Such as, when the dropout information according to a voice segments, this voice segments is determined Really occur signaling point to lose, and when the signaling point lost meets specified requirements, determine that this voice segments is for losing Aphasia segment.Specified requirements can be lossing signal point sum more than first appointment number, such as 100, Then speech signal processing device can identify the sum of the lossing signal point voice segments more than 50 as losing Aphasia segment;Or, it is intended that condition can also be that the number of the signaling point lost continuously is more than the second appointment Number, such as 60, then the number of the signaling point that speech signal processing device can identify continuous loss is big Voice segments in 60 is as losing voice segments.
Illustrating at this, the present embodiment does not limit the value of above-mentioned hop count threshold value, can fit according to application scenarios Answering property is arranged.
Further alternative, above-mentioned hop count threshold value is one, then speech signal processing device can will be lost The number of voice segments compares with default hop count threshold value, if losing the number of voice segments more than presetting hop count Threshold value, it is determined that voice signal is severe dropout, severe dropout refers to signaling point loss situation The most serious;Otherwise, if the number losing voice segments less than or equal to default hop count threshold value but is not 0, Then determining that voice signal is slight dropout, slight dropout refers to that signaling point loss situation is the lightest Micro-;If losing the number of voice segments equal to 0, it is determined that voice signal is that zero degree signal is lost, and illustrates not There is dropout.For example, it is assumed that hop count threshold value is 10, the sum of the voice segments that voice signal marks off It is 60, if 60 voice segments have more than the voice segments generation dropout of 10, illustrates that this voice is believed Number it is severe dropout;If 60 voice segments having voice segments generation dropout but less than 10, Illustrate that this voice signal is slight dropout;If 60 voice segments not having voice segments generation signal lose Lose, illustrate that this voice signal is that zero degree signal is lost.
Further alternative, above-mentioned hop count threshold value includes the first hop count threshold value and the second hop count threshold value, and Two hop count threshold values are more than the first hop count threshold value.Based on this, speech signal processing device can will lose voice The number of section compares with the first hop count threshold value and the second hop count threshold value respectively;If losing the individual of voice segments Number, less than or equal to the first hop count threshold value, determines that voice signal is that zero degree signal is lost;If loss voice segments Number more than the first hop count threshold value but less than or equal to the second hop count threshold value, determine that voice signal is slight Dropout;If the number losing voice segments is more than the second hop count threshold value, determine that voice signal is severe letter Number lose.
Illustrating at this, above-mentioned hop count threshold value can also include multiple hop count threshold value, thus by voice signal Dropout situation is divided into more different dropout degree, the most once, two degree, three degree, four degree, Five degree etc..
In another optional embodiment, speech signal processing device can be according at least one voice segments Dropout information, adds up the dropout fragment that formed in each voice segments by the signaling point lost continuously Length, according to the length of dropout fragment in each voice segments and the comparative result presetting threshold value of counting, really The dropout degree of speech signal.
Wherein, above-mentioned dropout fragment refers to the fragment that the signaling point lost continuously in voice segments is formed. The length of dropout fragment refers to the number of the signaling point lost continuously that dropout fragment includes.
If the signaling point lost continuously in certain voice segments is more, even if other voice segments does not occurs signal Losing or dropout lesser extent, also resulting in voice signal cannot be correctly validated, so voice letter Number processing means can be added up signal in each voice segments lose according to the dropout information of at least one voice segments Lose the length of fragment;Then the length of dropout fragment in each voice segments is carried out with presetting threshold value of counting Relatively, come really according to the length of dropout fragment in each voice segments and the comparative result presetting threshold value of counting The dropout degree of speech signal.
Further alternative, above-mentioned threshold value of counting is one, then speech signal processing device can be by each language In segment, the length of dropout fragment compares with this threshold value of counting, if there is dropout fragment Length is more than the voice segments of threshold value of counting, it is determined that voice signal is severe dropout, and severe signal is lost It is the most serious that mistake refers to that signaling point loses relatively situation;Otherwise, if the length that there is not dropout fragment is big Voice segments in threshold value of counting, it is determined that voice signal is slight dropout.
Further alternative, above-mentioned threshold value of counting includes: the first threshold value and second of counting is counted threshold value, and Second threshold value of counting is counted threshold value more than first.Based on this, speech signal processing device can be by each voice In Duan, the length of dropout fragment threshold value and second threshold value of counting of counting with first respectively compares;If At least one voice segments does not exist the length of dropout fragment count more than first the voice segments of threshold value, Determine that voice signal is that zero degree signal is lost;If at least one voice segments exists the length of dropout fragment Degree is counted threshold value more than first but be there is not the length of dropout fragment and count more than second the voice of threshold value Section, determines that voice signal is slight dropout;If at least one voice segments exists dropout fragment The length voice segments of counting threshold value more than second, determine that voice signal is severe dropout.
In another optional embodiment, speech signal processing device can be simultaneously according to losing voice segments Number and the length of dropout fragment in the comparative result of default hop count threshold value and each voice segments are with default The comparative result of threshold value of counting, determines the dropout degree of voice signal.
Such as, include the first hop count threshold value and the second hop count threshold value with hop count threshold value, and threshold value of counting include: First threshold value and second of counting is counted as a example by threshold value, if losing the number of voice segments more than the second hop count threshold value, And at least one voice segments exists the length of dropout fragment count more than second the voice segments of threshold value, Determine that voice signal is severe dropout;If losing the number of voice segments less than or equal to the first hop count threshold Value, and at least one voice segments does not exist the length of dropout fragment count more than first the language of threshold value Segment, determines that voice signal is that zero degree signal is lost;Remaining situation, determines that voice signal is slight signal Lose.
What deserves to be explained is, the above-mentioned various dropouts determining voice signal provided according to the present embodiment The embodiment of degree, those skilled in the art are it is contemplated that similar expansion scheme, and these expansion scheme all belong to In protection scope of the present invention, various expansion scheme is described the most one by one at this.
After the dropout degree determining voice signal, speech signal processing device can be believed according to voice Number dropout degree, voice signal is carried out voice recognition processing.
Lose if it is determined that voice signal is zero degree signal, i.e. there is not dropout in voice signal, then voice Signal processing apparatus directly can carry out voice recognition processing to voice signal, both can ensure that speech recognition The efficiency processed, it is also ensured that the accuracy rate of speech recognition, improves Consumer's Experience.
If it is determined that voice signal is slight dropout, voice signal generation dropout is described but it is not very Seriously, also in the range of can correctly identifying, then speech signal processing device can be at least one language Loss voice segments in segment, utilizes the signaling point do not lost in loss voice segments to lose losing in voice segments The signaling point lost compensates, and the voice segments after compensating is carried out voice recognition processing, to ensure voice The accuracy rate identified, improves Consumer's Experience.
What deserves to be explained is, the present embodiment does not limit to utilize loses the signaling point pair do not lost in voice segments Lose the concrete mode that the signaling point lost in voice segments compensates.More preferred, can be by whole The signaling point lost in voice segments is point two parts, to every the most a part of, utilize and does not loses in this part The signaling point lost in this part is compensated, owing to the signaling point in every part comes relatively by signaling point Saying at a distance of relatively near, compensating at a distance of nearer signaling point so utilizing, it is ensured that voice letter after compensation Number with do not occur lose voice signal more closely, based on compensate after voice signal carrying out voice knowledge Time other, be conducive to improving the accuracy rate of speech recognition.
Further, in above-mentioned signal compensation process, it is preferred to use with the signaling point lost at a distance of nearest The signaling point of this loss is compensated by the signaling point do not lost.
Generally, in above-mentioned signal compensation process, the number of the signaling point do not lost in every part is general More than the number of the signaling point lost, so can completely compensate the signaling point of loss for every part. Certainly, at some in particular cases, it is also possible to occur that in certain part, the number of lossing signal point is not more than The number of lossing signal point, then for this part, can only compensate the signaling point of partial loss, And for remaining the signaling point failing to be compensated, can use in another part at a distance of the most nearest signaling point One by one it is compensated.
If it is determined that voice signal is severe dropout, illustrate that voice signal generation dropout ratio is more serious, Having exceeded the scope that can correctly identify, then speech signal processing device exports information to user, To point out this voice signal cannot normally identify for severe dropout.Wherein, speech signal processing device By the way of text or voice, information can be exported to user, such as, can be on interactive interface Output text prompt information, or output information of voice prompt.For user, can be according to this prompting Information, takes corresponding measure in time, such as, re-enters voice signal, in order to obtain required voice in time Service, improves Consumer's Experience.
The structural representation of the speech signal processing device that Fig. 2 provides for another embodiment of the present invention.Such as figure Shown in 2, this device includes: receiver module 21, acquisition module 22, determine module 23 and processing module 24。
Receiver module 21, is used for receiving voice signal, and described voice signal includes at least one voice segments.
Acquisition module 22, for obtaining the dropout information of at least one voice segments.
Determine module 23, for the dropout information according at least one voice segments, determine voice signal Dropout degree.
Processing module 24, for the dropout degree according to voice signal, carries out voice knowledge to voice signal Other places are managed.
In an optional embodiment, acquisition module 22 specifically for:
To each voice segments at least one voice segments, by each two adjacent signaling point in institute's speech segment Amplitude be multiplied, obtain the multiplied result adjacent signaling point more than or equal to 0 and lose as institute speech segment Signaling point, and add up the dropout fragment that formed in institute's speech segment by the signaling point lost continuously Length.
In an optional embodiment, as it is shown on figure 3, determine that the one of module 23 realizes structure and includes: First determines that unit 231 and second determines at least one in unit 232.
Wherein, first determines unit 231, for the dropout information according at least one voice segments, Add up the number losing voice segments at least one voice segments, according to losing the number of voice segments and default section The comparative result of number threshold value, determines the dropout degree of voice signal.
Second determines unit 232, and for the dropout information according at least one voice segments, statistics is each The length of the dropout fragment formed by the signaling point lost continuously in voice segments, according in each voice segments The length of dropout fragment and the comparative result presetting threshold value of counting, determine the dropout of voice signal Degree.
Further alternative, above-mentioned hop count threshold value includes: the first hop count threshold value and more than the first hop count threshold value The second hop count threshold value;Accordingly, above-mentioned threshold value of counting includes: first counts threshold value and more than the first point The second of number threshold values is counted threshold value.
Based on above-mentioned, first determines that unit 231 is particularly used in:
If the number losing voice segments is less than or equal to the first hop count threshold value, determine that voice signal is zero degree letter Number lose;
If the number losing voice segments more than the first hop count threshold value but is less than or equal to the second hop count threshold value, really Speech signal is slight dropout;
If the number losing voice segments is more than the second hop count threshold value, determine that voice signal is severe dropout.
Accordingly, second determines that unit 232 is particularly used in:
The language of threshold value if the length that there is not dropout fragment at least one voice segments is counted more than first Segment, determines that voice signal is that zero degree signal is lost;
If at least one voice segments existing the length of dropout fragment count threshold value more than first but do not deposit In the voice segments that the length of dropout fragment counts threshold value more than second, determine that voice signal is slight letter Number lose;
The voice of threshold value if the length that there is dropout fragment at least one voice segments is counted more than second Section, determines that voice signal is severe dropout.
In an optional embodiment, processing module 24 specifically for:
Lose if voice signal is zero degree signal, then directly voice signal is carried out voice recognition processing;
If voice signal is slight dropout, then to the loss voice segments at least one voice segments, profit Compensate losing the signaling point lost in voice segments with losing the signaling point do not lost in voice segments, and Voice segments after compensating is carried out voice recognition processing;
If voice signal is severe dropout, then export information to user, with suggestion voice signal Cannot normally identify for severe dropout.
The speech signal processing device that the present embodiment provides, after having taken into full account that dropout is to voice signal The continuous impact processed, and corresponding processing mode can be taked according to the dropout degree of voice signal, Beneficially can improve the accuracy rate of voice signal identification.
Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, the parts shown as unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed to multiple network On unit.Some or all of module therein can be selected according to the actual needs to realize the present embodiment The purpose of scheme.Those of ordinary skill in the art are not in the case of paying performing creative labour, the most permissible Understand and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each reality The mode of executing can add the mode of required general hardware platform by software and realize, naturally it is also possible to by firmly Part.Based on such understanding, the portion that prior art is contributed by technique scheme the most in other words Dividing and can embody with the form of software product, this computer software product can be stored in computer can Read in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that one Computer equipment (can be personal computer, server, or the network equipment etc.) performs each to be implemented The method described in some part of example or embodiment.
Last it is noted that above example is only in order to illustrate technical scheme, rather than to it Limit;Although the present invention being described in detail with reference to previous embodiment, the ordinary skill of this area Personnel it is understood that the technical scheme described in foregoing embodiments still can be modified by it, or Person carries out equivalent to wherein portion of techniques feature;And these amendments or replacement, do not make corresponding skill The essence of art scheme departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. an audio signal processing method, it is characterised in that including:
Receiving voice signal, described voice signal includes at least one voice segments;
Obtain the dropout information of at least one voice segments described;
Dropout information according at least one voice segments described, determines that the signal of described voice signal is lost Mistake degree;
Dropout degree according to described voice signal, carries out voice recognition processing to described voice signal.
Method the most according to claim 1, it is characterised in that at least one language described in described acquisition The dropout information of segment, including:
To each voice segments at least one voice segments described, by the adjacent letter of each two in institute's speech segment The amplitude of number point is multiplied, and obtains the multiplied result adjacent signaling point more than or equal to 0 as institute's speech segment The signaling point lost, and add up the dropout sheet formed in institute's speech segment by the signaling point lost continuously The length of section.
Method the most according to claim 1, it is characterised in that at least one language described in described basis The dropout information of segment, determines the dropout degree of described voice signal, including:
Dropout information according at least one voice segments described, in statistics at least one voice segments described Lose the number of voice segments, according to number and the comparative result of default hop count threshold value of described loss voice segments, Determine the dropout degree of described voice signal;And/or
Dropout information according at least one voice segments described, adds up in each voice segments by losing continuously The length of dropout fragment that formed of signaling point, according to the length of dropout fragment in each voice segments With the comparative result of default threshold value of counting, determine the dropout degree of described voice signal.
Method the most according to claim 3, it is characterised in that described hop count threshold value includes: first Hop count threshold value and the second hop count threshold value more than described first hop count threshold value;Described threshold value of counting includes: the One counts threshold value and the second of threshold value of counting more than described first is counted threshold value;
Number according to described loss voice segments and the comparative result of default hop count threshold value, determine described voice The dropout degree of signal, including:
If the number of described loss voice segments is less than or equal to described first hop count threshold value, determine described voice Signal is that zero degree signal is lost;
If the number of described loss voice segments is more than described first hop count threshold value but less than or equal to described second Hop count threshold value, determines that described voice signal is slight dropout;
If the number of described loss voice segments is more than described second hop count threshold value, determine that described voice signal is Severe dropout;
The described comparison knot according to the length of dropout fragment in each voice segments with default threshold value of counting Really, determine the dropout degree of described voice signal, including:
If the length that there is not dropout fragment at least one voice segments described is counted more than described first The voice segments of threshold value, determines that described voice signal is that zero degree signal is lost;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described first Value but there is not the length of dropout fragment and count more than described second the voice segments of threshold value, determine described Voice signal is slight dropout;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described second The voice segments of value, determines that described voice signal is severe dropout.
5. according to the method described in any one of claim 1-4, it is characterised in that described according to institute's predicate The dropout degree of tone signal, carries out voice recognition processing to described voice signal, including:
Lose if described voice signal is zero degree signal, then directly described voice signal is carried out speech recognition Process;
If described voice signal is slight dropout, then to the loss language at least one voice segments described Segment, utilizes the signaling point do not lost in the described loss voice segments letter to losing in described loss voice segments Number point compensates, and the voice segments after compensating is carried out voice recognition processing;
If described voice signal is severe dropout, then export information to user, described with prompting Voice signal is that severe dropout cannot normally identify.
6. a speech signal processing device, it is characterised in that including:
Receiver module, is used for receiving voice signal, and described voice signal includes at least one voice segments;
Acquisition module, for obtaining the dropout information of at least one voice segments described;
Determine module, for the dropout information according at least one voice segments described, determine institute's predicate The dropout degree of tone signal;
Processing module, for the dropout degree according to described voice signal, is carried out described voice signal Voice recognition processing.
Device the most according to claim 6, it is characterised in that described acquisition module specifically for:
To each voice segments at least one voice segments described, by the adjacent letter of each two in institute's speech segment The amplitude of number point is multiplied, and obtains the multiplied result adjacent signaling point more than or equal to 0 as institute's speech segment The signaling point lost, and add up the dropout sheet formed in institute's speech segment by the signaling point lost continuously The length of section.
Device the most according to claim 6, it is characterised in that described determine that module includes:
First determines unit, for the dropout information according at least one voice segments described, adds up institute State the number losing voice segments at least one voice segments, according to the number of described loss voice segments with preset The comparative result of hop count threshold value, determines the dropout degree of described voice signal;And/or
Second determines unit, and for the dropout information according at least one voice segments described, statistics is each The length of the dropout fragment formed by the signaling point lost continuously in voice segments, according in each voice segments The length of dropout fragment and the comparative result presetting threshold value of counting, determine the signal of described voice signal Loss degree.
Device the most according to claim 8, it is characterised in that described hop count threshold value includes: first Hop count threshold value and the second hop count threshold value more than described first hop count threshold value;Described threshold value of counting includes: the One counts threshold value and the second of threshold value of counting more than described first is counted threshold value;
Described first determine unit specifically for:
If the number of described loss voice segments is less than or equal to described first hop count threshold value, determine described voice Signal is that zero degree signal is lost;
If the number of described loss voice segments is more than described first hop count threshold value but less than or equal to described second Hop count threshold value, determines that described voice signal is slight dropout;
If the number of described loss voice segments is more than described second hop count threshold value, determine that described voice signal is Severe dropout;
Described second determine unit specifically for:
If the length that there is not dropout fragment at least one voice segments described is counted more than described first The voice segments of threshold value, determines that described voice signal is that zero degree signal is lost;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described first Value but there is not the length of dropout fragment and count more than described second the voice segments of threshold value, determine described Voice signal is slight dropout;
The threshold if the length that there is dropout fragment at least one voice segments described is counted more than described second The voice segments of value, determines that described voice signal is severe dropout.
10. according to the device described in any one of claim 6-9, it is characterised in that described processing module Specifically for:
Lose if described voice signal is zero degree signal, then directly described voice signal is carried out speech recognition Process;
If described voice signal is slight dropout, then to the loss language at least one voice segments described Segment, utilizes the signaling point do not lost in the described loss voice segments letter to losing in described loss voice segments Number point compensates, and the voice segments after compensating is carried out voice recognition processing;
If described voice signal is severe dropout, then export information to user, described with prompting Voice signal is that severe dropout cannot normally identify.
CN201610179999.XA 2016-03-25 2016-03-25 Voice signal processing method and apparatus Pending CN105845138A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610179999.XA CN105845138A (en) 2016-03-25 2016-03-25 Voice signal processing method and apparatus
PCT/CN2016/096988 WO2017161829A1 (en) 2016-03-25 2016-08-26 Voice signal information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610179999.XA CN105845138A (en) 2016-03-25 2016-03-25 Voice signal processing method and apparatus

Publications (1)

Publication Number Publication Date
CN105845138A true CN105845138A (en) 2016-08-10

Family

ID=56583905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610179999.XA Pending CN105845138A (en) 2016-03-25 2016-03-25 Voice signal processing method and apparatus

Country Status (2)

Country Link
CN (1) CN105845138A (en)
WO (1) WO2017161829A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106856093A (en) * 2017-02-23 2017-06-16 海信集团有限公司 Audio-frequency information processing method, intelligent terminal and Voice command terminal
CN107170451A (en) * 2017-06-27 2017-09-15 乐视致新电子科技(天津)有限公司 Audio signal processing method and device
WO2017161829A1 (en) * 2016-03-25 2017-09-28 乐视控股(北京)有限公司 Voice signal information processing method and device
CN107316638A (en) * 2017-06-28 2017-11-03 北京粉笔未来科技有限公司 A kind of poem recites evaluating method and system, a kind of terminal and storage medium
CN107990908A (en) * 2017-11-20 2018-05-04 广东欧珀移动通信有限公司 A kind of phonetic navigation method and device based on Bluetooth communication
CN108831438A (en) * 2018-07-24 2018-11-16 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN108965562A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109003619A (en) * 2018-07-24 2018-12-14 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109065017A (en) * 2018-07-24 2018-12-21 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109121042A (en) * 2018-07-26 2019-01-01 Oppo广东移动通信有限公司 Voice data processing method and Related product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113270096A (en) * 2021-05-13 2021-08-17 前海七剑科技(深圳)有限公司 Voice response method and device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002297180A (en) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd Voice recognizing device
CN1604572A (en) * 2004-11-09 2005-04-06 北京中星微电子有限公司 A semantic integrity ensuring method under IP network environment
CN1731718A (en) * 2004-08-06 2006-02-08 北京中星微电子有限公司 Noise reduction method and device concerning IP network voice data packet lost
CN103632679A (en) * 2012-08-21 2014-03-12 华为技术有限公司 An audio stream quality assessment method and an apparatus
CN107170451A (en) * 2017-06-27 2017-09-15 乐视致新电子科技(天津)有限公司 Audio signal processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2517393C2 (en) * 2008-06-11 2014-05-27 Ниппон Телеграф Энд Телефон Корпорейшн Method of estimating quality of audio signal, apparatus and computer-readable medium storing programme
CN102568470B (en) * 2012-01-11 2013-12-25 广州酷狗计算机科技有限公司 Acoustic fidelity identification method and system for audio files
CN105845138A (en) * 2016-03-25 2016-08-10 乐视控股(北京)有限公司 Voice signal processing method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002297180A (en) * 2001-03-29 2002-10-11 Sanyo Electric Co Ltd Voice recognizing device
CN1731718A (en) * 2004-08-06 2006-02-08 北京中星微电子有限公司 Noise reduction method and device concerning IP network voice data packet lost
CN1604572A (en) * 2004-11-09 2005-04-06 北京中星微电子有限公司 A semantic integrity ensuring method under IP network environment
CN103632679A (en) * 2012-08-21 2014-03-12 华为技术有限公司 An audio stream quality assessment method and an apparatus
CN107170451A (en) * 2017-06-27 2017-09-15 乐视致新电子科技(天津)有限公司 Audio signal processing method and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017161829A1 (en) * 2016-03-25 2017-09-28 乐视控股(北京)有限公司 Voice signal information processing method and device
CN106856093A (en) * 2017-02-23 2017-06-16 海信集团有限公司 Audio-frequency information processing method, intelligent terminal and Voice command terminal
CN107170451A (en) * 2017-06-27 2017-09-15 乐视致新电子科技(天津)有限公司 Audio signal processing method and device
CN107316638A (en) * 2017-06-28 2017-11-03 北京粉笔未来科技有限公司 A kind of poem recites evaluating method and system, a kind of terminal and storage medium
CN107990908A (en) * 2017-11-20 2018-05-04 广东欧珀移动通信有限公司 A kind of phonetic navigation method and device based on Bluetooth communication
CN108965562A (en) * 2018-07-24 2018-12-07 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN108831438A (en) * 2018-07-24 2018-11-16 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109003619A (en) * 2018-07-24 2018-12-14 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN109065017A (en) * 2018-07-24 2018-12-21 Oppo(重庆)智能科技有限公司 Voice data generation method and relevant apparatus
CN108831438B (en) * 2018-07-24 2021-01-08 Oppo(重庆)智能科技有限公司 Voice data generation method and device, electronic device and computer readable storage medium
CN108965562B (en) * 2018-07-24 2021-04-13 Oppo(重庆)智能科技有限公司 Voice data generation method and related device
CN109065017B (en) * 2018-07-24 2021-04-16 Oppo(重庆)智能科技有限公司 Voice data generation method and related device
CN109121042A (en) * 2018-07-26 2019-01-01 Oppo广东移动通信有限公司 Voice data processing method and Related product
CN109121042B (en) * 2018-07-26 2020-12-08 Oppo广东移动通信有限公司 Voice data processing method and related product

Also Published As

Publication number Publication date
WO2017161829A1 (en) 2017-09-28

Similar Documents

Publication Publication Date Title
CN105845138A (en) Voice signal processing method and apparatus
JP6820360B2 (en) Signal classification methods and signal classification devices, as well as coding / decoding methods and coding / decoding devices.
KR102037195B1 (en) Voice detection methods, devices and storage media
CN102056026B (en) Audio/video synchronization detection method and system, and voice detection method and system
JP2019531494A (en) Voice quality evaluation method and apparatus
CN103035238B (en) Encoding method and decoding method of voice frequency data
CN107808670A (en) Voice data processing method, device, equipment and storage medium
CN109036412A (en) voice awakening method and system
CN107533850B (en) Audio content identification method and device
CN102714034B (en) Signal processing method, device and system
US9424743B2 (en) Real-time traffic detection
CN110265065B (en) Method for constructing voice endpoint detection model and voice endpoint detection system
CN106356077B (en) A kind of laugh detection method and device
CN107644643A (en) A kind of voice interactive system and method
KR20140031790A (en) Robust voice activity detection in adverse environments
CN105847252B (en) A kind of method and device of more account switchings
US20160323687A1 (en) Stereo decoding method and apparatus
CN107705791A (en) Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition
CN109410956A (en) A kind of object identifying method of audio data, device, equipment and storage medium
CN104978955A (en) Voice control method and system
CN103050116A (en) Voice command identification method and system
CN106782529A (en) The wake-up selected ci poem selection method and device of speech recognition
CN102376306B (en) Method and device for acquiring level of speech frame
CN109524013A (en) A kind of method of speech processing, device, medium and smart machine
CN107957861B (en) Method and device for instantly playing audio data in sound card signal input channel

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160810