CN107819964A

CN107819964A - Improve method, apparatus, terminal and the computer-readable recording medium of speech quality

Info

Publication number: CN107819964A
Application number: CN201711125884.3A
Authority: CN
Inventors: 杨宗业
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-11-10
Filing date: 2017-11-10
Publication date: 2018-03-20
Anticipated expiration: 2037-11-10
Also published as: CN107819964B

Abstract

The invention belongs to voice call technical field, more particularly to a kind of method, apparatus, terminal and computer-readable recording medium for improving speech quality, wherein, methods described includes：When user terminal is in hand-free call state, the upward signal of data feedback channel and the downstream signal of down going channel are obtained, the call mode of the user terminal is judged according to the upward signal and downstream signal；If the user terminal is in double and says call mode, identify whether the downstream signal is non-speech audio；If it is non-speech audio to identify the downstream signal, the gain for the down channel of decaying.Echo is inhibited, improves the tonequality and speech quality of ascending voice.

Description

Improve method, apparatus, terminal and the computer-readable recording medium of speech quality

Technical field

The invention belongs to voice call technical field, more particularly to a kind of method, apparatus for improving speech quality, terminal and Computer-readable recording medium.

Background technology

In the prior art, in the case where the voices such as hand-free call, Video chat and wechat voice pair say scene, often produce back Sound problem, so-called echo refer to that the microphone of local terminal caller while the voice signal of local terminal caller is gathered, can also be adopted Collect speaker announcement is transmitted through next non-speech audio from opposite end caller.Local terminal caller collects microphone all When sound is sent to opposite end caller, opposite end caller can also listen while the voice signal of local terminal caller is heard To the non-speech audio, i.e. echo, therefore, in order to avoid influence of the non-speech audio to speech quality is, it is necessary to microphone All sound collected carry out echo suppression processing, but the processing method for suppressing echo in the prior art can be conversed local terminal The voice signal of person impacts, and can also destroy the voice signal of local terminal caller to a certain extent while echo is eliminated, Cause signal skew so that what opposite end caller heard is the interrupted sound of interim card, influences speech communication quality.

The content of the invention

The embodiment of the present invention provides a kind of method, apparatus, terminal and computer-readable recording medium for improving speech quality, Influence of the non-speech audio to speech quality can be avoided, improves the tonequality of ascending voice.

First aspect of the embodiment of the present invention provides a kind of method for improving speech quality, including：

When user terminal is in hand-free call state, the upward signal of data feedback channel and the descending letter of down going channel are obtained Number, the call mode of the user terminal is judged according to the upward signal and downstream signal；

If the user terminal is in double and says call mode, identify whether the downstream signal is non-speech audio；

If it is non-speech audio to identify the downstream signal, the gain for the down channel of decaying.

Second aspect of the embodiment of the present invention provides a kind of device for improving speech quality, including：

Call mode judging unit, for being in hand-free call state when user terminal, obtain the up letter of data feedback channel Number and down going channel downstream signal, the call mode of the user terminal is judged according to the upward signal and downstream signal；

Recognition unit, if being in double for the user terminal says call mode, identify the downstream signal whether be Non-speech audio；

Attenuation units, if for identifying that the downstream signal is non-speech audio, the increasing for the down channel of decaying Benefit.

The third aspect of the embodiment of the present invention provides a kind of terminal, including memory, processor and is stored in the storage Realized in device and the computer program that can run on the processor, described in the computing device during computer program above-mentioned The step of method.

Fourth aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable recording medium Computer program is stored with, the step of computer program realizes the above method when being executed by processor.

In the embodiment of the present invention, by user terminal be in it is double say call mode when, whether identify the downstream signal For non-speech audio, and when the downstream signal is non-speech audio, the gain for the down channel of decaying so that local terminal leads to The microphone of words person will not collect the non-speech audio of speaker announcement, suppression when gathering the voice signal of local terminal caller Echo has been made, has improved the tonequality of ascending voice.Further, since it is the down channel of decaying when downstream signal is non-speech audio Gain, therefore, will not make it that local terminal caller hears is the interrupted sound of interim card, improves speech quality.

Brief description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be in embodiment or description of the prior art The required accompanying drawing used is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore The restriction to scope is not construed as, for those of ordinary skill in the art, is not paying the premise of creative work Under, other related accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is the structural representation of user terminal provided in an embodiment of the present invention；

Fig. 2 is a kind of implementation process figure of the method for raising speech quality that the embodiment of the present invention one provides；

Fig. 3 is the method S202 for the raising speech quality that the embodiment of the present invention two provides specific implementation flow chart；

Fig. 4 is the method S202 for the raising speech quality that the embodiment of the present invention three provides specific implementation flow chart；

Fig. 5 is the structural representation of the device for the raising speech quality that the embodiment of the present invention four provides；

Fig. 6 is the structural representation of terminal provided in an embodiment of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.Based on embodiments of the invention, those skilled in the art are not making the premise of creative work Lower obtained every other embodiment, belongs to the scope of protection of the invention.

Fig. 1 shows the structural representation of user terminal of the embodiment of the present invention, and the user terminal can be smart mobile phone Or pad, including dual microphone and loudspeaker 103.The dual microphone includes main microphon 101 and secondary microphone 102.In user In communication process, the main microphon 101 is mainly used to gather the voice of user, and the secondary microphone 102 is mainly used to gather Outside noise.The processor of the smart mobile phone or pad collects to the main microphon 101 and the secondary microphone 102 After alternative sounds are handled, the sound wave opposite with ambient noise can be produced to offset ambient noise.

But when user terminal is in hand-free call state, for example, hand-free call, Video chat, wechat voice or trip The voices such as play voice are double to be said under scene, often echogenicity problem.That is, the microphone of local terminal caller is in collection local terminal call While the voice signal of person, can also collect the broadcast of loudspeaker 103 is transmitted through next non-speech audio from opposite end caller, makes Local terminal caller all sound for collecting microphone when be sent to opposite end caller, opposite end caller is hearing this While holding the voice signal of caller, the non-speech audio, i.e. echo can be also heard, so as to have impact on speech quality.

Because present hands-free descending sound the more does the more big, echo is caused to suppress to require also more and more higher.Deposit in the prior art All sound for collecting of the microphone to local terminal caller carry out echo and suppress processing when, can be to the language of local terminal caller Sound signal impacts, and can also destroy the voice signal of local terminal caller to a certain extent while echo is eliminated, cause letter Number distortion so that what opposite end caller heard is the interrupted sound of interim card, influences speech communication quality.

In order to solve the above problems, a kind of method for improving speech quality provided in an embodiment of the present invention, by user Terminal is in double and identifies whether the downstream signal is non-speech audio when saying call mode, and is non-in the downstream signal During voice signal, the gain for the down channel of decaying so that language of the microphone of local terminal caller in collection local terminal caller During sound signal, the non-speech audio of speaker announcement will not be collected, it is suppressed that echo, improve the tonequality of ascending voice.Separately Outside, due to being that the gain for down channel of decaying can't be to the voice of local terminal caller when downstream signal is non-speech audio Signal causes deleterious effect, and therefore, will not make it that opposite end caller hears is the interrupted sound of interim card, improves call matter Amount.

Embodiment one

Fig. 2 shows a kind of implementation process figure of method for improving speech quality provided in an embodiment of the present invention, the side Method is applied to user terminal, including step S201 to step S203.

In S201, when user terminal is in hand-free call state, the upward signal of data feedback channel and descending logical is obtained The downstream signal in road, the call mode of the user terminal is judged according to the upward signal and downstream signal.

When user terminal is in hand-free call state, the upward signal of data feedback channel and the descending letter of down going channel are obtained Number refer to obtain the upward signal of one end caller's data feedback channel and the downstream signal of down going channel；In embodiments of the present invention, To be illustrated exemplified by obtaining the downstream signal of the upward signal of data feedback channel of local terminal caller and down going channel.

Wherein, it is described to judge that the call mode of the user terminal refers to pass through according to the upward signal and downstream signal Judge the up channel and down channel whether and meanwhile carry out signal transmission, to judge that the call mode of the user terminal is Pattern is said in double, is in singly saying pattern.Described pair is said that pattern refers to the up channel with the down channel simultaneously Carry out the call mode of signal transmission；It is described singly to say that pattern refers to the up channel and a channel in the down channel Carry out the call mode of signal transmission.

In certain embodiments of the present invention, it is described to judge that the user is whole according to the upward signal and downstream signal The call mode at end includes：Obtain the up decibel value of the upward signal and the descending decibel value of the downstream signal；If institute Up decibel value is stated more than the first predetermined threshold value, and the descending decibel value is more than the second predetermined threshold value, it is determined that the user Terminal is in double and says call mode；If the up decibel value is less than or equal to first predetermined threshold value, or described descending point Shellfish value is less than or equal to second predetermined threshold value, it is determined that the user terminal is in and singly says call mode.

Wherein, first predetermined threshold value and second predetermined threshold value can with it is equal can also be unequal.Described first Predetermined threshold value carries out value with second predetermined threshold value according to practical application, for example, value is 5~10 decibels etc., herein not It is defined.

Due to it is double say pattern under, the problem of just having echo, is present, and therefore, the embodiment of the present invention is in user terminal It is double when saying pattern, carry out echo suppression.

In S202, if the user terminal, which is in double, says call mode, identify whether the downstream signal is non-language Sound signal.

Wherein, the non-speech audio refers to the voice signal for not including opposite end caller in downstream signal, i.e. under described All voice signals in row signal only in addition to the voice signal of opposite end caller.For example, opposite end caller is not carried out When speaking, local terminal caller still receives the voice signal for being transmitted through coming at the caller of opposite end, then the voice signal is believed for non-voice Number, and for example, the non-speech audio includes the language of the equipment such as the voice signal of local terminal caller, the voice signal of animal and TV Sound signal etc..

In S203, if identifying, the downstream signal is non-speech audio, the gain for the down channel of decaying.

While the voice signal of the microphone collection local terminal caller of local terminal caller, speaker announcement can be also collected The non-speech audio.When causing the local terminal caller all sound are sent into opposite end caller, opposite end caller exists The non-speech audio can be heard by hearing while the voice signal of local terminal caller.Therefore, the non-speech audio is not uncommon Hope the voice signal for being sent to opposite end caller again.Further, since the non-speech audio does not include the language of opposite end caller Sound signal, therefore, belong to noise for local terminal caller.

Therefore, the present invention passes through the increasing for the down channel of decaying when judging the downstream signal for non-speech audio Benefit so that it is wide will not to collect loudspeaker while the voice signal of local terminal caller is gathered for the microphone of local terminal caller The non-speech audio broadcast, opposite end caller will not hear the non-voice letter while voice signal of local terminal caller is heard Number, so as to improve the tonequality of ascending voice.

Embodiment two

The present embodiment be to identified in the S202 of embodiment one downstream signal whether be non-speech audio further limit It is fixed.

In the present embodiment, the voice that the non-speech audio refers to not include opposite end caller in the downstream signal is believed Number.For example, when opposite end caller is not spoken, local terminal caller still receives the voice letter for being transmitted through coming at the caller of opposite end Number, then the voice signal is non-speech audio, and and for example, the non-speech audio includes the voice signal of local terminal caller, animal Voice signal and the voice signal of equipment such as TV etc..

As shown in figure 3, whether the identification downstream signal is non-speech audio, including：Step S301 is to step S303。

In S301, each sound spectrum from alternative sounds source that the downstream signal includes is separated.

The sound spectrum is to discriminate between the physical parameter of different sources of sound, can be shown using electro-kinetic instrument.

Because the non-speech audio includes people, animal, TV etc. the various people that can produce sound or thing Caused voice signal, therefore, each sound spectrum from alternative sounds source can be separated to the non-speech audio, it is described Each sound spectrum is used to judge whether the downstream signal is non-speech audio.

In S302, judge whether include opposite end caller's voice spectrum in each sound spectrum.

Alternatively, it is described judge in each sound spectrum whether to include opposite end caller's voice spectrum before, including：Adopt Collect and store opposite end caller's voice spectrum.

For example, when just proceeding by voice call, gather and store opposite end caller's voice spectrum, in follow-up call During, identify in each sound spectrum whether include opposite end caller using opposite end caller's voice spectrum of the storage Voice spectrum.

It should be noted that the sound spectrum of the opposite end caller can be with the identity information one of the opposite end caller It is same to be stored in the memory of local terminal caller, when conversing next time, you can use the opposite end caller's voice prestored Whether the opposite end caller voice spectrum of current talking is included in each sound spectrum described in frequency spectrum discerning.Wherein, the opposite end leads to The identity information of words person includes name, telephone number, game ID, WeChat ID or other identity codes of opposite end caller, This is without limiting.

But in order to save the memory space of the memory of local terminal caller, the opposite end caller voice spectrum can be with Stored without long-time, for example, in this end of conversation, the storage that release stores the opposite end caller voice spectrum is empty Between, or, after duration is set, release stores the memory space of the opposite end caller voice spectrum, and the setting duration can One week, one month or a season etc. are thought, herein without limiting.

In certain embodiments of the present invention, it is described to judge whether include opposite end caller's language in each sound spectrum Sound spectrum, including：Each sound spectrum is matched with opposite end caller's voice spectrum；If exist sound spectrum with it is described The match is successful for opposite end caller's voice spectrum, then judges that each sound spectrum includes opposite end caller's voice spectrum；If do not deposit In sound spectrum, the match is successful with the opposite end caller voice spectrum, then judges that each sound spectrum does not include opposite end and conversed Person's voice spectrum.

Wherein, it is described each sound spectrum is subjected to matching with opposite end caller's voice spectrum to include：By each sound The spectrum signature of sound spectrum is matched with the spectrum signature of the opposite end caller voice spectrum, improves matching efficiency.

In S302, if not including opposite end caller's voice spectrum in each sound spectrum, identify described descending Signal is non-speech audio.

When not including opposite end caller's voice spectrum in each sound spectrum, then it represents that opposite end caller does not speak, Now, it can determine that and the information that local terminal caller wants be not present in the downstream signal, therefore, it is possible to determine that the downstream signal For non-speech audio.That is, expression can carry out fading gain to the downstream signal, for example, the width for the downstream signal of decaying Value.

So that the microphone of local terminal caller will not be collected and raised one's voice while the voice signal of local terminal caller is gathered The non-speech audio of device broadcast so that opposite end caller will not hear described while the voice signal of local terminal caller is heard Non-speech audio, so as to improve the tonequality of ascending voice.

Embodiment three

As shown in figure 4, whether the identification downstream signal is non-speech audio, including：Step S401 is to step S404。

In S401, each sound spectrum from alternative sounds source that the downstream signal includes is separated.

It should be noted that the S401 is identical with the embodiment of the S301 in embodiment two, will not be repeated here.

In S402, judge whether include human speech frequency spectrum in each sound spectrum.

Because the voice spectrum of the mankind has more periodically relative to noises such as machines, therefore, can easily sentence Break and in each sound spectrum whether include human speech frequency spectrum.

In S403, if not including human speech frequency spectrum in each sound spectrum, identify that the downstream signal is Non-speech audio.

Because the non-speech audio refers to the voice signal that does not include opposite end caller, i.e. except opposite end caller's All voice signals beyond voice signal.Therefore, then can be with when not including human speech frequency spectrum in each sound spectrum Judge the downstream signal for non-speech audio.

For example, when local terminal caller and opposite end call are not spoken, then only do not include human speech frequency spectrum in channel Voice signal.

In S404, if each sound spectrum includes human speech frequency spectrum, judge each mankind's voice spectrum whether be Local terminal caller's voice spectrum；If judging each mankind's voice spectrum for local terminal caller's voice spectrum, identify described Downstream signal is non-speech audio；If it not is local terminal caller's voice spectrum to judge that each mankind's voice spectrum is present, know It is not non-speech audio not go out the downstream signal.

Wherein, it is described to judge whether each mankind's voice spectrum is that local terminal caller's voice spectrum refers to：Identify the mankind Whether local terminal caller voice spectrum is only included in voice spectrum, if only including local terminal caller in the human speech frequency spectrum Voice spectrum, then judge the downstream signal for non-speech audio.Not only lead to when in the human speech frequency spectrum including local terminal Words person's voice spectrum, in addition to during other mankind's voice spectrums, it is not non-speech audio to represent the downstream signal.

Due to not only including local terminal caller's voice spectrum, in addition to other human speeches in the human speech frequency spectrum During frequency spectrum, then it represents that opposite end caller's voice spectrum has been possible in the downstream signal, now, in order to avoid judging by accident, then not Can recognize that the downstream signal is non-speech audio, otherwise, the sound that local terminal caller will be caused can not to hear opposite end caller Sound signal.That is, now, the downstream signal is identified as voice signal, fading gain is not carried out to it.

Alternatively, before whether the identification human speech frequency spectrum is local terminal caller's voice spectrum, including：Collection And store local terminal caller's voice spectrum.

For example, local terminal caller prerecords the voice signal of local terminal caller on the subscriber terminal, and it is corresponding to store it Local terminal caller's voice spectrum, in follow-up communication process, utilize the storage local terminal caller voice spectrum identification Whether include local terminal caller's voice spectrum in each sound spectrum.

In certain embodiments of the present invention, whether the identification human speech frequency spectrum is local terminal caller's voice Frequency spectrum, including：Each mankind's voice spectrum is matched with local terminal caller's voice spectrum；If each human speech frequency The match is successful with local terminal caller voice spectrum for spectrum, then it is non-speech audio to identify the downstream signal；If each mankind Human speech frequency spectrum in voice spectrum be present and matched with local terminal caller's voice spectrum unsuccessful, then identify the downstream signal It is not non-speech audio.

It is therefore, in embodiments of the present invention, described descending by first judging because the identity of local terminal caller is more fixed Whether include human speech frequency spectrum in each sound spectrum of signal, when each sound spectrum includes human speech frequency spectrum, sentence Whether each mankind's voice spectrum that breaks is local terminal caller's voice spectrum, you can judges whether the downstream signal is non-voice Signal.

Compared with implementing two, the present embodiment will not be because of the change of opposite end conversation object, and needs to resurvey opposite end and lead to Words person's voice spectrum, but the decision accuracy of embodiment two is higher.

Example IV

As shown in figure 5, the embodiment of the present invention provides a kind of device 500 for improving speech quality, including：

Call mode judging unit 501, for being in hand-free call state when user terminal, obtain the up of data feedback channel The downstream signal of signal and down going channel, the call mould of the user terminal is judged according to the upward signal and downstream signal Formula；

Recognition unit 502, if being in double for the user terminal says call mode, whether identify the downstream signal For non-speech audio；

Attenuation units 503, if for identifying that the downstream signal is non-speech audio, the down channel of decaying Gain.

It should be noted that for convenience and simplicity of description, the device 500 of the raising speech quality of foregoing description it is specific The course of work, the corresponding process of Fig. 2 and Fig. 4 methods describeds is may be referred to, is no longer excessively repeated herein.

Fig. 6 is the schematic diagram of terminal provided in an embodiment of the present invention.As shown in fig. 6, the terminal of the embodiment can include： Processor 601, memory 602, input equipment 603 and output equipment 604, the processor 601, memory 602, input equipment 603 and output equipment 604 connected by bus 605.The input equipment 603 can include keyboard, Trackpad, fingerprint and adopt sensing Device, microphone etc., output equipment 604 can include display, loudspeaker etc..

The terminal also includes being stored in the computer that can be run in the memory 602 and on the processor 601 Program, such as improve the program of speech quality.The processor 601 realizes above-mentioned each raising when performing the computer program Step in the embodiment of the method for speech quality, such as step 201 shown in Fig. 2 is to 203, or, the processor 601 performs Realize the function of each module/unit in above-mentioned each device embodiment during the computer program, for example, module 501 shown in Fig. 5 to 503 function.

Exemplary, the computer program can be divided into one or more module/units, one or more Individual module/unit is stored in the memory 602, and is performed by the processor 601, to complete the present invention.Described one Individual or multiple module/units can be the series of computation machine programmed instruction section that can complete specific function, and the instruction segment is used for Implementation procedure of the computer program in the terminal is described.For example, the computer program can be divided into call Mode determination, recognition unit and attenuation units, each unit concrete function are as follows：Call mode judging unit, used for working as Family terminal is in hand-free call state, the upward signal of data feedback channel and the downstream signal of down going channel is obtained, on described Row signal and downstream signal judge the call mode of the user terminal；Recognition unit, if being in double for the user terminal Call mode is said, then identifies whether the downstream signal is non-speech audio；Attenuation units, if for identifying the descending letter Number be non-speech audio, then the gain for the down channel of decaying.

The terminal can be the mobile terminals such as smart mobile phone, or desktop PC, notebook, palm PC and The computing devices such as cloud server.The terminal may include, but be not limited only to, processor 601, memory 602.Art technology Personnel are appreciated that Fig. 6 is only the restriction of the example of terminal, not structure paired terminal, can include more more or more than illustrating Few part, either combines some parts or different parts, for example, the terminal can also include input-output equipment, Network access equipment, bus etc..

Alleged processor 601 can be CPU (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.

The memory 602 can be the internal storage unit of the terminal, such as the hard disk or internal memory of terminal.It is described to deposit Reservoir 602 can also be the plug-in type hard disk being equipped with the External memory equipment of the terminal, such as the terminal, intelligent storage Block (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc.. Further, the memory 602 can also both include the internal storage unit of the terminal or including External memory equipment.Institute Memory 602 is stated to be used to store the computer program and other programs and data needed for the terminal.The memory 602 can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used To be that unit is individually physically present, can also two or more units it is integrated in a unit, it is above-mentioned integrated Unit can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.In addition, each function list Member, the specific name of module are not limited to the protection domain of the application also only to facilitate mutually distinguish.Said system The specific work process of middle unit, module, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and is not described in detail or remembers in some embodiment The part of load, it may refer to the associated description of other embodiments.

Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed device/terminal and method, can pass through Other modes are realized.For example, device/terminal embodiment described above is only schematical, for example, the module or The division of unit, only a kind of division of logic function, can there are other dividing mode, such as multiple units when actually realizing Or component can combine or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute Display or the mutual coupling discussed or direct-coupling or communication connection can be by some interfaces, device or unit INDIRECT COUPLING or communication connection, can be electrical, mechanical or other forms.

The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.

If the integrated module/unit realized in the form of SFU software functional unit and as independent production marketing or In use, it can be stored in a computer read/write memory medium.Based on such understanding, the present invention realizes above-mentioned implementation All or part of flow in example method, by computer program the hardware of correlation can also be instructed to complete, described meter Calculation machine program can be stored in a computer-readable recording medium, and the computer program can be achieved when being executed by processor The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or some intermediate forms etc..The computer-readable medium It can include：Any entity or device, recording medium, USB flash disk, mobile hard disk, the magnetic of the computer program code can be carried Dish, CD, computer storage, read-only storage (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the meter The content that calculation machine computer-readable recording medium includes can carry out appropriate increase and decrease according to legislation in jurisdiction and the requirement of patent practice, Such as in some jurisdictions, electric carrier signal and telecommunications are not included according to legislation and patent practice, computer-readable medium Signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although with reference to foregoing reality Example is applied the present invention is described in detail, it will be understood by those within the art that：It still can be to foregoing each Technical scheme described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic；And these are changed Or replace, the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme, all should Within protection scope of the present invention.

Claims

A kind of 1. method for improving speech quality, it is characterised in that including：

When user terminal is in hand-free call state, the upward signal of data feedback channel and the downstream signal of down going channel are obtained, The call mode of the user terminal is judged according to the upward signal and downstream signal；

If the user terminal is in double and says call mode, identify whether the downstream signal is non-speech audio；

If it is non-speech audio to identify the downstream signal, the gain for the down channel of decaying.
2. the method as described in claim 1, it is characterised in that whether the identification downstream signal is non-speech audio, Including：

Separate each sound spectrum from alternative sounds source that the downstream signal includes；

Judge whether include opposite end caller's voice spectrum in each sound spectrum；

If not including opposite end caller's voice spectrum in each sound spectrum, identify that the downstream signal is believed for non-voice Number.
3. method as claimed in claim 2, it is characterised in that described to judge whether lead in each sound spectrum including opposite end Before words person's voice spectrum, including：

Gather and store opposite end caller's voice spectrum.
4. method as claimed in claim 2 or claim 3, it is characterised in that described to judge in each sound spectrum whether to include pair Caller's voice spectrum is held, including：

Each sound spectrum is matched with opposite end caller's voice spectrum；If there is sound spectrum to converse with the opposite end The match is successful for person's voice spectrum, then judges that each sound spectrum includes opposite end caller's voice spectrum；If sound audio is not present The match is successful with the opposite end caller voice spectrum for spectrum, then judges that each sound spectrum does not include opposite end caller's voice frequency Spectrum.
5. the method as described in claim 1, it is characterised in that whether the identification downstream signal is non-speech audio, Including：

Separate each sound spectrum from alternative sounds source that the downstream signal includes；

Judge whether include human speech frequency spectrum in each sound spectrum；

If not including human speech frequency spectrum in each sound spectrum, it is non-speech audio to identify the downstream signal；

If each sound spectrum includes human speech frequency spectrum, judge whether each mankind's voice spectrum is local terminal caller's voice Frequency spectrum；If judging each mankind's voice spectrum for local terminal caller's voice spectrum, it is non-language to identify the downstream signal Sound signal；If it not is local terminal caller's voice spectrum to judge that each mankind's voice spectrum is present, the descending letter is identified Number it is not non-speech audio.
6. method as claimed in claim 5, it is characterised in that described to judge whether each mankind's voice spectrum is local terminal caller Before voice spectrum, including：

Gather and store local terminal caller's voice spectrum.
7. the method as described in claim 5 or 6, it is characterised in that described to judge whether each mankind's voice spectrum is that local terminal leads to Words person's voice spectrum, including：

Each mankind's voice spectrum is matched with local terminal caller's voice spectrum；

If each mankind's voice spectrum is with local terminal caller voice spectrum, the match is successful, and it is non-to identify the downstream signal Voice signal；If human speech frequency spectrum be present in each mankind's voice spectrum can not match with local terminal caller's voice spectrum Work(, then it is not non-speech audio to identify the downstream signal.
A kind of 8. device for improving speech quality, it is characterised in that including：

Call mode judging unit, for being in hand-free call state when user terminal, obtain data feedback channel upward signal and The downstream signal of down going channel, the call mode of the user terminal is judged according to the upward signal and downstream signal；

Recognition unit, if being in double for the user terminal says call mode, identify whether the downstream signal is non-language Sound signal；

Attenuation units, if for identifying that the downstream signal is non-speech audio, the gain for the down channel of decaying.
9. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processor The computer program of operation, it is characterised in that realize such as claim 1 to 7 described in the computing device during computer program The step of any one methods described.
10. a kind of computer-readable recording medium, the computer-readable recording medium storage has computer program, and its feature exists In when the computer program is executed by processor the step of realization such as any one of claim 1 to 7 methods described.