CN108053833A - Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds - Google Patents
Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds Download PDFInfo
- Publication number
- CN108053833A CN108053833A CN201711243578.XA CN201711243578A CN108053833A CN 108053833 A CN108053833 A CN 108053833A CN 201711243578 A CN201711243578 A CN 201711243578A CN 108053833 A CN108053833 A CN 108053833A
- Authority
- CN
- China
- Prior art keywords
- signal
- frequency
- voice
- pitched sounds
- present frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Abstract
Processing method, device, electronic equipment and the computer readable storage medium uttered long and high-pitched sounds this application discloses a kind of voice.Wherein method includes:Whether detection current scene meets the chat scenario that closely multi-person speech is hands-free;If so, the voice signal in acquisition chat scenario, and the voice signal of the present frame collected is converted into frequency-region signal;Extract the feature of frequency-region signal;Whether the voice signal that present frame is judged according to the feature of frequency-region signal is signal of uttering long and high-pitched sounds;If so, the voice signal of present frame is removed.This method can cause echo to inhibit sordid problem to avoid howling is excessive, and then the situation of winding the self-oscillation amplifying occur, improve it is double stress results, and improve the usage experience of user.
Description
Technical field
A kind of utter long and high-pitched sounds this application involves voice processing technology field more particularly to voice processing method, device, electronics are set
Standby and computer readable storage medium.
Background technology
With the development of digital network, there is numerous mobile phone speech intercom systems:It can be very by cell phone software
Easily realize traditional intercom.Transmitting terminal mobile phone is obtained voice signal and is passed through data network and be transferred to and connect by microphone
End is listened, answers the voice signal that section is received by loud speaker broadcasting, material is thus formed a basic voice inter-speaking systems.
But in actual application, due to the needs of game, more people can indoors or closely interior mutual hands-free voice, due to connecing
The loud speaker at end is listened constantly to make a sound, while is also received by the microphone of transmitting terminal, constantly Xun Huan often generates
There is self-excitation and utters long and high-pitched sounds problem, bad experience is brought to user in the self-excitation of sound.
In correlation technique, typically problem of uttering long and high-pitched sounds closely is handled by strengthening echo compacting.But this pass through
Strengthen the mode of echo compacting, can cause normal background sound scene, the double of voice stress results very poor, and are present with audio discontinuity
Situation about not hearing, user experience are deteriorated.
The content of the invention
The purpose of the application purport is solving one of the technical issues of above-mentioned at least to a certain extent.
For this purpose, first purpose of the application is the processing method for proposing that a kind of voice is uttered long and high-pitched sounds.This method can be to avoid
Howling is excessive to cause echo to inhibit sordid problem, and then the situation of winding the self-oscillation amplifying occurs, improve it is double stress results,
And improve the usage experience of user.
Second purpose of the application is the processing unit for proposing that a kind of voice is uttered long and high-pitched sounds.
The 3rd purpose of the application is to propose a kind of electronic equipment.
The 4th purpose of the application is to propose a kind of computer readable storage medium.
In order to achieve the above objectives, the processing method that the voice that the application first aspect embodiment proposes is uttered long and high-pitched sounds, including:Detection
Whether current scene meets the chat scenario that closely multi-person speech is hands-free;If so, gather the voice in the chat scenario
Signal, and the voice signal of the present frame collected is converted into frequency-region signal;Extract the feature of the frequency-region signal;According to institute
The feature for stating frequency-region signal judges whether the voice signal of the present frame is signal of uttering long and high-pitched sounds;If so, by the present frame
Voice signal is removed.
In order to achieve the above objectives, the processing unit that the voice that the application second aspect embodiment proposes is uttered long and high-pitched sounds, including:Detection
Module, for detecting whether current scene meets the chat scenario that closely multi-person speech is hands-free;Acquisition module, for detecting
When meeting the hands-free chat scenario of closely multi-person speech to current scene, the voice signal in the chat scenario is gathered;Letter
Number modular converter, for the voice signal of the present frame collected to be converted to frequency-region signal;Characteristic extracting module, for extracting
The feature of the frequency-region signal;Judgment module, for judging that the voice of the present frame is believed according to the feature of the frequency-region signal
Number whether it is signal of uttering long and high-pitched sounds;Remove module, for the voice signal for judging the present frame for utter long and high-pitched sounds signal when, described will work as
The voice signal of previous frame is removed.
In order to achieve the above objectives, the application third aspect embodiment propose electronic equipment, including memory, processor and
The computer program that can be run on the memory and on the processor is stored in, the processor performs described program
When, realize the processing method that the voice described in the application first aspect embodiment is uttered long and high-pitched sounds.
In order to achieve the above objectives, the non-transitorycomputer readable storage medium that the application fourth aspect embodiment proposes,
Computer program is stored thereon with, the voice described in the application first aspect embodiment is realized when described program is executed by processor
The processing method uttered long and high-pitched sounds.
Processing method, device, electronic equipment and the computer-readable storage medium uttered long and high-pitched sounds according to the voice of the embodiment of the present application
Whether matter, detectable current scene meet the chat scenario that closely multi-person speech is hands-free, if so, in acquisition chat scenario
Voice signal, and the voice signal of the present frame collected is converted into frequency-region signal, and the feature of frequency-region signal is extracted, and root
Whether the voice signal that present frame is judged according to the feature of frequency-region signal is signal of uttering long and high-pitched sounds, if so, the voice signal by present frame
It is removed.I.e. for the closely hands-free chat scenario of multi-person speech, asked due to closely hands-free there are echo amount is excessive
Topic, in general background sound and human voice signal are smaller, when according to the feature of frequency-region signal detect exist utter long and high-pitched sounds signal when, can will
The signal is removed, and avoids howling is excessive echo is caused to inhibit sordid problem, and then winding the self-oscillation amplifying occurs
Situation improves the usage experience of user.
The additional aspect of the application and advantage will be set forth in part in the description, and will partly become from the following description
It obtains substantially or is recognized by the practice of the application.
Description of the drawings
It in order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow chart for the processing method uttered long and high-pitched sounds according to the voice of the application one embodiment;
Fig. 2 is the structure diagram for the processing unit uttered long and high-pitched sounds according to the voice of the application one embodiment;
Fig. 3 is the structure diagram for the processing unit uttered long and high-pitched sounds according to the voice of one specific embodiment of the application;
Fig. 4 is the structure diagram according to the electronic equipment of the application one embodiment.
Specific embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or has the function of same or like element.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the application, and it is not intended that limitation to the application.
Below with reference to the accompanying drawings processing method, device, electronic equipment and calculating that the voice of the embodiment of the present application is uttered long and high-pitched sounds are described
Machine readable storage medium storing program for executing.
Fig. 1 is the flow chart for the processing method uttered long and high-pitched sounds according to the voice of the application one embodiment.It should be noted that this
The processing method that the voice of application embodiment is uttered long and high-pitched sounds can be applied to the processing unit that the voice of the embodiment of the present application is uttered long and high-pitched sounds, the processing
Device can be configured in electronic equipment.Wherein, in embodiments herein, which can be mobile terminal (such as hand
Machine, tablet computer, personal digital assistant etc. have the hardware device of various operating systems).
As shown in Figure 1, the processing method that the voice is uttered long and high-pitched sounds can include:
Whether S110, detection current scene meet the chat scenario that closely multi-person speech is hands-free.Wherein, the application's
In embodiment, the hands-free chat scenario of the closely multi-person speech is used to indicate in closely scene, and there are more people to pass through
The scene of hands-free mode voice-enabled chat.
It should be noted that be adapted to closely multi-person speech hands-free for the processing method that the voice of the embodiment of the present application is uttered long and high-pitched sounds
Chat scenario, for example, suitable for more people's athletic games closely hands-free scene, there are echo amount is excessive due to closely hands-free
The problem of, it can to occur under hands-free scene self-excitation and utter long and high-pitched sounds problem.
In this step, it can detect whether current scene meets the chat scenario that closely multi-person speech is hands-free, if satisfied,
Step S120 is then performed, i.e., processing of uttering long and high-pitched sounds is carried out to the voice signal collected.As a kind of exemplary realization method, can pass through
Testing audio is sent to current scene, and is detected whether to receive in certain period of time after other equipment first receives by microphone
The testing audio exported by loud speaker, if so, can determine that the current scene is hands-free for closely multi-person speech
Chat scenario.
S120, if so, the voice signal in acquisition chat scenario, and the voice signal of the present frame collected is converted
For frequency-region signal.
Optionally, detect current scene meet described in closely multi-person speech hands-free chat scenario when, can pass through
Microphone gathers the voice signal in the chat scenario, and Discrete Fourier Transform or discrete cosine transform can be used or change
Into cosine transform, the voice signal of the present frame collected is converted into frequency-region signal.Wherein, in embodiments herein
In, the present frame is that frame in the voice signal for receiving the current time after the voice signal framing, obtained
Signal.
S130 extracts the feature of frequency-region signal.
As a kind of exemplary realization method, the single-frequency energy of the frequency point and the frequency point in the frequency-region signal can extract.It can
Selection of land can determine multiple sampling frequency points in the frequency-region signal according to preset sample frequency, wherein, each sampling frequency point is right
Answer a frequency.
For example, the voice signal collected is carried out framing, under the preset sample frequency, multiple sampling frequencies are included per frame
Point, wherein, frequency point refers to specific absolute frequency value, is after being sampled according to the preset sample frequency to every frame signal, incites somebody to action
After being ranked up per all frequencies gathered in frame signal, obtained number, so, each frequency point that samples corresponds to a frequency
Rate.
After the frequency point in extracting the frequency-region signal, it may be determined that the single-frequency energy of the frequency point.For example, due to each sampling
Frequency point corresponds to a specific signal frequency, so, each single-frequency energy sampled corresponding to frequency point is exactly that the frequency point corresponds to
Signal frequency energy (i.e. the range value of the corresponding signal frequency of the frequency point).
S140, whether the voice signal that present frame is judged according to the feature of frequency-region signal is signal of uttering long and high-pitched sounds.
Optionally, can determine whether the frequency point in the frequency-region signal and single-frequency energy is to judge the voice signal of the present frame
No is signal of uttering long and high-pitched sounds.As a kind of exemplary realization method, can determine whether frequency point in the frequency-region signal single-frequency energy whether
In preset time period exponentially type rise, if so, determine whether this exponentially type rise after single-frequency energy value it is whether big
In predetermined threshold value, if so, can determine that the voice signal of the present frame for signal of uttering long and high-pitched sounds.
That is, the single-frequency energy of the frequency point in preset time period can be counted, and judge the single-frequency energy of the frequency point
Amount whether in the preset time period exponentially type rises and is more than predetermined threshold value, if so, can determine that the language of the present frame
Sound signal is signal of uttering long and high-pitched sounds.
S150, if so, the voice signal of present frame is removed.
Optionally, the voice signal for judging the present frame for utter long and high-pitched sounds signal when, the signal of the access can be removed.
As a kind of example, processing can be reset to the frequency-region signal and is converted back into time domain to eliminate signal of uttering long and high-pitched sounds.Thus, it is possible to it avoids making a whistling sound
Calling excessive causes echo to inhibit sordid problem.
In conclusion the processing method that the voice of the embodiment of the present application is uttered long and high-pitched sounds, is chatted for closely multi-person speech is hands-free
Its scene, due to closely it is hands-free there are echo amount it is excessive the problem of, in general background sound and human voice signal are smaller, work as detection
The signal risen to single-frequency nergy Index type, and when the signal amplitude is more than certain threshold value, it can be by the target signal filter, to reduce back
The input quantity of sound avoids winding self-excitation, while improves double stress results.
According to the processing method that the voice of the embodiment of the present application is uttered long and high-pitched sounds, whether detectable current scene meets closely more people
The hands-free chat scenario of voice, if so, the voice signal in acquisition chat scenario, and the voice of the present frame collected is believed
Number frequency-region signal is converted to, and extracts the feature of frequency-region signal, and judge that the voice of present frame is believed according to the feature of frequency-region signal
Number whether it is signal of uttering long and high-pitched sounds, if so, the voice signal of present frame is removed.It is i.e. hands-free for closely multi-person speech
Chat scenario, due to closely it is hands-free there are echo amount it is excessive the problem of, in general background sound and human voice signal are smaller, work as root
According to the feature of frequency-region signal detect exist utter long and high-pitched sounds signal when, which can be removed, avoid howling is excessive from causing back
Sound inhibits sordid problem, and then the situation of winding the self-oscillation amplifying occurs, improve it is double stress results, and improve making for user
With experience.
Corresponding with the processing method that the voice that above-mentioned several embodiments provide is uttered long and high-pitched sounds, a kind of embodiment of the application also carries
For the processing unit that a kind of voice is uttered long and high-pitched sounds, due to the processing unit that voice provided by the embodiments of the present application is uttered long and high-pitched sounds and above-mentioned several realities
Apply that the processing method uttered long and high-pitched sounds of voice of example offer is corresponding, therefore the embodiment for the processing method uttered long and high-pitched sounds in aforementioned voice is also fitted
For the processing unit that voice provided in this embodiment is uttered long and high-pitched sounds, it is not described in detail in the present embodiment.Fig. 2 is according to the application
The structure diagram for the processing unit that the voice of one embodiment is uttered long and high-pitched sounds.It should be noted that the voice of the embodiment of the present application is maked a whistling sound
The processing unit cried can be configured in electronic equipment.Wherein, in embodiments herein, which can be mobile whole
End (such as mobile phone, tablet computer, personal digital assistant have the hardware device of various operating systems).
As shown in Fig. 2, the processing unit 200 that the voice is uttered long and high-pitched sounds can include:Detection module 210, acquisition module 220, letter
Number modular converter 230, characteristic extracting module 240, judgment module 250 and remove module 260.
Specifically, detection module 210 is used to detect whether current scene meets the hands-free chat field of closely multi-person speech
Scape.
Acquisition module 220 is used to, when detecting that current scene meets the hands-free chat scenario of closely multi-person speech, adopt
Collect the voice signal in chat scenario.
Signal conversion module 230 is used to the voice signal of the present frame collected being converted to frequency-region signal.
Characteristic extracting module 240 is used to extract the feature of frequency-region signal.As a kind of example, this feature extraction module 240
It can extract the single-frequency energy of the frequency point and the frequency point in the frequency-region signal.
Judgment module 250 is used to judge whether the voice signal of present frame is signal of uttering long and high-pitched sounds according to the feature of frequency-region signal.
As a kind of example, as shown in figure 3, the judgment module 250 may include:First judging unit 251,252 and of second judgment unit
Identifying unit 253.
Wherein, whether the first judging unit 251 is used to judge the single-frequency energy of the frequency point in frequency-region signal in preset time
Exponentially type rises in section;Second judgment unit 252 is used for the single-frequency energy of the frequency point in frequency-region signal in preset time period
When inside exponentially type rises, judge whether the single-frequency energy value after exponentially type rises is more than predetermined threshold value;Identifying unit 253 is used
When the single-frequency energy value after the rising of exponentially type is more than predetermined threshold value, judge the voice signal of present frame for signal of uttering long and high-pitched sounds.
Remove module 260 be used for the voice signal for judging present frame for utter long and high-pitched sounds signal when, by the voice signal of present frame
It is removed.As a kind of example, this can reset the frequency-region signal processing and be converted back into time domain to eliminate signal of uttering long and high-pitched sounds.
According to the processing unit that the voice of the embodiment of the present application is uttered long and high-pitched sounds, whether full current scene can be detected by detection module
The sufficient closely hands-free chat scenario of multi-person speech, if so, acquisition module then gathers the voice signal in chat scenario, signal turns
The voice signal of the present frame collected is converted to frequency-region signal by mold changing block, and characteristic extracting module extracts the spy of frequency-region signal
Sign, judgment module judges whether the voice signal of present frame is signal of uttering long and high-pitched sounds according to the feature of frequency-region signal, if so, remove module
Then the voice signal of present frame is removed.I.e. for the closely hands-free chat scenario of multi-person speech, due to closely exempting from
It withdraws deposit echo amount is excessive the problem of, in general background sound and human voice signal are smaller, are detected when according to the feature of frequency-region signal
Go out to exist when uttering long and high-pitched sounds signal, which can be removed, avoid howling is excessive echo is caused to inhibit sordid problem, into
And there is the situation of winding the self-oscillation amplifying, improve it is double stress results, and improve the usage experience of user.
In order to realize above-described embodiment, the application also proposed a kind of electronic equipment.
Fig. 4 is the structure diagram according to the electronic equipment of the application one embodiment.It should be noted that in the application
Embodiment in, the electronic equipment can be mobile terminal (such as mobile phone, tablet computer, personal digital assistant have various behaviour
Make the hardware device of system).
As shown in figure 4, the electronic equipment 400 can include:Memory 410, processor 420 and it is stored in memory 410
Computer program 430 that is upper and being run on processor 420, when processor 420 performs described program 430, realizes the application
The processing method that voice described in any of the above-described a embodiment is uttered long and high-pitched sounds.
In order to realize above-described embodiment, the application also proposed a kind of non-transitorycomputer readable storage medium, thereon
Computer program is stored with, realizes that the voice described in any of the above-described a embodiment of the application is maked a whistling sound when described program is executed by processor
The processing method cried.
In the description of the present application, it is to be understood that term " first ", " second " are only used for description purpose, and cannot
It is interpreted as indicating or implies relative importance or imply the quantity of the technical characteristic indicated by indicating.Define as a result, " the
One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In the description of the present application, " multiple "
It is meant that at least two, such as two, three etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description
Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms is not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It is combined in an appropriate manner in one or more embodiments or example.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the different embodiments described in this specification or example and different embodiments or exemplary feature
It closes and combines.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include
Module, segment or the portion of the code of the executable instruction of one or more the step of being used to implement specific logical function or process
Point, and the scope of the preferred embodiment of the application includes other realization, wherein can not press shown or discuss suitable
Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
Row system, device or equipment instruction fetch and the system executed instruction) it uses or combines these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass
Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment
It puts.The more specific example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Connecting portion (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can be for example by carrying out optical scanner to paper or other media, then into edlin, interpretation or if necessary with it
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combination thereof.Above-mentioned
In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage
Or firmware is realized.If for example, with hardware come realize in another embodiment, can be under well known in the art
Any one of row technology or their combination are realized:With for the logic gates to data-signal realization logic function
Discrete logic, have suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries
Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, one or a combination set of the step of including embodiment of the method.
In addition, each functional unit in each embodiment of the application can be integrated in a processing module, it can also
That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould
The form that hardware had both may be employed in block is realized, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and is independent production marketing or in use, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although it has been shown and retouches above
Embodiments herein is stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the application
System, those of ordinary skill in the art can be changed above-described embodiment, change, replace and become within the scope of application
Type.
Claims (10)
1. the processing method that a kind of voice is uttered long and high-pitched sounds, which is characterized in that comprise the following steps:
Whether detection current scene meets the chat scenario that closely multi-person speech is hands-free;
If so, gathering the voice signal in the chat scenario, and the voice signal of the present frame collected is converted into frequency
Domain signal;
Extract the feature of the frequency-region signal;
Whether the voice signal that the present frame is judged according to the feature of the frequency-region signal is signal of uttering long and high-pitched sounds;
If so, the voice signal of the present frame is removed.
2. the method as described in claim 1, which is characterized in that the feature of the extraction frequency-region signal, including:
Extract the single-frequency energy of the frequency point and the frequency point in the frequency-region signal.
3. method as claimed in claim 2, which is characterized in that the feature according to frequency-region signal judges the present frame
Whether voice signal is signal of uttering long and high-pitched sounds, including:
Judge the frequency point in the frequency-region signal single-frequency energy whether in preset time period exponentially type rise;
If so, determining whether the single-frequency energy value after exponentially type rising is more than predetermined threshold value;
If so, judge the voice signal of the present frame for signal of uttering long and high-pitched sounds.
4. the method as described in claim 1, which is characterized in that it is described to remove the voice signal of present frame, including:
Processing is reset to the frequency-region signal and is converted back into time domain to eliminate signal of uttering long and high-pitched sounds.
5. a kind of processing unit that voice is uttered long and high-pitched sounds, which is characterized in that including:
Detection module, for detecting whether current scene meets the chat scenario that closely multi-person speech is hands-free;
Acquisition module, for when detecting that current scene meets the closely hands-free chat scenario of multi-person speech, described in acquisition
Voice signal in chat scenario;
Signal conversion module, for the voice signal of the present frame collected to be converted to frequency-region signal;
Characteristic extracting module, for extracting the feature of the frequency-region signal;
Judgment module, for judging whether the voice signal of the present frame is letter of uttering long and high-pitched sounds according to the feature of the frequency-region signal
Number;
Remove module, for the voice signal for judging the present frame for utter long and high-pitched sounds signal when, the voice of the present frame is believed
It number is removed.
6. device as claimed in claim 5, which is characterized in that the characteristic extracting module is specifically used for:
Extract the single-frequency energy of the frequency point and the frequency point in the frequency-region signal.
7. device as claimed in claim 6, which is characterized in that the judgment module includes:
First judging unit, for judging the single-frequency energy of the frequency point in the frequency-region signal whether in preset time period
Exponentially type rises;
Second judgment unit, the single-frequency energy for the frequency point in the frequency-region signal are in the preset time period
When exponential type rises, judge whether the single-frequency energy value after exponentially type rises is more than predetermined threshold value;
Identifying unit, when being more than the predetermined threshold value for the single-frequency energy value after exponentially type rising, described in judgement
The voice signal of present frame is signal of uttering long and high-pitched sounds.
8. device as claimed in claim 5, which is characterized in that the remove module is specifically used for:
Processing is reset to the frequency-region signal and is converted back into time domain to eliminate signal of uttering long and high-pitched sounds.
9. a kind of electronic equipment, including memory, processor and it is stored on the memory and can transports on the processor
Capable computer program, which is characterized in that when the processor performs described program, realize such as any one of claims 1 to 4
The processing method that the voice is uttered long and high-pitched sounds.
10. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the journey
The processing method that voice is uttered long and high-pitched sounds according to any one of claims 1 to 4 is realized when sequence is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711243578.XA CN108053833A (en) | 2017-11-30 | 2017-11-30 | Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711243578.XA CN108053833A (en) | 2017-11-30 | 2017-11-30 | Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108053833A true CN108053833A (en) | 2018-05-18 |
Family
ID=62121759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711243578.XA Pending CN108053833A (en) | 2017-11-30 | 2017-11-30 | Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108053833A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429858A (en) * | 2018-03-12 | 2018-08-21 | 广东欧珀移动通信有限公司 | Voice communication data processing method, device, storage medium and mobile terminal |
CN109831732A (en) * | 2019-02-25 | 2019-05-31 | 天津大学 | Intelligent chauvent's criterion device and method based on smart phone |
CN112750461A (en) * | 2020-02-26 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Voice communication optimization method and device, electronic equipment and readable storage medium |
WO2021203603A1 (en) * | 2020-04-10 | 2021-10-14 | 南京拓灵智能科技有限公司 | Howling suppression method and apparatus, and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011135485A (en) * | 2009-12-25 | 2011-07-07 | Audio Technica Corp | Howling suppression apparatus |
CN204790994U (en) * | 2015-07-17 | 2015-11-18 | 廖加斌 | Multi -function display for tuning |
CN105812993A (en) * | 2014-12-29 | 2016-07-27 | 联芯科技有限公司 | Howling detection and suppression method and device |
CN105895115A (en) * | 2016-04-01 | 2016-08-24 | 北京小米移动软件有限公司 | Squeal determining method and squeal determining device |
CN106303878A (en) * | 2015-05-22 | 2017-01-04 | 成都鼎桥通信技术有限公司 | One is uttered long and high-pitched sounds and is detected and suppressing method |
CN106453762A (en) * | 2016-11-02 | 2017-02-22 | 上海数果科技有限公司 | A method and system for processing voice whistlers in an audio system |
-
2017
- 2017-11-30 CN CN201711243578.XA patent/CN108053833A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011135485A (en) * | 2009-12-25 | 2011-07-07 | Audio Technica Corp | Howling suppression apparatus |
CN105812993A (en) * | 2014-12-29 | 2016-07-27 | 联芯科技有限公司 | Howling detection and suppression method and device |
CN106303878A (en) * | 2015-05-22 | 2017-01-04 | 成都鼎桥通信技术有限公司 | One is uttered long and high-pitched sounds and is detected and suppressing method |
CN204790994U (en) * | 2015-07-17 | 2015-11-18 | 廖加斌 | Multi -function display for tuning |
CN105895115A (en) * | 2016-04-01 | 2016-08-24 | 北京小米移动软件有限公司 | Squeal determining method and squeal determining device |
CN106453762A (en) * | 2016-11-02 | 2017-02-22 | 上海数果科技有限公司 | A method and system for processing voice whistlers in an audio system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108429858A (en) * | 2018-03-12 | 2018-08-21 | 广东欧珀移动通信有限公司 | Voice communication data processing method, device, storage medium and mobile terminal |
CN108429858B (en) * | 2018-03-12 | 2020-05-12 | Oppo广东移动通信有限公司 | Voice call data processing method and device, storage medium and mobile terminal |
CN109831732A (en) * | 2019-02-25 | 2019-05-31 | 天津大学 | Intelligent chauvent's criterion device and method based on smart phone |
CN112750461A (en) * | 2020-02-26 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Voice communication optimization method and device, electronic equipment and readable storage medium |
CN112750461B (en) * | 2020-02-26 | 2023-08-01 | 腾讯科技(深圳)有限公司 | Voice communication optimization method and device, electronic equipment and readable storage medium |
WO2021203603A1 (en) * | 2020-04-10 | 2021-10-14 | 南京拓灵智能科技有限公司 | Howling suppression method and apparatus, and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108053833A (en) | Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds | |
CN107910014B (en) | Echo cancellation test method, device and test equipment | |
CN103348730B (en) | The Quality of experience of voice service is measured | |
CN104980337B (en) | A kind of performance improvement method and device of audio processing | |
CN108022591A (en) | The processing method of speech recognition, device and electronic equipment in environment inside car | |
CN106911996A (en) | The detection method of microphone state, device and terminal device | |
CN108159702B (en) | Multi-player voice game processing method and device | |
CN109036412A (en) | voice awakening method and system | |
MX2008016354A (en) | Detecting an answering machine using speech recognition. | |
CN108305637A (en) | Earphone method of speech processing, terminal device and storage medium | |
CN109036393A (en) | Wake-up word training method, device and the household appliance of household appliance | |
CN110113497A (en) | Voice calling-out method, device and terminal based on interactive voice | |
CN110956976B (en) | Echo cancellation method, device and equipment and readable storage medium | |
CN110705309B (en) | Service quality evaluation method and system | |
CN106847305A (en) | A kind of method and device of the recording data for processing service calls | |
CN105118522A (en) | Noise detection method and device | |
CN113241085B (en) | Echo cancellation method, device, equipment and readable storage medium | |
CN109785845A (en) | Method of speech processing, device and equipment | |
CN109166571A (en) | Wake-up word training method, device and the household appliance of household appliance | |
CN107566663A (en) | Detect the dialing tone on telephone line | |
CN110299144A (en) | Audio mixing method, server and client | |
CN112242135A (en) | Voice data processing method and intelligent customer service device | |
CN114694678A (en) | Sound quality detection model training method, sound quality detection method, electronic device, and medium | |
CN107452398A (en) | Echo acquisition methods, electronic equipment and computer-readable recording medium | |
CN108540680B (en) | Switching method and device of speaking state and conversation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18 Applicant after: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS Corp.,Ltd. Address before: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18 Applicant before: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS Corp.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180518 |