CN108152788A - Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium - Google Patents

Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium Download PDF

Info

Publication number
CN108152788A
CN108152788A CN201711416776.1A CN201711416776A CN108152788A CN 108152788 A CN108152788 A CN 108152788A CN 201711416776 A CN201711416776 A CN 201711416776A CN 108152788 A CN108152788 A CN 108152788A
Authority
CN
China
Prior art keywords
sound
audio signal
zero
source follow
crossing rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711416776.1A
Other languages
Chinese (zh)
Inventor
田拓
来意哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian TCL Software Development Co Ltd
Original Assignee
Xian TCL Software Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian TCL Software Development Co Ltd filed Critical Xian TCL Software Development Co Ltd
Priority to CN201711416776.1A priority Critical patent/CN108152788A/en
Publication of CN108152788A publication Critical patent/CN108152788A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction

Abstract

The invention discloses a kind of sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium, which includes:Obtain energy threshold and zero-crossing rate threshold value;According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;According to sound bearing information, the sound collection direction of terminal is determined.The present invention to burst audio signal by carrying out threshold value restriction, to increase the sound Sources Detection to the sound end that happens suddenly, so as to make emergency reaction to accident, the interference of noise sound source is avoided to improve voice tracking and speech recognition accuracy and real-time, reduce influence of noise, it realizes more sound source direction findings, effective position and extraction is carried out to the audio-frequency information of sound source, greatly improve the working efficiency of sound-source follow-up equipment.

Description

Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium
Technical field
The present invention relates to a kind of sound-source follow-up technical field more particularly to sound-source follow-up method, sound-source follow-up equipment and meters Calculation machine readable storage medium storing program for executing.
Background technology
At present, it in many spatial scenes such as hotel's protection and monitor field, large-scale report meeting-place, news spot, usually needs Microphone array is wanted to carry out far field pickup, to track the voice of spokesman in scene.
But there are following defects for existing microphone array, do not happen suddenly speech terminals detection, it is impossible to accident As emergency reaction, and easily, so as to influence the effect of far field pickup, microphone is caused by the noise jamming of other sound sources Array decrease to some degree in the accuracy and real-time on location tracking voice, causes microphone array correct Ground gets the voice messaging of spokesman, significantly reduces the working efficiency of microphone array.
Invention content
It is a primary object of the present invention to provide a kind of sound-source follow-up method, sound-source follow-up equipment and computer-readable storage Medium, it is intended to which the technology for solving the tracing and positioning inefficiency of accuracy that microphone array is listed in the pickup of far field and real-time is asked Topic.
To achieve the above object, the embodiment of the present invention provides a kind of sound-source follow-up method, the sound-source follow-up method application In sound-source follow-up terminal, the sound-source follow-up method includes:
Obtain energy threshold and zero-crossing rate threshold value;
According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;
Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
According to sound bearing information, the sound collection direction of terminal is determined.
Preferably, it is described to include according to energy threshold and zero-crossing rate threshold test and the step of acquiring burst audio signal:
It obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than the live sound of zero-crossing rate threshold value Frequency signal is set as burst audio signal.
Preferably, described pair of burst audio signal parses, to obtain the sound bearing information of burst audio signal Step includes:
The maximal audio signal of energy value maximum in all burst audio signals is obtained, and maximum sound is determined according to energy value The time delay value of frequency signal;
Time frequency point all in burst audio signal is obtained according to signal time delay value;
All time frequency points are subjected to clustering processing, to obtain sound bearing information.
Preferably, described the step of all time frequency points are carried out clustering processing, includes:
Noise reduction process is carried out to all time frequency points, to get noise reduction time frequency point;
All noise reduction time frequency points are subjected to clustering processing, to obtain sound bearing information.
Preferably, described according to sound bearing information, the step of sound collection direction for determining terminal, includes:
When detecting multi-acoustical azimuth information, the beam energy of each sound bearing information is obtained;
The direction of the sound bearing information of beam energy maximum is determined as to the sound collection direction of terminal.
Preferably, described the step of obtaining energy threshold and zero-crossing rate threshold value, includes:
Sample audio signal in default acquisition range is acquired according to default test condition;
It is calculated according to sample audio signal, to obtain energy threshold and zero-crossing rate threshold value.
In addition, to achieve the above object, the present invention also provides a kind of sound-source follow-up equipment, the sound-source follow-up equipment packet It includes:Memory, processor, communication bus and the sound-source follow-up program being stored on the memory,
The communication bus is used to implement the communication connection between processor and memory;
The processor is for performing the sound-source follow-up program, to realize following steps:
Obtain energy threshold and zero-crossing rate threshold value;
According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;
Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
According to sound bearing information, the sound collection direction of terminal is determined.
Preferably, it is described to include according to energy threshold and zero-crossing rate threshold test and the step of acquiring burst audio signal:
It obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than the live sound of zero-crossing rate threshold value Frequency signal is set as burst audio signal.
Preferably, described pair of burst audio signal parses, to obtain the sound bearing information of burst audio signal Step includes:
The maximal audio signal of energy value maximum in all burst audio signals is obtained, and maximum sound is determined according to energy value The time delay value of frequency signal;
Time frequency point all in burst audio signal is obtained according to signal time delay value;
All time frequency points are subjected to clustering processing, to obtain sound bearing information.
Preferably, described the step of all time frequency points are carried out clustering processing, includes:
Noise reduction process is carried out to all time frequency points, to get noise reduction time frequency point;
All noise reduction time frequency points are subjected to clustering processing, to obtain sound bearing information.
Preferably, described according to sound bearing information, the step of sound collection direction for determining terminal, includes:
When detecting multi-acoustical azimuth information, the beam energy of each sound bearing information is obtained;
The direction of the sound bearing information of beam energy maximum is determined as to the sound collection direction of terminal.
Preferably, described the step of obtaining energy threshold and zero-crossing rate threshold value, includes:
Sample audio signal in default acquisition range is acquired according to default test condition;
It is calculated according to sample audio signal, to obtain energy threshold and zero-crossing rate threshold value.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Storage medium storage there are one either more than one program the one or more programs can by one or one with On processor perform for:
Obtain energy threshold and zero-crossing rate threshold value;
According to energy threshold and zero-crossing rate threshold test and obtain burst audio signal;
Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
According to sound bearing information, the sound collection direction of terminal is determined.
The present invention is by obtaining energy threshold and zero-crossing rate threshold value;According to energy threshold and zero-crossing rate threshold test and obtain Happen suddenly audio signal;Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;According to sound source Azimuth information determines the sound collection direction of terminal.The present invention to burst audio signal by carrying out threshold value restriction, with increase pair The sound Sources Detection of burst sound end, so as to make emergency reaction to accident, avoids the interference of noise sound source from improving Voice is tracked and speech recognition accuracy and real-time, reduces influence of noise, more sound source direction findings is realized, to the audio-frequency information of sound source Effective position and extraction are carried out, greatly improves the working efficiency of sound-source follow-up equipment.
Description of the drawings
Fig. 1 is the flow diagram of one preferred embodiment of sound-source follow-up method of the present invention;
Fig. 2 is the refinement flow diagram of step S40 in Fig. 1;
Fig. 3 is the refinement flow diagram of step S20 in Fig. 1;
Fig. 4 is the device structure schematic diagram of hardware running environment that present invention method is related to;
Fig. 5 is sound-source follow-up terminal near field spherical wave model of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of sound-source follow-up method, and the sound-source follow-up method is applied to sound-source follow-up terminal, in sound source In method for tracing first embodiment, with reference to Fig. 1, the sound-source follow-up method includes:
Step S10 obtains energy threshold and zero-crossing rate threshold value;
The audio model has short-time average energy, and the short-time average energy can carry out voiced sound analysis to voice signal (short-time average energy of voiced sound is bigger than voiceless sound short-time average energy;Can also be used to distinguish the demarcating of initial consonant and simple or compound vowel of a Chinese syllable, it is noiseless and Sound boundary etc..And energy threshold is that the thresholding of short-time average energy is defined, it, can be to voice signal by energy threshold Screening and filtering is carried out, to ensure the audio signal acquired in follow-up sound-source follow-up terminal as accurate clearly information signal.
The zero-crossing rate threshold value is referred in discrete time voice signal, if adjacent signal sampling has Different algebraic symbols is known as that zero passage has occurred, and the number of zero passage is known as short-time zero-crossing rate in the unit interval, described short When zero-crossing rate refer to a signal sign change ratio, be signal frequency simple metric.And zero-crossing rate threshold value is pair The thresholding of zero-crossing rate defines, and by zero-crossing rate threshold value, can carry out screening and filtering to voice signal, ensure follow-up sound-source follow-up terminal Acquired audio signal is not invalid irregular information signal.
It is understood that it is higher using the recognition efficiency of short-time average energy when ambient noise is smaller, and carrying on the back It is higher using the recognition efficiency of zero-crossing rate when scape noise is bigger, but it is often the case that two parametric joints are identified.Energy It measures threshold value and zero-crossing rate threshold value and qualified sex determination is carried out to collected audio signal all in accordance with actual conditions.
The step of acquisition energy threshold and zero-crossing rate threshold value, includes:
Step S11 acquires the sample audio signal in default acquisition range according to default test condition;
The acquisition modes of energy threshold and zero-crossing rate threshold value can be user's manual setting or sound-source follow-up is whole The production firm at end is preset or according to actual conditions debugging etc..And wherein it is best able to quality assurance effect Mode is debugged according to actual conditions.Concrete condition is that the application scenarios of sound-source follow-up terminal are usually that safety monitoring is led The important spaces scenes such as domain, large-scale report meeting-place, video conference scene, news spot, speech meeting-place, therefore matter is acquired to sound source It measures more demanding, needs to be made adjustment in due course according to site environment.
Different site environments also has no environmental impact factor, and it is radio reception effect that wherein influence factor maximum is most important Fruit.Assuming that sound-source follow-up terminal apart from sound source (spokesman or playback equipment etc.) farther out, then in current field scene, sound Terminal is tracked in source to be influenced to lead to the reduction of radio reception effect by other sound sources by inevitably.Therefore, debugging process is mainly The difference of normal radio reception effect and disturbed radio reception effect is simulated, and carries out adaptation processing.Therefore be normally carried out cash register it It is preceding, it can be achieved that setting one ecotopia, carry out optimum efficiency sampling.Specially according to the default default acquisition model of test condition acquisition Interior sample audio signal is enclosed, the default test condition generally can be the quiet glitch-free condition of site environment;It can also Being can normal condition of slightly noisy environment under radio reception situation etc..It is carried out under default test condition in default acquisition range Audio signal sample, the default acquisition range is the radio reception model that sets of radio reception effect to ensure sound-source follow-up terminal It encloses, the radio reception range of terminal is more remote, and the requirement to the parsing recognition capability of audio signal is higher, to the specification demands of hardware Also bigger, this can cause terminal volume to increase, therefore, it is necessary to set a default acquisition range, as sound-source follow-up terminal Reasonable radio reception range, while radio reception mass effect is ensured, avoid that overcritical on terminal hardware is caused to operate constant.
Step S12, is calculated according to sample audio signal, to obtain energy threshold and zero-crossing rate threshold value.
After sample audio signal is collected, sound-source follow-up terminal will carry out at data calculating sample audio signal Reason.It should be noted that stabilization and elevation references to ensure energy threshold and zero-crossing rate threshold value, the number of sample audio signal Amount is more, and density is higher, can more calculate and get accurate energy value and zero-crossing rate threshold value.
Optionally, it is below energy threshold and a preferred embodiment of zero-crossing rate threshold value:
Analog signal is passed through built-in multi-channel sound synchronous acquisition module by sample audio signal by sound-source follow-up terminal It is converted into digital signal and gives DSP (Digital Signal Processing) chip, dsp chip calculates the short-time energy of sample audio signal And short-time zero-crossing rate;Each frame is denoted as, n=1, and 2 ... N, n are discrete audio sig time series, and N is frame length, and i represents frame number. Then the energy threshold of each frame audio signal is:
And the zero-crossing rate threshold value of each frame audio signal is:
That is, Ei and Zi are respectively the energy threshold of the sample audio signal and zero-crossing rate threshold value.
Step S20 according to energy threshold and zero-crossing rate threshold test and acquires burst audio signal;
Sound-source follow-up terminal, can be by energy threshold and zero-crossing rate after energy threshold and zero-crossing rate threshold value is got Threshold value carries out Effective selection and judgement as accurately reference data to subsequent audio signal.Pass through energy threshold and zero passage Rate threshold value, sound-source follow-up terminal can detect the current all burst audio signals terminated in real time.In actual life, to sound source Detection identification tracing process in often detect the audio signal of burst.Such as go out suddenly in the quiet spatial scene of script Audio-frequency information is showed, then sound-source follow-up terminal needs to be tracked audio signal detection, to determine the audio currently occurred Whether information is effective information.If it is determined that whether the mode of effective information can be examined by energy threshold and zero-crossing rate threshold value It surveys.
It is described to include according to energy threshold and zero-crossing rate threshold test and the step of acquiring burst audio signal with reference to Fig. 2:
Step S21 obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
The live audio signal refer to sound-source follow-up terminal audio signal at the scene, sound-source follow-up terminal it is main It is to be acquired, and be converted into live audio signal, while live audio signal is solved by the audio-frequency information to scene Analysis, to get the energy value of live audio signal and zero-crossing rate.It can be seen from the above, energy value and zero-crossing rate are the audio that happens suddenly The short-time average energy and short-time zero-crossing rate of signal.
Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than zero-crossing rate threshold value by step S22 Live audio signal be set as burst audio signal.
After the energy value and zero-crossing rate for getting live audio signal, sound-source follow-up terminal will be to energy value and zero passage Rate carries out threshold test.Assuming that energy value is more than energy threshold, it was demonstrated that the energy value of current live audio signal is up to standard;Assuming that Zero-crossing rate is more than zero-crossing rate threshold value, then proves that the zero-crossing rate of current live audio signal is up to standard.But the detection decision process In, only the energy value of live audio signal and zero-crossing rate are up to standard simultaneously, could be confirmed as the live audio signal up to standard Happen suddenly audio signal, and otherwise the live audio signal is invalid audio signal.That is, it is assumed that in live audio signal Energy value is more than energy threshold, and zero-crossing rate is not more than zero-crossing rate threshold value;Or the zero-crossing rate in live audio signal was more than Zero rate threshold value, and when energy value is not more than energy threshold, sound-source follow-up terminal will assert the burst audio signal to be underproof Invalid audio signal.By energy threshold and the double definition of zero-crossing rate threshold value, sound-source follow-up terminal can be by unsharp, nothing It imitates irregular noise effectively to be filtered, so as to obtain the really necessary burst audio signal wanted, avoids the occurrence of the sound of acquisition The not available phenomenon of frequency signal occurs.
Step S30 parses burst audio signal, to obtain the sound bearing information of burst audio signal;
Many information are contained in burst audio signal, including audio collection intensity, audio frequency and audio semanteme etc. A variety of effective informations, but be required for parse and can getting.By the parsing to the audio signal that happens suddenly, can obtain a large amount of Data are parsed, and these parse data, can be specifically directed towards the source of this section burst audio signal.
Specifically, with reference to Fig. 3, parsing explanation will be carried out by example below, described pair of burst audio signal parses, Included the step of the sound bearing information of burst audio signal with obtaining:
Step S31 obtains the maximal audio signal of energy value maximum in all burst audio signals, and according to energy value simultaneously The signal time delay value of maximal audio signal is determined according to energy value;
Burst audio signal acquired in sound-source follow-up terminal may be discontinuous, therefore the energy in the audio signal that happens suddenly Magnitude also can be discontinuous, therefore there are certain energy value differences.That is, in the burst audio signal acquired, The frame energy of each frame has and the totally different of energy value can occurs because of the articulation type difference of sound source, and energy value maximum Audio signal is typically the main sound source information of current application scene, such as in news briefing, the speech for the spokesman that gives a lecture It is the audio signal of volume highest in focus and meeting-place (i.e. energy value is maximum).Sound-source follow-up terminal need to get all prominent The maximal audio signal of energy value maximum in audio signal is sent out, the frame energy in each frame signal, sound-source follow-up terminal can The time delay value of the maximal audio signal is determined, because time delay value can be embodied by the variation tendency of frame energy.And usually, In the application scenarios of sound-source follow-up terminal, the sound for the spokesman that gives a lecture will be more stable, the energy value of audio signal Stable version can be presented.
Step S32 obtains time frequency point all in burst audio signal according to signal time delay value;
All time frequency points are carried out clustering processing, to obtain sound bearing information by step S33.
By signal time delay value, terminal can determine that all time frequency points in burst audio signal, so that it is determined that all time-frequencies The accurate location of point, and according to time frequency point, terminal can carry out it clustering processing, to judge that terminal detects that the burst audio is believed Number direction position, so as to obtain burst audio signal sound bearing information.Clustering processing mainly clicks through different time-frequencies Row statistical disposition to determine whether the signal frame present in the different frame moment is effective or clearly signal frame, and is effectively believed Signal strength in number frame can identify the direction source of the signal frame, by multiple letters in one section of burst audio signal The clustering processing of number frame, terminal can count accurate sound bearing information.
It is described that all time frequency points are subjected to clustering processing, included the step of sound bearing information with obtaining:
Step S331 carries out noise reduction process, to get noise reduction time frequency point to all time frequency points;
All noise reduction time frequency points are carried out clustering processing, to obtain sound bearing information by step S332.
There may be some free invalid signals points in time frequency point, to avoid invalid signals point to obtaining sound bearing letter The interference of breath, the present embodiment will carry out noise reduction process to time frequency point, to get noise reduction time frequency point.It will dissociate in all time frequency points Invalid time frequency point is filtered, is isolated or softening, so as to reduce or eliminate discrete time frequency point, can intuitively show time-frequency Feature, while be conducive to improve the identification of all time frequency points, facilitate subsequent operation processing.
Step S40 according to sound bearing information, determines the sound collection direction of terminal.
After getting sound bearing information, terminal can be directed to sound bearing information, determine the position range of sound source, and The reception antenna of terminal acquisition audio signal or acquisition module are further accurately positioned, may around be existed to filter out Noise, avoid influence of the disturbing factor to radio reception effect.It the reception antenna in sound-source follow-up terminal or adopts in the present embodiment Collection module can be set to rotatable harvester, after sound bearing information is got, can harvester be carried out displacement, with It is more convenient effectively to collect audio signal.For example, among debate competition, sound-source follow-up terminal can track square and negative side simultaneously Speech, it is rapid to determine square Sounnd source direction information when taking turns to square speech, the harvester in terminal is turned to Towards the position of square Sounnd source direction;And when taking turns to negative side's speech, terminal can determine rapidly the sound source side of negative side by analysis To information, the harvester in terminal is turned to towards the position on negative side's Sounnd source direction, so as to fulfill obtaining in high precision The purpose of the audio signal of speaking party.
The present invention is by obtaining energy threshold and zero-crossing rate threshold value;According to energy threshold and zero-crossing rate threshold test and obtain Happen suddenly audio signal;Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;According to sound source Azimuth information determines the sound collection direction of terminal.The present invention to burst audio signal by carrying out threshold value restriction, with increase pair The sound Sources Detection of burst sound end, so as to make emergency reaction to accident, avoids the interference of noise sound source from improving Voice is tracked and speech recognition accuracy and real-time, reduces influence of noise, more sound source direction findings is realized, to the audio-frequency information of sound source Effective position and extraction are carried out, greatly improves the working efficiency of sound-source follow-up equipment.
Further, on the basis of sound-source follow-up method first embodiment of the present invention, sound-source follow-up side of the present invention is proposed Method second embodiment, with reference to Fig. 2, with previous embodiment difference lies in, it is described according to sound bearing information, determine the sound of terminal The step of sound acquisition direction, includes:
Step S41 when detecting multi-acoustical azimuth information, obtains the beam energy of each sound bearing information;
The direction of the sound bearing information of beam energy maximum is determined as the sound collection direction of terminal by step S42.
Assuming that terminal gets multi-acoustical azimuth information simultaneously, then different sound bearing information will currently be confirmed as Sound collection direction, such as current sound-source follow-up terminal are detected simultaneously by two or more sound in news briefing Source, respectively Chinese speech and translator of English.It so proves under current scene, Chinese speech and translator of English have belonged to Sound source is imitated, should be acquired simultaneously by flow.But in the present embodiment, translator of English is the version conversion to Chinese speech, It makes a speech relative to the Chinese of master, the volume (i.e. energy value) of translator of English can down slightly.So chased after for convenience of sound source Track terminal carries out maximized sound-source follow-up, and terminal will pass through an energy decision process, to determine the direction to be tracked.
Specifically, after terminal detects multi-acoustical azimuth information, terminal will directly acquire each sound bearing information Beam energy.The beam energy refers to the sound source power of energy value maximum detected in sound bearing information.Wave Beam energy is bigger, then corresponding energy value is bigger, also means that the collected corresponding sound source of current institute is main sound source.
After beam energy is determined, terminal can determine most important sound source in lower current scene environment, thus by wave beam The direction of the sound bearing information of energy maximum is determined as the sound collection direction of terminal.
Optionally, in the present embodiment, sound source and sound-source follow-up terminal is distant, i.e., multiple in sound-source follow-up terminal Amplitude fading difference very little between microphone, can be approximately considered equal, be plane wave model.When information source is from sound-source follow-up end When end is nearer, the far field model based on plane wave front is no longer applicable in, it is necessary to using more accurate also increasingly complex based on spherical surface The near field model of wavefront.Amplitude fading will occur in communication process for sound wave, and the amplitude fading factor is directly proportional to propagation distance. The distance of information source to each array element of sound-source follow-up terminal is different, therefore during acoustic wavefront each array element of arrival, amplitude is also different 's.Near field model and far field model it is most important difference lies in whether consider each array element of sound-source follow-up terminal because receive signal amplitude It is influenced caused by the difference of attenuation.For far field model, the range difference of information source to each array element is non-compared with entire propagation distance It is often small, it can be neglected;With reference to Fig. 5, Fig. 5 is sound-source follow-up terminal near field spherical wave model of the present invention, near field model, letter The range difference of source to each array element is larger compared with entire propagation distance, it is necessary to consider that each array element receives the amplitude difference of signal.
With reference to Fig. 4, Fig. 4 is the device structure schematic diagram for the hardware running environment that present invention method is related to.
Terminal of the embodiment of the present invention can be PC or smart mobile phone, tablet computer, E-book reader, MP3 (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3) Player, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard sound Frequency level 4) terminal devices such as player, pocket computer.
As shown in figure 4, the sound-source follow-up equipment can include:Processor 1001, such as CPU, memory 1005, communication are total Line 1002.Wherein, communication bus 1002 is used to implement the connection communication between processor 1001 and memory 1005.Memory 1005 can be high-speed RAM memory or the memory (non-volatile memory) of stabilization, such as disk are deposited Reservoir.Memory 1005 optionally can also be the storage device independently of aforementioned processor 1001.
Optionally, which can also include user interface, network interface, camera, RF (Radio Frequency, radio frequency) circuit, sensor, voicefrequency circuit, WiFi module etc..User interface can include display screen (Display), input unit such as keyboard (Keyboard), optional user interface can also include wireline interface, the nothing of standard Line interface.Network interface can optionally include standard wireline interface and wireless interface (such as WI-FI interfaces).
It will be understood by those skilled in the art that the sound-source follow-up device structure shown in Fig. 4 is not formed to sound-source follow-up The restriction of equipment can include either combining certain components or different component cloth than illustrating more or fewer components It puts.
As shown in figure 4, it can lead to as in a kind of memory 1005 of computer storage media including operating system, network Believe module and sound-source follow-up program.Operating system be management and control sound-source follow-up device hardware and software resource program, Support the operation of sound-source follow-up program and other softwares and/or program.Network communication module is used to implement in memory 1005 It communication between each component in portion and communicates between hardware and softwares other in sound-source follow-up equipment.
In sound-source follow-up equipment shown in Fig. 4, processor 1001 chases after for performing the sound source stored in memory 1005 Track program realizes following steps:
Obtain energy threshold and zero-crossing rate threshold value;
According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;
Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
According to sound bearing information, the sound collection direction of terminal is determined.
Further, it is described to be wrapped according to energy threshold and zero-crossing rate threshold test and the step of acquiring burst audio signal It includes:
It obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than the live sound of zero-crossing rate threshold value Frequency signal is set as burst audio signal.
Further, described pair of burst audio signal parses, to obtain the sound bearing information of burst audio signal The step of include:
The maximal audio signal of energy value maximum in all burst audio signals is obtained, and maximum sound is determined according to energy value The time delay value of frequency signal;
Time frequency point all in burst audio signal is obtained according to signal time delay value;
All time frequency points are subjected to clustering processing, to obtain sound bearing information.
Further, described the step of all time frequency points are carried out clustering processing, includes:
Noise reduction process is carried out to all time frequency points, to get noise reduction time frequency point;
All noise reduction time frequency points are subjected to clustering processing, to obtain sound bearing information.
Further, described according to sound bearing information, the step of sound collection direction for determining terminal, includes:
When detecting multi-acoustical azimuth information, the beam energy of each sound bearing information is obtained;
The direction of the sound bearing information of beam energy maximum is determined as to the sound collection direction of terminal.
Further, described the step of obtaining energy threshold and zero-crossing rate threshold value, includes:
Sample audio signal in default acquisition range is acquired according to default test condition;
It is calculated according to sample audio signal, to obtain energy threshold and zero-crossing rate threshold value.
The specific embodiment of sound-source follow-up equipment of the present invention and above-mentioned each embodiment of sound-source follow-up method are essentially identical, This is repeated no more.
The present invention also provides a kind of computer readable storage medium, there are one the computer-readable recording medium storages Either more than one program the one or more programs can also be performed by one or more than one processor with For:
Obtain energy threshold and zero-crossing rate threshold value;
According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;
Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
According to sound bearing information, the sound collection direction of terminal is determined.
Computer readable storage medium specific embodiment of the present invention and the basic phase of above-mentioned each embodiment of sound-source follow-up method Together, details are not described herein.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements not only include those elements, and And it further includes other elements that are not explicitly listed or further includes intrinsic for this process, method, article or device institute Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including this Also there are other identical elements in the process of element, method, article or device.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), used including some instructions so that a station terminal equipment (can be mobile phone, computer takes Be engaged in device, air conditioner or the network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

  1. A kind of 1. sound-source follow-up method, which is characterized in that the sound-source follow-up method is applied to sound-source follow-up terminal, the sound source Method for tracing includes:
    Obtain energy threshold and zero-crossing rate threshold value;
    According to energy threshold and zero-crossing rate threshold test and acquire burst audio signal;
    Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
    According to sound bearing information, the sound collection direction of terminal is determined.
  2. 2. sound-source follow-up method as described in claim 1, which is characterized in that described to be examined according to energy threshold and zero-crossing rate threshold value It surveys and includes the step of acquiring burst audio signal:
    It obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
    Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than the live audio letter of zero-crossing rate threshold value Number it is set as burst audio signal.
  3. 3. sound-source follow-up method as claimed in claim 2, which is characterized in that described pair of burst audio signal parses, with The step of sound bearing information for obtaining burst audio signal, includes:
    The maximal audio signal of energy value maximum in all burst audio signals is obtained, and according to energy value and true according to energy value Determine the signal time delay value of maximal audio signal;
    Time frequency point all in burst audio signal is obtained according to signal time delay value;
    All time frequency points are subjected to clustering processing, to obtain sound bearing information.
  4. 4. sound-source follow-up method as claimed in claim 3, which is characterized in that it is described that all time frequency points are subjected to clustering processing, Included the step of sound bearing information with obtaining:
    Noise reduction process is carried out to all time frequency points, to get noise reduction time frequency point;
    All noise reduction time frequency points are subjected to clustering processing, to obtain sound bearing information.
  5. 5. sound-source follow-up method as claimed in claim 4, which is characterized in that it is described according to sound bearing information, determine terminal Sound collection direction the step of include:
    When detecting multi-acoustical azimuth information, the beam energy of each sound bearing information is obtained;
    The direction of the sound bearing information of beam energy maximum is determined as to the sound collection direction of terminal.
  6. 6. sound-source follow-up method as described in claim 1, which is characterized in that energy threshold and the zero-crossing rate threshold value of obtaining Step includes:
    Sample audio signal in default acquisition range is acquired according to default test condition;
    It is calculated according to sample audio signal, to obtain energy threshold and zero-crossing rate threshold value.
  7. 7. a kind of sound-source follow-up equipment, which is characterized in that the sound-source follow-up equipment includes:Memory, processor, communication bus And be stored in the sound-source follow-up program on the memory, when the sound-source follow-up program is performed by the processor realize with Lower step:
    Obtain energy threshold and zero-crossing rate threshold value;
    According to energy threshold and zero-crossing rate threshold test and obtain burst audio signal;
    Burst audio signal is parsed, to obtain the sound bearing information of burst audio signal;
    According to sound bearing information, the sound collection direction of terminal is determined.
  8. 8. sound-source follow-up equipment as claimed in claim 7, which is characterized in that the sound-source follow-up program is held by the processor Following steps are also realized during row:
    It obtains live audio signal and parses, to obtain the energy value of live audio signal and zero-crossing rate;
    Energy value in all live audio signals is more than energy threshold, and zero-crossing rate is more than the live audio letter of zero-crossing rate threshold value Number it is set as burst audio signal.
  9. 9. sound-source follow-up equipment as claimed in claim 7, which is characterized in that the sound-source follow-up program is held by the processor The step of sound-source follow-up method as described in any one of claim 3 to 6 is also realized during row.
  10. 10. a kind of computer readable storage medium, which is characterized in that sound source is stored on the computer readable storage medium and is chased after Track program realizes such as sound-source follow-up according to any one of claims 1 to 6 when the sound-source follow-up program is executed by processor The step of method.
CN201711416776.1A 2017-12-22 2017-12-22 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium Pending CN108152788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711416776.1A CN108152788A (en) 2017-12-22 2017-12-22 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711416776.1A CN108152788A (en) 2017-12-22 2017-12-22 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN108152788A true CN108152788A (en) 2018-06-12

Family

ID=62465492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711416776.1A Pending CN108152788A (en) 2017-12-22 2017-12-22 Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108152788A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192219A (en) * 2018-09-11 2019-01-11 四川长虹电器股份有限公司 The method for improving microphone array far field pickup based on keyword
CN109270493A (en) * 2018-10-16 2019-01-25 苏州思必驰信息科技有限公司 Sound localization method and device
CN109709518A (en) * 2018-12-25 2019-05-03 北京猎户星空科技有限公司 Sound localization method, device, smart machine and storage medium
CN110335313A (en) * 2019-06-17 2019-10-15 腾讯科技(深圳)有限公司 Audio collecting device localization method and device, method for distinguishing speek person and system
CN110797045A (en) * 2018-08-01 2020-02-14 北京京东尚科信息技术有限公司 Sound processing method, system, electronic device and computer readable medium
CN111640437A (en) * 2020-05-25 2020-09-08 中国科学院空间应用工程与技术中心 Voiceprint recognition method and system based on deep learning
CN112533070A (en) * 2020-11-18 2021-03-19 深圳Tcl新技术有限公司 Video sound and picture adjusting method, terminal and computer readable storage medium
CN113223548A (en) * 2021-05-07 2021-08-06 北京小米移动软件有限公司 Sound source positioning method and device
CN113542863A (en) * 2020-04-14 2021-10-22 深圳Tcl数字技术有限公司 Sound processing method, storage medium and smart television

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device
CN104538041A (en) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 Method and system for detecting abnormal sounds
CN105403860A (en) * 2014-08-19 2016-03-16 中国科学院声学研究所 Multi-sparse-sound-source positioning method based on predomination correlation
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN106371057A (en) * 2016-09-07 2017-02-01 北京声智科技有限公司 Voice source direction finding method and apparatus
CN106960672A (en) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 The bandwidth expanding method and device of a kind of stereo audio

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102854494A (en) * 2012-08-08 2013-01-02 Tcl集团股份有限公司 Sound source locating method and device
CN105403860A (en) * 2014-08-19 2016-03-16 中国科学院声学研究所 Multi-sparse-sound-source positioning method based on predomination correlation
CN104538041A (en) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 Method and system for detecting abnormal sounds
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN106371057A (en) * 2016-09-07 2017-02-01 北京声智科技有限公司 Voice source direction finding method and apparatus
CN106960672A (en) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 The bandwidth expanding method and device of a kind of stereo audio

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
代勇等: "基于时频域的具有延迟的欠定盲分离", 《四川大学学报 (工程科学版)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797045A (en) * 2018-08-01 2020-02-14 北京京东尚科信息技术有限公司 Sound processing method, system, electronic device and computer readable medium
CN109192219A (en) * 2018-09-11 2019-01-11 四川长虹电器股份有限公司 The method for improving microphone array far field pickup based on keyword
CN109192219B (en) * 2018-09-11 2021-12-17 四川长虹电器股份有限公司 Method for improving far-field pickup of microphone array based on keywords
CN109270493A (en) * 2018-10-16 2019-01-25 苏州思必驰信息科技有限公司 Sound localization method and device
CN109709518A (en) * 2018-12-25 2019-05-03 北京猎户星空科技有限公司 Sound localization method, device, smart machine and storage medium
CN109709518B (en) * 2018-12-25 2021-07-20 北京猎户星空科技有限公司 Sound source positioning method and device, intelligent equipment and storage medium
CN110335313A (en) * 2019-06-17 2019-10-15 腾讯科技(深圳)有限公司 Audio collecting device localization method and device, method for distinguishing speek person and system
CN110335313B (en) * 2019-06-17 2022-12-09 腾讯科技(深圳)有限公司 Audio acquisition equipment positioning method and device and speaker identification method and system
CN113542863A (en) * 2020-04-14 2021-10-22 深圳Tcl数字技术有限公司 Sound processing method, storage medium and smart television
CN111640437A (en) * 2020-05-25 2020-09-08 中国科学院空间应用工程与技术中心 Voiceprint recognition method and system based on deep learning
CN112533070A (en) * 2020-11-18 2021-03-19 深圳Tcl新技术有限公司 Video sound and picture adjusting method, terminal and computer readable storage medium
CN112533070B (en) * 2020-11-18 2024-02-06 深圳Tcl新技术有限公司 Video sound and picture adjusting method, terminal and computer readable storage medium
CN113223548A (en) * 2021-05-07 2021-08-06 北京小米移动软件有限公司 Sound source positioning method and device

Similar Documents

Publication Publication Date Title
CN108152788A (en) Sound-source follow-up method, sound-source follow-up equipment and computer readable storage medium
US11620983B2 (en) Speech recognition method, device, and computer-readable storage medium
US20210217433A1 (en) Voice processing method and apparatus, and device
US20160187453A1 (en) Method and device for a mobile terminal to locate a sound source
CN107316651B (en) Audio processing method and device based on microphone
CN108366216A (en) TV news recording, record and transmission method, device and server
CN112751648B (en) Packet loss data recovery method, related device, equipment and storage medium
CN105118522A (en) Noise detection method and device
CN107580155B (en) Network telephone quality determination method, network telephone quality determination device, computer equipment and storage medium
CN105719644A (en) Method and device for adaptively adjusting voice recognition rate
CN107578770A (en) Networking telephone audio recognition method, device, computer equipment and storage medium
CN110505332A (en) A kind of noise-reduction method, device, mobile terminal and storage medium
WO2016187910A1 (en) Voice-to-text conversion method and device, and storage medium
CN110364156A (en) Voice interactive method, system, terminal and readable storage medium storing program for executing
CN108010539A (en) A kind of speech quality assessment method and device based on voice activation detection
CN105872205A (en) Information processing method and device
CN109151789A (en) Interpretation method, device, system and bluetooth headset
CN109361995A (en) A kind of volume adjusting method of electrical equipment, device, electrical equipment and medium
CN111868823A (en) Sound source separation method, device and equipment
CN114067822A (en) Call audio processing method and device, computer equipment and storage medium
CN109031201A (en) The voice localization method and device of Behavior-based control identification
CN109994129A (en) Speech processing system, method and apparatus
CN110364176A (en) Audio signal processing method and device
CN113053365B (en) Voice separation method, device, equipment and storage medium
CN204117590U (en) Voice collecting denoising device and voice quality assessment system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180612