CN102625006B

CN102625006B - Method and system for synchronization and alignment of echo cancellation data and audio communication equipment

Info

Publication number: CN102625006B
Application number: CN201110033889XA
Authority: CN
Inventors: 邹连平; 张海东; 陈剑勇
Original assignee: Individual
Current assignee: Chen Jianyong
Priority date: 2011-01-31
Filing date: 2011-01-31
Publication date: 2013-12-04
Anticipated expiration: 2031-01-31
Also published as: CN102625006A

Abstract

The invention, which is suitable for the audio processing field, provides a method and a system for synchronization and alignment of echo cancellation data and audio communication equipment. The method comprises the following steps that: audio data played by audio playing device are written in a preset echo reference queue and the number of times of the continuous occurrence of phenomena that the frame number of the audio data in the echo reference queue is less than or equal to the preset minimum frame number or is greater than or equal to the preset maximum frame number is obtained; according to the preset average frame number of the audio data in the echo reference queue, the frame number of current audio data in the echo reference queue is dynamically adjusted; and the audio data in the echo reference queue and audio data with echo to be eliminated are sent to an echo elimination module, wherein the audio data with echo to be eliminated are collected by audio collection equipment. According to the invention, synchronization and alignment are carried out on the audio data with echo to be eliminated and reference audio data before echo elimination by dynamically adjusting the audio data in the echo reference queue, thereby reducing probability of mis-comparison of the audio data of the echo elimination module and improving echo elimination efficiency of the echo elimination module and the quality of voice conversation.

Description

A kind of echo is eliminated the synchronous alignment schemes of data, system and audio communication device

Technical field

The invention belongs to field of audio processing, relate in particular to a kind of echo and eliminate the synchronous alignment schemes of data, system and audio communication device.

Background technology

In recent years, along with the development of network technology, the networking telephone (Voice over Internet Protocol, VoIP) technology has broken through bottleneck, the cheap price with it, and high-quality communication effect has won liking of user.Yet, when call terminal is used sound equipment or loudspeaker to play the other side's sound, sound will be collected by microphone through the reflection of external environment condition, thereby forms acoustic echo, causes the reduction of speech quality.

Therefore, echo cancellation module is the indispensable module in audio session, if the effect that echo is eliminated is bad, will have a strong impact on speech quality, and whether the data of input echo cancellation module are synchronous, is the basis that guarantees that can echo cancellation onset.Yet prior art adopts the method for relatively simple buffered audio played data and audio collection data, can't carry out targetedly the data of buffering area, synchronously align comparatively accurately, directly by echo cancellation module, processed, greatly waste the resource of CPU, caused echo cancellation weak effect, echo cancellation performance unstable.

Summary of the invention

The purpose of the embodiment of the present invention is to provide a kind of echo to eliminate the synchronous alignment schemes of data, be intended to solve the method that adopts relatively simple buffered audio played data and audio collection data due to prior art, can't carry out targetedly the data of buffering area, synchronously align comparatively accurately, cause echo cancellation weak effect, the unsettled problem of echo cancellation performance.

The embodiment of the present invention is achieved in that the synchronous alignment schemes of a kind of echo elimination data, and described method comprises the steps:

The minimum time delay formed according to the echo of obtaining in advance, long delay and average delay, the preset echo of initialization is with reference to queue;

The voice data that audio-frequence player device is play is written to preset echo with reference to queue, obtains the number of times that described echo occurs continuously being less than or equal to default minimum frame number or is more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data;

When described number of times surpasses default number of times, the default average frame number according to described echo with reference to queue sound intermediate frequency data, dynamically adjust the frame number of described echo with reference to current voice data in queue;

Described echo is sent to echo cancellation module with reference to the voice data of the echo to be eliminated of the voice data in queue and audio collecting device collection;

Described when described number of times surpasses default number of times, the default average frame number according to described echo with reference to queue sound intermediate frequency data, dynamically adjust described echo and comprise the steps: with reference to the step of the frame number of current voice data in queue

When the number of times that minimum frame number occurs continuously being less than or equal to reference to the frame number of queue sound intermediate frequency data when echo surpasses default number of times, add empty frame to described echo with reference to queue, the average frame number that makes described echo equal to preset with reference to the frame number of current voice data in queue;

When the number of times that maximum frame number occurs continuously being more than or equal to reference to the frame number of queue sound intermediate frequency data when echo surpasses default number of times, remove described echo with reference to the some audio data frames in queue, the maximum frame number that makes described echo equal to preset with reference to the frame number of current voice data in queue.

Another purpose of the embodiment of the present invention is to provide a kind of echo to eliminate the synchronous alignment of data, and described system comprises:

Echo is with reference to the queue initialization unit, the minimum time delay formed for the echo according to obtaining in advance, long delay and average delay, and the preset echo of initialization is with reference to queue;

The number of times information acquisition unit, be written to preset echo with reference to queue for the voice data that audio-frequence player device is play, obtain the number of times that described echo occurs continuously being less than or equal to default minimum frame number or is more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data;

Echo is with reference to the regulator unit, while for the number of times obtained when described number of times information acquisition unit, surpassing default number of times, default average frame number according to echo with reference to queue sound intermediate frequency data, dynamically adjust the frame number of described echo with reference to current voice data in queue; And

Audio data transmission unit, for sending to echo cancellation module by echo with reference to the audio data frame of the echo to be eliminated of the voice data of queue and audio collecting device collection;

Described echo comprises with reference to the regulator unit:

The minimum frame adjustment unit, for the number of times that occurs continuously being less than or equal to minimum frame number when echo with reference to the frame number of queue sound intermediate frequency data during over default number of times, add empty frame to described echo with reference to queue, the average frame number that makes described echo equal to preset with reference to the frame number of current voice data in queue; And

The largest frames adjustment unit, for the number of times that occurs continuously being more than or equal to maximum frame number when echo with reference to the frame number of queue sound intermediate frequency data during over default number of times, remove described echo with reference to the some audio data frames in queue, the maximum frame number that makes described echo equal to preset with reference to the frame number of current voice data in queue.

Another purpose of the embodiment of the present invention is to provide a kind of audio communication device that comprises the synchronous alignment of above-mentioned echo elimination data.

When the embodiment of the present invention occurs that with reference to the frame number of queue sound intermediate frequency data the number of times that is less than or equal to default minimum frame number or is more than or equal to default maximum frame number surpasses default number of times echo continuously, default average frame number according to echo with reference to queue sound intermediate frequency Frame, dynamically adjust the frame number of echo with reference to current voice data in queue, echo is sent to echo cancellation module with reference to the voice data of the echo to be eliminated of the voice data in queue and audio collecting device collection, make the voice data of the echo to be eliminated that enters echo cancellation module be synchronizeed and align before echo is eliminated with the reference audio data of eliminating as echo, thereby reduce the mistake of echo cancellation module voice data relatively, improve the echo of echo cancellation module and eliminated efficiency, and the quality of voice call.

The accompanying drawing explanation

Fig. 1 is the realization flow figure that the echo that provides of first embodiment of the invention is eliminated the synchronous alignment schemes of data;

Fig. 2 is the realization flow figure that the echo that provides of second embodiment of the invention is eliminated the synchronous alignment schemes of data;

Fig. 3 is the instantiation figure that the echo that provides of second embodiment of the invention is eliminated the synchronous alignment of data;

Fig. 4 is the structure chart that the echo that provides of third embodiment of the invention is eliminated the synchronous alignment of data;

Fig. 5 is the structure chart that the echo that provides of third embodiment of the invention is eliminated the synchronous alignment of data;

Fig. 6 is the structure chart that the echo that provides of fourth embodiment of the invention is eliminated the synchronous alignment of data.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

The embodiment of the present invention provides a kind of echo to eliminate the synchronous alignment schemes of data, and described method comprises the steps:

Described echo is sent to echo cancellation module with reference to the voice data of the echo to be eliminated of the voice data in queue and audio collecting device collection.

The embodiment of the present invention also provides a kind of echo to eliminate the synchronous alignment of data, and described system comprises:

Audio data transmission unit, for sending to echo cancellation module by echo with reference to the audio data frame of the echo to be eliminated of the voice data of queue and audio collecting device collection.

The embodiment of the present invention also provides a kind of audio communication device that comprises the synchronous alignment of above-mentioned echo elimination data.

Below in conjunction with specific embodiment, specific implementation of the present invention is described in detail:

Embodiment mono-:

Because computer sound card is of a great variety, the degree difference that clock drift occurs, nondeterministic network communication delay in addition, the difference reflection of external environment condition to sound, and Windows itself is a multitask, non real-time operating system, cause echo cancellation module to be difficult to the voice data of Gather and input in real time, realize voice data and the synchronous alignment of the voice data that audio collecting device gathers and the elimination of echo of audio-frequence player device broadcasting.

In embodiments of the present invention, the number of times that occurs continuously being less than or equal to default minimum frame number or be more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data (audio data frame that namely audio-frequence player device is play) by obtaining echo, the nonsynchronous number of times of voice data namely gathered with audio collecting device, default average frame number according to echo with reference to queue sound intermediate frequency data, dynamically adjust the frame number of described echo with reference to current voice data in queue, realize the synchronous alignment of voice data and the voice data that audio collecting device gathers of audio-frequence player device broadcasting, thereby for echo is eliminated ready.

A kind of echo that Fig. 1 shows first embodiment of the invention to be provided is eliminated the realization flow of the synchronous alignment schemes of data, and details are as follows:

In step S101, the voice data that audio-frequence player device is play is written to preset echo with reference to queue, obtains the number of times that echo occurs continuously being less than or equal to default minimum frame number or is more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data.

In embodiments of the present invention, operating system is called the first thread (broadcasting thread) by the audio-frequence player device playing audio-fequency data, call the second thread (collecting thread) and gather voice data by audio collecting device, two threads are separate, simultaneously, preserve the voice data that audio-frequence player device is play, the reference audio data of eliminating as echo, voice data is organized with the form of frame.In specific implementation process, take the data structure of queue to carry out stores audio data, write voice data from tail of the queue, read voice data from team's head.Particularly, the voice data that echo is play for the storing audio playback equipment with reference to queue, when echo occurs being less than or equal to default minimum frame number with reference to the frame number of queue sound intermediate frequency data continuously, illustrate that collecting thread is operated system and frequently dispatches, be not scheduled relatively for a long time and play thread; When echo occurs being more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data continuously, illustrate that the broadcasting thread is operated system and frequently dispatches, and collecting thread is not scheduled relatively for a long time, now need echo is adjusted with reference to the frame number of queue sound intermediate frequency data.

Therefore, should carry out initialization to echo with reference to queue in advance, such as: arrange echo with reference to the minimum frame number of queue, maximum frame number, average frame number, echo be set with reference to the frame number of queue sound intermediate frequency data, default maximum frame number appears being more than or equal in the number of times of default minimum frame number and echo continuously number of times etc. with reference to the frame number of queue sound intermediate frequency data occur continuously being less than or equal to.

In step S102, when number of times surpasses default number of times, the default average frame number according to echo with reference to queue sound intermediate frequency data, dynamically adjust the frame number of echo with reference to current voice data in queue.

In embodiments of the present invention, can the number of times of default minimum frame number appear being less than or equal to continuously by counters count echo is set with reference to the frame number of queue sound intermediate frequency data, when the number of times that default minimum frame number occurs continuously being less than or equal to reference to the frame number of queue sound intermediate frequency data when echo surpasses default number of times, empty frame is inserted into to the tail of the queue of echo with reference to queue, make echo reach default average frame number with reference to the frame number of queue sound intermediate frequency data, and, by calculator zero setting, restart counting.

In addition, can the number of times of default maximum frame number appear being more than or equal to continuously by counters count echo is set with reference to the frame number of queue sound intermediate frequency data, when the number of times that default maximum frame number occurs continuously being more than or equal to reference to the frame number of queue sound intermediate frequency data when echo surpasses default number of times, echo is shifted out to queue with reference to some audio data frames of queue squadron head, making echo is default maximum frame number with reference to the frame number of queue sound intermediate frequency Frame, and, by calculator zero setting, restart counting.In specific implementation process, when echo with reference to the frame number of queue sound intermediate frequency data between default minimum frame number and maximum frame number the time, need not be adjusted it, by echo is sent to echo cancellation module with reference to the audio data frame of the echo to be eliminated of queue sound intermediate frequency Frame and audio collecting device collection, constantly consume echo with reference to the audio data frame in queue, thereby realize the automatic adjustment of echo with reference to queue sound intermediate frequency Frame.

In step S103, echo is sent to echo cancellation module with reference to the voice data of the echo to be eliminated of the voice data in queue and audio collecting device collection.

In embodiments of the present invention, after by step S102, echo being adjusted with reference to the audio data frame in queue, complete echo and align with the synchronizeing of voice data of the echo to be eliminated of audio collecting device collection with reference to queue sound intermediate frequency data, and the echo after synchronously aliging sends to echo cancellation module with reference to the audio data frame of the echo to be eliminated of queue sound intermediate frequency Frame and audio collecting device collection.In specific implementation process, echo cancellation module can adopt acoustic echo cancellation algorithm AEC equal echo elimination algorithm to carry out the elimination of echo.In embodiments of the present invention, before step S101, time delay between the voice data of echo to be eliminated that need to be by the collection of accurate estimation audio collecting device and the voice data that audio-frequence player device is play, thereby realize echo is carried out to initialization with reference to the parameter of queue, for the synchronous alignment of voice data provides foundation.

Embodiment bis-:

In echo cancellation process, need to pass to two audio signals of echo cancellation module: the voice data that the voice data of the echo to be eliminated of audio collecting device collection and audio-frequence player device are play, two voice datas must be synchronously good, could obtain echo eradicating efficacy preferably.Although synchronously refer to that there is time delay in two signals, time delay must be relatively fixing, will keep coherent on sequential, if it is not synchronous to send into two audio signals of echo cancellation module, two audio signal generation frame dislocation, just can't carry out the echo elimination.

In embodiments of the present invention, time delay between the voice data that the voice data of the echo to be eliminated by the collection of accurate estimation audio collecting device and audio-frequence player device are play, adjustment is input to the audio data frame of echo cancellation module, realize synchronous alignment between the two, improved the efficiency that echo is eliminated.

A kind of echo that Fig. 2 shows second embodiment of the invention to be provided is eliminated the realization flow of the synchronous alignment schemes of data, and details are as follows:

In embodiments of the present invention, for the time delay between the voice data of the echo to be eliminated of accurately estimating the audio collecting device collection and voice data that audio-frequence player device is play, can, by before voice call, calculate in advance the delayed data collected by audio collecting device from the voice data of audio-frequence player device broadcasting.In specific implementation process, set in advance the sample voice data, and by this voice data of audio-frequence player device loop play, this voice data of playing by the audio collecting device collection, as shown in Figure 3.

In step S201, the default voice data by the audio-frequence player device loop play.

In step S202, judge whether the number of times of audio-frequence player device playing audio-fequency data surpasses preset value, when the number of times of audio-frequence player device playing audio-fequency data surpasses preset value, execution step S206, otherwise execution step S203.

In embodiments of the present invention, in order to improve the real-time of voice communication, the data volume of the default voice data of playing should be too not large, and preset value is that the preset times of audio-frequence player device playing audio-fequency data should be not too many yet in addition, thereby reduces the complexity that time delay is calculated.

In step S203, by the voice data of this broadcasting of audio collecting device synchronous acquisition audio-frequence player device, the voice data that output gathers.

In embodiments of the present invention, when the number of times of audio-frequence player device playing audio-fequency data does not surpass preset value, by the voice data of this broadcasting of audio collecting device synchronous acquisition audio-frequence player device, the voice data that output gathers.

In step S204, the auto-correlation function value of audio data frame in presetting range in the voice data sound intermediate frequency Frame of this broadcasting of calculating audio-frequence player device and the voice data of collection.

In step S205, obtain and preserve the maximum in auto-correlation function value, and the delayed data of the voice data sound intermediate frequency Frame gathered while obtaining maximum.

Auto-correlation function is for characterizing a random process itself, at two T in the same time not ₁, T ₂State between degree of correlation.In embodiments of the present invention, the auto-correlation function value of audio data frame in presetting range in the voice data sound intermediate frequency Frame of this broadcasting of calculating audio-frequence player device and the voice data of collection, while in auto-correlation function value, obtaining maximum, temporal information by corresponding audio data frame in the voice data gathered, obtain and preserve the time slot that the voice data of this broadcasting is gathered by audio collecting device, the delay time information that echo forms.

In step S206, obtain minimum time delay, long delay and average delay in the temporal information of all preservations.

In embodiments of the present invention, when the number of times of audio-frequence player device playing audio-fequency data surpasses preset value, obtain the maximum delay in the delayed data of preservation, long delay and average delay.Wherein average delay has characterized in the voice data that audio-frequence player device plays each frame from being played to collected average delay information again, thereby provides the foundation for the elimination of follow-up echo.

In the invention process is fallen, the form that sets in advance the sample voice data of broadcasting can be the audio formats such as pcm, avi, wav, and its time length is t ₁Second, every t ₂Play once second, loop play n time, and by independently playing Thread control, it is play, and independently collecting thread gathers its sound played back, i.e. every t ₂Play a time span second is t ₁The sample voice data of second gathers t simultaneously ₂The echo voice data of the sample voice data of second.Mean to calculate sample point number in the sample voice data of use of auto-correlation function value with Count at every turn, the accuracy of the delayed data formed for the computational speed that improves auto-correlation function value and echo, can in the sample voice data, search the sampled point of energy value maximum, centered by this sampled point, expanding to total sample counts out as the Frame of Count, be made as sample frame, simultaneously, from the t gathered ₂Rise the starting position set in advance in the echo voice data of second, to comprise a Frame of the sample echo audio data sampling point that number is Count, be made as echo frame, echo frame and sample frame will be carried out to convolution, calculate both auto-correlation function values, computing formula is as follows:

R_{xy} = {&Integral;}_{t_{s}}^{t_{e}} x (t) y (τ - t) dt - - - (1)

Wherein, t _sFor the time started of sample frame, t _eFor the concluding time of sample frame, t is the temporal information that each sampled point is corresponding, x (t) is energy value corresponding to sampled point t in sample frame, τ for x (t) sample frame in carry out the side-play amount of sampled point in the echo frame of convolution, y (τ-t) for x (t) sample frame in carry out the energy value of sampled point in the echo frame of convolution, also can directly to the autocorrelation functional value, carry out normalization by following formula, obtain the auto-correlation function value after normalization:

ρ_{xy} = \frac{{&Integral;}_{t_{s}}^{t_{e}} x (t) y (τ - t) dt}{{[{&Integral;}_{t_{s}}^{t_{e}} x^{2} (t) dt {&Integral;}_{t_{s}}^{t_{e}} y^{2} (t) dt]}^{1 / 2}} - - - (2)

Echo frame is moved to a sample point backward, form a new audio data frame, according to above-mentioned formula (2), again calculate the auto-correlation function value with sample frame, repeat this process, and preserve the auto-correlation function value calculated, until the end position set in advance.In embodiments of the present invention, the starting position set in advance and end position can pass to according to the voice data of broadcasting audio collecting device after Ambient shortest time and maximum duration from audio-frequence player device arrange.

Finally, obtain the maximum in all auto-correlation function values of preservation, and the delayed data of the voice data sound intermediate frequency Frame (echo frame) gathered while obtaining maximum, it is side-play amount, thereby the voice data that obtains this broadcasting is collected time slot again, the delay time that echo forms.After the sample voice data that loop play is n time, obtain the delay time in each broadcasting of all preservations, thereby obtain minimum time delay, long delay and average delay, and then the synchronous alignment of eliminating data for echo provides foundation, echo, eliminate in the synchronous alignment procedure of data, the echo set in advance is respectively with reference to the minimum frame number of queue, maximum frame number and average frame number: the time span of minimum time delay/audio data frame that minimum frame number=echo forms; The time span of long delay/audio data frame that maximum frame number=echo forms; The time span of average frame number=average delay/audio data frame that echo forms, wherein the time span of audio data frame refers to the frame length of each participation calculating auto-correlation function value.

One of ordinary skill in the art will appreciate that all or part of step realized in above-described embodiment method is to come the hardware that instruction is relevant to complete by program, described program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk, CD etc.

Embodiment tri-:

A kind of echo that Fig. 4 shows third embodiment of the invention to be provided is eliminated the structure of the synchronous alignment of data, for convenience of explanation, only shows the part relevant to the embodiment of the present invention.

It can be the software unit that runs on audio communication device that this echo is eliminated the synchronous alignment of data, also can be used as independently plug-in unit and is integrated in these audio communication devices or runs in the application system of these audio communication devices, wherein:

The minimum time delay that echo forms according to the echo of obtaining in advance with reference to queue initialization unit 41, long delay and average delay, the preset echo of initialization is with reference to queue.

In embodiments of the present invention, should carry out initialization to echo with reference to queue in advance, such as: arrange echo with reference to the minimum frame number of queue, maximum frame number, average frame number, echo be set with reference to the frame number of queue sound intermediate frequency data, default maximum frame number appears being more than or equal in the number of times of default minimum frame number and echo continuously number of times etc. with reference to the frame number of queue sound intermediate frequency data occur continuously being less than or equal to.

The voice data that number of times information acquisition unit 42 is play audio-frequence player device is written to preset echo with reference to queue, obtains the number of times that echo occurs continuously being less than or equal to default minimum frame number or is more than or equal to default maximum frame number with reference to the frame number of queue sound intermediate frequency data.

When the number of times that echo is obtained when number of times information acquisition unit 42 with reference to regulator unit 43 surpasses default number of times, the default average frame number according to echo with reference to queue sound intermediate frequency data, dynamically adjust the frame number of echo with reference to current voice data in queue.

In addition, can the number of times of default maximum frame number appear being more than or equal to continuously by counters count echo is set with reference to the frame number of queue sound intermediate frequency data, when the number of times that default maximum frame number occurs continuously being more than or equal to reference to the frame number of queue sound intermediate frequency data when echo surpasses default number of times, echo is shifted out to queue with reference to some audio data frames of queue squadron head, making echo is default maximum frame number with reference to the frame number of queue sound intermediate frequency Frame, and, by calculator zero setting, restart counting.In specific implementation process, when echo with reference to the frame number of queue sound intermediate frequency data between default minimum frame number and maximum frame number the time, need not be adjusted it, by echo is sent to echo cancellation module with reference to the audio data frame of the echo to be eliminated of queue sound intermediate frequency Frame and audio collecting device collection, continuous consumes audio data frame, thus realize the automatic adjustment of echo with reference to queue sound intermediate frequency Frame.

Audio data transmission unit 44 sends to echo cancellation module by echo with reference to the audio data frame of the echo to be eliminated of the voice data in queue and audio collecting device collection.

In embodiments of the present invention, after echo is adjusted with reference to the audio data frame in queue, complete echo and align with the synchronizeing of voice data of the echo to be eliminated of audio collecting device collection with reference to queue sound intermediate frequency data, and the echo after synchronously aliging sends to echo cancellation module with reference to the audio data frame of the echo to be eliminated of queue sound intermediate frequency Frame and audio collecting device collection.In specific implementation process, echo cancellation module can adopt acoustic echo cancellation algorithm AEC equal echo elimination algorithm to carry out the elimination of echo.

In specific implementation process, echo specifically comprises with reference to regulator unit 43: minimum frame adjustment unit 431 and largest frames adjustment unit 432, as shown in Figure 5, wherein:

Minimum frame adjustment unit 431, for the number of times that occurs continuously being less than or equal to minimum frame number when echo with reference to the frame number of queue sound intermediate frequency data during over default number of times, add empty frame to echo with reference to queue, the average frame number that makes echo equal to preset with reference to the frame number of current voice data in queue; And

Largest frames adjustment unit 432, for the number of times that occurs continuously being more than or equal to maximum frame number when echo with reference to the frame number of queue sound intermediate frequency data during over default number of times, remove echo with reference to the some audio data frames in queue, the maximum frame number that makes echo equal to preset with reference to the frame number of current voice data in queue.

Embodiment tetra-:

In embodiments of the present invention, for the time delay between the voice data of the echo to be eliminated of accurately estimating the audio collecting device collection and voice data that audio-frequence player device is play, can, by before voice call, calculate in advance the delayed data collected by audio collecting device from the voice data of audio-frequence player device broadcasting.In specific implementation process, set in advance the sample voice data, and by this voice data of audio-frequence player device loop play, this voice data of playing by the audio collecting device collection.

A kind of echo that Fig. 6 shows fourth embodiment of the invention to be provided is eliminated the structure of the synchronous alignment of data, for convenience of explanation, only shows the part relevant to the embodiment of the present invention.

Broadcasting time judging unit 61, by the default voice data of audio-frequence player device loop play, judges whether the number of times of audio-frequence player device playing audio-fequency data surpasses preset value.

When voice data output unit 62 does not surpass preset value when the number of times of audio-frequence player device playing audio-fequency data, by the voice data of this broadcasting of audio collecting device synchronous acquisition audio-frequence player device, output gathers voice data.

The auto-correlation function value of audio data frame in presetting range in the voice data sound intermediate frequency Frame of functional value computing unit 63 these broadcastings of calculating audio-frequence player device and the voice data of collection.

The first delayed data acquiring unit 64 obtains and preserves the maximum in auto-correlation function value, and the delayed data of the voice data sound intermediate frequency Frame gathered while obtaining maximum.

When the second delayed data acquiring unit 65 surpasses preset value when the number of times of audio-frequence player device playing audio-fequency data, obtain minimum time delay in the temporal information of all preservations, long delay and average delay.

R_{xy} = {&Integral;}_{t_{s}}^{t_{e}} x (t) y (τ - t) dt - - - (1)

ρ_{xy} = \frac{{&Integral;}_{t_{s}}^{t_{e}} x (t) y (τ - t) dt}{{[{&Integral;}_{t_{s}}^{t_{e}} x^{2} (t) dt {&Integral;}_{t_{s}}^{t_{e}} y^{2} (t) dt]}^{1 / 2}} - - - (2)

Finally, obtain the maximum in all auto-correlation function values of preservation, and the delayed data of the voice data sound intermediate frequency Frame (echo frame) gathered while obtaining maximum, it is side-play amount, thereby the voice data that obtains this broadcasting is collected time slot again, the delay time that echo forms.After the sample voice data that loop play is n time, obtain minimum time delay, long delay and average delay in each broadcasting of all preservations, and then the synchronous alignment of eliminating data for echo provides foundation, echo, eliminate in the synchronous alignment procedure of data, the echo set in advance is respectively with reference to the minimum frame number of queue, maximum frame number and average frame number: the time span of minimum time delay/audio data frame that minimum frame number=echo forms; The time span of long delay/audio data frame that maximum frame number=echo forms; The time span of average frame number=average delay/audio data frame that echo forms, wherein the time span of audio data frame refers to the frame length of each participation calculating auto-correlation function value.

Said units can be for minimum time delay that accurately estimated echo forms, long delay and average delay, can be to run on the software unit that echo is eliminated the synchronous alignment of data, also can be used as independently plug-in unit and be integrated in the application system of the synchronous alignment of echo elimination data.

The embodiment of the present invention is accurately estimated the time delay between the voice data of echo to be eliminated of audio collecting device collection and voice data that audio-frequence player device is play in advance, thereby calculate the minimum frame number of echo with reference to queue, maximum frame number and average frame number, when echo occurs that with reference to the frame number of queue sound intermediate frequency data the number of times that is less than or equal to default minimum frame number or is more than or equal to default maximum frame number surpasses default number of times continuously, default average frame number according to echo with reference to queue sound intermediate frequency Frame, dynamically adjust the frame number of echo with reference to current voice data in queue, echo is sent to echo cancellation module with reference to the voice data of the echo to be eliminated of the voice data in queue and audio collecting device collection, make the voice data of the echo to be eliminated that enters echo cancellation module be synchronizeed and align before echo is eliminated with the reference audio data of eliminating as echo, thereby reduce the mistake of echo cancellation module voice data relatively, improve the echo of echo cancellation module and eliminated efficiency, and the quality of voice call.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims

1. an echo is eliminated the synchronous alignment schemes of data, it is characterized in that, described method comprises the steps:

2. the method for claim 1, is characterized in that, by following steps, obtains minimum time delay that described echo forms, long delay and average delay:

The default voice data by the audio-frequence player device loop play, judge whether the number of times of described audio-frequence player device playing audio data surpasses preset value;

When the number of times of described audio-frequence player device playing audio-fequency data does not surpass preset value, by the voice data of described this broadcasting of audio-frequence player device of audio collecting device synchronous acquisition, the voice data that output gathers;

Calculate in the voice data of the voice data sound intermediate frequency Frame of described this broadcasting of audio-frequence player device and collection the auto-correlation function value of audio data frame in presetting range;

Obtain and preserve the maximum in described auto-correlation function value, and the delayed data of the voice data sound intermediate frequency Frame gathered while obtaining described maximum;

When the number of times of described audio-frequence player device playing audio-fequency data surpasses preset value, obtain minimum time delay in the temporal information of all preservations, long delay and average delay.

3. method as claimed in claim 2, is characterized in that,

Described minimum frame number is: the time span of minimum time delay/audio data frame that the echo of obtaining forms;

Described maximum frame number is: the time span of long delay/audio data frame that the echo of obtaining forms;

Described average frame number is: the time span of average delay/audio data frame that the echo of obtaining forms.

4. an echo is eliminated the synchronous alignment of data, it is characterized in that, described system comprises:

Described echo comprises with reference to the regulator unit:

5. system as claimed in claim 4, is characterized in that, described system also comprises:

The broadcasting time judging unit, for by the default voice data of audio-frequence player device loop play, judge whether the number of times of described audio-frequence player device playing audio data surpasses preset value;

The voice data output unit, while for the number of times when the audio-frequence player device playing audio-fequency data, not surpassing preset value, by the voice data of this broadcasting of audio collecting device synchronous acquisition audio-frequence player device, output gathers voice data;

The functional value computing unit, for the auto-correlation function value of audio data frame in the voice data presetting range of the voice data sound intermediate frequency Frame that calculates this broadcasting of audio-frequence player device and collection;

The first delayed data acquiring unit, for obtaining and preserve the maximum of auto-correlation function value, and the delayed data of the voice data sound intermediate frequency Frame gathered while obtaining maximum; And

The second delayed data acquiring unit, while for the number of times when the audio-frequence player device playing audio-fequency data, surpassing preset value, obtain minimum time delay in the temporal information of all preservations, long delay and average delay.

6. an audio communication device, is characterized in that, described equipment comprises the synchronous alignment of the described echo elimination data of claim 4 or 5.