CN108965777A

CN108965777A - A kind of echo cancel method and device

Info

Publication number: CN108965777A
Application number: CN201710751905.6A
Authority: CN
Inventors: 刘宝臣; 韩杰; 杨春晖; 王艳辉
Original assignee: Beijing Visionvera International Information Technology Co Ltd
Current assignee: Hainan Shilian Communication Technology Co.,Ltd.
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2018-12-07
Anticipated expiration: 2037-08-28
Also published as: CN108965777B

Abstract

The embodiment of the invention provides a kind of echo cancel method and device, the method is networked applied to view, includes regarding networked server in the view networking, the first video conference terminal, and, the second video conference terminal；The described method includes: the first video conference terminal determines filter factor；Calculate the constant time lag between the second video conference terminal；Obtain the primary data amount of reference data buffer area；According to the filter factor and constant time lag, the primary data amount is adjusted to target data amount；The audio data that the view networked server is sent by downstream communications link is received, the audio data is acquired by second video conference terminal；According to the target data amount, echo cancellation operation is executed to the audio data.The embodiment of the present invention can satisfy requirement of the echo cancellation algorithm to the timing synchronization of audio data, and then realize the elimination of echo, improve the speech quality in video conference.

Description

A kind of echo cancel method and device

Technical field

The present invention relates to view networking technology fields, fill more particularly to a kind of echo cancel method and a kind of echo cancellor It sets.

Background technique

Video conference refers to that the people positioned at two or more places are handed over face-to-face by communication equipment and network A kind of conferencing form talked.In general, needing the video conference device using profession when carrying out video conference, set up special Video conferencing system.Using video conferencing system, participant can hear the sound in other meeting-place, see that other meeting-place scenes are joined Image, movement and the expression of meeting people, can also send electronic presentations content, participant is made to have feeling on the spot in person.

Currently, in order to improve the speech quality of video conference, in the audio that each meeting-place acquires and plays, Ke Yitong Cross the elimination that echo cancellation algorithm carries out echo.It but is non-reality for the acquisition of audio and broadcasting in video conferencing system When, and echo cancellation algorithm is more demanding to the timing synchronization of input audio stream, non real-time audio stream so that acquisition and The audio stream of broadcasting cannot keep coherent in timing, to lead to the problem of poor synchronization, influence speech quality.In addition, in sight When voice communication system in, due to the unstability of network, data packet may lose or due to excessive delay on network And it is abandoned by other processing links.So that the number that the played data amount from distal end may be generated less than proximal end recording acquisition According to amount.In the case, the problem of equally will also result in serious poor synchronization.And audio stream timing synchronization difference ask Topic, so that echo is difficult to be eliminated, seriously affects the quality of voice communication.

Summary of the invention

In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kind A kind of echo cancel method and a kind of corresponding echo cancelling device to solve the above problems.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of echo cancel method, the method is applied to view It networks, includes regarding networked server in the view networking, the first video conference terminal, and, the second video conference terminal；It is described Method includes:

First video conference terminal determines filter factor；

First video conference terminal calculates the constant time lag between the second video conference terminal；

First video conference terminal obtains the primary data amount of reference data buffer area；

First video conference terminal adjusts the primary data amount to mesh according to the filter factor and constant time lag Mark data volume；

First video conference terminal receives the audio number that the view networked server is sent by downstream communications link According to the audio data is acquired by second video conference terminal；

First video conference terminal executes echo cancellor behaviour according to the target data amount, to the audio data Make.

Optionally, first video conference terminal calculates the step of constant time lag between the second video conference terminal Include:

Empty the data volume of reference data buffer area；

Acquisition and broadcasting target audio data；

According to acquisition and the target audio data played, the first audio file and the second audio file are generated respectively；

Calculate the constant time lag between first audio file and the second audio file.

Optionally, first video conference terminal adjusts the initial number according to the filter factor and constant time lag Include: according to amount to the step of target data amount

Determine the corresponding work delay of the filter factor；

Calculate the time delayed difference value between the constant time lag and work delay；

According to the time delayed difference value, the primary data amount is adjusted to target data amount.

Optionally, described according to the time delayed difference value, adjusting the primary data amount to the step of target data amount includes:

When the time delayed difference value is greater than zero, the buffered data in the reference data buffer area, the data after making buffering It is equal to measure data volume corresponding with the time delayed difference value；

When the time delayed difference value is less than zero, partial data is abandoned, remaining data in the reference data buffer area are made It is equal to measure data volume corresponding with the time delayed difference value.

Optionally, first video conference terminal executes echo to the audio data according to the target data amount Eliminating the step of operating includes:

In playing audio data, local voice data is acquired；

The local voice data is transmitted to sef-adapting filter through the reference data buffer area, by described adaptive Filter carries out echo cancellation operation to the local voice data.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of echo cancelling device, described device is applied to view It networks, includes regarding networked server in the view networking, the first video conference terminal, and, the second video conference terminal；It is described Device includes:

Determining module, for determining the filter factor of the first video conference terminal；

Computing module prolongs for calculating the fixation between first video conference terminal and the second video conference terminal When；

Obtain module, the primary data amount of the reference data buffer area for obtaining first video conference terminal；

Module is adjusted, for adjusting the primary data amount to target data according to the filter factor and constant time lag Amount；

Receiving module, the audio data sent for receiving the view networked server by downstream communications link are described Audio data is acquired by second video conference terminal；

Execution module, for executing echo cancellation operation to the audio data according to the target data amount.

Optionally, the computing module includes:

Data volume empties submodule, for emptying the data volume of reference data buffer area；

The acquisition of target audio data plays submodule, for acquiring and playing target audio data；

Audio file generates submodule, for generating the first audio respectively according to acquisition and the target audio data played File and the second audio file；

Constant time lag computational submodule is prolonged for calculating the fixation between first audio file and the second audio file When.

Optionally, the adjustment module includes:

Work, which is delayed, determines submodule, for determining the corresponding work delay of the filter factor；

Time delayed difference value computational submodule, for calculating the time delayed difference value between the constant time lag and work delay；

Target data amount adjusting submodule, for adjusting the primary data amount to number of targets according to the time delayed difference value According to amount.

Optionally, the target data amount adjusting submodule includes:

Buffer cell, for when the time delayed difference value is greater than zero, the buffered data in the reference data buffer area to make Data volume data volume corresponding with the time delayed difference value after buffering is equal；

Discarding unit, for abandoning partial data, making the reference data buffer area when the time delayed difference value is less than zero Interior remaining data volume data volume corresponding with the time delayed difference value is equal.

Optionally, the execution module includes:

Local speech data-acquisition submodule, in playing audio data, acquiring local voice data；

Local speech data transmission module, for through the reference data buffer area by the local voice data transmission To sef-adapting filter, echo cancellation operation is carried out to the local voice data by the sef-adapting filter.

Compared with the background art, the embodiment of the present invention includes following advantages:

The embodiment of the present invention, the first video conference terminal pass through determine filter factor, and with the second video conference terminal Between constant time lag, the primary data amount in reference data buffer area can be adjusted to target data amount, thus receiving It is acquired to by the second video conference terminal, and after the audio data that view networked server is sent, it can be to the audio data Echo cancellation operation is executed, to eliminate echo.The present embodiment is by adjusting the data volume in reference data buffer area, so that system Constant time lag can be close to filter factor corresponding work delay, so that the timing between reference data and echo data can It realizes dynamic equilibrium, meets requirement of the echo cancellation algorithm to the timing synchronization of audio data, and then realize the elimination of echo, mention Speech quality in high video conference.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of echo cancel method embodiment one of the invention；

Fig. 2 is a kind of networking schematic diagram of view networking of the invention；

Fig. 3 is a kind of hardware structural diagram of node server of the invention；

Fig. 4 is a kind of hardware structural diagram of access switch of the invention；

Fig. 5 is the hardware structural diagram that a kind of Ethernet association of the invention turns gateway；

Fig. 6 is a kind of step flow chart of echo cancel method embodiment two of the invention；

Fig. 7 is a kind of structural block diagram of echo cancelling device embodiment of the invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.

Referring to Fig.1, a kind of step flow chart of echo cancel method embodiment one of the invention is shown, specifically can wrap Include following steps:

Step 101, the first video conference terminal determines filter factor；

It should be noted that this method can be applied to view networking.It is the important milestone of network Development depending on networking, is one It is a to can be realized HD video transmission, push numerous Internet applications to HD video, the aspectant network system of high definition.

Real-time high-definition video switching technology is used depending on networking, it can be such as high in a network platform by required service Clear video conference, Intellectualized monitoring analysis, emergency command, digital broadcast television, delay TV, the Web-based instruction, shows video monitoring Field live streaming, VOD program request, TV Mail, individual character records (PVR), Intranet (manages) channel by oneself, intelligent video Broadcast Control, information publication All be incorporated into a system platform etc. services such as tens of kinds of videos, voice, picture, text, communication, data, by TV or Computer realizes that high-definition quality video plays.

Embodiment in order to enable those skilled in the art to better understand the present invention makes a presentation to depending on networking first below.

Depending on networking, applied portion of techniques is as described below:

Network technology (Network Technology): the network technology innovation depending on networking improves traditional ethernet (Ethernet), with huge video flow potential on network.Different from simple network packet packet switch (Packet Switching) or lattice network exchanges (Circuit Switching), full using Packet Switching depending on networking technology Sufficient Streaming demand.Have flexible, the simple and low price of packet switch depending on networking technology, is provided simultaneously with the product of circuit switching Matter and safety assurance realize the seamless connection of the whole network switched virtual circuit and data format.

Switching technology (Switching Technology): asynchronous and packet switch two for regarding networking using Ethernet are excellent Point eliminates Ethernet defect under the premise of complete compatible, has an end-to-end seamless connection of the whole network, direct user terminal, directly Carry IP data packet.User data is not required to any format conversion in network-wide basis.It is the more advanced form of Ethernet depending on networking, It is a real-time exchange platform, can be realized the whole network large-scale high-definition realtime video transmission that current internet cannot achieve, it will Numerous network video applications push high Qinghua to, unitize.

Server technology (Server Technology): view networking is different with the server technology on unified video platform In traditional server, its streaming media be built upon it is connection-oriented on the basis of, data-handling capacity with Flow, communication time are unrelated, and single network layer can be transmitted comprising signaling and data.For voice and video business, Depending on the complexity many simpler than data processing networked and unified video platform Streaming Media is handled, efficiency is significantly than traditional server Improve hundred times or more.

Reservoir technology (Storage Technology): unify the ultrahigh speed reservoir technology of video platform to adapt to The media content of vast capacity and super-flow and use state-of-the-art real time operating system, by the program in server instruction Information MAP is to specific hard drive space, and media content is no longer pass through server, and moment is directly delivered to user terminal, user etc. To typical time less than 0.2 second.The sector distribution of optimization greatly reduces the mechanical movement of hard disc magnetic head tracking, resource consumption The 20% of the internet ad eundem IP is only accounted for, but generates the concurrent flow greater than 3 times of traditional disk array, overall efficiency promotes 10 times More than.

Network security technology (Network Security Technology): the structural design for regarding networking passes through every time Service the network that independent licence system, equipment and the modes such as user data is completely isolated have thoroughly eradicated puzzlement internet from structure Safety problem does not need antivirus applet, firewall generally, has prevented the attack of hacker and virus, provides for user structural Carefree secure network.

Service innovative technology (Service Innovation Technology): unified video platform is by business and transmits Be fused together, whether single user, private user or a network sum total, be all only primary automatic connection.With Family terminal, set-top box or PC are attached directly to unified video platform, obtain the multimedia video service of colourful various forms. Unified video platform substitutes traditional complicated applications with table schema using " menu type " and programs, and considerably less code can be used Complicated application can be realized, realize the new business innovation of " endless ".

Networking depending on networking is as described below:

It is a kind of central controlled network structure depending on networking, which can be Tree Network, Star network, ring network etc. class Type, but centralized control node is needed to control whole network in network on this basis.

As shown in Fig. 2, being a kind of networking schematic diagram of view networking of the invention.As shown in Figure 2, it is divided into access net depending on networking With Metropolitan Area Network (MAN) two parts.

The equipment of access mesh portions can be mainly divided into 3 classes: node server, access switch, terminal (including various machines Top box, encoding board, memory etc.).Node server is connected with access switch, and access switch can be with multiple terminal phases Even, and it can connect Ethernet.

Wherein, node server is the node that centralized control functions are played in access net, can control access switch and terminal. Node server can directly be connected with access switch, can also directly be connected with terminal.

As shown in figure 3, being a kind of hardware structural diagram of node server of the invention.Node server mainly includes Network Interface Module 301, switching engine module 302, CPU module 303, disk array module 304；

Wherein, Network Interface Module 301, the Bao Jun that CPU module 303, disk array module 304 are come in enter switching engine Module 302；Switching engine module 302 look into the operation of address table 305 to the packet come in, to obtain the navigation information of packet； And the packet is stored according to the navigation information of packet the queue of corresponding pack buffer 306；If the queue of pack buffer 306 is close It is full, then it abandons；All pack buffer queues of 302 poll of switching engine mould, are forwarded: 1) port if meeting the following conditions It is less than to send caching；2) the queue package counting facility is greater than zero.Disk array module 304 mainly realizes the control to hard disk, including The operation such as initialization, read-write to hard disk；CPU module 303 is mainly responsible between access switch, terminal (not shown) Protocol processes, to address table 305 (including descending protocol packet address table, uplink protocol package address table, data packet addressed table) Configuration, and, the configuration to disk array module 304.

As shown in figure 4, being a kind of hardware structural diagram of access switch of the invention.Access switch mainly includes Network Interface Module (downstream network interface module 401, uplink network interface module 402), switching engine module 403 and CPU mould Block 404；

Wherein, the packet (upstream data) that downstream network interface module 401 is come in enters packet detection module 405；Packet detection mould Whether mesh way address (DA), source address (SA), type of data packet and the packet length of the detection packet of block 405 meet the requirements, if met, It then distributes corresponding flow identifier (stream-id), and enters switching engine module 403, otherwise abandon；Uplink network interface mould The packet (downlink data) that block 402 is come in enters switching engine module 403；The data packet that CPU module 404 is come in enters switching engine Module 403；Switching engine module 403 look into the operation of address table 406 to the packet come in, to obtain the navigation information of packet； If the packet into switching engine module 403 is that downstream network interface is gone toward uplink network interface, in conjunction with flow identifier (stream-id) packet is stored in the queue of corresponding pack buffer 407；If the queue of the pack buffer 407 is close full, It abandons；If the packet into switching engine module 403 is not that downstream network interface is gone toward uplink network interface, according to packet Navigation information is stored in the data packet queue of corresponding pack buffer 407；If the queue of the pack buffer 407 is close full, Then abandon.

All pack buffer queues of 403 poll of switching engine module, are divided to two kinds of situations in embodiments of the present invention:

If the queue is that downstream network interface is gone toward uplink network interface, meets the following conditions and be forwarded: 1) It is less than that the port sends caching；2) the queue package counting facility is greater than zero；3) token that rate control module generates is obtained；

If the queue is not that downstream network interface is gone toward uplink network interface, meets the following conditions and is forwarded: 1) it is less than to send caching for the port；2) the queue package counting facility is greater than zero.

Rate control module 408 is configured by CPU module 404, to all downlink networks in programmable interval Interface generates token toward the pack buffer queue that uplink network interface is gone, to control the code rate of forwarded upstream.

CPU module 404 is mainly responsible for the protocol processes between node server, the configuration to address table 406, and, Configuration to rate control module 408.

The equipment of access mesh portions further includes that Ethernet association turns gateway.As shown in figure 5, being a kind of Ethernet association of the invention Turn the hardware structural diagram of gateway.It mainly includes Network Interface Module (downstream network interface module that Ethernet association, which turns gateway, 501, uplink network interface module 502), switching engine module 503, CPU module 504, packet detection module 505, code rate control mould Block 508, address table 506, pack buffer 507 and MAC adding module 509, MAC removing module 510.

Wherein, the data packet that downstream network interface module 501 is come in enters packet detection module 505；Packet detection module 505 is examined Ethernet mac DA, ethernet mac SA, Ethernet length or frame type, the view networking mesh way address of measured data packet DA, whether meet the requirements depending on networking source address SA, depending on networking data Packet type and packet length, corresponding stream is distributed if meeting Identifier (stream-id)；Then, MAC DA, MAC SA, length or frame type are subtracted by MAC removing module 510 (2byte), and enter corresponding receive and cache, otherwise abandon；

Downstream network interface module 501 detects the transmission caching of the port, according to the view of packet networking mesh if there is Bao Ze Address D A knows the ethernet mac DA of corresponding terminal, adds the ethernet mac DA of terminal, Ethernet assists the MAC for turning gateway SA, Ethernet length or frame type, and send.

The function that Ethernet association turns other modules in gateway is similar with access switch.

The terminal for accessing mesh portions mainly includes Network Interface Module, Service Processing Module and CPU module；For example, machine top Box mainly includes Network Interface Module, video/audio encoding and decoding engine modules, CPU module；Encoding board mainly includes network interface mould Block, video encoding engine modules, CPU module；Memory mainly includes Network Interface Module, CPU module and disk array mould Block.

Similar, the equipment of metropolitan area mesh portions can also be divided into 3 classes: metropolitan area server, node switch, node serve Device.Metropolitan area server is connected with node switch, and node switch can be connected with multiple node servers.Node switching owner It to include Network Interface Module, switching engine module and CPU module；Metropolitan area server mainly includes Network Interface Module, exchange Engine modules and CPU module are constituted.

Wherein, node server is the node server for accessing mesh portions, i.e. node server had both belonged to access wet end Point, and belong to metropolitan area mesh portions.

Metropolitan area server is the node that centralized control functions are played in Metropolitan Area Network (MAN), can control node switch and node serve Device.Metropolitan area server can be directly connected to node switch, can also be directly connected to node server.

It can be seen that be entirely a kind of central controlled network structure of layering depending on networking network, and node server and metropolitan area The network controlled under server can be the various structures such as tree-shaped, star-like, cyclic annular.

Visually claim, access mesh portions can form unified video platform (part in virtual coil), and multiple unified videos are flat Platform can form view networking；Each unified video platform can be interconnected by metropolitan area and wide area depending on networking.

It include access network data packet and Metropolitan Area Network (MAN) data packet depending on networking data packet.

Access net data packet mainly include following sections: destination address (DA), source address (SA), reserve bytes, payload(PDU)、CRC。

As shown in the table, the data packet for accessing net mainly includes following sections:

DA

SA

Reserved

Payload

CRC

Wherein, destination address (DA) is made of 8 bytes (byte), first character section indicate data packet type (such as Various protocol packages, multicast packet, unicast packet etc.), be up to 256 kinds of possibility, the second byte to the 6th byte is metropolitan area Net address, the seven, the 8th bytes are access net address.

Source address (SA) is also to be made of 8 bytes (byte), is defined identical as destination address (DA).

Reserve bytes are made of 2 bytes.

The part payload has different length according to the type of different datagrams, is if it is various protocol packages 64 bytes are 32+1024=1056 bytes if it is single group unicast packets words, are not restricted to above 2 kinds certainly.

CRC is made of 4 bytes, and calculation method follows the Ethernet CRC algorithm of standard.

The topology of Metropolitan Area Network (MAN) is pattern, may there is 2 kinds, connection even of more than two kinds, i.e. node switching between two equipment It can all can exceed that 2 kinds between machine and node server, node switch and node switch, node switch and node server Connection.But the metropolitan area net address of metropolitan area network equipment is uniquely, to close to accurately describe the connection between metropolitan area network equipment System introduces parameter in depending on networking: label uniquely to describe a metropolitan area network equipment.

In view networking, (Multi-Protocol Label Switch, multiprotocol label are handed over by the definition of label and MPLS Change) label definition it is similar, it is assumed that between equipment A and equipment B there are two connection, then data packet from equipment A to equipment B just There are 2 labels, data packet also there are 2 labels from equipment B to equipment A.Label is divided into label, outgoing label, it is assumed that data packet enters The label (entering label) of equipment A is 0x0000, and the label (outgoing label) when this data packet leaves equipment A may reform into 0x0001.The networking process of Metropolitan Area Network (MAN) is to enter network process under centralized control, also means that address distribution, the label of Metropolitan Area Network (MAN) Distribution be all to be dominated by metropolitan area server, node switch, node server be all passively execute, this point with The label distribution of MPLS is different, and the distribution of the label of MPLS is the result that interchanger, server are negotiated mutually.

As shown in the table, the data packet of Metropolitan Area Network (MAN) mainly includes following sections:

DA

SA

Reserved

Label

Payload

CRC

That is destination address (DA), source address (SA), reserve bytes (Reserved), label, payload (PDU), CRC.Its In, the format of label, which can refer to, such as gives a definition: label is 32bit, wherein high 16bit retains, only with low 16bit, its position Set is between the reserve bytes and payload of data packet.

It in embodiments of the present invention, may include regarding networked server depending on networking, the first video conference terminal, and, the Two video conference terminals.

First video conference terminal and the second video conference terminal can be set-top box (SetTopBox, STB), be one The equipment for connecting television set and outside source, the digital signal of compression can be changed into television content by it, and on a television set It shows.

In general, set-top box can connect camera and microphone, it is more for acquiring video data and audio data etc. Media data also can connect television set, for multi-medium datas such as playing video data and audio datas.

In the application scenarios such as video conference, the first video conference terminal and the second video conference terminal external signal each other Source, i.e. the first video conference terminal can acquire multi-medium data and by being sent to the second video conference end depending on networked server End plays the above-mentioned multi-medium data received by the second video conference terminal；Meanwhile second video conference terminal can also adopt Collect multi-medium data and by being sent to the first video conference terminal depending on networked server, is played and connect by the first video conference terminal The above-mentioned multi-medium data received.

In the present embodiment, audio data acquired with the second video conference terminal and by being sent to the depending on networked server It is introduced for one video conference terminal.It should be noted that in video conference scene, the first video conference terminal and The operation that two video conference terminals are carried out should be it is identical, i.e., it is whole to receive the second video conference in the first video conference terminal When holding the audio data sent and carrying out echo cancellation operation to above-mentioned audio data, the second video conference can also use this reality The method for applying example carries out echo cancellation operation to the audio data by the acquisition of the first video conference terminal received.

In embodiments of the present invention, before carrying out echo cancellation operation, the first video conference terminal can determine filter first Wave system number.The filter factor, which can refer in the first video conference terminal, actually makees echo cancellor behaviour to the audio data received The coefficient of the sef-adapting filter of work.

In the concrete realization, can fix the filter factor is a certain particular value, after determining filter factor, Ke Yixiang Determine the range of the work delay of sef-adapting filter corresponding with the filter factor with answering.Filter factor determines echo The convergence of elimination algorithm, in practical application, it is desirable that algorithm has fast convergence and stability, that is, requires filter factor can Fast convergence and the steady operation under this coefficient.

It should be noted that those skilled in the art can set the specific value of the filter factor according to actual needs Size, the embodiment of the present invention are not construed as limiting this.

Step 102, first video conference terminal calculates the constant time lag between the second video conference terminal；

In embodiments of the present invention, the first video conference terminal, the second video conference terminal, view networked server and its He can be constructed as a video conferencing system at equipment jointly.Between first video conference terminal and the second video conference terminal Constant time lag can refer to the constant time lag of current video conferencing system.

In the concrete realization, constant time lag can be real in the case where not making data buffering to the first video conference terminal When will acquisition and play audio data, save as audio file, analyzed to obtain by audio analysis tool.

Step 103, first video conference terminal obtains the primary data amount of reference data buffer area；

In embodiments of the present invention, primary data amount, which can refer to, is carrying out echo cancellor behaviour in the first video conference terminal Before work, the data volume that is buffered in reference data buffer area.

Step 104, first video conference terminal adjusts the initial number according to the filter factor and constant time lag According to amount to target data amount；

In general, the first video conference terminal will connect after the audio data for receiving the acquisition of the second video conference terminal While the audio data received is sent to sound card, it is also desirable to the audio data are sent to echo cancellation algorithm and referred to.It returns The echo cancellation operation that sound elimination algorithm carries out is the treatment process of adaptive-filtering.Echo cancellor is mainly supported using echo The method to disappear, that is, pass through the size of adaptive approach estimated echo data, then this estimated value is subtracted in receiving signal To offset echo.This requires reference data must shift to an earlier date to be reached in echo data.

Therefore, system delay can be changed by adjusting the size of primary data amount in reference data buffer area, makes system In constant time lag close to filter factor corresponding best effort delay, to meet above-mentioned requirements.

For example, the corresponding best effort delay of filter factor is 200ms, if system constant time lag is 300ms in order to make to be The constant time lag of system is delayed close to the algorithm of filter, can be mostly slow by the size of data volume in increase reference data buffer area Rush the data volume of 100ms.

Step 105, first video conference terminal receives the view networked server and is sent by downstream communications link Audio data, the audio data acquires by second video conference terminal；

In the concrete realization, in video conference, the video conference terminal of distal end, i.e. the second video conference terminal can be adopted Collect audio data, and be sent to view networked server by uplink communication links, receives the audio number depending on networked server According to rear, it is first determined then the destination address of the audio data is sent to the first video conference terminal by downstream communications link. First video conference terminal, i.e., local video conference terminal, can be to the audio datas after receiving above-mentioned audio data Make echo cancellation operation.

Step 106, first video conference terminal executes echo to the audio data according to the target data amount Eliminate operation.

In general, the audio data of distal end is by the sound card of local terminal after being played back, by echo path with dialect Sound is formed by data, the echo to be eliminated of the echo cancellation algorithm of instant sef-adapting filter by acquisition again.

For example, the first video conference terminal play the second video conference terminal send audio data and in local broadcasting Afterwards, the sound that loudspeaker plays is reflected by air borne or wall, can be passed to microphone again, and with dialect Sound is resurveyed, if being transferred to the second video conference terminal again, distal end can hear apparent echo, can be interfered Normal talking.Therefore, in order to improve speech quality when video conference, this portion of audio data should just be eliminated as far as possible.

In the concrete realization, the local voice data that can will be acquired when playing the audio data that distal end is sent, through joining It examines data buffer zone and is transmitted to sef-adapting filter, echo cancellation process is carried out by the sef-adapting filter, to eliminate echo.

In embodiments of the present invention, the first video conference terminal pass through determine filter factor, and with the second video conference Constant time lag between terminal can adjust the primary data amount in reference data buffer area to target data amount, thus It receives and is acquired by the second video conference terminal, and after the audio data that view networked server is sent, it can be to the audio Data execute echo cancellation operation, to eliminate echo.The present embodiment by adjusting the data volume in reference data buffer area so that The constant time lag of system can be close to the corresponding work delay of filter factor, thus the timing between reference data and echo data It can be realized dynamic equilibrium, meet requirement of the echo cancellation algorithm to the timing synchronization of audio data, and then realize disappearing for echo It removes, improves the speech quality in video conference.

Referring to Fig. 6, a kind of step flow chart of echo cancel method embodiment two of the invention is shown, specifically can wrap Include following steps:

Step 601, the first video conference terminal determines filter factor；

It should be noted that this method can be applied to view networking, this may include view networked server and view depending on networking Frequency conference terminal.Video conference terminal may include at least two, i.e. the first video conference terminal and the second video conference terminal.

One video conference terminal can acquire the multi-medium datas such as video data and audio data, by regarding the Internet services Above-mentioned multi-medium data is transmitted to another video conference terminal by device, and more matchmakers are played on received video conference terminal Volume data, to realize the real-time video meeting between at least two parties.

It in embodiments of the present invention, is local video conference terminal with the first video conference terminal, with the second video council Terminal is discussed to be introduced for the video conference terminal of distal end.That is, the first video conference terminal receives the second video conference end Hold the video data and multi-medium datas such as audio data and in local broadcasting of transmission, with realize local user and remote subscriber it Between video conference.Certainly, during video conference, local video conference terminal can also acquire local video in real time The multi-medium datas such as data and audio data and the video conference terminal broadcasting for being transmitted to distal end, bipartite operating process base This is consistent.

In general, the first video conference terminal is receiving the audio data of the second video conference terminal transmission and is passing through loudspeaking When device plays back, the sound of above-mentioned broadcasting is reflected by air borne or wall, can be passed to microphone again, and adjoint Local speech is resurveyed, if the local speech of acquisition is transferred to the second video conference terminal, remote subscriber can Hear obviously echo.Therefore, it in order to improve speech quality when video conference, needs to disappear to this partial echo It removes.

In embodiments of the present invention, before eliminating echo, the filter factor of sef-adapting filter can be determined first.In reality In border, the first video conference terminal is after receiving the audio data of distal end, by the way that the audio data is transmitted to adaptive filter Wave device carries out echo cancellation process to the audio data by echo cancellation algorithm by sef-adapting filter.

Step 602, the constant time lag between the second video conference terminal is calculated；

In embodiments of the present invention, the constant time lag between the first video conference terminal and the second video conference terminal, can To refer to the constant time lag of the video conferencing system collectively constituted by distinct devices such as above-mentioned multiple video conference terminals.

It in embodiments of the present invention, can be in the case where not making data buffering to the first video conference terminal, in real time By acquisition and the audio data played, audio file is saved as, is obtained by the audio file that audio analysis tool analysis is saved To the constant time lag.

In the concrete realization, the data volume of reference data buffer area can be emptied first, then acquire and play target sound Frequency evidence, and according to acquisition and the target audio data played, the first audio file and the second audio file are generated respectively, are passed through Audio analysis tool calculates the constant time lag between the first audio file and the second audio file, so that the fixation for obtaining system is prolonged When.

Step 603, the primary data amount of reference data buffer area is obtained；

Step 604, the corresponding work delay of the filter factor is determined；

In embodiments of the present invention, the work delay can refer to it is corresponding with the filter factor of sef-adapting filter most Good work delay.In general, after determining filter factor, it can obtain work delay.

Step 605, the time delayed difference value between the constant time lag and work delay is calculated；

For example, it is assumed that the constant time lag of system is 300ms, best effort delay corresponding with filter factor is 200ms, then Time delayed difference value between the two is 100ms.

Certainly, the constant time lag of system may also be less than the corresponding best effort delay of filter factor.For example, system is consolidated Fixed delay is 200ms, and best effort delay corresponding with filter factor is 250ms, then time delayed difference value between the two be- 50ms。

Step 606, according to the time delayed difference value, the primary data amount is adjusted to target data amount；

In embodiments of the present invention, the primary data amount in reference data buffer area is adjusted to target data amount, it can be with Change system delay, to make the constant time lag in system close to the corresponding best effort delay of filter factor.

Therefore, in the concrete realization, when time delayed difference value be greater than zero when, can in reference data buffer area buffered data, Data volume data volume corresponding with the time delayed difference value after making buffering is equal；And when time delayed difference value is less than zero, portion can be abandoned Divided data keeps remaining data volume data volume corresponding with the time delayed difference value in reference data buffer area equal.

It should be noted that the target data amount in buffer area adjusted can not data volume corresponding with time delayed difference value It is essentially equal, as long as and meeting with the time delayed difference value in a certain range.

Step 607, the audio data that the view networked server is sent by downstream communications link, the audio number are received It is acquired according to by second video conference terminal；

In the concrete realization, in video conference, the video conference terminal of distal end, i.e. the second video conference terminal can be adopted Collect audio data, and be sent to view networked server by uplink communication links, receives the audio number depending on networked server According to rear, it is first determined then the destination address of the audio data is sent to the first video conference terminal by downstream communications link. Step can be executed sequentially after receiving above-mentioned audio data in first video conference terminal, i.e., local video conference terminal 608 and step 609, echo cancellation operation is made to the audio data.

Step 608, in playing audio data, local voice data is acquired；

In embodiments of the present invention, the local voice data of acquisition is in the audio number for transmitting the second video conference terminal It when according to being played back by loudspeaker, being reflected by air borne or wall, being passed to microphone again, and with local speech The data resurveyed, the partial data are time that sef-adapting filter needs to eliminate when carrying out echo cancellation process Sound.

Step 609, the local voice data is transmitted to sef-adapting filter through the reference data buffer area, by institute It states sef-adapting filter and echo cancellation operation is carried out to the local voice data.

It in embodiments of the present invention, can be by local voice data through reference data buffer transfer to adaptive-filtering Device.Due to being adjusted to the data volume in reference data buffer area so that between reference data and echo data when Sequence can be realized synchronization.Therefore, echo cancellation process can be effectively performed in sef-adapting filter, to eliminate echo.

In order to make it easy to understand, being made a presentation below with a specific example to echo cancel method of the invention.

By taking a certain video conferencing system as an example.Firstly, the parameter in audio system is as follows:

Collection terminal, audio sample rate 32kHz, sampling precision 16bit, monophonic；

Play end, audio sample rate 32kHz, sampling precision 16bit, two-channel.

It since when carrying out echo cancellor, the parameter of reference data and acquisition data must be consistent, is turned by played data It is changed to and obtains after monophonic, i.e. audio sample rate 32kHz, sampling precision 16bit, monophonic.

The code rate of audio collection is 512kbps, and the data volume of 1ms is 64B；

The code rate that audio plays is 1024kbps, and the data volume of 1ms is 128B.

For the ease of compared with the delay requirement in echo cancellation algorithm, in the following description, usually by data volume Be converted to time quantum.

Secondly, clear caching system sound intermediate frequency acquisition and played:

Acquisition caching, that is, adopt and use, typically not greater than the minimum data amount of next stage required by task；

Caching is played, sound card caching is often referred to, for the timing synchronization of echo cancellor, can control the data of sound card caching Amount that is, between the data volume of 24ms to 48ms, minimizes delay jitter between 3kB to 6kB.

One, correctly the constant time lag in estimation current system working environment, the filter factor of selected echo cancellation algorithm are The reference time delay that some value, i.e. algorithm effectively work；

For example, being estimated under different operating environment by multiple samples, the fixation of the audio system of the video conference terminal Delay is 200ms to 300ms, and the coefficient that can select the filter in echo cancellation algorithm is some value, so that its effective work Making delay is 150ms to 250ms.

Two, by adjusting the data volume in reference data buffer area, calculate the delay in audio system close to echo cancellor The corresponding best effort delay of the coefficient of filter in method；

While the dual channel data of broadcasting is sent to sound card, replicates and be converted to mono data to be sent to reference data slow Rush area.It is assumed that the constant time lag of current system be 250ms, then by set the primary data amount in reference data buffer area as The data volume of 100ms.It there is no data, thus the number of reference data buffer area and acquisition data buffer zone in acquisition buffer area at this time The data volume for being 100ms according to amount difference meets algorithm and effectively works latency requirement so that system delay becomes 150ms.

It should be noted that the size of primary data amount is with system delay with correlation in reference data buffer area.Ginseng Echo cancellation algorithm should be admitted to prior to echo data by examining data, and the primary data amount of reference data buffer area is bigger, reference The timing that data are sent into algorithm more lags, then the timing difference between reference data and echo data is smaller, that is, is delayed smaller.By In the factors such as VFP voice-frequency dispatching, Network Packet Loss, Multi-channel audio sound mixing will cause played data cutout, reference data cannot supplement The data of situation, reference data buffer area are consumed, and delay always has the tendency that becoming larger.Thus, reference data buffer area Data volume setting principle be: biggish buffer area is set as far as possible so that echo cancellation algorithm work is in lesser delay section, It avoids causing algorithm to fail because data jitter time delay increases.Since the size of data volume in reference data buffer area is that dynamic is set Fixed, therefore it can be abstracted into a parameter in the application, for setting the size of buffer area primary data amount.

Three, a data synchronization mechanism is established, reference data and acquisition data is made to reach dynamic equilibrium under instant messaging.

Synchronous final purpose is that reference data and the delay of acquisition data is allowed to be delayed as close as possible to algorithm and keep steady It is fixed, to uniformly be handled by echo cancellation algorithm, to realize the elimination of echo.This just needs strict control audio system Links and external factor in system, the constant time lag including system operation, sound card play caching, and reference data caching is adopted Collect data buffer storage and Network Packet Loss etc..

It confirmed system constant time lag, securing algorithm delay, primary data amount in adjustment reference data buffer area After matching of the size to meet system delay and algorithm delay, that is, establish the primary condition of synchronization mechanism.

In practical applications, on the one hand, along with factors such as network jitter, audio data scheduling, Multi-channel audio sound mixings, broadcast It puts the case where data will appear cutout, reference data die-offs, and can seriously destroy the synchronous condition initially set up.In such case Under, it needs to be replenished in time played data, reference data buffer area can be usually filled into quiet data.On the other hand, due to The factors such as network delay or system congestion, have formerly been automatically replenished data, then and in a short time receive a large amount of data, make At redundancy, the synchronous condition initially set up can be equally destroyed.In this case, then need to abandon the data of redundancy.This is just needed A data balancing is established on the primary condition for establishing synchronization mechanism, establish reference data buffer area and need The data volume lower limit of supplementary data and the data volume upper limit for needing to abandon redundant data.The upper and lower bound refers to reference number According to the difference between the data volume in the data volume in buffer area and acquisition data buffer zone.This boundary and echo cancellation algorithm The performance of middle sef-adapting filter is related, i.e. the corresponding algorithm delay of the coefficient of filter and echo estimation range.

For example, the corresponding algorithm delay of the coefficient of filter is 200ms, echo estimation range is ± 50ms, corresponding to filter The wave device reference time delay that effectively works is then 150 to 250ms, and the difference of the data volume bound of reference data buffer area then can be The data volume of 100ms.

Therefore, can according to the data volume of lower and upper limit critical value, supplement or abandon data so that reference data with adopt Collection data reach dynamic equilibrium under instant messaging, realize timing synchronization.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.

Referring to Fig. 7, a kind of structural block diagram of echo cancelling device embodiment of the invention is shown, described device can answer It networks for regarding, described depending on may include regarding networked server in networking, the first video conference terminal, and, the second video council Terminal is discussed, described device can specifically include following module:

Determining module 701, for determining the filter factor of the first video conference terminal；

Computing module 702, for calculating the fixation between first video conference terminal and the second video conference terminal Delay；

Obtain module 703, the primary data amount of the reference data buffer area for obtaining first video conference terminal；

Module 704 is adjusted, for adjusting the primary data amount to number of targets according to the filter factor and constant time lag According to amount；

Receiving module 705, the audio data sent for receiving the view networked server by downstream communications link, institute Stating audio data can be acquired by second video conference terminal；

Execution module 706, for executing echo cancellation operation to the audio data according to the target data amount.

In embodiments of the present invention, the computing module 702 can specifically include following submodule:

In embodiments of the present invention, the adjustment module 704 can specifically include following submodule:

In embodiments of the present invention, the target data amount adjusting submodule can specifically include such as lower unit:

In embodiments of the present invention, the execution module 706 can specifically include following submodule:

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.

Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of echo cancel method provided by the present invention and a kind of echo cancelling device, detailed Jie has been carried out It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only It is to be used to help understand method and its core concept of the invention；At the same time, for those skilled in the art, according to this hair Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is limitation of the present invention.

Claims

1. a kind of echo cancel method, which is characterized in that the method is applied to view networking, includes view networking in the view networking Server, the first video conference terminal, and, the second video conference terminal；The described method includes:

First video conference terminal determines filter factor；

First video conference terminal adjusts the primary data amount to number of targets according to the filter factor and constant time lag According to amount；

First video conference terminal receives the audio data that the view networked server is sent by downstream communications link, institute Audio data is stated to be acquired by second video conference terminal；

First video conference terminal executes echo cancellation operation according to the target data amount, to the audio data.

2. the method according to claim 1, wherein first video conference terminal calculates and the second video council View terminal between constant time lag the step of include:

Empty the data volume of reference data buffer area；

Acquisition and broadcasting target audio data；

3. method according to claim 1 or 2, which is characterized in that first video conference terminal is according to the filtering Coefficient and constant time lag, adjusting the primary data amount to the step of target data amount includes:

Determine the corresponding work delay of the filter factor；

4. according to the method described in claim 3, adjusting the initial number it is characterized in that, described according to the time delayed difference value Include: according to amount to the step of target data amount

When the time delayed difference value be greater than zero when, the buffered data in the reference data buffer area, make buffering after data volume with The corresponding data volume of the time delayed difference value is equal；

When the time delayed difference value is less than zero, abandon partial data, make in the reference data buffer area remaining data volume with The corresponding data volume of the time delayed difference value is equal.

5. the method according to claim 1, wherein first video conference terminal is according to the target data It measures, includes: to the step of audio data execution echo cancellation operation

In playing audio data, local voice data is acquired；

The local voice data is transmitted to sef-adapting filter through the reference data buffer area, by the adaptive-filtering Device carries out echo cancellation operation to the local voice data.

6. a kind of echo cancelling device, which is characterized in that described device is applied to view networking, includes view networking in the view networking Server, the first video conference terminal, and, the second video conference terminal；Described device includes:

Computing module, for calculating the constant time lag between first video conference terminal and the second video conference terminal；

Module is adjusted, for adjusting the primary data amount to target data amount according to the filter factor and constant time lag；

Receiving module, the audio data sent for receiving the view networked server by downstream communications link, the audio Data are acquired by second video conference terminal；

7. device according to claim 6, which is characterized in that the computing module includes:

Audio file generates submodule, for generating the first audio file respectively according to acquisition and the target audio data played With the second audio file；

Constant time lag computational submodule, for calculating the constant time lag between first audio file and the second audio file.

8. device according to claim 6 or 7, which is characterized in that the adjustment module includes:

Target data amount adjusting submodule, for adjusting the primary data amount to target data amount according to the time delayed difference value.

9. device according to claim 8, which is characterized in that the target data amount adjusting submodule includes:

Buffer cell, for when the time delayed difference value is greater than zero, the buffered data in the reference data buffer area to make to buffer The data volume corresponding with the time delayed difference value of data volume afterwards is equal；

Discarding unit makes to remain in the reference data buffer area for abandoning partial data when the time delayed difference value is less than zero Remaining data volume data volume corresponding with the time delayed difference value is equal.

10. device according to claim 6, which is characterized in that the execution module includes:

Local speech data transmission module, for being transmitted to the local voice data certainly through the reference data buffer area Adaptive filter carries out echo cancellation operation to the local voice data by the sef-adapting filter.