CN106534762B

CN106534762B - low-delay distributed audio processing method and system

Info

Publication number: CN106534762B
Application number: CN201611009743.0A
Authority: CN
Inventors: 许裕锋; 李大强
Original assignee: Ifreecomm Technology Co Ltd
Current assignee: Guangxi Guangtou Jiefei Technology Co ltd
Priority date: 2016-11-16
Filing date: 2016-11-16
Publication date: 2019-12-13
Anticipated expiration: 2036-11-16
Also published as: CN106534762A

Abstract

The invention discloses a low-delay distributed audio processing method and a system, wherein the method comprises the following steps: appointing any one processor from the two processors as a main processor, and the other processor as a secondary processor; receiving the audio signal transmitted by the FPGA; on the master processor, decoding a far-end audio signal in the audio signals to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data, and on the slave processor, performing audio processing on a local audio signal in the audio signals to generate second audio data; and transmitting the first audio processing data and the second audio processing data to the FPGA for audio mixing processing, and distributing the first audio processing data and the second audio processing data subjected to audio mixing processing by the FPGA. The technical scheme of the invention can reduce the audio processing delay, improve the system stability and present better use experience for users.

Description

low-delay distributed audio processing method and system

Technical Field

The invention relates to the technical field of video communication, in particular to a low-delay distributed audio processing method and system.

background

with the development of multimedia technology and the improvement of national economic level, a professional video conference system has vigorous requirements on traditional application, new requirements on industries such as remote education, digital court and the like, and higher requirements on professional video conference terminals, and users hope to provide more multi-channel audio and video processing capability to meet various application scenes of the users. Meanwhile, since the video conference system belongs to a highly interactive application, the quality of multimedia processing is crucial to the user experience. For continuous data streams such as audio and video, not only real-time transmission is required, but also good voice quality and video definition are ensured, and synchronization of the audio and video is also ensured. In a video conference system in reality, the processing of audio is particularly critical to ensure the use effect of products, compared with video.

Because the audio frequency application scene is complicated among the videoconference relatively, contains local sound coding and sends the far-end, and the local sound is local broadcast, plays far-end sound through the player, and the use scenes such as local broadcast after far-end sound and local sound audio mixing need carry out audio processing through audio algorithm in order to guarantee the audio frequency effect, include: echo cancellation, background noise suppression, howling suppression, automatic gain compensation, etc. A traditional single video conference terminal uses one processor to process multi-channel audio algorithm, but with the fact that the number of audio access paths is increased to dozens of paths, different requirements of various audio algorithm processing capacities exist, one CPU (central processing unit) cannot process audio and other services at the same time, but the same video conference terminal is needed, and a plurality of processors are arranged inside the video conference terminal to perform cooperative operation to share the CPU performance requirements of audio processing, so that audio delay is reduced, and better effect experience is provided for users. How to adopt a plurality of processors to carry out cooperative operation so as to lead the video processing effect to be better becomes the technical problem to be solved by the scheme.

disclosure of Invention

To solve at least one of the above technical problems, it is a primary object of the present invention to provide a low-latency distributed audio processing method and apparatus.

In order to achieve the purpose, the invention adopts a technical scheme that: a low-delay distributed audio processing method is provided, which comprises the following steps:

Appointing any one processor from the two processors as a main processor, and the other processor as a secondary processor;

Receiving audio signals transmitted by the FPGA, wherein the audio signals comprise remote audio signals and local audio signals;

On the master processor, decoding a far-end audio signal in the audio signals to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data, and on the slave processor, performing audio processing on a local audio signal in the audio signals to generate second audio data;

And transmitting the first audio processing data and the second audio processing data to the FPGA for audio mixing processing, and distributing the first audio processing data and the second audio processing data subjected to audio mixing processing by the FPGA.

preferably, after the step of assigning any one of the two slave processors as a master processor and the other processor as a slave processor, the method further comprises:

Providing synchronized clock signals to the master and slave processors.

Preferably, the step of performing sound effect processing on the decoded audio data to generate first audio data specifically includes:

the decoded audio data is subjected to automatic sound mixing, automatic gain, echo cancellation and noise suppression processing to generate first audio data.

preferably, the step of performing audio processing on a local audio signal in the audio signals to generate second audio data specifically includes:

And carrying out automatic sound mixing, automatic gain, howling suppression and noise suppression on the local audio signals in the audio data to generate second audio data.

preferably, the main processor further encodes the local audio signal, transmits the encoded local audio signal to the FPGA, and distributes the encoded local audio signal to the remote end through the FPGA.

In order to achieve the purpose, the invention adopts another technical scheme that: there is provided a low latency distributed audio processing system comprising:

The AD/DA chips comprise two AD/DA chips which are connected in a daisy chain manner and are used for collecting audio signals and playing a first audio signal and a second audio signal which are subjected to weighted sound mixing processing, wherein the audio signals comprise a far-end audio signal and a local audio signal;

The FPGA device is electrically connected with the AD/DA chip set and used for receiving and transmitting the audio signals, and performing weighted audio mixing processing and distribution on the first audio signals and the second audio signals;

The main processor is electrically connected with the FPGA device and is used for decoding a far-end audio signal in the audio signal to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data and transmitting the first audio data to the FPGA device;

And the slave processor is electrically connected with the FPGA device and used for carrying out audio processing on the local audio signal in the audio signals to generate second audio data and transmitting the second audio data to the FPGA device.

preferably, the low-latency distributed audio processing system further comprises an audio synchronization clock for providing a synchronization clock signal for the master processor and the slave processor.

Preferably, the main server is configured to decode a far-end audio signal in the audio signal to obtain decoded audio data, perform sound effect processing on the decoded audio data, generate first audio data, and transmit the first audio data to the FPGA device.

Preferably, the slave server is configured to perform automatic mixing, automatic gain, howling suppression and noise suppression processing on a local audio signal in the audio data, generate second audio data, and transmit the second audio data to the FPGA device.

Preferably, the main server is further configured to perform encoding processing on the local audio signal, transmit the encoded local audio signal to the FPGA device, and distribute the encoded local audio signal to the remote end through the FPGA device.

the technical scheme of the invention respectively processes the far-end audio signal and the local audio signal by adopting two processors, particularly, one processor is appointed as a main processor by the two processors, the other processor is a slave processor, the main processor is responsible for performing audio decoding on the far-end signal, performing echo cancellation, background noise suppression, automatic gain control and automatic sound mixing processing on the decoded audio data to generate first audio data, the slave processor is responsible for carrying out noise suppression, howling suppression, automatic gain control and automatic sound mixing processing on the local audio signal to generate second audio data, the first audio data and the second audio data are mixed and distributed through the FPGA, so that the CPU performance is fully utilized through the distribution of audio tasks in a master CPU and a slave CPU, the audio processing delay is reduced, the system stability is improved, and better use experience is presented for users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a flow chart of a low latency distributed audio processing method according to an embodiment of the invention;

Fig. 2 is a block diagram of a low latency distributed audio processing system according to an embodiment of the present invention.

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description of the invention relating to "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying any relative importance or implicit indication of the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

different from the prior art that a single video conference terminal uses one processor to perform multi-channel audio algorithm processing, but as the number of audio access paths increases to dozens of paths, and different requirements of various audio algorithm processing capabilities exist, one cpu cannot simultaneously process audio and other service problems.

Referring to fig. 1, in an embodiment of the present invention, the low-latency distributed audio processing method includes the following steps:

Step S10, appointing any processor from the two processors as a main processor, and the other processor as a secondary processor;

Step S20, receiving audio signals transmitted by the FPGA, wherein the audio signals comprise remote audio signals and local audio signals;

Step S30, on the master processor, decoding the far-end audio signal in the audio signal to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data, and on the slave processor, performing audio processing on the local audio signal in the audio signal to generate second audio data;

step S40, the first audio processing data and the second audio processing data are transmitted to the FPGA for audio mixing processing, and the FPGA distributes the first audio processing data and the second audio processing data for audio mixing processing.

in the embodiment, the main processor can process the far-end audio signal, and the auxiliary processors can process the local audio signal, so that the performance of each processor can be fully utilized by processing the far-end audio signal and the local audio signal separately, the problem that a single CPU cannot run or has a delay problem when processing echo cancellation and howling suppression simultaneously is avoided, the system stability is improved, and better use experience is presented to a user. The FPGA carries out weighted sound mixing processing, and the weight of the sound mixing processing can be flexibly configured according to actual requirements. When the audio signal is transmitted, the method adopts a time division multiplexing mode to transmit audio data, each data link supports 16 channels, and 4 data links support 64 channels of audio data input and output.

In a specific embodiment, after step S10, in which the slave processors designate any one of the processors as a master processor, and the other processor as a slave processor, the method further includes:

Providing synchronized clock signals to the master and slave processors.

in order to ensure clock synchronization between different chips, in this embodiment, clock synchronization signals are provided for the master processor and the slave processor, and the clock synchronization signals synchronize the master processor and the slave processor.

in a specific embodiment, the step of performing sound effect processing on the decoded audio data to generate first audio data specifically includes:

In this embodiment, when the main processor receives the far-end audio signal, the far-end audio signal is decoded to obtain decoded audio data, and then the decoded audio data is subjected to automatic sound mixing, automatic gain control, echo cancellation, and noise suppression to obtain first audio data.

Further, the step of performing audio processing on a local audio signal in the audio signal to generate second audio data specifically includes:

In this embodiment, when the slave processor receives the local audio signal, the second audio data is obtained by performing automatic audio mixing, automatic gain control, echo cancellation, and noise suppression on the local audio signal, so that the working performance of the slave processor can be fully utilized.

from a comparison of the two embodiments described above, the difference is that: the main processor is used for carrying out echo cancellation processing on the far-end audio signal, the secondary processor is used for carrying out howling suppression on the local audio signal, and therefore the delay problem caused by the simultaneous operation of the howling suppression and the echo cancellation on one CPU can be avoided, and the main processor and the secondary processor can make full use of respective performances.

in a specific embodiment, the main processor further performs encoding processing on the local audio signal, and transmits the encoded local audio signal to the FPGA and distributes the encoded local audio signal to the remote end through the FPGA. The transmission of the local audio signal can be performed by encoding the local audio signal.

referring to fig. 2, in an embodiment of the present invention, the low-latency distributed audio processing system includes:

At least one AD/DA chip group 10, one group of AD/DA chips comprises two AD/DA chips connected in a daisy chain, and the AD/DA chips are used for collecting audio signals and playing a first audio signal and a second audio signal which are processed by weighted sound mixing, wherein the audio signals comprise a far-end audio signal and a local audio signal;

The FPGA device 20 is electrically connected with the AD/DA chip set 10 and used for receiving and transmitting audio signals, and performing weighted audio mixing processing and distribution on the first audio signals and the second audio signals;

The main processor 30 is electrically connected with the FPGA device 20, and is configured to decode a far-end audio signal in the audio signal to obtain decoded audio data, perform sound effect processing on the decoded audio data to generate first audio data, and transmit the first audio data to the FPGA device 20;

And the slave processor 40 is electrically connected with the FPGA device 20 and used for carrying out audio processing on the local audio signal in the audio signal to generate second audio data and transmitting the second audio data to the FPGA device 20.

in this embodiment, the system uses ti8168 of ti corporation as a processor, which is a multichannel high-definition soc system chip, in which a C674x DSP core with lGHz dominant frequency is integrated, and 2 ti8168 chips are collocated in the same terminal to perform parallel audio processing, which are respectively called as a master 8168, a master DSP (master processor 30), a slave 8168, and a slave DSP (slave processor 40) for convenience of description. The audio algorithm runs on the master-slave DSP. Based on the realization scheme of cooperation of the ti8168 chip with strong audio processing capability and the multiple processors of the system design, the product can completely provide good use experience for customers. When the audio signal is transmitted, the system adopts a time division multiplexing mode to transmit audio data, each data link supports 16 channels, and 4 data links support 64 channels for audio data input and output. In order to ensure low time delay of data interaction, the system adopts EDMA rapid data exchange to receive and transmit audio signals. In addition, the system has better expansibility for audio processing, and because the main processor 30 and the auxiliary processor 40 are provided with 3 audio access and output interfaces, each audio access and output interface is provided with a plurality of receiving and transmitting channels, the research and development of the audio expansion board are very convenient.

In one embodiment, the low latency distributed audio processing system further comprises an audio synchronization clock for providing a synchronization clock signal to the master processor 30 and the slave processor 40.

In this embodiment, the audio frequency synchronization clock further includes a clock synchronization signal provided for a plurality of AD/DA chips, so as to enable different chips to operate synchronously.

in a specific embodiment, the main server is configured to decode a far-end audio signal in the audio signals to obtain decoded audio data, perform sound effect processing on the decoded audio data, generate first audio data, and transmit the first audio data to the FPGA device 20.

In this embodiment, when the main processor 30 receives the far-end audio signal, the far-end audio signal is decoded to obtain decoded audio data, and then the decoded audio data is subjected to automatic sound mixing, automatic gain control, echo cancellation, and noise suppression to obtain first audio data, so that the working performance of the main processor 30 can be fully utilized.

further, the slave server is configured to perform automatic sound mixing, automatic gain, howling suppression, and noise suppression on the local audio signal in the audio data, generate second audio data, and transmit the second audio data to the FPGA device 20.

in this embodiment, when the slave processor 40 receives the local audio signal, the second audio data is obtained by performing automatic audio mixing, automatic gain control, echo cancellation, and noise suppression on the local audio signal, so that the working performance of the slave processor 40 can be fully utilized.

From a comparison of the two embodiments described above, the difference is that: the main processor 30 is used for performing echo cancellation processing on the far-end audio signal, and the secondary processor 40 is used for performing howling suppression on the local audio signal, so that the delay problem caused by the simultaneous operation of the howling suppression and the echo cancellation on one CPU can be avoided, and the main processor 30 and the secondary processor 40 can fully utilize the respective performances.

in a specific embodiment, the main server 30 is further configured to perform encoding processing on the local audio signal, transmit the encoded local audio signal to the FPGA device 20, and distribute the encoded local audio signal to a remote end through the FPGA device 20. The local audio signal can be transmitted to the far end by encoding the local audio signal.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A low-latency distributed audio processing method, characterized in that the low-latency distributed audio processing method comprises the following steps:

appointing any one processor from the two processors as a main processor, and the other processor as a secondary processor; the method also comprises the steps of providing synchronous clock signals for the master processor and the slave processor;

On the master processor, decoding a far-end audio signal in the audio signal to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data, specifically, performing automatic sound mixing, automatic gain, echo cancellation and noise suppression processing on the decoded audio data to generate first audio data, and on the slave processor, performing audio processing on a local audio signal in the audio signal to generate second audio data, specifically, performing automatic sound mixing, automatic gain, howling suppression and noise suppression processing on the local audio signal in the audio data to generate second audio data;

The main processor is also used for encoding the local audio signal, transmitting the encoded local audio signal to the FPGA and distributing the encoded local audio signal to the remote end through the FPGA;

2. A low-latency distributed audio processing system, comprising:

the low-delay distributed audio processing system also comprises an audio synchronous clock which is used for providing synchronous clock signals for the main processor and the slave processor;

The main processor is electrically connected with the FPGA device and is used for decoding a far-end audio signal in the audio signal to obtain decoded audio data, performing sound effect processing on the decoded audio data to generate first audio data and transmitting the first audio data to the FPGA device, wherein the first audio data is generated after automatic sound mixing, automatic gain, echo cancellation and noise suppression processing are performed on the decoded audio data;

The main processor is also used for encoding the local audio signal, transmitting the encoded local audio signal to the FPGA device and distributing the encoded local audio signal to a remote end through the FPGA device;

And the slave processor is electrically connected with the FPGA device and used for carrying out audio processing on a local audio signal in the audio signal to generate second audio data and transmitting the second audio data to the FPGA device, wherein the second audio data is generated after carrying out automatic mixing, automatic gain, howling suppression and noise suppression on the local audio signal in the audio data.