CN111741177B

CN111741177B - Audio mixing method, device, equipment and medium for online conference

Info

Publication number: CN111741177B
Application number: CN202010537309.XA
Authority: CN
Inventors: 权威; 汪海滨; 袁茂林; 王永强; 王国良; 石峥; 张雷; 张亚伟
Original assignee: Zhejiang Qiju Technology Co ltd
Current assignee: Zhejiang Qiju Technology Co ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2021-07-27
Anticipated expiration: 2040-06-12
Also published as: CN111741177A

Abstract

The embodiment of the invention discloses a sound mixing method, a device, equipment and a medium for an online conference. Wherein, the method comprises the following steps: if the triggering event of the audio mixing data is monitored, acquiring a recording client in a recording state in the participating clients; selecting a sound mixing client from the recording clients according to the number and the priority of the recording clients; and carrying out sound mixing processing on the voice collected from the sound mixing client to obtain conference sound mixing data. The embodiment of the invention can dynamically use different sound mixing strategies according to different numbers of recording clients, thereby solving the problem of high performance consumption when conference voice is subjected to sound mixing processing by participating clients; meanwhile, the sound mixing efficiency is effectively improved, so that the participants have higher participation experience.

Description

Audio mixing method, device, equipment and medium for online conference

Technical Field

The present invention relates to data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for mixing audio for an online conference.

Background

With the rapid development of internet technology, the application of streaming media technology, such as web conferences and internet phones, has been widely popularized in the lives of people; during a multimedia conference, participants need to hear the sound of a plurality of speakers, and under a conventional mode, a voice signal of each speaker is separately transmitted to each participant and then played after being mixed at a terminal, so that each member participating in the conference can receive the sound of the speaker.

The defects of the scheme are as follows: when each terminal performs sound mixing, a large amount of network bandwidth is occupied, and the requirement on the terminal performance is high; thereby resulting in low sound mixing efficiency and reducing the conference experience of the participants.

Disclosure of Invention

The embodiment of the application provides a sound mixing method, a device, equipment and a medium for an online conference, which can make different sound mixing methods according to the number of different participants participating in the conference, thereby effectively improving the sound mixing efficiency.

In a first aspect, an embodiment of the present invention provides a mixing method for an online conference, including:

if the triggering event of the audio mixing data is monitored, acquiring a recording client in a recording state in the participating clients;

selecting a sound mixing client from the recording clients according to the number and the priority of the recording clients;

and carrying out sound mixing processing on the voice collected from the sound mixing client to obtain conference sound mixing data.

Optionally, selecting a mixing client from the recording clients according to the number and the priority of the recording clients includes:

if the number of the recording clients is larger than a preset number threshold, selecting a sound mixing client according to the priority of the number of the recording clients;

otherwise, selecting the recording client as the audio mixing client.

Optionally, the audio mixing processing is performed on the speech collected by the audio mixing client to obtain conference audio mixing data, including:

performing audio superposition on the voice collected by the audio mixing client to obtain an overflow ratio, and segmenting the voice through the overflow ratio to obtain a scaling value of the segmented voice;

and according to the zoom value of the segmented voice, performing segmented contraction on the superposed voice to obtain conference mixed voice data.

Optionally, the mixing processing of the speech collected by the mixing client includes:

if the mixing state of the voice of the mixing client side is not involved in mixing, performing fade-in processing on the voice of the mixing client side, and changing the mixing state from not involved in mixing to participating in mixing;

and if the sound mixing state of the voice of the sound mixing client side is the state of participating in sound mixing, fading out the voice of the sound mixing client side, and changing the sound mixing state from the state of participating in sound mixing to the state of not participating in sound mixing.

In a second aspect, an embodiment of the present invention provides an audio mixing apparatus for an online conference, including:

the acquisition module is used for acquiring a recording client in a recording state in the participating clients if a mixed sound data triggering event is monitored;

the selection module is used for selecting a sound mixing client from the recording clients according to the number and the priority of the recording clients;

and the audio mixing module is used for carrying out audio mixing processing on the voice collected by the audio mixing client to obtain conference audio mixing data.

Optionally, the selecting module is specifically configured to:

otherwise, selecting the recording client as the audio mixing client.

Optionally, the mixing module is specifically configured to:

Optionally, the mixing module is further specifically configured to:

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for mixing an online conference according to any one of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the mixing method for an online conference according to any one of the embodiments of the present invention.

According to the embodiment of the invention, for the recording clients which are in the recording state and are obtained from the participating clients, the sound mixing clients are selected from the recording clients according to the number and the priority of the recording clients; and carrying out sound mixing processing on the voice collected from the sound mixing client to obtain conference sound mixing data. The embodiment of the invention can dynamically use different sound mixing strategies according to different numbers of recording clients, thereby solving the problem of high performance consumption when conference voice is subjected to sound mixing processing by participating clients; meanwhile, the sound mixing efficiency is effectively improved, so that the participants have higher participation experience.

Drawings

Fig. 1 is a schematic flow chart of a mixing method for an online conference according to a first embodiment of the present invention;

fig. 2 is a flowchart illustrating a mixing method for an online conference according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a mixing apparatus for an online conference according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart illustrating a mixing method for an online conference according to a first embodiment of the present invention. The embodiment can be applied to the situation of carrying out efficient sound mixing on the voice data in the online conference. The method of the embodiment can be executed by a mixing device of an online conference, and the device can be realized in a hardware/software mode and can be configured in electronic equipment. The audio mixing method for the online conference can be realized according to any embodiment of the application. As shown in fig. 1, the method specifically includes the following steps:

and S110, if the sound mixing data trigger event is monitored, acquiring the recording client in the recording state in the participating clients.

In this embodiment, the mixed sound data triggering event is a response event when voice data is accessed in the online conference, and can effectively reflect the increase and decrease of the multiple paths of voice data in the online conference; in the process of the online conference, real-time monitoring needs to be carried out on the access of voice data, and the voice data needing sound mixing is determined again according to the monitoring result.

In a large-scale online conference, a plurality of participating clients exist, and during communication and communication of the conference, the plurality of participating clients are probably in a recording state at the same time, namely the plurality of participating clients simultaneously send voice data to a conference system to express own viewpoints so as to realize effective communication of conference contents; the participating client can be a mobile device with a voice conversation function; such as a smart phone, a personal computer or tablet computer, etc. The participating client being in the recording state indicates that the user on the participating client side is speaking or is inputting voice data.

And S120, selecting a sound mixing client from the recording clients according to the number and the priority of the recording clients.

In this embodiment, in order to avoid simultaneously carrying out mixing processing on the voices of a plurality of recording clients, the conference system needs to simultaneously process multiple paths of voices, resulting in the problem of low mixing data processing efficiency, therefore, a certain selection condition needs to be formulated, the mixing clients capable of carrying out mixing processing are selected from the plurality of recording clients, and the problem of large influence on mixing effect caused by the number of mixing paths is solved.

The priority of the recording client is determined by automatic allocation of a conference system when the recording client joins the online conference; for example, the priority of the recording client can be determined according to the identification information of the recording client; the identification information of the recording client mainly comprises: the house owner or the tourist, and the identification information of the recording client can support dynamic configuration, and the purpose is to achieve the effect of improving the priority of some sessions.

Specifically, an audio energy value can be added on the basis of the number and the priority of the recording clients to determine a sound mixing client; namely, according to the number, priority and/or audio energy value of the recording clients, the audio mixing client is selected from the recording clients so as to ensure the effectiveness of the determined audio mixing client. The audio energy value is judged according to the pulse code modulation value of the audio; because the volume can be embodied by the pulse code modulation value, the audio energy value can be obtained only by accumulating the pulse code modulation value of the current frame; compared with the prior art that the audio mixing client is determined by adopting the signal-to-noise ratio, the implementation selects the audio mixing client according to the added audio energy value, and solves the problem that the time consumption is large because the signal-to-noise ratio needs to be subjected to complex operation.

And S130, carrying out sound mixing processing on the voice collected from the sound mixing client to obtain conference sound mixing data.

In this embodiment, when mixing collected voices, a plurality of voices are sequentially added to different audio queues, and each path of conversation voice corresponds to one audio queue; then, carrying out audio mixing processing on each audio queue according to the time stamp by using a piecewise contraction method to obtain conference audio mixing data; the segmented contraction method mainly scales different audio segments, so that the occurrence of plosive is prevented.

Specifically, the conference system unifies the data format of the obtained conference mixing data, packages the conference mixing data and sends the conference mixing data to each recording client, so that each recording client can effectively receive the voice data in the conference in real time in the conference; the embodiment of the invention can dynamically use different sound mixing strategies according to different numbers of recording clients, thereby solving the problem of high performance consumption when conference voice is subjected to sound mixing processing by participating clients; meanwhile, the sound mixing efficiency is effectively improved, so that the participants have higher participation experience.

Example two

Fig. 2 is a flowchart illustrating an online conference mixing method according to a second embodiment of the present invention. The embodiment is further expanded and optimized on the basis of the embodiment, and can be combined with any optional alternative in the technical scheme. As shown in fig. 2, the method includes:

s210, if the sound mixing data trigger event is monitored, the recording client in the recording state in the participating clients is obtained.

And S220, selecting a sound mixing client from the recording clients according to the number and the priority of the recording clients.

And S230, carrying out audio superposition on the voice collected by the audio mixing client to obtain an overflow ratio, and segmenting the voice through the overflow ratio to obtain a scaling value of the segmented voice.

In this embodiment, the overflow ratio is a ratio of the number of overflows to the number of overlaps after the voice data are overlapped, and the stability of the voice data can be expressed; the embodiment mainly realizes the voice mixing processing through a segmentation shrinkage method, and comprises the steps of reading voice data to be subjected to voice mixing according to a time stamp alignment mode; audio superposition is carried out after reading, the superposed voice data can be stored in a new audio queue, segmentation can be carried out according to the overflow proportion of the voice in the superposition process, and different paragraphs correspond to different overflow ratios; and calculating the scaling value of the voice after the current corresponding segmentation according to the overflow ratio and the number of the audios contained in the current segmentation.

Exemplarily, the selected mixing clients are set as a, b and c; storing the audio data of a, b and c in respective queues, such as S _ a, S _ b and S _ c; the audio data represents volume level, and the audio data is represented by a continuous segment of short types, each short type occupying two bytes, and thus the data in the queue is 011232132313344512. Then, adding the binary data corresponding to the S _ a, S _ b and S _ c queues according to the time stamp, as follows:

S_a：01 12 32 13 23 13 34 45 12

S_b：10 12 32 13 32 45 32 12 32

S_c：02 32 12 43 56 12 34 12 34

adding the corresponding data: s _ a + S _ b + S _ c; i.e. S _ a (0112 converts to shaping) plus S _ b (1012 converts to shaping) plus S _ c (0232 converts to shaping). During the addition the data exceeds the maximum value of short (maximum value 65535 represented by two bytes) at which point an overflow is deemed to exist. For example, if overflow occurs in 5 members in the first 10 members (each occupying two bytes, e.g., 0112 in S _ a) accumulation, the overflow ratio is 1/2, which is a proportional value in this embodiment. And so on, for example, if overflow occurs in 1 member of the continued 20 member accumulations, the overflow ratio is 1/20, and so on.

The calculation of the overflow ratio in this embodiment is a dynamic process, so as to avoid scaling of normal speech due to scaling of overflow members, overflow members are classified as one segment as much as possible, and members without overflow are classified as one segment. For example, if there are two overflows in the top 10 members (e.g., members 8 and 9), then the top 7 members are divided into segments and members 8 and 9 are divided into segments.

And S240, according to the zoom value of the segmented voice, performing segmented contraction on the superposed voice to obtain conference mixed voice data.

In this embodiment, segment contraction is performed on the overall voice after the superimposition according to the scaling value corresponding to the segmented voice, and then conference mixed-sound data is obtained.

The embodiment of the invention segments the voice by calculating the overflow proportion of the voice when adding; and then, the superposed voice is segmented and contracted through the zoom values of different segmented voices so as to obtain complete conference mixed voice data, and the mixed voice data can be effectively and quickly obtained.

Optionally, S220 includes:

otherwise, the recording client is selected as the audio mixing client.

In this embodiment, in order to avoid the influence on the mixing efficiency when mixing multiple channels of voices, a fixed amount of mixed voices needs to be set, and the load of the conference system can be effectively reduced. Wherein, the preset number threshold may be 3. Showing and explaining with a preset quantity threshold value of 3; during an online conference, a plurality of conference participating clients (such as a, b, c, d and e) in a recording state exist in a session, the maximum value of the mixing members is set to be 3, which indicates that only 3 members can participate in mixing at most in one frame of audio data, and for example, after the recording clients a, b, c, d and e are received, voices of the 3 recording clients are selected for mixing according to priorities. If the priority of the voice data is ranked as b > d > e > a > c, the selected mixing clients are b, d and e.

Optionally, the mixing processing of the speech collected from the mixing client includes:

and if the mixing state of the voice of the mixing client is mixing, fading the voice of the mixing client, and changing the mixing state from mixing to mixing.

In this embodiment, the system internally maintains a corresponding variable Enum _ MixStatus m _ MixStatus for each mixing client, where the variable is mainly used to indicate a state of a frame participating in mixing on the mixing client, and can better handle whether a current frame needs special processing (such as fade-in or fade-out) or not through the state.

Wherein the mixing state of the mixing client includes a default state keyixnormal, a last frame participating mixing (keyixsound), a last frame not participating mixing (keyixsound), and an error state (keyixerror). The fade-in processing and the fade-out processing in the embodiment are both realized by adopting an audio filtering algorithm, but the adopted filtering factors are different; specifically, the fade-in process realizes a slow change of the sound volume from low to high; the fade-out process achieves a slow change in sound volume from high to low. Through the fade-in and fade-out algorithm processing in the audio mixing conversation process, the smooth transition of conversation sound can be realized, and the problem of large and small sound is avoided.

The present embodiment can also perform fade-in or fade-out processing for the mixing state of the participating clients to realize effective mixing of voices, as shown in the following example.

If the voice data is a real-time stream, each mixing is an audio stream, and it is assumed that the audio streams participating in the mixing at this time are A, B, C, D and E, for example, mixed audio streams B, C and D selected according to priority. At this time A and E are not selected; then, the mixing states of a and E need to be judged. There are two cases:

1) if the mixing state of a (or E) is keyixprenious (mixing participation), it means that the previous frame participates in mixing, the current mixing needs to perform fade-out processing on the data stream of a, and finally, the mixing state of the current mixing is updated to keyixprenious (mixing non-participation);

2) if the mixing state of a (or E) is not keyimixpreious, no processing is performed.

And if the three-way mixed data selected at this time are A, B and C. When A, B and C are in the selected 3 channels, the mixing status of A, B and C is determined. There are two cases:

1) if the mixing state of A (B or C) is KEYMIXNOPREIOUS (not participating in mixing), fade-in processing is carried out on the A (B or C), and finally the mixing state of A (B or C) is updated to be KEYMIXPREIOUS (participating in mixing);

2) if the mixing state of a (B or C) is keyimixpreious, mixing processing is performed.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a mixing apparatus for an online conference according to a third embodiment of the present invention, which is applicable to a case of mixing voice data in an online conference efficiently. The device is configured in the electronic equipment, and can realize the audio mixing method of the online conference in any embodiment of the application. The device specifically comprises the following steps:

an obtaining module 310, configured to obtain a recording client in a recording state among participating clients if a mixing data triggering event is monitored;

a selecting module 320, configured to select a mixing client from the recording clients according to the number and priority of the recording clients;

and the audio mixing module 330 is configured to perform audio mixing processing on the voice collected by the audio mixing client to obtain conference audio mixing data.

Optionally, the selecting module 320 is specifically configured to:

otherwise, selecting the recording client as the audio mixing client.

Optionally, the mixing module 330 is specifically configured to:

Optionally, the mixing module 330 is further specifically configured to:

By the audio mixing device for the online conference, different audio mixing strategies can be dynamically used according to different numbers of recording clients, and the problem of high performance consumption when conference voice is subjected to audio mixing processing through participating clients is solved; meanwhile, the sound mixing efficiency is effectively improved, so that the participants have higher participation experience.

The audio mixing device for the online conference, provided by the embodiment of the invention, can execute the audio mixing method for the online conference, provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention, as shown in fig. 4, the electronic device includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of the processors 410 in the electronic device may be one or more, and one processor 410 is taken as an example in fig. 4; the processor 410, the memory 420, the input device 430 and the output device 440 in the electronic apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.

The memory 420, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the mixing method of an online conference in the embodiments of the present invention. The processor 410 executes various functional applications and data processing of the electronic device by running software programs, instructions and modules stored in the memory 420, that is, implements the mixing method for an online conference provided by the embodiment of the present invention.

The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to an electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, and may include a keyboard, a mouse, and the like. The output device 440 may include a display device such as a display screen.

EXAMPLE five

The present embodiment provides a storage medium containing computer-executable instructions, which are used to implement the mixing method for an online conference provided by the embodiment of the present invention when executed by a computer processor.

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the mixing method of the online conference provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for mixing audio of an online conference, the method comprising:

carrying out sound mixing processing on the voice collected from the sound mixing client to obtain conference sound mixing data;

carry out the audio mixing to the pronunciation that follows the audio mixing customer end is gathered and is handled, obtains meeting audio mixing data, include:

the overflow ratio is the ratio of the number of overflows to the number of overlaps after the voice data are overlapped;

in the superposition process, segmentation is carried out according to the overflow ratio of the voice, and different paragraphs correspond to different overflow ratios;

calculating the scaling value of the voice after the current corresponding segmentation according to the overflow ratio and the number of audios contained in the current segmentation;

2. The method of claim 1, wherein selecting a mixing client from the recording clients according to the number and the priority of the recording clients comprises:

otherwise, selecting the recording client as the audio mixing client.

3. The method according to claim 1, wherein the mixing the voice collected from the mixing client comprises:

4. An audio mixing apparatus for an online conference, the apparatus comprising:

the audio mixing module is used for carrying out audio mixing processing on the voice collected from the audio mixing client to obtain conference audio mixing data;

the audio mixing module is specifically configured to:

5. The apparatus according to claim 4, wherein the selection module is specifically configured to:

otherwise, selecting the recording client as the audio mixing client.

6. The apparatus according to claim 4, wherein the mixing module is further specifically configured to:

7. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of mixing for an online conference as recited in any of claims 1-3.

8. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a mixing method for an online conference according to any one of claims 1 to 3.