CN112565668A

CN112565668A - Method, system and readable storage medium for sharing sound in network conference

Info

Publication number: CN112565668A
Application number: CN202011576572.6A
Authority: CN
Inventors: 顾骋
Original assignee: Chuangxiang Space Information Technology Suzhou Co ltd
Current assignee: Chuangxiang Space Information Technology Suzhou Co ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-03-26
Anticipated expiration: 2040-12-28
Also published as: CN112565668B

Abstract

The invention discloses a method, a system and a readable storage medium for sharing sound in a network conference, wherein the method comprises the following steps: acquiring first sound data and second sound data; performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data; and sending the mixed sound data to a terminal. The invention can directly send the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.

Description

Method, system and readable storage medium for sharing sound in network conference

Technical Field

The present invention relates to the field of audio data processing, and more particularly, to a method, system and readable storage medium for sharing sound in a network conference.

Background

With the development of science and technology, more and more scenes are available for performing a web conference by using a network, and users communicate and display in the web conference through software or a platform. In a conference fusing voip and pstn, a voip host can play videos and demonstrate PPT (point-to-point) frequently in a desktop sharing mode, and the method is greatly helpful for the development of the conference. However, sound information in videos and PPTs and other pstn participants cannot hear the sound information, so that information which is expected to be expressed by the conference is lost to a certain extent, and the development of the conference is not facilitated. The collection mode of the shared sound is that sound data (including sound data played by videos and PPT and sound data spoken by other participants) are captured from the default audio rendering equipment, and then the sound data spoken by other participants are eliminated through an echo eliminator so as to prevent the other participants from hearing own sound again, and finally the sound data played by videos and PPT are obtained.

In the prior art, after a microphone and a shared sound are mixed at a voip host end, the mixed sound needs to be mixed again by a mixing server and then is sent to a pstn participant end, wherein a mixing algorithm performs nonlinear processing such as amplitude limiting, companding and gain control, the more mixing times, the more sound is easy to generate distortion, and particularly, the content of the shared sound is mainly music and is more sensitive to distortion. The prior art easily causes the problem that the shared sound heard by the pstn participant end is distorted.

Disclosure of Invention

In order to solve at least one technical problem, the present invention provides a method, a system and a readable storage medium for sharing sound in a network conference.

The invention provides a method for sharing sound in a network conference, which comprises the following steps:

acquiring first sound data and second sound data;

performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;

and sending the mixed sound data to a terminal.

In this scheme, still include:

performing voice recognition on the first voice data;

filtering the first sound data according to a preset rule to obtain filtered first sound data;

and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.

In the scheme, the first sound data is sound data collected by the terminal equipment, and the second sound data is sound data generated by a terminal playing file.

In this scheme, the first sound data is filtered according to a preset rule to obtain the filtered first sound data, and specifically the method includes:

acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;

extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;

and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.

In this scheme, the identifying, according to a preset rule, the current sound data in the third sound data specifically includes:

sending the environment parameters to a server;

the server acquires a corresponding filter module and parameters according to the environment parameters;

using the filter module and the parameters as an echo cancellation filter;

and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.

In this scheme, still include:

acquiring first sound data of a plurality of terminals;

and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.

A second aspect of the present invention provides a system for sharing sound in a web conference, including a memory and a processor, where the memory includes a method program for sharing sound in a web conference, and the method program for sharing sound in a web conference is executed by the processor to implement the following steps:

acquiring first sound data and second sound data;

and sending the mixed sound data to a terminal.

In this scheme, still include:

performing voice recognition on the first voice data;

sending the environment parameters to a server;

using the filter module and the parameters as an echo cancellation filter;

In this scheme, still include:

acquiring first sound data of a plurality of terminals;

The third aspect of the present invention also provides a computer-readable storage medium, which includes a program of a method for sharing sound for a network conference of a machine, and when the program of the method for sharing sound for a network conference is executed by a processor, the steps of the method for sharing sound for a network conference are implemented as any one of the above.

According to the method, the system and the readable storage medium for sharing the sound in the network conference, the voip host end sends two paths of data streams (microphone data and shared sound data), the two paths of data streams are forwarded to the sound mixing server for sound mixing and then are sent to the pstn participant end, and the pstn participant end finally receives one path of data stream and decodes and plays the data stream. The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.

Drawings

FIG. 1 is a flow chart illustrating a method of sharing sound for a web conference of the present invention;

fig. 2 shows a system block diagram of sharing sound in a network conference according to the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

In the invention, voice Over IP is carried on the IP network. pstn (public switched telephone network) is the public switched telephone network, which is the telephone network we usually use.

Fig. 1 is a flow chart illustrating a method for sharing sound in a web conference according to the present invention.

As shown in fig. 1, the present invention discloses a method for sharing sound in a network conference, which comprises:

acquiring first sound data and second sound data;

and sending the mixed sound data to a terminal.

It should be noted that, in the present invention, the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file. The terminal can collect the sound data by adopting a microphone or other sound collection equipment, and the second sound data can be sound generated by local playing of PPT (power point) or video and the like. First, the mixing server first obtains the first sound data and the second sound data, and the two sound data may be sent in two paths or combined into one path through a time division multiplexing technology, as long as the mixing server can process the two data according to a preset rule. After receiving the two kinds of sound data, the mixing server performs mixing processing, that is, combines the two kinds of sound data. After the mixing process, the mixed sound data is sent to the terminal, so that the user can hear the speaking sound of other users and the sound of the preset user playing video or ppt and other files through a playing device such as a loudspeaker.

It should be noted that, in the present invention, the first sound data and the second sound data may be sent to the server end through two paths, or may be sent to the server end through one path in a time division multiplexing manner, as long as the first sound data and the second sound data can be separated by using a preset protocol or algorithm.

According to the embodiment of the invention, the method further comprises the following steps:

performing voice recognition on the first voice data;

It should be noted that the first sound data is sound data collected by the terminal device, and the sound data often includes the speaking sound of the user and echoes and noises of other sounds, so that the first sound data needs to be filtered to obtain the required sound data.

For example, in the processing of sharing the sound data by the voip host end, the sound data of the far-end sound data, the video and the PPT are mixed and then rendered on the default audio rendering device and finally played, so that the voip host end can hear the speaking sound of the far-end participant and can also hear the sound of the video and the PPT played by the local machine; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the voip presenter end, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression. The data stream of the voice from the voip host side includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The two data streams are forwarded to a sound mixing server for sound mixing processing and then are issued to a pstn participant terminal, and microphone sound data are acquired from a default audio acquisition device and are acquired through audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.

The data flow diagram for the pstn participant-side sound contains a transmit flow and a receive flow. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream is the mixed sound data of the microphone and the shared sound sent by the mixed sound server, and is rendered on the default audio rendering device for playing after being decoded.

According to the embodiment of the present invention, the filtering the first sound data according to the preset rule to obtain the filtered first sound data specifically includes:

It should be noted that, because the first sound data is sound data collected by the terminal device, the sound data often includes the speaking sound of the user and echoes and noises of other sounds, the first sound data needs to be filtered to obtain the required sound data. In the filtering process, if other noises and echoes need to be filtered and the voice data of the preset personnel is reserved, the voiceprint characteristics of all participants need to be acquired first, the voiceprint characteristics of the preset personnel are determined, and the voiceprint is similar to the fingerprint of a person, so that the voice data of the user can be identified through the voiceprint. And then extracting the sound data of the voiceprint characteristic in the first sound data according to the voiceprint characteristic of the preset person to obtain third sound data. In the third sound data, there may also be an echo of the user speaking, that is, the sound of the user speaking before, and then the current sound data in the third sound data, that is, the near-end sound data, needs to be identified through a preset rule, and the current sound data is used as the first sound data after being filtered to participate in sound mixing of other sound data.

According to the embodiment of the present invention, the identifying the current sound data in the third sound data according to the preset rule specifically includes:

sending the environment parameters to a server;

using the filter module and the parameters as an echo cancellation filter;

It should be noted that, when sound filtering is performed, a filter is often used for filtering, and sound data with preset frequency and amplitude can be filtered through the filter, but parameters of the filter are often difficult to determine, and if the parameter setting is not accurate, it is likely that required sound data cannot be filtered, so that the filter module and the filter parameters are obtained through cloud services. Firstly, acquiring environmental parameters of collected sound, and sending the environmental parameters to a server. The server can select the module and the parameter of the filter through the cloud computing technology, namely the server obtains the corresponding filter module and the corresponding parameter according to the environment parameter. The filter generated by the cloud server can be better adapted to the current sound environment and filter related sounds, the filter module and the parameters are used as an echo cancellation filter, and the echo cancellation filter is used for filtering echo data in the third sound data to obtain the current sound data. The filter module and the parameters are selected through cloud computing, the filter and the parameters can be dynamically adjusted to adapt to the current sound environment, and filtering is more accurate.

acquiring first sound data of a plurality of terminals;

It should be noted that in a network conference, a plurality of users often speak at the same time or in the same time period, and in order to make other conference participants hear the speech of all users, it is necessary to acquire sound data of each user and then perform mixing processing. After first sound data of a plurality of terminals are obtained, the server filters noise and other sounds to obtain sound data of preset users, then sound mixing processing is carried out on the sound data of the plurality of users and shared sound data, then the obtained mixed sound data is sent to each terminal, and each terminal receives the mixed sound data and then carries out decoding and playing.

As shown in fig. 2, the present invention discloses a system 2 for sharing sound in a network conference, which includes a memory 21 and a processor 21, wherein the memory 22 includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:

acquiring first sound data and second sound data;

and sending the mixed sound data to a terminal.

performing voice recognition on the first voice data;

sending the environment parameters to a server;

using the filter module and the parameters as an echo cancellation filter;

acquiring first sound data of a plurality of terminals;

The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. The method realizes the scheme of sharing the sound in the conference fusing the voip and the pstn, and the scheme allows a voip host to share the desktop and simultaneously share the video played by the host and the sound of the PPT courseware, so that other pstn participants can hear the video and the sound of the PPT courseware played by the voip host, and the expressive information in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for sharing sound in a web conference, comprising:

acquiring first sound data and second sound data;

and sending the mixed sound data to a terminal.

2. The method for sharing sound in a web conference according to claim 1, further comprising:

performing voice recognition on the first voice data;

3. The method as claimed in claim 1, wherein the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file.

4. The method for sharing sound in a network conference according to claim 2, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:

5. The method for sharing sound in a web conference according to claim 4, wherein the identifying the current sound data in the third sound data according to the preset rule specifically includes:

sending the environment parameters to a server;

using the filter module and the parameters as an echo cancellation filter;

6. The method for sharing sound in a web conference according to claim 1, further comprising:

acquiring first sound data of a plurality of terminals;

7. A system for sharing sound in a network conference, comprising a memory and a processor, wherein the memory includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:

acquiring first sound data and second sound data;

and sending the mixed sound data to a terminal.

8. The system for sharing sound in a web conference according to claim 7, further comprising:

performing voice recognition on the first voice data;

9. The system for sharing sound in a network conference according to claim 8, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a program of a method of sharing sound for a web conference of a machine, which program, when being executed by a processor, carries out the steps of a method of sharing sound for a web conference according to any one of claims 1 to 6.