CN112543202A

CN112543202A - Method, system and readable storage medium for transmitting shared sound in network conference

Info

Publication number: CN112543202A
Application number: CN202011575111.7A
Authority: CN
Inventors: 顾骋
Original assignee: Chuangxiang Space Information Technology Suzhou Co ltd
Current assignee: Chuangxiang Space Information Technology Suzhou Co ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-03-23
Anticipated expiration: 2040-12-28
Also published as: CN112543202B

Abstract

The invention provides a method, a system and a readable storage medium for transmitting shared sound in a network conference, wherein the method comprises the following steps: collecting first sound data and second sound data; encoding the first sound data and the second sound data; and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path. The invention directly forwards the shared sound data of the host to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process and improves the tone quality of the shared sound heard by the participant end. The technical scheme of the invention allows the host to share the video played by the host and the sound of the PPT courseware together while sharing the desktop, so that other participants can hear the video and the sound of the PPT courseware played by the host, thus the information which can be expressed in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.

Description

Method, system and readable storage medium for transmitting shared sound in network conference

Technical Field

The present application relates to the field of sound data processing, and more particularly, to a method, a system, and a readable storage medium for transmitting shared sound in a network conference.

Background

When the voip conference is used, participants of the conference basically enter conferences in different places, and a host often plays videos and demonstrates PPT in a desktop sharing mode, so that the method is greatly helpful for the development of the conference. However, sound information in videos and PPT cannot be heard by other participants, so that information that the conference desires to express is lost to a certain extent, which is not beneficial to the development of the conference.

In the prior art, after a host mixes a microphone and a shared sound, the microphone and the shared sound are processed by a sound mixing server and then forwarded to a participant end, wherein the sound mixing server can perform operations such as caching, decoding, sound mixing and encoding on data, delay and jitter of the data must be increased, the content of the shared sound is mainly music, the requirements on real-time performance and smoothness of a transmission process are higher than those of the microphone, and the problems of stutter, noise and the like of the shared sound heard by the participant end are easily caused in the prior art. Therefore, it is desirable to design a conference transmission method to solve the above problems.

Disclosure of Invention

In order to solve at least one technical problem, the invention provides a method, a system and a readable storage medium for transmitting shared sound in a network conference.

The first aspect of the present invention provides a method for transmitting shared sound in a network conference, including:

collecting first sound data and second sound data;

encoding the first sound data and the second sound data;

and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.

In the scheme, the first sound data are sound data collected by a preset terminal, and the second sound data are sound data of preset audio played by the preset terminal.

In the scheme, the first path is used for sending the sound data to the server, and the sound data is sent to the terminal equipment after being processed by the server; and the second path is used for directly sending the sound data to the terminal equipment.

In this scheme, still include:

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;

and sending the first mixed sound data to a plurality of terminal devices.

In this scheme, the identifying the first sound data and the filtering echo data specifically include:

identifying the first data to obtain preset personnel voice data, noise data and echo data;

filtering noise data and echo data of the first data to obtain preset personnel sound data;

and performing gain processing on the preset personnel voice data to obtain the gained voice data.

In this scheme, still include:

acquiring voice and voiceprint information of preset personnel;

extracting voiceprint characteristics of preset personnel according to the voiceprint information;

according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;

carrying out echo judgment on the recognized sound data of the first preset personnel;

and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.

The second aspect of the present invention provides a system for transmitting shared sound in a web conference, including a memory and a processor, where the memory includes a program for transmitting shared sound in a web conference, and the program for transmitting shared sound in a web conference is executed by the processor to implement the following steps:

collecting first sound data and second sound data;

encoding the first sound data and the second sound data;

In this scheme, still include:

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

and sending the first mixed sound data to a plurality of terminal devices.

In this scheme, still include:

acquiring voice and voiceprint information of preset personnel;

A third aspect of the present invention provides a computer-readable storage medium containing a program of a method for a network conference transmitting shared sound of a machine, which when executed by a processor implements the steps of a method for a network conference transmitting shared sound as described in any one of the above.

According to the method, the system and the readable storage medium for transmitting the shared sound by the network conference, the host side can send two paths of data streams, wherein microphone data can be firstly forwarded to the sound mixing server for sound mixing processing and then issued to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two paths of data streams, then carries out sound mixing and finally plays. The invention directly forwards the shared sound data of the host to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process and improves the tone quality of the shared sound heard by the participant end. The technical scheme of the invention allows the host to share the video played by the host and the sound of the PPT courseware together while sharing the desktop, so that other participants can hear the video and the sound of the PPT courseware played by the host, thus the information which can be expressed in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.

Drawings

FIG. 1 shows a flow diagram of a method of a network conference transmitting shared sound;

fig. 2 is a block diagram of a system for transmitting shared sound in a network conference according to the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

In the present invention, voice Over IP (voice Over IP) refers to voice communication carried Over an IP network.

Fig. 1 shows a flow chart of a method for transmitting shared sound in a network conference.

As shown in fig. 1, the present invention discloses a method for transmitting shared sound in a network conference, which comprises:

s102, collecting first sound data and second sound data;

s104, encoding the first sound data and the second sound data;

s106, the coded first sound is sent according to a first path, and the coded second sound data is sent according to a second path.

It should be noted that the first sound data is sound data collected by a preset terminal, and may be sound data of a host or sound data of a preset speaker, and the second sound data is sound data of a preset terminal playing a preset audio, and may be sound data of ppt or video. For example, when there are a plurality of speakers, the first sound data may be sound data collected by terminals where the plurality of speakers exist, and when there is only one speaker, the first sound data may be sound data collected by a terminal where the speaker exists. The device for collecting the sound by the terminal can be a microphone or other audio collecting devices. The preset terminal can be set by a person skilled in the art according to actual needs.

It is worth mentioning that in the voip conference, the host hopes to share the own desktop video and the PPT picture and also share the video and the PPT sound played by the host. When the voip conference is used, participants of the conference basically enter conferences in different places, and a host often plays videos and demonstrates PPT in a desktop sharing mode, so that the method is greatly helpful for the development of the conference.

The terminal may collect the first sound data and the second sound data first, and then the terminal performs encoding processing on the first sound data and the second sound data according to a predetermined protocol and type to obtain encoded sound data. And then, sending the encoded first sound data according to a first path, that is, sending the sound data to a server, and sending the sound data to a terminal device for playing after being processed by the server, where the terminal device may be multiple terminal devices, or may be a preset terminal device, such as a terminal device where a host or a speaker is located. And sending the second sound data according to the second path, namely directly sending the sound data to the terminal equipment. For example, the host side may send two data streams (microphone data and shared sound data), where the microphone data is forwarded to the sound mixing server for sound mixing and then is delivered to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two data streams (microphone sound mixing data and shared sound data delivered by the sound mixing server), then performs sound mixing and finally plays the data streams. The method can directly forward the preset shared sound data of the terminal equipment to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process, and improves the tone quality of the shared sound heard by the participant end.

According to the embodiment of the invention, the method further comprises the following steps:

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

and sending the first mixed sound data to a plurality of terminal devices.

It should be noted that, the collected first sound data often includes echo data, so that filtering processing is required. The server collects a plurality of first sound data; and identifying the first sound data, and filtering echo data. After the echo data is filtered, the speaking sound of the terminal is preset, in order to keep the real-time performance and experience feeling of a conference, when a plurality of people speak, the speaking sound of each person needs to be played, the sounds are combined to obtain first mixed sound data, and then the first mixed sound data is sent to a plurality of terminal devices to be played.

According to the embodiment of the present invention, the identifying the first sound data and filtering the echo data specifically include:

It should be noted that, in a preset terminal, for example, a host, the sound data of the far-end sound data, the video and the PPT sound data are mixed and then rendered on a default audio rendering device and finally played, so that the host can hear the speaking sound of the far-end participant and can also hear the video and the PPT sound played by the host; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the host, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression.

The data stream of the presenter-side sound includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The shared sound data can be directly forwarded to the participant terminal, the microphone data can be sent to the sound mixing server for sound mixing processing and then is sent to the participant terminal, wherein the microphone sound data is captured from the default audio acquisition equipment and is obtained through the audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.

The data stream of the participant-side sound includes a transmission stream and a reception stream. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream has two paths, one path is shared sound data, and the other path is microphone mixed sound data issued by the mixed sound server, and the microphone mixed sound data is rendered on default audio rendering equipment for playing after being decoded and mixed.

acquiring voice and voiceprint information of preset personnel;

When echo filtering is performed, in order to improve accuracy of extracting sound, sound information may be extracted according to a voiceprint feature of a preset speaker. Firstly, acquiring voice and voiceprint information of a preset person; and then extracting voiceprint characteristics of preset personnel according to the voiceprint information, and screening out the sound data of the preset personnel from the sound data through the voiceprint characteristics. Then, according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data; and performing echo judgment on the recognized sound data of the first preset person, namely judging whether the recognized sound data is current sound data, namely judging whether the recognized sound data is the speech of the preset person at the current moment, and if the recognized sound data is the previous speech, judging that the recognized sound data is echo data and needs to be filtered. And finally, filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.

As shown in fig. 2, the present invention discloses a system 2 for web conference to transmit shared sound, which comprises a memory 21 and a processor 22, wherein the memory includes a program of method for web conference to transmit shared sound, and the program of method for web conference to transmit shared sound implements the following steps when executed by the processor:

collecting first sound data and second sound data;

encoding the first sound data and the second sound data;

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

and sending the first mixed sound data to a plurality of terminal devices.

acquiring voice and voiceprint information of preset personnel;

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for transmitting shared sound in a network conference, comprising:

collecting first sound data and second sound data;

encoding the first sound data and the second sound data;

2. The method as claimed in claim 1, wherein the first sound data is sound data collected by a predetermined terminal, and the second sound data is sound data of a predetermined audio played by the predetermined terminal.

3. The method of claim 1, wherein the first path is sending voice data to a server, and sending the voice data to the terminal device after being processed by the server; and the second path is used for directly sending the sound data to the terminal equipment.

4. The method for transmitting the shared sound in the network conference according to claim 1, further comprising:

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

and sending the first mixed sound data to a plurality of terminal devices.

5. The method as claimed in claim 4, wherein the step of identifying the first sound data and filtering the echo data comprises:

6. The method for transmitting the shared sound in the network conference according to claim 5, further comprising:

acquiring voice and voiceprint information of preset personnel;

7. A system for transmitting shared sound in a web conference, comprising a memory and a processor, wherein the memory includes a program for transmitting shared sound in the web conference, and the program for transmitting shared sound in the web conference is executed by the processor to implement the following steps:

collecting first sound data and second sound data;

encoding the first sound data and the second sound data;

8. The system for transmitting shared sound in a network conference according to claim 7, further comprising:

the server collects a plurality of first sound data;

identifying the first sound data, and filtering echo data;

and sending the first mixed sound data to a plurality of terminal devices.

9. The system of claim 7, wherein the identifying the first sound data and the filtering the echo data specifically are:

10. A computer-readable storage medium, characterized in that the computer-readable storage medium contains a program of a method for a network conference transmitting shared sound of a machine, which program, when executed by a processor, carries out the steps of a method for a network conference transmitting shared sound of any one of claims 1 to 6.