CN112543202A - Method, system and readable storage medium for transmitting shared sound in network conference - Google Patents

Method, system and readable storage medium for transmitting shared sound in network conference Download PDF

Info

Publication number
CN112543202A
CN112543202A CN202011575111.7A CN202011575111A CN112543202A CN 112543202 A CN112543202 A CN 112543202A CN 202011575111 A CN202011575111 A CN 202011575111A CN 112543202 A CN112543202 A CN 112543202A
Authority
CN
China
Prior art keywords
data
sound
sound data
transmitting
echo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011575111.7A
Other languages
Chinese (zh)
Other versions
CN112543202B (en
Inventor
顾骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangxiang Space Information Technology Suzhou Co ltd
Original Assignee
Chuangxiang Space Information Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuangxiang Space Information Technology Suzhou Co ltd filed Critical Chuangxiang Space Information Technology Suzhou Co ltd
Priority to CN202011575111.7A priority Critical patent/CN112543202B/en
Publication of CN112543202A publication Critical patent/CN112543202A/en
Application granted granted Critical
Publication of CN112543202B publication Critical patent/CN112543202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Abstract

The invention provides a method, a system and a readable storage medium for transmitting shared sound in a network conference, wherein the method comprises the following steps: collecting first sound data and second sound data; encoding the first sound data and the second sound data; and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path. The invention directly forwards the shared sound data of the host to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process and improves the tone quality of the shared sound heard by the participant end. The technical scheme of the invention allows the host to share the video played by the host and the sound of the PPT courseware together while sharing the desktop, so that other participants can hear the video and the sound of the PPT courseware played by the host, thus the information which can be expressed in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.

Description

Method, system and readable storage medium for transmitting shared sound in network conference
Technical Field
The present application relates to the field of sound data processing, and more particularly, to a method, a system, and a readable storage medium for transmitting shared sound in a network conference.
Background
When the voip conference is used, participants of the conference basically enter conferences in different places, and a host often plays videos and demonstrates PPT in a desktop sharing mode, so that the method is greatly helpful for the development of the conference. However, sound information in videos and PPT cannot be heard by other participants, so that information that the conference desires to express is lost to a certain extent, which is not beneficial to the development of the conference.
In the prior art, after a host mixes a microphone and a shared sound, the microphone and the shared sound are processed by a sound mixing server and then forwarded to a participant end, wherein the sound mixing server can perform operations such as caching, decoding, sound mixing and encoding on data, delay and jitter of the data must be increased, the content of the shared sound is mainly music, the requirements on real-time performance and smoothness of a transmission process are higher than those of the microphone, and the problems of stutter, noise and the like of the shared sound heard by the participant end are easily caused in the prior art. Therefore, it is desirable to design a conference transmission method to solve the above problems.
Disclosure of Invention
In order to solve at least one technical problem, the invention provides a method, a system and a readable storage medium for transmitting shared sound in a network conference.
The first aspect of the present invention provides a method for transmitting shared sound in a network conference, including:
collecting first sound data and second sound data;
encoding the first sound data and the second sound data;
and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.
In the scheme, the first sound data are sound data collected by a preset terminal, and the second sound data are sound data of preset audio played by the preset terminal.
In the scheme, the first path is used for sending the sound data to the server, and the sound data is sent to the terminal equipment after being processed by the server; and the second path is used for directly sending the sound data to the terminal equipment.
In this scheme, still include:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
In this scheme, the identifying the first sound data and the filtering echo data specifically include:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
In this scheme, still include:
acquiring voice and voiceprint information of preset personnel;
extracting voiceprint characteristics of preset personnel according to the voiceprint information;
according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;
carrying out echo judgment on the recognized sound data of the first preset personnel;
and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
The second aspect of the present invention provides a system for transmitting shared sound in a web conference, including a memory and a processor, where the memory includes a program for transmitting shared sound in a web conference, and the program for transmitting shared sound in a web conference is executed by the processor to implement the following steps:
collecting first sound data and second sound data;
encoding the first sound data and the second sound data;
and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.
In the scheme, the first sound data are sound data collected by a preset terminal, and the second sound data are sound data of preset audio played by the preset terminal.
In the scheme, the first path is used for sending the sound data to the server, and the sound data is sent to the terminal equipment after being processed by the server; and the second path is used for directly sending the sound data to the terminal equipment.
In this scheme, still include:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
In this scheme, the identifying the first sound data and the filtering echo data specifically include:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
In this scheme, still include:
acquiring voice and voiceprint information of preset personnel;
extracting voiceprint characteristics of preset personnel according to the voiceprint information;
according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;
carrying out echo judgment on the recognized sound data of the first preset personnel;
and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
A third aspect of the present invention provides a computer-readable storage medium containing a program of a method for a network conference transmitting shared sound of a machine, which when executed by a processor implements the steps of a method for a network conference transmitting shared sound as described in any one of the above.
According to the method, the system and the readable storage medium for transmitting the shared sound by the network conference, the host side can send two paths of data streams, wherein microphone data can be firstly forwarded to the sound mixing server for sound mixing processing and then issued to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two paths of data streams, then carries out sound mixing and finally plays. The invention directly forwards the shared sound data of the host to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process and improves the tone quality of the shared sound heard by the participant end. The technical scheme of the invention allows the host to share the video played by the host and the sound of the PPT courseware together while sharing the desktop, so that other participants can hear the video and the sound of the PPT courseware played by the host, thus the information which can be expressed in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.
Drawings
FIG. 1 shows a flow diagram of a method of a network conference transmitting shared sound;
fig. 2 is a block diagram of a system for transmitting shared sound in a network conference according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the present invention, voice Over IP (voice Over IP) refers to voice communication carried Over an IP network.
Fig. 1 shows a flow chart of a method for transmitting shared sound in a network conference.
As shown in fig. 1, the present invention discloses a method for transmitting shared sound in a network conference, which comprises:
s102, collecting first sound data and second sound data;
s104, encoding the first sound data and the second sound data;
s106, the coded first sound is sent according to a first path, and the coded second sound data is sent according to a second path.
It should be noted that the first sound data is sound data collected by a preset terminal, and may be sound data of a host or sound data of a preset speaker, and the second sound data is sound data of a preset terminal playing a preset audio, and may be sound data of ppt or video. For example, when there are a plurality of speakers, the first sound data may be sound data collected by terminals where the plurality of speakers exist, and when there is only one speaker, the first sound data may be sound data collected by a terminal where the speaker exists. The device for collecting the sound by the terminal can be a microphone or other audio collecting devices. The preset terminal can be set by a person skilled in the art according to actual needs.
It is worth mentioning that in the voip conference, the host hopes to share the own desktop video and the PPT picture and also share the video and the PPT sound played by the host. When the voip conference is used, participants of the conference basically enter conferences in different places, and a host often plays videos and demonstrates PPT in a desktop sharing mode, so that the method is greatly helpful for the development of the conference.
The terminal may collect the first sound data and the second sound data first, and then the terminal performs encoding processing on the first sound data and the second sound data according to a predetermined protocol and type to obtain encoded sound data. And then, sending the encoded first sound data according to a first path, that is, sending the sound data to a server, and sending the sound data to a terminal device for playing after being processed by the server, where the terminal device may be multiple terminal devices, or may be a preset terminal device, such as a terminal device where a host or a speaker is located. And sending the second sound data according to the second path, namely directly sending the sound data to the terminal equipment. For example, the host side may send two data streams (microphone data and shared sound data), where the microphone data is forwarded to the sound mixing server for sound mixing and then is delivered to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two data streams (microphone sound mixing data and shared sound data delivered by the sound mixing server), then performs sound mixing and finally plays the data streams. The method can directly forward the preset shared sound data of the terminal equipment to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process, and improves the tone quality of the shared sound heard by the participant end.
According to the embodiment of the invention, the method further comprises the following steps:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
It should be noted that, the collected first sound data often includes echo data, so that filtering processing is required. The server collects a plurality of first sound data; and identifying the first sound data, and filtering echo data. After the echo data is filtered, the speaking sound of the terminal is preset, in order to keep the real-time performance and experience feeling of a conference, when a plurality of people speak, the speaking sound of each person needs to be played, the sounds are combined to obtain first mixed sound data, and then the first mixed sound data is sent to a plurality of terminal devices to be played.
According to the embodiment of the present invention, the identifying the first sound data and filtering the echo data specifically include:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
It should be noted that, in a preset terminal, for example, a host, the sound data of the far-end sound data, the video and the PPT sound data are mixed and then rendered on a default audio rendering device and finally played, so that the host can hear the speaking sound of the far-end participant and can also hear the video and the PPT sound played by the host; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the host, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression.
The data stream of the presenter-side sound includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The shared sound data can be directly forwarded to the participant terminal, the microphone data can be sent to the sound mixing server for sound mixing processing and then is sent to the participant terminal, wherein the microphone sound data is captured from the default audio acquisition equipment and is obtained through the audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data stream of the participant-side sound includes a transmission stream and a reception stream. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream has two paths, one path is shared sound data, and the other path is microphone mixed sound data issued by the mixed sound server, and the microphone mixed sound data is rendered on default audio rendering equipment for playing after being decoded and mixed.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring voice and voiceprint information of preset personnel;
extracting voiceprint characteristics of preset personnel according to the voiceprint information;
according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;
carrying out echo judgment on the recognized sound data of the first preset personnel;
and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
When echo filtering is performed, in order to improve accuracy of extracting sound, sound information may be extracted according to a voiceprint feature of a preset speaker. Firstly, acquiring voice and voiceprint information of a preset person; and then extracting voiceprint characteristics of preset personnel according to the voiceprint information, and screening out the sound data of the preset personnel from the sound data through the voiceprint characteristics. Then, according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data; and performing echo judgment on the recognized sound data of the first preset person, namely judging whether the recognized sound data is current sound data, namely judging whether the recognized sound data is the speech of the preset person at the current moment, and if the recognized sound data is the previous speech, judging that the recognized sound data is echo data and needs to be filtered. And finally, filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
Fig. 2 is a block diagram of a system for transmitting shared sound in a network conference according to the present invention.
As shown in fig. 2, the present invention discloses a system 2 for web conference to transmit shared sound, which comprises a memory 21 and a processor 22, wherein the memory includes a program of method for web conference to transmit shared sound, and the program of method for web conference to transmit shared sound implements the following steps when executed by the processor:
collecting first sound data and second sound data;
encoding the first sound data and the second sound data;
and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.
It should be noted that the first sound data is sound data collected by a preset terminal, and may be sound data of a host or sound data of a preset speaker, and the second sound data is sound data of a preset terminal playing a preset audio, and may be sound data of ppt or video. For example, when there are a plurality of speakers, the first sound data may be sound data collected by terminals where the plurality of speakers exist, and when there is only one speaker, the first sound data may be sound data collected by a terminal where the speaker exists. The device for collecting the sound by the terminal can be a microphone or other audio collecting devices. The preset terminal can be set by a person skilled in the art according to actual needs.
It is worth mentioning that in the voip conference, the host hopes to share the own desktop video and the PPT picture and also share the video and the PPT sound played by the host. When the voip conference is used, participants of the conference basically enter conferences in different places, and a host often plays videos and demonstrates PPT in a desktop sharing mode, so that the method is greatly helpful for the development of the conference.
The terminal may collect the first sound data and the second sound data first, and then the terminal performs encoding processing on the first sound data and the second sound data according to a predetermined protocol and type to obtain encoded sound data. And then, sending the encoded first sound data according to a first path, that is, sending the sound data to a server, and sending the sound data to a terminal device for playing after being processed by the server, where the terminal device may be multiple terminal devices, or may be a preset terminal device, such as a terminal device where a host or a speaker is located. And sending the second sound data according to the second path, namely directly sending the sound data to the terminal equipment. For example, the host side may send two data streams (microphone data and shared sound data), where the microphone data is forwarded to the sound mixing server for sound mixing and then is delivered to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two data streams (microphone sound mixing data and shared sound data delivered by the sound mixing server), then performs sound mixing and finally plays the data streams. The method can directly forward the preset shared sound data of the terminal equipment to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process, and improves the tone quality of the shared sound heard by the participant end.
According to the embodiment of the invention, the method further comprises the following steps:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
It should be noted that, the collected first sound data often includes echo data, so that filtering processing is required. The server collects a plurality of first sound data; and identifying the first sound data, and filtering echo data. After the echo data is filtered, the speaking sound of the terminal is preset, in order to keep the real-time performance and experience feeling of a conference, when a plurality of people speak, the speaking sound of each person needs to be played, the sounds are combined to obtain first mixed sound data, and then the first mixed sound data is sent to a plurality of terminal devices to be played.
According to the embodiment of the present invention, the identifying the first sound data and filtering the echo data specifically include:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
It should be noted that, in a preset terminal, for example, a host, the sound data of the far-end sound data, the video and the PPT sound data are mixed and then rendered on a default audio rendering device and finally played, so that the host can hear the speaking sound of the far-end participant and can also hear the video and the PPT sound played by the host; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the host, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression.
The data stream of the presenter-side sound includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The shared sound data can be directly forwarded to the participant terminal, the microphone data can be sent to the sound mixing server for sound mixing processing and then is sent to the participant terminal, wherein the microphone sound data is captured from the default audio acquisition equipment and is obtained through the audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data stream of the participant-side sound includes a transmission stream and a reception stream. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream has two paths, one path is shared sound data, and the other path is microphone mixed sound data issued by the mixed sound server, and the microphone mixed sound data is rendered on default audio rendering equipment for playing after being decoded and mixed.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring voice and voiceprint information of preset personnel;
extracting voiceprint characteristics of preset personnel according to the voiceprint information;
according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;
carrying out echo judgment on the recognized sound data of the first preset personnel;
and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
When echo filtering is performed, in order to improve accuracy of extracting sound, sound information may be extracted according to a voiceprint feature of a preset speaker. Firstly, acquiring voice and voiceprint information of a preset person; and then extracting voiceprint characteristics of preset personnel according to the voiceprint information, and screening out the sound data of the preset personnel from the sound data through the voiceprint characteristics. Then, according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data; and performing echo judgment on the recognized sound data of the first preset person, namely judging whether the recognized sound data is current sound data, namely judging whether the recognized sound data is the speech of the preset person at the current moment, and if the recognized sound data is the previous speech, judging that the recognized sound data is echo data and needs to be filtered. And finally, filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
A third aspect of the present invention provides a computer-readable storage medium containing a program of a method for a network conference transmitting shared sound of a machine, which when executed by a processor implements the steps of a method for a network conference transmitting shared sound as described in any one of the above.
According to the method, the system and the readable storage medium for transmitting the shared sound by the network conference, the host side can send two paths of data streams, wherein microphone data can be firstly forwarded to the sound mixing server for sound mixing processing and then issued to the participant side, the shared sound data is directly forwarded to the participant side without passing through the sound mixing server, and the participant side receives the two paths of data streams, then carries out sound mixing and finally plays. The invention directly forwards the shared sound data of the host to the participant end without passing through the audio mixing server, reduces the damage to the sound data in the transmission process and improves the tone quality of the shared sound heard by the participant end. The technical scheme of the invention allows the host to share the video played by the host and the sound of the PPT courseware together while sharing the desktop, so that other participants can hear the video and the sound of the PPT courseware played by the host, thus the information which can be expressed in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for transmitting shared sound in a network conference, comprising:
collecting first sound data and second sound data;
encoding the first sound data and the second sound data;
and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.
2. The method as claimed in claim 1, wherein the first sound data is sound data collected by a predetermined terminal, and the second sound data is sound data of a predetermined audio played by the predetermined terminal.
3. The method of claim 1, wherein the first path is sending voice data to a server, and sending the voice data to the terminal device after being processed by the server; and the second path is used for directly sending the sound data to the terminal equipment.
4. The method for transmitting the shared sound in the network conference according to claim 1, further comprising:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
5. The method as claimed in claim 4, wherein the step of identifying the first sound data and filtering the echo data comprises:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
6. The method for transmitting the shared sound in the network conference according to claim 5, further comprising:
acquiring voice and voiceprint information of preset personnel;
extracting voiceprint characteristics of preset personnel according to the voiceprint information;
according to the voiceprint characteristics of the preset personnel, recognizing the voice data of the first preset personnel from the first voice data;
carrying out echo judgment on the recognized sound data of the first preset personnel;
and filtering echoes in the first preset personnel voice data to obtain the preset personnel voice data.
7. A system for transmitting shared sound in a web conference, comprising a memory and a processor, wherein the memory includes a program for transmitting shared sound in the web conference, and the program for transmitting shared sound in the web conference is executed by the processor to implement the following steps:
collecting first sound data and second sound data;
encoding the first sound data and the second sound data;
and transmitting the encoded first sound according to a first path, and transmitting the encoded second sound data according to a second path.
8. The system for transmitting shared sound in a network conference according to claim 7, further comprising:
the server collects a plurality of first sound data;
identifying the first sound data, and filtering echo data;
combining the plurality of first sound data with the echo data filtered out to obtain first sound mixing data;
and sending the first mixed sound data to a plurality of terminal devices.
9. The system of claim 7, wherein the identifying the first sound data and the filtering the echo data specifically are:
identifying the first data to obtain preset personnel voice data, noise data and echo data;
filtering noise data and echo data of the first data to obtain preset personnel sound data;
and performing gain processing on the preset personnel voice data to obtain the gained voice data.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium contains a program of a method for a network conference transmitting shared sound of a machine, which program, when executed by a processor, carries out the steps of a method for a network conference transmitting shared sound of any one of claims 1 to 6.
CN202011575111.7A 2020-12-28 2020-12-28 Method, system and readable storage medium for transmitting shared sound in network conference Active CN112543202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011575111.7A CN112543202B (en) 2020-12-28 2020-12-28 Method, system and readable storage medium for transmitting shared sound in network conference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011575111.7A CN112543202B (en) 2020-12-28 2020-12-28 Method, system and readable storage medium for transmitting shared sound in network conference

Publications (2)

Publication Number Publication Date
CN112543202A true CN112543202A (en) 2021-03-23
CN112543202B CN112543202B (en) 2022-04-12

Family

ID=75017687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011575111.7A Active CN112543202B (en) 2020-12-28 2020-12-28 Method, system and readable storage medium for transmitting shared sound in network conference

Country Status (1)

Country Link
CN (1) CN112543202B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638671A (en) * 2011-02-15 2012-08-15 华为终端有限公司 Method and device for processing conference information in video conference
CN204231580U (en) * 2014-11-27 2015-03-25 珠海迈科智能科技股份有限公司 A kind of karaoke OK system and audio frequency integrating apparatus thereof
CN107221340A (en) * 2017-05-31 2017-09-29 福建星网视易信息系统有限公司 Real-time methods of marking, storage device and application based on MCVF multichannel voice frequency
CN209882100U (en) * 2019-05-17 2019-12-31 四川易简天下科技股份有限公司 Audio transmission structure
CN111212032A (en) * 2019-12-13 2020-05-29 视联动力信息技术股份有限公司 Audio processing method and device based on video network, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102638671A (en) * 2011-02-15 2012-08-15 华为终端有限公司 Method and device for processing conference information in video conference
CN204231580U (en) * 2014-11-27 2015-03-25 珠海迈科智能科技股份有限公司 A kind of karaoke OK system and audio frequency integrating apparatus thereof
CN107221340A (en) * 2017-05-31 2017-09-29 福建星网视易信息系统有限公司 Real-time methods of marking, storage device and application based on MCVF multichannel voice frequency
CN209882100U (en) * 2019-05-17 2019-12-31 四川易简天下科技股份有限公司 Audio transmission structure
CN111212032A (en) * 2019-12-13 2020-05-29 视联动力信息技术股份有限公司 Audio processing method and device based on video network, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112543202B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
JP4255461B2 (en) Stereo microphone processing for conference calls
EP2439945B1 (en) Audio panning in a multi-participant video conference
US6317776B1 (en) Method and apparatus for automatic chat room source selection based on filtered audio input amplitude of associated data streams
JP2005536132A (en) A human / machine interface for the real-time broadcast and execution of multimedia files during a video conference without interrupting communication
CN110072021B (en) Method, apparatus and computer readable medium in audio teleconference mixing system
WO2019164574A1 (en) Transcription of communications
WO2014154065A2 (en) Data transmission method, media acquisition device, video conference terminal and storage medium
CN110299144B (en) Audio mixing method, server and client
US8553520B2 (en) System and method for echo suppression in web browser-based communication
CN108259857A (en) A kind of two-way intercommunication integral system of video monitoring public broadcasting and its control method
CN110782907A (en) Method, device and equipment for transmitting voice signal and readable storage medium
CN111863011A (en) Audio processing method and electronic equipment
CN112688965B (en) Conference audio sharing method and device, electronic equipment and storage medium
EP2158753B1 (en) Selection of audio signals to be mixed in an audio conference
CN112565668B (en) Method for sharing sound in network conference
WO2021017807A1 (en) Call connection establishment method, first terminal, server, and storage medium
CN112543202B (en) Method, system and readable storage medium for transmitting shared sound in network conference
CN111951813A (en) Voice coding control method, device and storage medium
JP3898673B2 (en) Audio communication system, method and program, and audio reproduction apparatus
JP2008141348A (en) Communication apparatus
Perkins et al. Multicast audio: The next generation
JP2003023499A (en) Conference server device and conference system
CN117118956B (en) Audio processing method, device, electronic equipment and computer readable storage medium
CN111356062A (en) Data acquisition method, microphone and computer readable storage medium
US11764984B2 (en) Teleconference method and teleconference system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant