CN112565668A - Method, system and readable storage medium for sharing sound in network conference - Google Patents
Method, system and readable storage medium for sharing sound in network conference Download PDFInfo
- Publication number
- CN112565668A CN112565668A CN202011576572.6A CN202011576572A CN112565668A CN 112565668 A CN112565668 A CN 112565668A CN 202011576572 A CN202011576572 A CN 202011576572A CN 112565668 A CN112565668 A CN 112565668A
- Authority
- CN
- China
- Prior art keywords
- sound data
- sound
- data
- sharing
- mixing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000001914 filtration Methods 0.000 claims description 25
- 230000008030 elimination Effects 0.000 abstract description 7
- 238000003379 elimination reaction Methods 0.000 abstract description 7
- 238000009877 rendering Methods 0.000 description 13
- 238000002592 echocardiography Methods 0.000 description 6
- 230000001629 suppression Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 101710129069 Serine/threonine-protein phosphatase 5 Proteins 0.000 description 1
- 101710199542 Serine/threonine-protein phosphatase T Proteins 0.000 description 1
- 229920000470 poly(p-phenylene terephthalate) polymer Polymers 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The invention discloses a method, a system and a readable storage medium for sharing sound in a network conference, wherein the method comprises the following steps: acquiring first sound data and second sound data; performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data; and sending the mixed sound data to a terminal. The invention can directly send the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.
Description
Technical Field
The present invention relates to the field of audio data processing, and more particularly, to a method, system and readable storage medium for sharing sound in a network conference.
Background
With the development of science and technology, more and more scenes are available for performing a web conference by using a network, and users communicate and display in the web conference through software or a platform. In a conference fusing voip and pstn, a voip host can play videos and demonstrate PPT (point-to-point) frequently in a desktop sharing mode, and the method is greatly helpful for the development of the conference. However, sound information in videos and PPTs and other pstn participants cannot hear the sound information, so that information which is expected to be expressed by the conference is lost to a certain extent, and the development of the conference is not facilitated. The collection mode of the shared sound is that sound data (including sound data played by videos and PPT and sound data spoken by other participants) are captured from the default audio rendering equipment, and then the sound data spoken by other participants are eliminated through an echo eliminator so as to prevent the other participants from hearing own sound again, and finally the sound data played by videos and PPT are obtained.
In the prior art, after a microphone and a shared sound are mixed at a voip host end, the mixed sound needs to be mixed again by a mixing server and then is sent to a pstn participant end, wherein a mixing algorithm performs nonlinear processing such as amplitude limiting, companding and gain control, the more mixing times, the more sound is easy to generate distortion, and particularly, the content of the shared sound is mainly music and is more sensitive to distortion. The prior art easily causes the problem that the shared sound heard by the pstn participant end is distorted.
Disclosure of Invention
In order to solve at least one technical problem, the present invention provides a method, a system and a readable storage medium for sharing sound in a network conference.
The invention provides a method for sharing sound in a network conference, which comprises the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
In this scheme, still include:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
In the scheme, the first sound data is sound data collected by the terminal equipment, and the second sound data is sound data generated by a terminal playing file.
In this scheme, the first sound data is filtered according to a preset rule to obtain the filtered first sound data, and specifically the method includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
In this scheme, the identifying, according to a preset rule, the current sound data in the third sound data specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
In this scheme, still include:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
A second aspect of the present invention provides a system for sharing sound in a web conference, including a memory and a processor, where the memory includes a method program for sharing sound in a web conference, and the method program for sharing sound in a web conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
In this scheme, still include:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
In the scheme, the first sound data is sound data collected by the terminal equipment, and the second sound data is sound data generated by a terminal playing file.
In this scheme, the first sound data is filtered according to a preset rule to obtain the filtered first sound data, and specifically the method includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
In this scheme, the identifying, according to a preset rule, the current sound data in the third sound data specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
In this scheme, still include:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
The third aspect of the present invention also provides a computer-readable storage medium, which includes a program of a method for sharing sound for a network conference of a machine, and when the program of the method for sharing sound for a network conference is executed by a processor, the steps of the method for sharing sound for a network conference are implemented as any one of the above.
According to the method, the system and the readable storage medium for sharing the sound in the network conference, the voip host end sends two paths of data streams (microphone data and shared sound data), the two paths of data streams are forwarded to the sound mixing server for sound mixing and then are sent to the pstn participant end, and the pstn participant end finally receives one path of data stream and decodes and plays the data stream. The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.
Drawings
FIG. 1 is a flow chart illustrating a method of sharing sound for a web conference of the present invention;
fig. 2 shows a system block diagram of sharing sound in a network conference according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the invention, voice Over IP is carried on the IP network. pstn (public switched telephone network) is the public switched telephone network, which is the telephone network we usually use.
Fig. 1 is a flow chart illustrating a method for sharing sound in a web conference according to the present invention.
As shown in fig. 1, the present invention discloses a method for sharing sound in a network conference, which comprises:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
It should be noted that, in the present invention, the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file. The terminal can collect the sound data by adopting a microphone or other sound collection equipment, and the second sound data can be sound generated by local playing of PPT (power point) or video and the like. First, the mixing server first obtains the first sound data and the second sound data, and the two sound data may be sent in two paths or combined into one path through a time division multiplexing technology, as long as the mixing server can process the two data according to a preset rule. After receiving the two kinds of sound data, the mixing server performs mixing processing, that is, combines the two kinds of sound data. After the mixing process, the mixed sound data is sent to the terminal, so that the user can hear the speaking sound of other users and the sound of the preset user playing video or ppt and other files through a playing device such as a loudspeaker.
It should be noted that, in the present invention, the first sound data and the second sound data may be sent to the server end through two paths, or may be sent to the server end through one path in a time division multiplexing manner, as long as the first sound data and the second sound data can be separated by using a preset protocol or algorithm.
According to the embodiment of the invention, the method further comprises the following steps:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
It should be noted that the first sound data is sound data collected by the terminal device, and the sound data often includes the speaking sound of the user and echoes and noises of other sounds, so that the first sound data needs to be filtered to obtain the required sound data.
For example, in the processing of sharing the sound data by the voip host end, the sound data of the far-end sound data, the video and the PPT are mixed and then rendered on the default audio rendering device and finally played, so that the voip host end can hear the speaking sound of the far-end participant and can also hear the sound of the video and the PPT played by the local machine; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the voip presenter end, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression. The data stream of the voice from the voip host side includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The two data streams are forwarded to a sound mixing server for sound mixing processing and then are issued to a pstn participant terminal, and microphone sound data are acquired from a default audio acquisition device and are acquired through audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data flow diagram for the pstn participant-side sound contains a transmit flow and a receive flow. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream is the mixed sound data of the microphone and the shared sound sent by the mixed sound server, and is rendered on the default audio rendering device for playing after being decoded.
According to the embodiment of the present invention, the filtering the first sound data according to the preset rule to obtain the filtered first sound data specifically includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
It should be noted that, because the first sound data is sound data collected by the terminal device, the sound data often includes the speaking sound of the user and echoes and noises of other sounds, the first sound data needs to be filtered to obtain the required sound data. In the filtering process, if other noises and echoes need to be filtered and the voice data of the preset personnel is reserved, the voiceprint characteristics of all participants need to be acquired first, the voiceprint characteristics of the preset personnel are determined, and the voiceprint is similar to the fingerprint of a person, so that the voice data of the user can be identified through the voiceprint. And then extracting the sound data of the voiceprint characteristic in the first sound data according to the voiceprint characteristic of the preset person to obtain third sound data. In the third sound data, there may also be an echo of the user speaking, that is, the sound of the user speaking before, and then the current sound data in the third sound data, that is, the near-end sound data, needs to be identified through a preset rule, and the current sound data is used as the first sound data after being filtered to participate in sound mixing of other sound data.
According to the embodiment of the present invention, the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
It should be noted that, when sound filtering is performed, a filter is often used for filtering, and sound data with preset frequency and amplitude can be filtered through the filter, but parameters of the filter are often difficult to determine, and if the parameter setting is not accurate, it is likely that required sound data cannot be filtered, so that the filter module and the filter parameters are obtained through cloud services. Firstly, acquiring environmental parameters of collected sound, and sending the environmental parameters to a server. The server can select the module and the parameter of the filter through the cloud computing technology, namely the server obtains the corresponding filter module and the corresponding parameter according to the environment parameter. The filter generated by the cloud server can be better adapted to the current sound environment and filter related sounds, the filter module and the parameters are used as an echo cancellation filter, and the echo cancellation filter is used for filtering echo data in the third sound data to obtain the current sound data. The filter module and the parameters are selected through cloud computing, the filter and the parameters can be dynamically adjusted to adapt to the current sound environment, and filtering is more accurate.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
It should be noted that in a network conference, a plurality of users often speak at the same time or in the same time period, and in order to make other conference participants hear the speech of all users, it is necessary to acquire sound data of each user and then perform mixing processing. After first sound data of a plurality of terminals are obtained, the server filters noise and other sounds to obtain sound data of preset users, then sound mixing processing is carried out on the sound data of the plurality of users and shared sound data, then the obtained mixed sound data is sent to each terminal, and each terminal receives the mixed sound data and then carries out decoding and playing.
Fig. 2 shows a system block diagram of sharing sound in a network conference according to the present invention.
As shown in fig. 2, the present invention discloses a system 2 for sharing sound in a network conference, which includes a memory 21 and a processor 21, wherein the memory 22 includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
It should be noted that, in the present invention, the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file. The terminal can collect the sound data by adopting a microphone or other sound collection equipment, and the second sound data can be sound generated by local playing of PPT (power point) or video and the like. First, the mixing server first obtains the first sound data and the second sound data, and the two sound data may be sent in two paths or combined into one path through a time division multiplexing technology, as long as the mixing server can process the two data according to a preset rule. After receiving the two kinds of sound data, the mixing server performs mixing processing, that is, combines the two kinds of sound data. After the mixing process, the mixed sound data is sent to the terminal, so that the user can hear the speaking sound of other users and the sound of the preset user playing video or ppt and other files through a playing device such as a loudspeaker.
It should be noted that, in the present invention, the first sound data and the second sound data may be sent to the server end through two paths, or may be sent to the server end through one path in a time division multiplexing manner, as long as the first sound data and the second sound data can be separated by using a preset protocol or algorithm.
According to the embodiment of the invention, the method further comprises the following steps:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
It should be noted that the first sound data is sound data collected by the terminal device, and the sound data often includes the speaking sound of the user and echoes and noises of other sounds, so that the first sound data needs to be filtered to obtain the required sound data.
For example, in the processing of sharing the sound data by the voip host end, the sound data of the far-end sound data, the video and the PPT are mixed and then rendered on the default audio rendering device and finally played, so that the voip host end can hear the speaking sound of the far-end participant and can also hear the sound of the video and the PPT played by the local machine; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the voip presenter end, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression. The data stream of the voice from the voip host side includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The two data streams are forwarded to a sound mixing server for sound mixing processing and then are issued to a pstn participant terminal, and microphone sound data are acquired from a default audio acquisition device and are acquired through audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data flow diagram for the pstn participant-side sound contains a transmit flow and a receive flow. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream is the mixed sound data of the microphone and the shared sound sent by the mixed sound server, and is rendered on the default audio rendering device for playing after being decoded.
According to the embodiment of the present invention, the filtering the first sound data according to the preset rule to obtain the filtered first sound data specifically includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
It should be noted that, because the first sound data is sound data collected by the terminal device, the sound data often includes the speaking sound of the user and echoes and noises of other sounds, the first sound data needs to be filtered to obtain the required sound data. In the filtering process, if other noises and echoes need to be filtered and the voice data of the preset personnel is reserved, the voiceprint characteristics of all participants need to be acquired first, the voiceprint characteristics of the preset personnel are determined, and the voiceprint is similar to the fingerprint of a person, so that the voice data of the user can be identified through the voiceprint. And then extracting the sound data of the voiceprint characteristic in the first sound data according to the voiceprint characteristic of the preset person to obtain third sound data. In the third sound data, there may also be an echo of the user speaking, that is, the sound of the user speaking before, and then the current sound data in the third sound data, that is, the near-end sound data, needs to be identified through a preset rule, and the current sound data is used as the first sound data after being filtered to participate in sound mixing of other sound data.
According to the embodiment of the present invention, the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
It should be noted that, when sound filtering is performed, a filter is often used for filtering, and sound data with preset frequency and amplitude can be filtered through the filter, but parameters of the filter are often difficult to determine, and if the parameter setting is not accurate, it is likely that required sound data cannot be filtered, so that the filter module and the filter parameters are obtained through cloud services. Firstly, acquiring environmental parameters of collected sound, and sending the environmental parameters to a server. The server can select the module and the parameter of the filter through the cloud computing technology, namely the server obtains the corresponding filter module and the corresponding parameter according to the environment parameter. The filter generated by the cloud server can be better adapted to the current sound environment and filter related sounds, the filter module and the parameters are used as an echo cancellation filter, and the echo cancellation filter is used for filtering echo data in the third sound data to obtain the current sound data. The filter module and the parameters are selected through cloud computing, the filter and the parameters can be dynamically adjusted to adapt to the current sound environment, and filtering is more accurate.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
It should be noted that in a network conference, a plurality of users often speak at the same time or in the same time period, and in order to make other conference participants hear the speech of all users, it is necessary to acquire sound data of each user and then perform mixing processing. After first sound data of a plurality of terminals are obtained, the server filters noise and other sounds to obtain sound data of preset users, then sound mixing processing is carried out on the sound data of the plurality of users and shared sound data, then the obtained mixed sound data is sent to each terminal, and each terminal receives the mixed sound data and then carries out decoding and playing.
The third aspect of the present invention also provides a computer-readable storage medium, which includes a program of a method for sharing sound for a network conference of a machine, and when the program of the method for sharing sound for a network conference is executed by a processor, the steps of the method for sharing sound for a network conference are implemented as any one of the above.
According to the method, the system and the readable storage medium for sharing the sound in the network conference, the voip host end sends two paths of data streams (microphone data and shared sound data), the two paths of data streams are forwarded to the sound mixing server for sound mixing and then are sent to the pstn participant end, and the pstn participant end finally receives one path of data stream and decodes and plays the data stream. The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.
The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. The method realizes the scheme of sharing the sound in the conference fusing the voip and the pstn, and the scheme allows a voip host to share the desktop and simultaneously share the video played by the host and the sound of the PPT courseware, so that other pstn participants can hear the video and the sound of the PPT courseware played by the voip host, and the expressive information in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (10)
1. A method for sharing sound in a web conference, comprising:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
2. The method for sharing sound in a web conference according to claim 1, further comprising:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
3. The method as claimed in claim 1, wherein the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file.
4. The method for sharing sound in a network conference according to claim 2, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
5. The method for sharing sound in a web conference according to claim 4, wherein the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
6. The method for sharing sound in a web conference according to claim 1, further comprising:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
7. A system for sharing sound in a network conference, comprising a memory and a processor, wherein the memory includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
8. The system for sharing sound in a web conference according to claim 7, further comprising:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
9. The system for sharing sound in a network conference according to claim 8, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a program of a method of sharing sound for a web conference of a machine, which program, when being executed by a processor, carries out the steps of a method of sharing sound for a web conference according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011576572.6A CN112565668B (en) | 2020-12-28 | 2020-12-28 | Method for sharing sound in network conference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011576572.6A CN112565668B (en) | 2020-12-28 | 2020-12-28 | Method for sharing sound in network conference |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112565668A true CN112565668A (en) | 2021-03-26 |
CN112565668B CN112565668B (en) | 2022-03-04 |
Family
ID=75033742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011576572.6A Active CN112565668B (en) | 2020-12-28 | 2020-12-28 | Method for sharing sound in network conference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112565668B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113542982A (en) * | 2021-06-28 | 2021-10-22 | 瑞芯微电子股份有限公司 | Sound mixing method and storage medium |
CN118283015A (en) * | 2024-05-30 | 2024-07-02 | 江西扬声电子有限公司 | Multi-channel audio transmission method and system based on cabin Ethernet |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03171929A (en) * | 1989-11-30 | 1991-07-25 | Nec Corp | Echo canceller device |
JPH05227110A (en) * | 1992-02-14 | 1993-09-03 | Nippon Television Network Corp | Side tone suppressing device for sng |
JPH05323987A (en) * | 1992-05-26 | 1993-12-07 | Pioneer Electron Corp | Echo device |
CN102118523A (en) * | 2009-12-30 | 2011-07-06 | 北京大唐高鸿数据网络技术有限公司 | Mixing control method for centralized teleconference |
CN104159177A (en) * | 2014-07-16 | 2014-11-19 | 浙江航天长峰科技发展有限公司 | Audio recording system and method based on screencast |
CN106603877A (en) * | 2015-10-16 | 2017-04-26 | 鸿合科技有限公司 | Collaborative conference voice collection method and apparatus |
CN110956976A (en) * | 2019-12-17 | 2020-04-03 | 苏州科达科技股份有限公司 | Echo cancellation method, device, equipment and readable storage medium |
CN111583932A (en) * | 2020-04-30 | 2020-08-25 | 厦门快商通科技股份有限公司 | Sound separation method, device and equipment based on human voice model |
-
2020
- 2020-12-28 CN CN202011576572.6A patent/CN112565668B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03171929A (en) * | 1989-11-30 | 1991-07-25 | Nec Corp | Echo canceller device |
JPH05227110A (en) * | 1992-02-14 | 1993-09-03 | Nippon Television Network Corp | Side tone suppressing device for sng |
JPH05323987A (en) * | 1992-05-26 | 1993-12-07 | Pioneer Electron Corp | Echo device |
CN102118523A (en) * | 2009-12-30 | 2011-07-06 | 北京大唐高鸿数据网络技术有限公司 | Mixing control method for centralized teleconference |
CN104159177A (en) * | 2014-07-16 | 2014-11-19 | 浙江航天长峰科技发展有限公司 | Audio recording system and method based on screencast |
CN106603877A (en) * | 2015-10-16 | 2017-04-26 | 鸿合科技有限公司 | Collaborative conference voice collection method and apparatus |
CN110956976A (en) * | 2019-12-17 | 2020-04-03 | 苏州科达科技股份有限公司 | Echo cancellation method, device, equipment and readable storage medium |
CN111583932A (en) * | 2020-04-30 | 2020-08-25 | 厦门快商通科技股份有限公司 | Sound separation method, device and equipment based on human voice model |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113542982A (en) * | 2021-06-28 | 2021-10-22 | 瑞芯微电子股份有限公司 | Sound mixing method and storage medium |
CN118283015A (en) * | 2024-05-30 | 2024-07-02 | 江西扬声电子有限公司 | Multi-channel audio transmission method and system based on cabin Ethernet |
Also Published As
Publication number | Publication date |
---|---|
CN112565668B (en) | 2022-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10325612B2 (en) | Method, device, and system for audio data processing | |
US7206404B2 (en) | Communications system and method utilizing centralized signal processing | |
US9253331B2 (en) | Call handling | |
CN112565668B (en) | Method for sharing sound in network conference | |
CN110072021B (en) | Method, apparatus and computer readable medium in audio teleconference mixing system | |
CN111863011B (en) | Audio processing method and electronic equipment | |
US20210345051A1 (en) | Centrally controlling communication at a venue | |
CN110299144A (en) | Audio mixing method, server and client | |
CN111199751B (en) | Microphone shielding method and device and electronic equipment | |
CN112688965B (en) | Conference audio sharing method and device, electronic equipment and storage medium | |
CN104580764A (en) | Ultrasound pairing signal control in teleconferencing system | |
CN111951813A (en) | Voice coding control method, device and storage medium | |
CN117079661A (en) | Sound source processing method and related device | |
JP2009118316A (en) | Voice communication device | |
CN112543202B (en) | Method, system and readable storage medium for transmitting shared sound in network conference | |
CN114979344A (en) | Echo cancellation method, device, equipment and storage medium | |
CN117118956B (en) | Audio processing method, device, electronic equipment and computer readable storage medium | |
CN112820307B (en) | Voice message processing method, device, equipment and medium | |
US11915710B2 (en) | Conference terminal and embedding method of audio watermarks | |
CN112216297B (en) | Processing method, system, medium and device for small VoIP sound of android mobile phone terminal | |
CN114530159A (en) | Multimedia resource integration scheduling method based on WebRTC technology | |
CN115914761A (en) | Multi-person wheat connecting method and device | |
CN113971956A (en) | Information processing method and device, electronic equipment and readable storage medium | |
CN115700881A (en) | Conference terminal and method for embedding voice watermark | |
CN118590460A (en) | Intercommunication method, intercommunication system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |