CN112565668A - Method, system and readable storage medium for sharing sound in network conference - Google Patents

Method, system and readable storage medium for sharing sound in network conference Download PDF

Info

Publication number
CN112565668A
CN112565668A CN202011576572.6A CN202011576572A CN112565668A CN 112565668 A CN112565668 A CN 112565668A CN 202011576572 A CN202011576572 A CN 202011576572A CN 112565668 A CN112565668 A CN 112565668A
Authority
CN
China
Prior art keywords
sound data
sound
data
sharing
mixing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011576572.6A
Other languages
Chinese (zh)
Other versions
CN112565668B (en
Inventor
顾骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangxiang Space Information Technology Suzhou Co ltd
Original Assignee
Chuangxiang Space Information Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuangxiang Space Information Technology Suzhou Co ltd filed Critical Chuangxiang Space Information Technology Suzhou Co ltd
Priority to CN202011576572.6A priority Critical patent/CN112565668B/en
Publication of CN112565668A publication Critical patent/CN112565668A/en
Application granted granted Critical
Publication of CN112565668B publication Critical patent/CN112565668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a method, a system and a readable storage medium for sharing sound in a network conference, wherein the method comprises the following steps: acquiring first sound data and second sound data; performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data; and sending the mixed sound data to a terminal. The invention can directly send the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.

Description

Method, system and readable storage medium for sharing sound in network conference
Technical Field
The present invention relates to the field of audio data processing, and more particularly, to a method, system and readable storage medium for sharing sound in a network conference.
Background
With the development of science and technology, more and more scenes are available for performing a web conference by using a network, and users communicate and display in the web conference through software or a platform. In a conference fusing voip and pstn, a voip host can play videos and demonstrate PPT (point-to-point) frequently in a desktop sharing mode, and the method is greatly helpful for the development of the conference. However, sound information in videos and PPTs and other pstn participants cannot hear the sound information, so that information which is expected to be expressed by the conference is lost to a certain extent, and the development of the conference is not facilitated. The collection mode of the shared sound is that sound data (including sound data played by videos and PPT and sound data spoken by other participants) are captured from the default audio rendering equipment, and then the sound data spoken by other participants are eliminated through an echo eliminator so as to prevent the other participants from hearing own sound again, and finally the sound data played by videos and PPT are obtained.
In the prior art, after a microphone and a shared sound are mixed at a voip host end, the mixed sound needs to be mixed again by a mixing server and then is sent to a pstn participant end, wherein a mixing algorithm performs nonlinear processing such as amplitude limiting, companding and gain control, the more mixing times, the more sound is easy to generate distortion, and particularly, the content of the shared sound is mainly music and is more sensitive to distortion. The prior art easily causes the problem that the shared sound heard by the pstn participant end is distorted.
Disclosure of Invention
In order to solve at least one technical problem, the present invention provides a method, a system and a readable storage medium for sharing sound in a network conference.
The invention provides a method for sharing sound in a network conference, which comprises the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
In this scheme, still include:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
In the scheme, the first sound data is sound data collected by the terminal equipment, and the second sound data is sound data generated by a terminal playing file.
In this scheme, the first sound data is filtered according to a preset rule to obtain the filtered first sound data, and specifically the method includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
In this scheme, the identifying, according to a preset rule, the current sound data in the third sound data specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
In this scheme, still include:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
A second aspect of the present invention provides a system for sharing sound in a web conference, including a memory and a processor, where the memory includes a method program for sharing sound in a web conference, and the method program for sharing sound in a web conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
In this scheme, still include:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
In the scheme, the first sound data is sound data collected by the terminal equipment, and the second sound data is sound data generated by a terminal playing file.
In this scheme, the first sound data is filtered according to a preset rule to obtain the filtered first sound data, and specifically the method includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
In this scheme, the identifying, according to a preset rule, the current sound data in the third sound data specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
In this scheme, still include:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
The third aspect of the present invention also provides a computer-readable storage medium, which includes a program of a method for sharing sound for a network conference of a machine, and when the program of the method for sharing sound for a network conference is executed by a processor, the steps of the method for sharing sound for a network conference are implemented as any one of the above.
According to the method, the system and the readable storage medium for sharing the sound in the network conference, the voip host end sends two paths of data streams (microphone data and shared sound data), the two paths of data streams are forwarded to the sound mixing server for sound mixing and then are sent to the pstn participant end, and the pstn participant end finally receives one path of data stream and decodes and plays the data stream. The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.
Drawings
FIG. 1 is a flow chart illustrating a method of sharing sound for a web conference of the present invention;
fig. 2 shows a system block diagram of sharing sound in a network conference according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the invention, voice Over IP is carried on the IP network. pstn (public switched telephone network) is the public switched telephone network, which is the telephone network we usually use.
Fig. 1 is a flow chart illustrating a method for sharing sound in a web conference according to the present invention.
As shown in fig. 1, the present invention discloses a method for sharing sound in a network conference, which comprises:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
It should be noted that, in the present invention, the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file. The terminal can collect the sound data by adopting a microphone or other sound collection equipment, and the second sound data can be sound generated by local playing of PPT (power point) or video and the like. First, the mixing server first obtains the first sound data and the second sound data, and the two sound data may be sent in two paths or combined into one path through a time division multiplexing technology, as long as the mixing server can process the two data according to a preset rule. After receiving the two kinds of sound data, the mixing server performs mixing processing, that is, combines the two kinds of sound data. After the mixing process, the mixed sound data is sent to the terminal, so that the user can hear the speaking sound of other users and the sound of the preset user playing video or ppt and other files through a playing device such as a loudspeaker.
It should be noted that, in the present invention, the first sound data and the second sound data may be sent to the server end through two paths, or may be sent to the server end through one path in a time division multiplexing manner, as long as the first sound data and the second sound data can be separated by using a preset protocol or algorithm.
According to the embodiment of the invention, the method further comprises the following steps:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
It should be noted that the first sound data is sound data collected by the terminal device, and the sound data often includes the speaking sound of the user and echoes and noises of other sounds, so that the first sound data needs to be filtered to obtain the required sound data.
For example, in the processing of sharing the sound data by the voip host end, the sound data of the far-end sound data, the video and the PPT are mixed and then rendered on the default audio rendering device and finally played, so that the voip host end can hear the speaking sound of the far-end participant and can also hear the sound of the video and the PPT played by the local machine; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the voip presenter end, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression. The data stream of the voice from the voip host side includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The two data streams are forwarded to a sound mixing server for sound mixing processing and then are issued to a pstn participant terminal, and microphone sound data are acquired from a default audio acquisition device and are acquired through audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data flow diagram for the pstn participant-side sound contains a transmit flow and a receive flow. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream is the mixed sound data of the microphone and the shared sound sent by the mixed sound server, and is rendered on the default audio rendering device for playing after being decoded.
According to the embodiment of the present invention, the filtering the first sound data according to the preset rule to obtain the filtered first sound data specifically includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
It should be noted that, because the first sound data is sound data collected by the terminal device, the sound data often includes the speaking sound of the user and echoes and noises of other sounds, the first sound data needs to be filtered to obtain the required sound data. In the filtering process, if other noises and echoes need to be filtered and the voice data of the preset personnel is reserved, the voiceprint characteristics of all participants need to be acquired first, the voiceprint characteristics of the preset personnel are determined, and the voiceprint is similar to the fingerprint of a person, so that the voice data of the user can be identified through the voiceprint. And then extracting the sound data of the voiceprint characteristic in the first sound data according to the voiceprint characteristic of the preset person to obtain third sound data. In the third sound data, there may also be an echo of the user speaking, that is, the sound of the user speaking before, and then the current sound data in the third sound data, that is, the near-end sound data, needs to be identified through a preset rule, and the current sound data is used as the first sound data after being filtered to participate in sound mixing of other sound data.
According to the embodiment of the present invention, the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
It should be noted that, when sound filtering is performed, a filter is often used for filtering, and sound data with preset frequency and amplitude can be filtered through the filter, but parameters of the filter are often difficult to determine, and if the parameter setting is not accurate, it is likely that required sound data cannot be filtered, so that the filter module and the filter parameters are obtained through cloud services. Firstly, acquiring environmental parameters of collected sound, and sending the environmental parameters to a server. The server can select the module and the parameter of the filter through the cloud computing technology, namely the server obtains the corresponding filter module and the corresponding parameter according to the environment parameter. The filter generated by the cloud server can be better adapted to the current sound environment and filter related sounds, the filter module and the parameters are used as an echo cancellation filter, and the echo cancellation filter is used for filtering echo data in the third sound data to obtain the current sound data. The filter module and the parameters are selected through cloud computing, the filter and the parameters can be dynamically adjusted to adapt to the current sound environment, and filtering is more accurate.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
It should be noted that in a network conference, a plurality of users often speak at the same time or in the same time period, and in order to make other conference participants hear the speech of all users, it is necessary to acquire sound data of each user and then perform mixing processing. After first sound data of a plurality of terminals are obtained, the server filters noise and other sounds to obtain sound data of preset users, then sound mixing processing is carried out on the sound data of the plurality of users and shared sound data, then the obtained mixed sound data is sent to each terminal, and each terminal receives the mixed sound data and then carries out decoding and playing.
Fig. 2 shows a system block diagram of sharing sound in a network conference according to the present invention.
As shown in fig. 2, the present invention discloses a system 2 for sharing sound in a network conference, which includes a memory 21 and a processor 21, wherein the memory 22 includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
It should be noted that, in the present invention, the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file. The terminal can collect the sound data by adopting a microphone or other sound collection equipment, and the second sound data can be sound generated by local playing of PPT (power point) or video and the like. First, the mixing server first obtains the first sound data and the second sound data, and the two sound data may be sent in two paths or combined into one path through a time division multiplexing technology, as long as the mixing server can process the two data according to a preset rule. After receiving the two kinds of sound data, the mixing server performs mixing processing, that is, combines the two kinds of sound data. After the mixing process, the mixed sound data is sent to the terminal, so that the user can hear the speaking sound of other users and the sound of the preset user playing video or ppt and other files through a playing device such as a loudspeaker.
It should be noted that, in the present invention, the first sound data and the second sound data may be sent to the server end through two paths, or may be sent to the server end through one path in a time division multiplexing manner, as long as the first sound data and the second sound data can be separated by using a preset protocol or algorithm.
According to the embodiment of the invention, the method further comprises the following steps:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
It should be noted that the first sound data is sound data collected by the terminal device, and the sound data often includes the speaking sound of the user and echoes and noises of other sounds, so that the first sound data needs to be filtered to obtain the required sound data.
For example, in the processing of sharing the sound data by the voip host end, the sound data of the far-end sound data, the video and the PPT are mixed and then rendered on the default audio rendering device and finally played, so that the voip host end can hear the speaking sound of the far-end participant and can also hear the sound of the video and the PPT played by the local machine; meanwhile, in order to make other participants at the far end hear the video and PPT sound shared by the voip presenter end, the sound data finally played back needs to be captured from the default audio rendering device, but the sound data of the other participants at the far end is also included, and if the sound data is directly transmitted to the far end, the other participants will hear the sound of speaking by themselves (so-called echo), so that the sound of the other participants at the far end must be filtered by the echo canceller. In the echo canceller, sound data captured from a default audio rendering device is used as near-end data, sound data of other participants are used as far-end reference data, and shared sound data only containing video and PPT sound is separated through processing such as linear echo cancellation and nonlinear echo suppression. The data stream of the voice from the voip host side includes a transmission stream and a reception stream. The transmit stream has two paths, one for shared voice data and the other for microphone data. The two data streams are forwarded to a sound mixing server for sound mixing processing and then are issued to a pstn participant terminal, and microphone sound data are acquired from a default audio acquisition device and are acquired through audio processing modules such as echo elimination, noise suppression, automatic gain and the like. And receiving microphone mixing data packets of other participants by the receiving stream, decoding the microphone mixing data packets, and rendering the decoded microphone mixing data packets to default audio rendering equipment for playing.
The data flow diagram for the pstn participant-side sound contains a transmit flow and a receive flow. The sending stream needs to perform an encoding sending operation on the microphone sound data, wherein the microphone sound data is acquired from a default audio acquisition device and is obtained after passing through an audio processing module such as echo elimination, noise suppression, automatic gain and the like. The received stream is the mixed sound data of the microphone and the shared sound sent by the mixed sound server, and is rendered on the default audio rendering device for playing after being decoded.
According to the embodiment of the present invention, the filtering the first sound data according to the preset rule to obtain the filtered first sound data specifically includes:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
It should be noted that, because the first sound data is sound data collected by the terminal device, the sound data often includes the speaking sound of the user and echoes and noises of other sounds, the first sound data needs to be filtered to obtain the required sound data. In the filtering process, if other noises and echoes need to be filtered and the voice data of the preset personnel is reserved, the voiceprint characteristics of all participants need to be acquired first, the voiceprint characteristics of the preset personnel are determined, and the voiceprint is similar to the fingerprint of a person, so that the voice data of the user can be identified through the voiceprint. And then extracting the sound data of the voiceprint characteristic in the first sound data according to the voiceprint characteristic of the preset person to obtain third sound data. In the third sound data, there may also be an echo of the user speaking, that is, the sound of the user speaking before, and then the current sound data in the third sound data, that is, the near-end sound data, needs to be identified through a preset rule, and the current sound data is used as the first sound data after being filtered to participate in sound mixing of other sound data.
According to the embodiment of the present invention, the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
It should be noted that, when sound filtering is performed, a filter is often used for filtering, and sound data with preset frequency and amplitude can be filtered through the filter, but parameters of the filter are often difficult to determine, and if the parameter setting is not accurate, it is likely that required sound data cannot be filtered, so that the filter module and the filter parameters are obtained through cloud services. Firstly, acquiring environmental parameters of collected sound, and sending the environmental parameters to a server. The server can select the module and the parameter of the filter through the cloud computing technology, namely the server obtains the corresponding filter module and the corresponding parameter according to the environment parameter. The filter generated by the cloud server can be better adapted to the current sound environment and filter related sounds, the filter module and the parameters are used as an echo cancellation filter, and the echo cancellation filter is used for filtering echo data in the third sound data to obtain the current sound data. The filter module and the parameters are selected through cloud computing, the filter and the parameters can be dynamically adjusted to adapt to the current sound environment, and filtering is more accurate.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
It should be noted that in a network conference, a plurality of users often speak at the same time or in the same time period, and in order to make other conference participants hear the speech of all users, it is necessary to acquire sound data of each user and then perform mixing processing. After first sound data of a plurality of terminals are obtained, the server filters noise and other sounds to obtain sound data of preset users, then sound mixing processing is carried out on the sound data of the plurality of users and shared sound data, then the obtained mixed sound data is sent to each terminal, and each terminal receives the mixed sound data and then carries out decoding and playing.
The third aspect of the present invention also provides a computer-readable storage medium, which includes a program of a method for sharing sound for a network conference of a machine, and when the program of the method for sharing sound for a network conference is executed by a processor, the steps of the method for sharing sound for a network conference are implemented as any one of the above.
According to the method, the system and the readable storage medium for sharing the sound in the network conference, the voip host end sends two paths of data streams (microphone data and shared sound data), the two paths of data streams are forwarded to the sound mixing server for sound mixing and then are sent to the pstn participant end, and the pstn participant end finally receives one path of data stream and decodes and plays the data stream. The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. In addition, the invention also carries out echo elimination according to the voiceprint data of the participants, so that the extracted sound is more accurate; and the filter is obtained through the cloud network, so that noise and echo data can be better filtered, and the use experience of a user is improved.
The invention directly sends the microphone data and the shared sound data of the voip host end to the sound mixing server, and the sound mixing server uniformly performs sound mixing operation, thereby reducing the damage of a multi-time sound mixing algorithm to the sound data and improving the tone quality of the shared sound heard by the participant end. The method realizes the scheme of sharing the sound in the conference fusing the voip and the pstn, and the scheme allows a voip host to share the desktop and simultaneously share the video played by the host and the sound of the PPT courseware, so that other pstn participants can hear the video and the sound of the PPT courseware played by the voip host, and the expressive information in the conference becomes more comprehensive, and the efficiency of the conference can be greatly improved.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for sharing sound in a web conference, comprising:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
2. The method for sharing sound in a web conference according to claim 1, further comprising:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
3. The method as claimed in claim 1, wherein the first sound data is sound data collected by a terminal device, and the second sound data is sound data generated by a terminal playing file.
4. The method for sharing sound in a network conference according to claim 2, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
5. The method for sharing sound in a web conference according to claim 4, wherein the identifying the current sound data in the third sound data according to the preset rule specifically includes:
sending the environment parameters to a server;
the server acquires a corresponding filter module and parameters according to the environment parameters;
using the filter module and the parameters as an echo cancellation filter;
and filtering echo data in the third sound data by using the echo cancellation filter to obtain the current sound data.
6. The method for sharing sound in a web conference according to claim 1, further comprising:
acquiring first sound data of a plurality of terminals;
and mixing the plurality of first sound data and the second sound data, and sending the obtained sound data to each terminal.
7. A system for sharing sound in a network conference, comprising a memory and a processor, wherein the memory includes a method program for sharing sound in a network conference, and the method program for sharing sound in a network conference is executed by the processor to implement the following steps:
acquiring first sound data and second sound data;
performing sound mixing processing on the first sound data and the second sound data to obtain sound mixing sound data;
and sending the mixed sound data to a terminal.
8. The system for sharing sound in a web conference according to claim 7, further comprising:
performing voice recognition on the first voice data;
filtering the first sound data according to a preset rule to obtain filtered first sound data;
and performing gain processing on the filtered first sound data, and then performing sound mixing with the second sound data.
9. The system for sharing sound in a network conference according to claim 8, wherein the filtering the first sound data according to a preset rule to obtain the filtered first sound data specifically comprises:
acquiring voiceprint characteristics of all participants, and determining the voiceprint characteristics of preset personnel;
extracting sound data of the voiceprint characteristics in the first sound data according to the voiceprint characteristics of the preset personnel to obtain third sound data;
and identifying the current sound data in the third sound data according to a preset rule, and taking the current sound data as the filtered first sound data.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a program of a method of sharing sound for a web conference of a machine, which program, when being executed by a processor, carries out the steps of a method of sharing sound for a web conference according to any one of claims 1 to 6.
CN202011576572.6A 2020-12-28 2020-12-28 Method for sharing sound in network conference Active CN112565668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011576572.6A CN112565668B (en) 2020-12-28 2020-12-28 Method for sharing sound in network conference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011576572.6A CN112565668B (en) 2020-12-28 2020-12-28 Method for sharing sound in network conference

Publications (2)

Publication Number Publication Date
CN112565668A true CN112565668A (en) 2021-03-26
CN112565668B CN112565668B (en) 2022-03-04

Family

ID=75033742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011576572.6A Active CN112565668B (en) 2020-12-28 2020-12-28 Method for sharing sound in network conference

Country Status (1)

Country Link
CN (1) CN112565668B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542982A (en) * 2021-06-28 2021-10-22 瑞芯微电子股份有限公司 Sound mixing method and storage medium
CN118283015A (en) * 2024-05-30 2024-07-02 江西扬声电子有限公司 Multi-channel audio transmission method and system based on cabin Ethernet

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03171929A (en) * 1989-11-30 1991-07-25 Nec Corp Echo canceller device
JPH05227110A (en) * 1992-02-14 1993-09-03 Nippon Television Network Corp Side tone suppressing device for sng
JPH05323987A (en) * 1992-05-26 1993-12-07 Pioneer Electron Corp Echo device
CN102118523A (en) * 2009-12-30 2011-07-06 北京大唐高鸿数据网络技术有限公司 Mixing control method for centralized teleconference
CN104159177A (en) * 2014-07-16 2014-11-19 浙江航天长峰科技发展有限公司 Audio recording system and method based on screencast
CN106603877A (en) * 2015-10-16 2017-04-26 鸿合科技有限公司 Collaborative conference voice collection method and apparatus
CN110956976A (en) * 2019-12-17 2020-04-03 苏州科达科技股份有限公司 Echo cancellation method, device, equipment and readable storage medium
CN111583932A (en) * 2020-04-30 2020-08-25 厦门快商通科技股份有限公司 Sound separation method, device and equipment based on human voice model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03171929A (en) * 1989-11-30 1991-07-25 Nec Corp Echo canceller device
JPH05227110A (en) * 1992-02-14 1993-09-03 Nippon Television Network Corp Side tone suppressing device for sng
JPH05323987A (en) * 1992-05-26 1993-12-07 Pioneer Electron Corp Echo device
CN102118523A (en) * 2009-12-30 2011-07-06 北京大唐高鸿数据网络技术有限公司 Mixing control method for centralized teleconference
CN104159177A (en) * 2014-07-16 2014-11-19 浙江航天长峰科技发展有限公司 Audio recording system and method based on screencast
CN106603877A (en) * 2015-10-16 2017-04-26 鸿合科技有限公司 Collaborative conference voice collection method and apparatus
CN110956976A (en) * 2019-12-17 2020-04-03 苏州科达科技股份有限公司 Echo cancellation method, device, equipment and readable storage medium
CN111583932A (en) * 2020-04-30 2020-08-25 厦门快商通科技股份有限公司 Sound separation method, device and equipment based on human voice model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542982A (en) * 2021-06-28 2021-10-22 瑞芯微电子股份有限公司 Sound mixing method and storage medium
CN118283015A (en) * 2024-05-30 2024-07-02 江西扬声电子有限公司 Multi-channel audio transmission method and system based on cabin Ethernet

Also Published As

Publication number Publication date
CN112565668B (en) 2022-03-04

Similar Documents

Publication Publication Date Title
US10325612B2 (en) Method, device, and system for audio data processing
US7206404B2 (en) Communications system and method utilizing centralized signal processing
US9253331B2 (en) Call handling
CN112565668B (en) Method for sharing sound in network conference
CN110072021B (en) Method, apparatus and computer readable medium in audio teleconference mixing system
CN111863011B (en) Audio processing method and electronic equipment
US20210345051A1 (en) Centrally controlling communication at a venue
CN110299144A (en) Audio mixing method, server and client
CN111199751B (en) Microphone shielding method and device and electronic equipment
CN112688965B (en) Conference audio sharing method and device, electronic equipment and storage medium
CN104580764A (en) Ultrasound pairing signal control in teleconferencing system
CN111951813A (en) Voice coding control method, device and storage medium
CN117079661A (en) Sound source processing method and related device
JP2009118316A (en) Voice communication device
CN112543202B (en) Method, system and readable storage medium for transmitting shared sound in network conference
CN114979344A (en) Echo cancellation method, device, equipment and storage medium
CN117118956B (en) Audio processing method, device, electronic equipment and computer readable storage medium
CN112820307B (en) Voice message processing method, device, equipment and medium
US11915710B2 (en) Conference terminal and embedding method of audio watermarks
CN112216297B (en) Processing method, system, medium and device for small VoIP sound of android mobile phone terminal
CN114530159A (en) Multimedia resource integration scheduling method based on WebRTC technology
CN115914761A (en) Multi-person wheat connecting method and device
CN113971956A (en) Information processing method and device, electronic equipment and readable storage medium
CN115700881A (en) Conference terminal and method for embedding voice watermark
CN118590460A (en) Intercommunication method, intercommunication system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant