WO2023125537A1 - Sound signal processing method and apparatus, and device and storage medium - Google Patents

Sound signal processing method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2023125537A1
WO2023125537A1 PCT/CN2022/142338 CN2022142338W WO2023125537A1 WO 2023125537 A1 WO2023125537 A1 WO 2023125537A1 CN 2022142338 W CN2022142338 W CN 2022142338W WO 2023125537 A1 WO2023125537 A1 WO 2023125537A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
interference source
sound signal
reference signal
Prior art date
Application number
PCT/CN2022/142338
Other languages
French (fr)
Chinese (zh)
Inventor
张磊
陈健
刘智辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023125537A1 publication Critical patent/WO2023125537A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Definitions

  • the present application relates to the technical field of audio processing, and in particular to an audio signal processing method, device, equipment and storage medium.
  • the sound pickup device can be turned off so that the sound pickup device no longer picks up the sound signal in the area where the participant is located, thereby avoiding interference to the speaker's voice.
  • the sound signal will inevitably be picked up by multiple adjacent sound pickup devices at the same time, which will cause interference to the voice of the speaker and greatly affect the sound quality of the conference.
  • the present application provides a sound signal processing method, device, equipment and storage medium, which can effectively improve sound quality.
  • the technical solution is as follows:
  • a sound signal processing method comprising:
  • the target sound signal is enhanced.
  • the interference source refers to a sound source that is considered to cause interference among multiple sound sources existing in the sound pickup space, for example, a participant having a private conversation in a conference.
  • the location of interference sources will be determined in different ways. For example, in the case of deploying multiple microphones, the location of the interference source is determined based on the number of the microphone corresponding to the interference source; for another example, in the case of deploying a microphone array, the interference source is determined based on the angle of the interference source relative to the microphone array source location.
  • the target sound signal refers to: among the multiple sound sources existing in the sound pickup space, the sound signal corresponding to the focused sound source, for example, the sound signal corresponding to the speaker in the conference.
  • enhancing the target sound signal refers to: suppressing the reference signal in the sound signal to enhance the target sound signal, for example, by reducing the proportion of the corresponding part of the reference signal in the sound signal to increase the target sound in the sound signal The proportion of the signal, so as to achieve the purpose of enhancing the target sound signal.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • the determining the position of the interference source in the sound pickup space includes:
  • the location selection instruction is received, and the location corresponding to the location selection instruction is determined as the location of the interference source in the sound pickup space.
  • the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
  • control device is used to select the location of the interference source, for example, the control device is integrated on a microphone, or the control device may be a conference touch panel.
  • the position selection instruction is triggered by an image acquisition device when the first body behavior is detected in the collected image, and the image acquisition device is used to perform image acquisition for the sound pickup space , the first body behavior is used to indicate to mute the location.
  • the first body behavior indicates mute at its position, for example, the participant puts the index finger vertically close to the lip.
  • the position of the interference source can be obtained directly from the position selection instruction, the amount of data involved in the calculation process is reduced, and the efficiency of sound signal processing is improved.
  • the determining the position of the interference source in the sound pickup space includes:
  • the image acquisition device is used for image acquisition for the sound pickup space
  • the position of the interference source is determined based on the real-time image, which ensures the accuracy of the position of the interference source and further improves the sound quality.
  • determining the position of the first body action in the sound pickup space as the position of the interference source includes:
  • the method also includes:
  • determining a position of the second body action in the sound pickup space as the position of the target, the second body action being used to indicate the The above-mentioned target sound signal is enhanced.
  • the second body behavior is used to instruct to enhance the target sound signal, for example, the participant puts the index finger horizontally close to the lip to indicate that the participant needs to speak.
  • the position corresponding to the target sound signal is determined based on the second body behavior, so that the target sound signal can be enhanced in a targeted manner, thereby improving the sound quality.
  • the method also includes:
  • the determining the reference signal from the sound signal based on the location of the interference source includes:
  • a reference signal is re-determined from the sound signal.
  • the interference source can be locked after the interference source is determined, so that the location of the interference source can be determined based on the real-time position change, and the accuracy of the location of the interference source can be ensured by capturing the change of the location of the interference source in time. Further, Ensure that in the changing actual conference scene, the sound signal can always be processed against the interference source to ensure the sound quality.
  • the sound pickup device includes multiple microphones
  • the determining the reference signal from the sound signal based on the location of the interference source includes:
  • the sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
  • the representative reference signal for the interference source can be determined based on the microphone corresponding to the interference source, so that the interference can be better filtered out based on the reference signal
  • the sound source of the source can effectively improve the sound quality.
  • the multiple microphones have a positioning function.
  • multiple microphones can be randomly placed according to the needs, which greatly reduces the scene restrictions during equipment deployment. While improving the flexibility of equipment deployment in the sound processing system, the real-time positioning of the microphones can realize the detection of interference sources. Accurate positioning, so that the sound of the interference source can be filtered out from the sound signal more accurately, and the sound quality can be effectively guaranteed.
  • the sound pickup device is a microphone array
  • the determining the reference signal from the sound signal based on the location of the interference source includes:
  • a reference signal is determined from the sound signals picked up by the microphone array.
  • the angle information of the position of the interference source refers to the angle of the interference source relative to the microphone array.
  • the beam angle range refers to the angle range covered by the beam formed by the microphone array. Based on the specified beam angle range, it is possible to determine the sound signal within the pickup range at the specified angle to the microphone array.
  • the method provided by the embodiment of the present application can adapt to the spatial arrangement characteristics of the microphone array in the scenario where the microphone array is used to pick up sound, and use the angle information of the interference source to obtain the angle within the specified angle range targeted to the interference source.
  • the sound signal ensures the representativeness of the reference signal to the interference source, improves the accuracy of sound signal processing for the interference source, and effectively improves the sound quality.
  • the method further includes:
  • the enhancing the target sound signal based on the reference signal includes:
  • a first sound signal Based on the reference signal, determine a first sound signal from the sound signals in the sound pickup space, the signal energy of the first sound signal is less than the signal energy of the reference signal, and the first sound signal the correlation with the reference signal is greater than a correlation threshold;
  • the target sound signal in the first sound signal is enhanced.
  • the magnitude of the signal energy can represent the strength of the human voice in the sound signal to a certain extent.
  • the correlation between signals can reflect the degree of mutual influence between signals.
  • the above technical solution it is possible to determine the first sound signal that is greatly affected by the interference source from the multi-channel sound signals, and then filter out the sound of the interference source in the first sound signal in a targeted manner, by improving the accuracy of filtering performance, effectively improving the sound quality.
  • the above-mentioned technical solution can ensure the privacy of the conversation of the participants in the conference scene on the basis of improving the sound quality, effectively Improved user experience.
  • the enhancing the target sound signal in the first sound signal based on the reference signal includes:
  • the first sound signal is filtered based on the reference signal, so that the influence of the sound of the interference source on the first sound signal can be reduced in a targeted manner.
  • the filter includes a first filter and a second filter
  • Filtering out the part related to the reference signal in the first sound signal through the filter, so as to enhance the target sound signal in the first sound signal, and outputting the filtering result includes:
  • the parameters of the first filter are determined based on the parameters of the second filter, and the parameters of the second filter are determined based on results of multiple filtering Determine the difference between;
  • the adaptive filter can adjust the parameters of the filter through an adaptive algorithm during the filtering process, so as to obtain a better filtering effect.
  • the method also includes:
  • the speed of parameter convergence of the adaptive filter can be effectively improved, thereby improving the efficiency of filtering.
  • the method also includes:
  • the filtered first sound signal is clipped.
  • the quality of the sound signal can be guaranteed.
  • a second aspect provides an audio signal processing device, which includes a plurality of functional modules for executing corresponding steps in the audio signal processing method provided in the first aspect.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • a sound signal processing device in a third aspect, includes a processor and a memory, the memory is used to store at least one piece of program code, the at least one piece of program code is loaded by the processor and executes the above-mentioned sound signal Approach.
  • a computer-readable storage medium is provided, and the computer-readable storage medium is used to store at least one piece of program code, and the at least one piece of program code is used to execute the above-mentioned sound signal processing method.
  • a computer program product is provided.
  • the sound signal processing device is made to execute the above sound signal processing method.
  • FIG. 1 is a schematic structural diagram of a sound processing system provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application
  • Fig. 3 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of a sound signal processing method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an adaptive filter provided by an embodiment of the present application.
  • FIG. 8 is a flow chart of a sound signal processing method provided in an embodiment of the present application.
  • FIG. 9 is a flow chart of a sound signal processing method provided by an embodiment of the present application.
  • FIG. 10 is a flow chart of a sound signal processing method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a distributed microphone positioning process provided by an embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of an audio signal processing device provided in an embodiment of the present application.
  • Fig. 13 is a schematic diagram of a hardware structure of an audio signal processing device provided by an embodiment of the present application.
  • Root mean square root mean square, RMS: By summing the squares of all discrete values of the signal, then taking the mean of the summed results, and finally taking the square root of the mean, the root mean square of the signal is obtained.
  • the root mean square is the effective value of a signal (such as a current signal and a voltage signal), and is used to characterize the energy of the signal.
  • SNR Signal noise ratio
  • SNR refers to the ratio of signal to noise in an electronic device or electronic system, for example, the ratio of signal energy to noise energy.
  • Signal refers to the signal from the outside of the device that needs to be processed by this device.
  • Noise refers to the irregular extra signal (or information) that does not exist in the original signal after passing through the device, and the signal does not follow the original signal. signal changes.
  • Sub-band coding technology is a technology that converts the original signal from the time domain to the frequency domain, then divides it into several sub-bands, and digitally encodes them respectively. It uses a band-pass filter bank to divide the original signal into several subbands, each subband corresponds to a specified frequency bandwidth, that is, each subband corresponds to a specified signal frequency.
  • Background noise suppression (automatic noise suppression, ANS) technology is used to detect the noise of background fixed frequency (such as: fan sound and air conditioner sound) and automatically filter out, so as to present the clear voice of the participants, widely used in video conferencing, voice In the sound signal processing technology in conferences and other scenarios.
  • background fixed frequency such as: fan sound and air conditioner sound
  • Cross correlation (cross correlation, CC): The result of cross correlation operation reflects the measure of similarity between two signals.
  • Adaptive filter (adaptive filter, ADF):
  • the adaptive filter can adaptively adjust the parameters of the filter according to the difference from the expected signal based on the characteristics of the input signal to ensure the filtering effect. Therefore, the adaptive filter is widely used. It is widely used in signal system identification, signal prediction and noise elimination.
  • the sound signal processing method provided in the embodiment of the present application is applied to a sound signal processing device.
  • the sound signal processing device may be a conference terminal or a smart speaker.
  • the sound signal processing device is used for processing the sound signal picked up by the sound pickup device from the sound pickup space.
  • the conference terminal performs noise reduction on sound signals in the conference site.
  • the sound pickup device has various forms, for example, the sound pickup device may be a microphone or a microphone array, and the like.
  • the microphone may be a fixed microphone, for example, a desktop embedded microphone; the microphone may also be a movable microphone.
  • the microphone array refers to an array structure obtained by arranging multiple microphones according to a certain spatial structure. According to the spatial characteristics of the array structure, the microphone array can process sound signals in multiple directions to obtain sound signals in various angle ranges. . According to different usage scenarios, different forms of sound pickup devices can be selected to pick up sound signals, and the form of the sound pickup device is not limited in the embodiments of the present application.
  • the pickup space is a pre-configured three-dimensional pickup area.
  • the sound pickup space may be a closed space, ie the size of the sound pickup space is limited.
  • the sound pickup space can be in the shape of a cuboid, and the size of the sound pickup space can be represented by length, width, and height.
  • the sound pickup space may also be an open space, for example, the height of the sound pickup space is not limited.
  • the size and shape of the sound pickup space can be set according to sound pickup requirements or sound pickup scenarios, and the embodiment of the present application does not limit the size and shape of the sound pickup space.
  • Fig. 1 is a schematic structural diagram of a sound processing system provided by an embodiment of the present application.
  • the sound processing system includes a sound pickup device, a sound signal processing device, and a sound pickup control device.
  • the sound pickup control device is used to determine the position of the interference source in the sound pickup space.
  • the sound pickup control device includes a control device for selecting the location of the interference source, for example, the control device is integrated on a microphone, or the control device may be a conference touch panel.
  • the sound pickup control device includes an image collection device, and the image collection device is used for image collection of a sound pickup space, for example, a camera in a venue.
  • the sound signal processing device obtains the sound signal picked up by the sound pickup device, and determines the position of the interference source in the sound pickup space through the sound pickup control device, and then determines the reference signal based on the position of the interference source, so as to filter out the sound through the reference signal The sound of the interference source in the signal, to achieve the purpose of enhancing the target sound signal.
  • the sound processing system shown in FIG. 1 is only used as an example for illustration, and is not used as a limitation to the sound processing system applied to the solution of the present application.
  • the deployment method of the sound processing system may also be different.
  • the embodiment of the present application is based on Fig. 2 to Fig. 5, for the four types of The deployment of the sound processing system is schematically illustrated.
  • the technical solution of the present application will be described below by taking the audio signal processing device as a conference terminal as an example.
  • Fig. 2 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application.
  • the sound processing system is applied in a conference scene, and the sound pickup space is the meeting place.
  • the sound processing system includes: a plurality of microphones 210 as sound pickup devices; a conference terminal 220 as a sound signal processing device; and a conference touch panel 230 as a sound pickup control device.
  • the conference terminal 220 is deployed on the wall of the venue, and the multiple microphones 210 are deployed at designated positions on the conference table.
  • the conference terminal 220 can acquire sound signals picked up by the plurality of microphones 210 from the venue.
  • the multiple microphones 210 have physical buttons 211 .
  • the microphone In response to a selection operation on the physical button 211 of any microphone, the microphone returns a selection instruction for the microphone to the conference terminal, and then the selected microphone is determined as the microphone corresponding to the interference source.
  • the microphone is corresponding to an indicator light, and the indicator light is used to indicate the selection state of the corresponding microphone. For example, if the indicator light is on to indicate that the microphone is selected, the microphone is determined to be the microphone corresponding to the interference source.
  • the conference touch panel 230 provides the function of selecting a microphone. In response to the selection operation of any microphone on the conference touch panel, the conference touch panel 230 returns a microphone selection instruction to the conference terminal 220, and the selection instruction indicates that the microphone is determined as the microphone corresponding to the interference source.
  • the conference touch panel 230 can control the indication status of the indicator light corresponding to the microphone to indicate the selection status of the microphone, for example, the conference touch panel controls the indicator light corresponding to the microphone to light up, indicating that the microphone is selected, Then the microphone is determined as the microphone corresponding to the interference source.
  • the conference terminal may be deployed on a movable stand in the conference venue.
  • Fig. 3 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application.
  • the sound processing system is applied in a meeting scene, and the sound pickup space is the meeting place.
  • the sound processing system includes: a plurality of microphones 310 as sound pickup devices; a conference terminal 320 as a sound signal processing device; a conference touch panel 330 and a camera 340 as sound pickup control devices.
  • the multiple microphones 310 have physical buttons 311 .
  • the configuration of the sound processing system other than the camera 340 is the same as that of the sound processing system corresponding to FIG. 2 , which will not be repeated here.
  • the camera 340 is deployed on the wall of the meeting place, and is used to collect images in the meeting place.
  • the camera is an external camera connected to the conference terminal.
  • the camera is a built-in camera of the conference terminal.
  • the camera has data processing capability, can process the collected images, and send a microphone selection instruction to the conference terminal, instructing the conference terminal to process the sound signal from the microphone accordingly. It should be noted that the above is only an exemplary description, and the embodiment of the present application does not limit the deployment position of the camera. For example, the camera may also be hung on the ceiling in the venue.
  • Fig. 4 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application.
  • the sound processing system is applied in a meeting scene, and the sound pickup space is the meeting place.
  • the sound processing system includes: a microphone array 410 as a sound pickup device; a conference terminal 420 as a sound signal processing device; a desktop physical button 430 as a sound pickup control device, a conference touch panel 440 and a camera 450 .
  • the microphone array and the conference terminal are physically integrated as one device, that is, the conference terminal has a built-in microphone array.
  • the microphone array and the conference terminal are physically separated two devices.
  • the device is deployed in the venue, so that the sound pickup range of the microphone array can cover the venue evenly.
  • deploy the conference terminal with the built-in microphone array in the middle of the wall of the venue the conference terminal 420 can obtain sound signals from various angle ranges in the venue from the microphone array 410 .
  • the desktop physical button 430 is used to select a position in the meeting place.
  • the desktop physical button 430 in response to a selection operation on any desktop physical button 430 , the desktop physical button 430 returns a selection instruction for the location of the desktop physical button in the sound pickup space to the conference terminal 420 .
  • the desktop physical button 430 is corresponding to an indicator light
  • the indicator light is used to indicate the selection state of the corresponding position of the desktop physical button. For example, if the indicator light is on to indicate that the corresponding position is selected, the position is determined as interference The location corresponding to the source.
  • the meeting touch panel 440 provides a function of selecting a location in the meeting place. In response to a selection operation for any position on the conference touch panel, the conference touch panel 440 returns an instruction for selecting any position in the conference site to the conference terminal 420 .
  • the conference touch panel 440 can control the indication state of the indicator light to indicate the selection status of the location in the venue.
  • the conference touch panel control indicator light is on, and the position corresponding to the indicator light is selected. Then the position is determined as the position corresponding to the interference source.
  • the camera 450 refer to the description of the camera 340 in the audio processing system corresponding to FIG. 3 above, which will not be repeated here.
  • the microphone array 410, the conference terminal 420 and the camera 450 are integrated together as one device, that is, the conference terminal has a built-in microphone array and camera.
  • Fig. 5 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application.
  • the sound processing system is applied in a conference scene, and the sound pickup space is the meeting place.
  • the sound processing system includes: a plurality of distributed microphones 510 with positioning functions as sound pickup devices; a conference terminal 520 as a sound signal processing device; a desktop physical button 530 as a sound pickup control device, a meeting A touch panel 540 and a camera 550 .
  • the distributed microphone 510 is randomly placed on the conference table in front of the conference terminal, and the position of the distributed microphone can be updated in real time in the conference terminal 520 .
  • the conference touch panel acquires the positions of distributed microphones to provide a function of selecting distributed microphones.
  • the conference touch panel In response to the selection operation of any distributed microphone on the conference touch panel, the conference touch panel returns a selection instruction for the distributed microphone to the conference terminal, indicating that the distributed microphone is selected, and the distributed microphone is selected.
  • the microphone is determined as the distributed microphone corresponding to the interference source.
  • the physical button 530 on the desktop is used to select a distributed microphone in the venue.
  • the desktop physical button 530 in response to a selection operation for any desktop physical button 530 , the desktop physical button 530 returns a selection instruction for the location of the desktop physical button in the sound pickup space to the conference terminal 520 .
  • the conference terminal selects the distributed microphone closest to the physical button on the desktop based on the location of the physical button on the desktop and the locations of the multiple distributed microphones.
  • the distributed microphone 510 corresponds to an indicator light, which is used to indicate the selection status of the corresponding distributed microphone. For example, if the indicator light is on to indicate that the corresponding distributed microphone is selected, the distributed microphone is determined to be Distributed microphones corresponding to interference sources.
  • the camera 550 refer to the description of the camera 340 in the audio processing system corresponding to FIG. 3 above, which will not be repeated here.
  • data transmission between devices can be performed through wireless communication, or data transmission can be performed through wired communication, which is not discussed in this embodiment of the present application. Do limited.
  • the sound signal processing device can obtain information such as the size and shape of the sound pickup space and the position information of each device in the sound pickup space, for example, the length of the sound pickup space, Width and height, position information of microphone (or microphone array), conference terminal, camera in sound pickup space, and numbers of multiple microphones, etc.
  • the conference terminal in the above audio processing system is used as a local conference terminal, and can send the processed audio signal to the remote conference terminal.
  • a remote conference terminal refers to a conference terminal that participates in the same conference as a local conference terminal and is deployed in a different area.
  • the local conference terminal and the remote conference terminal are connected through a multimedia control platform.
  • the local conference terminal can send the enhanced sound signal to the multimedia control platform, and the multimedia control platform mixes and codes the received sound signal and sends it to the remote conference terminal.
  • the conference terminal can also integrate part or all of the functions of the multimedia control platform, and the local conference terminal can mix and encode the enhanced audio signal and send it directly to the remote conference terminal.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • the sound signal processing device obtains information such as the size and shape of the sound pickup space and the position information of each device in the sound pickup space in the sound pickup space based on the actual needs of the conference scene and the deployment of the device, so as to ensure
  • the deployment of the sound processing system is adapted to the meeting scene, so that the sound signal processing equipment can perform sound signal processing based on the actual situation of the meeting scene, which improves the flexibility and compatibility of the sound signal processing method, and provides sound quality in different meeting scenes. Assure.
  • Fig. 6 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 2 above, and the sound processing system includes multiple microphones, a conference terminal, and a conference touch panel. The sound signal processing method is executed by the conference terminal. As shown in Figure 6, the method includes:
  • the conference terminal picks up sound signals in the sound pickup space through multiple microphones.
  • the sound processing system includes a plurality of microphones, a conference terminal, and a conference touch panel.
  • the sound processing system runs based on the system control software, and the sound processing system needs to be configured based on the system control software before performing sound signal processing.
  • system control software is installed on the conference terminal, and the conference terminal can obtain configuration information of the sound processing system through the system control software.
  • the configuration information entered on the configuration interface of the system control software is acquired.
  • the configuration information includes: the length, width and height of the sound pickup space; the position information of the multiple microphones and the conference terminal in the sound pickup space, for example, the microphones in the spatial coordinate system corresponding to the sound pickup space The coordinates of the multiple microphones; the number of the multiple microphones and the corresponding pickup range of each microphone.
  • the sound processing system can be reconfigured through the system control software. For example, if the scope of the pickup space needs to be adjusted, the length, width and height of the pickup space can be adjusted through the system control software.
  • the conference terminal can determine the positions, numbers and sound pickup ranges of the multiple microphones in the sound pickup space, so as to obtain multiple sound signals in the sound pickup space through the multiple microphones.
  • the above-mentioned system control software is installed on the conference touch panel, and correspondingly, the configuration information of the sound processing system can be acquired through the conference touch panel.
  • each sound signal since each sound signal includes sound within a certain pickup range corresponding to the microphone, each sound signal may be composed of sound signals from multiple sound sources. For example, in a conference scene, multiple If two participants speak at the same time, the sound signal picked up by one microphone may include the voices of multiple participants within the pickup range.
  • the proportion of the sound signals of the multiple sound sources in the sound signals picked up by one microphone is determined according to the relative position of each sound source and the microphone. For example, for a microphone, the closer the participant is to the microphone, the The larger the proportion of the corresponding sound signal in the sound signal picked up by the microphone, that is, the louder the volume of the participant's voice is in the sound signal picked up by the microphone.
  • the conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  • the interference source refers to the sound source that is considered to cause interference among the multiple sound sources in the sound pickup space, for example, the Participants in private conversations.
  • the conference terminal can determine the sound source considered as the interference source based on the position, so that in the subsequent sound signal processing process, the sound of the interference source is processed, for example, the interference in the sound signal is filtered out source sound.
  • the location selection instruction is triggered based on a selection operation of the location of the interference source on the control device.
  • the microphone corresponding to the selection operation is considered to be the microphone closest to the interference source, therefore, the microphone corresponding to the selection operation is taken as the microphone corresponding to the interference source.
  • the selection operation includes pressing a physical button corresponding to the microphone. Based on the pressing operation, a location selection instruction for the location of the microphone can be triggered, and the conference terminal will select the location according to the received location selection instruction. The location of the microphone is determined as the location of the interference source.
  • the selection operation includes a selection operation of the microphone on the conference touch panel, and the conference touch panel sends a location selection instruction for the location of the microphone to the conference terminal in response to the selection operation.
  • the conference terminal in response to receiving the location selection instruction, acquires the microphone number carried in the location selection instruction, and determines the microphone location corresponding to the microphone number as the location of the interference source.
  • the microphone is corresponding to an indicator light, and after the microphone is determined to be the microphone corresponding to the location of the interference source, the indication state of the indicator light corresponding to the microphone is switched to indicate that the microphone corresponds to the interference source.
  • the microphone is displayed on the conference touch panel as the microphone corresponding to the interference source.
  • step of switching the state of the indicator light and the step displayed on the conference touch panel can be executed synchronously or sequentially, which is not limited in this embodiment of the present application.
  • the conference terminal determines the sound signal originating from the microphone corresponding to the position of the interference source as a reference signal.
  • the conference terminal determines the sound signal picked up by the microphone as the reference signal.
  • the reference signal comes from the microphone closest to the interference source, the proportion of the sound signal of the interference source in the reference signal is greater than the proportion of the sound signal of the interference source in the sound signals of other microphones, that is, Yes, the reference signal is better representative of the sound signal of the interferer than the sound signal picked up by other microphones.
  • the reference signal can represent the sound signal of the interference source during sound signal processing, and is used to filter out the sound of the interference source.
  • the representative reference signal for the interference source can be determined based on the microphone corresponding to the interference source, so that the interference can be better filtered out based on the reference signal
  • the sound source of the source can effectively improve the sound quality.
  • the conference terminal performs denoising on the reference signal.
  • the reference signal since the reference signal originates from multiple sound sources in the sound pickup space, when there is noise in the reference signal, when the denoised reference signal is used to filter out the sound of the interference source, it can for better filtering.
  • the process of denoising the reference signal includes the following steps 6041 to 6042:
  • the conference terminal determines the noise threshold based on the reference signal.
  • the reference signal is divided into multiple signal frames of a specified time length (for example, 30 milliseconds), and the reference signal is denoised with the signal frame as the minimum processing unit.
  • the sound corresponding to the signal frame with the smallest signal amplitude spectrum is not human voice, and the non-human voice is considered as noise.
  • the magnitude of the amplitude spectrum can be compared based on the signal energy of the signal frame.
  • the minimum signal energy is determined as the noise threshold in the conference scene from the signal energy of 100 (or other values) signal frames of the reference signal, and the reference signal is denoised based on the noise threshold.
  • the noise threshold is used as a criterion for judging the human voice, and the signal frames whose signal energy is lower than the noise threshold are noise, that is, non-human voice.
  • the principle of calculating signal energy refers to formula (1).
  • X is the signal amplitude set corresponding to the signal frame
  • N is the number of signal amplitudes of the signal frame X, and N is a positive integer
  • RMS X is the signal energy of the signal frame X.
  • the conference terminal uses the recursive average noise estimation algorithm to determine the long-term stationary noise energy in the conference scene according to the reference signal obtained in real time, and continuously updates the noise threshold in the conference scene with the long-term stationary noise energy .
  • the determination process of determining the long-term stationary noise energy based on the recursive average noise estimation algorithm refers to formula (2) to formula (4).
  • the smoothing coefficient is determined based on the speech existence probability of the current signal frame through a recursive average noise estimation algorithm. The closer the speech existence probability of the current signal frame is to 1, the smoother the coefficient tends to be 1, indicating that the signal energy of the previous signal frame is tended to be used as the noise energy estimation of the current signal frame; the closer the speech existence probability of the current signal frame is to 0 , then the smoothing coefficient tends to 0, indicating that the signal energy of the current signal frame is tended to be used as the noise energy estimate.
  • the kth signal frame of the reference signal is located at the speech existence probability ⁇ '(k, l) of the l subband, and ⁇ p (0 ⁇ p ⁇ 1) is the first smoothing constant, where , when the signal energy at the l subband of the k signal frame is greater than the preset noise threshold, I(k,l) is 1; the signal energy at the l subband of the k signal frame is less than the preset In the case of the noise threshold, I(k,l) is 0.
  • the noise energy spectrum of the kth signal frame of the reference signal at the l+1 subband can be determined
  • the Y(k, l) is a signal expression in which the kth signal frame of the reference signal is located at the l subband.
  • the long-term stationary noise energy can be updated.
  • the conference terminal determines the signal-to-noise ratio of the reference signal based on the noise threshold and the reference signal, and sets the reference signal whose signal-to-noise ratio is smaller than the target threshold to 0.
  • the conference terminal calculates the ratio of the signal energy of each signal frame to the noise threshold, that is, the signal-to-noise ratio of the signal frame. If the signal-to-noise ratio of the signal frame is less than the target threshold, the signal frame is likely to be noise, and the signal amplitude of the signal frame is set to 0.
  • the principle of calculating the signal-to-noise ratio refers to the formula (5).
  • X is the signal amplitude set corresponding to the signal frame
  • SNR X is the signal-to-noise ratio of the signal frame X
  • RMS X is the signal energy of the signal frame X
  • RMS N is the noise energy (or long-term stationary noise energy ), that is, the noise threshold.
  • the RMS N may be the noise energy determined based on multiple local signal frames of the reference signal, or the long-term stationary noise energy determined based on the accumulation of the reference signal, which is not limited in this embodiment of the present application.
  • step 604 it is possible to mute the non-human voice part of the reference signal, obtain a reference signal including a purer human voice, improve the efficiency of subsequent sound signal processing based on the reference signal, and further improve the sound quality.
  • step 604 is an optional step, and in some embodiments, step 605 may be performed directly based on the reference signal determined in step 603 .
  • the conference terminal after performing the above step 604, the conference terminal inputs the denoised reference signal and other multi-channel sound signals from the sound pickup space into the ANS module for processing, so as to filter out the reference signal The background noise in and the background noise in the other multi-channel sound signals, thereby improving the efficiency of subsequent sound signal processing and further improving the sound quality.
  • the conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
  • the reference signal is used to filter out the sound of the interference source. Therefore, it is first necessary to determine the sound signal affected by the interference source from the sound signals in the pickup space, and then based on the reference signal, Filter out the sound of the interference source in a targeted manner.
  • the first sound signal affected by the sound signal of the interference source is determined based on the magnitude of the signal energy and the correlation with the reference signal.
  • the magnitude of the signal energy can represent the strength of the human voice in the sound signal to a certain extent. If the signal energy of one sound signal is greater than the signal energy of the other sound signal, it means that there is indeed a human voice in the sound signal, and the strength of the human voice can affect the other sound signal.
  • a sound signal of one path is affected by the interference source, it means that the sound signal of the sound signal of the path will continue to be interleaved with the sound signal of the interference source, therefore, the correlation between the sound signal of the path affected and the sound signal of the interference source will be higher than Other audio signals that are not affected.
  • the reference signal can well represent the sound signal of the interference source, when the signal energy of the first sound signal is less than the signal energy of the reference signal, and the correlation between the first sound signal and the reference signal is greater than The correlation threshold indicates that the reference signal affects the first sound signal, that is, the first sound signal is affected by the sound of the interference source.
  • the source of interference is participant A who is having a private conversation at a certain volume, and there is a participant B next to the participant A, the sound signal picked up by the microphone in front of the participant B will continue to be interwoven with the The voice of participant A having a private conversation, therefore, the sound signal picked up by the microphone in front of participant B is the sound signal affected by the interference source, that is, the first sound signal.
  • the correlation threshold can be set based on the accuracy requirement of the sound signal processing, which is not limited in this embodiment of the present application.
  • the conference terminal receives the reference signal and other multi-channel sound signals processed by the ANS module, and based on the signal energy of the reference signal, the signal energy of other multi-channel sound signals, and the relationship between the other multi-channel sound signals and the reference signal and determine the first sound signal from the other multi-channel sound signals.
  • the calculation principle of the signal energy refers to the above formula (1).
  • the embodiment of the present application uses the signal frame as the smallest unit to compare the signal energy.
  • the comparison of the signal energy can also be based on the average energy of multiple signal frames within a period of time, so as to improve the energy The accuracy of the comparison.
  • the magnitude of the correlation between the signals can be reflected by the cross-correlation value between the signals, and the principle of calculating the cross-correlation value between the signals can be referred to formula (6).
  • f(t) and g(t) are two signals; is the cross-correlation value between signal f(t) and signal g(t).
  • the reference signal is set to zero, for example, the signal amplitudes of multiple signal frames in the reference signal are set to 0, so that in the subsequent processing, there is no need to consider The influence of the reference signal on the audio signal of the channel.
  • the above technical solution it is possible to determine the first sound signal that is greatly affected by the interference source from the multi-channel sound signals, and then filter out the sound of the interference source in the first sound signal in a targeted manner, by improving the accuracy of filtering performance, effectively improving the sound quality.
  • the above-mentioned technical solution can ensure the privacy of the conversation of the participants in the conference scene on the basis of improving the sound quality, effectively Improved user experience.
  • step 605 is an optional step, and in some embodiments, step 606 may be directly performed based on the reference signal determined in step 603 . In other embodiments, step 606 is performed based on the denoised reference signal in step 604 .
  • the conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
  • the first sound signal includes sound signals corresponding to multiple sound sources, wherein the target sound signal is a sound signal corresponding to a focused sound source, for example, a sound signal corresponding to a speaker in a meeting, Therefore, the purpose of processing the sound signal is usually to highlight the target sound signal.
  • the reference signal can well represent the sound signal of the interference source, processing the first sound signal through the reference signal can specifically reduce the influence of the sound of the interference source on the first sound signal, thereby ensuring that the first sound Prominence of the target sound signal in the signal.
  • enhancing the target sound signal in the first sound signal refers to suppressing the reference signal in the first sound signal to enhance the target sound signal in the first sound signal, for example, By reducing the proportion of the corresponding portion of the reference signal in the first sound signal, the proportion of the target sound signal in the first sound signal is increased, thereby achieving the purpose of enhancing the target sound signal.
  • the reference signal is used as one input of the filter, and the first sound signal is used as the other input of the filter.
  • the filter Through the filter, the first sound signal related to the reference signal is filtered out. to enhance the target sound signal in the first sound signal, and output a filtering result.
  • the filter includes a first filter and a second filter, by inputting the reference signal into the first filter, based on the parameters of the first filter, adjusting the signal components of different frequencies in the reference signal
  • the weight value is used to reconstruct the reference signal to obtain an estimated signal of the reference signal
  • the estimated signal is a result of estimating the sound signal of the interference source in the reference signal.
  • the difference signal between the first sound signal and the estimated signal is used as a filtering result, and the part related to the reference signal in the first sound signal is filtered out by filtering out the estimated signal in the first sound signal.
  • the parameters of the first filter are determined based on the parameters of the second filter, and the parameters of the second filter are determined based on the difference between multiple filtering results.
  • the reference signal when the reference signal is input into the first filter, the reference signal is also input into the second filter, so as to obtain the nth filtering result of the second filter.
  • the second filter adjusts the parameters of the second filter based on the difference between the nth filtering result of the second filter and the n-1th filtering result, so that the estimated signal obtained based on the adjusted parameters can be more accurate A sound signal that is close to the interference source in the first sound signal.
  • the adjusted parameters of the second filter meet the convergence condition, the adjusted parameters of the second filter are configured to the first filter, thereby improving the effect of filtering the first sound signal.
  • n is an integer greater than 1.
  • the above-mentioned filter is an adaptive filter.
  • the adaptive filter uses an adaptive algorithm to adjust the parameters of the filter to obtain a better filtering effect.
  • the second filter Based on the difference between the nth filtering result of the second filter and the n-1th filtering result, the parameters of the second filter are adjusted through an adaptive algorithm, wherein the filter parameters include a filter step size , by adjusting the filter step size, the convergence speed of the filter parameters can be changed.
  • adaptive algorithms under different optimization criteria can be selected, for example, recursive least square algorithm (recursive least square, RLS), least mean square error algorithm ((least mean square, LMS) and normalized Normalized least mean square (NLMS), etc., which are not limited in this embodiment of the present application.
  • RLS recursive least square algorithm
  • LMS least mean square error algorithm
  • NLMS normalized Normalized least mean square
  • the embodiment of the present application provides a schematic diagram of an adaptive filter, as shown in FIG. 7 , wherein the reference signal is the input signal x(n); the desired signal y(n) includes the first sound signal v(n ) and the system echo d(n) of the reference signal; x(n) is input to the first filter and the second filter at the same time after fast Fourier transform processing; the first filter output (frequency domain) Estimated signal X'(m); the y(n) is processed by fast Fourier transform to obtain Y(m), the Y(m) is subtracted from the X'(m) by the adder, and the difference signal E is output (m), the E(m) undergoes inverse Fourier transform to obtain the filtering result e(n); the (frequency domain) estimated signal output by the second filter and Y(m) are obtained by adding the The difference signal is returned to the second filter for updating the parameters of the filter.
  • H(n) is the system function used to simulate the system echo.
  • the above-mentioned adaptive filtering process can be performed based on a deep learning model, and the parameters of the adaptive filter can be trained through the deep learning model, which can effectively improve the convergence speed of the adaptive filter parameters, thereby improving the filtering efficiency. efficiency.
  • the filtered first sound signal when the attenuation of the first sound signal before and after filtering is greater than the attenuation threshold, the filtered first sound signal may be weakened and cause distortion. In this case, the filtered first sound signal needs to be Carry out corresponding processing, for example, enhance the human voice in the signal or cut the distorted segment in the signal, so as to further ensure the quality of the sound signal.
  • the conference terminal sends the filtered first sound signal to the multimedia control platform, and the multimedia control platform encodes the received first sound signal and sends it to the remote conference terminal.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • Fig. 8 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 3 above, and the sound processing system includes multiple microphones, a conference terminal, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 8, the method includes:
  • the conference terminal picks up sound signals in the sound pickup space through multiple microphones.
  • the camera is used for image collection for the sound pickup space.
  • the position information of the camera in the sound pickup space and the angle range of the camera for image collection so as to determine the difference between the image collected by the camera and The relationship between positions in pickup space.
  • the left half of the image corresponds to the right half of the sound pickup space.
  • the conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  • the conference terminal can determine the sound source considered as the interference source based on the position based on the received position selection instruction, so that in the subsequent sound signal processing process
  • the sound of the interference source is processed, for example, the sound of the interference source in the sound signal is filtered out.
  • the camera has data processing capability and can detect the collected images, and if the camera detects the first body behavior from the collected images, the camera sends the position selection instruction to the conference terminal.
  • the first body behavior is used to indicate to mute the position, for example, the participant puts the index finger vertically close to the lip.
  • the camera Based on the relationship between the image captured by the pre-configured camera and the position in the sound pickup space, the camera can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image, Therefore, the position of the first body behavior in the sound pickup space is indicated in the position selection instruction.
  • the conference terminal receives the location selection instruction from the camera, acquires the location indicated by the location selection instruction, and determines the microphone corresponding to the location of the interference source based on the location indicated by the location selection instruction.
  • the camera has data processing capabilities, and the camera detects the collected images, and in the case that the first limb behavior is detected from the collected images, based on the multiple microphones in the sound pickup space
  • the position information determines the microphone closest to the position of the first body behavior, and determines a position selection instruction based on the number of the microphone to indicate that the position of the microphone is the position of the interference source.
  • the conference terminal receives the position selection command from the camera, acquires the microphone number carried in the position selection command, and determines the microphone corresponding to the microphone number in the position selection command as the microphone corresponding to the position of the interference source.
  • the position of the interference source can be obtained directly from the position selection instruction, the amount of data involved in the calculation process is reduced, and the efficiency of sound signal processing is improved.
  • the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device, and for a principle, refer to step 602 .
  • the conference terminal receives the image collected by the camera and detects the image to determine the location of the interference source.
  • the process of determining the location of the interference source includes the following steps 1 to 2:
  • Step 1 The conference terminal detects the image collected by the camera.
  • Step 2 In response to detecting the first body behavior in the image, the conference terminal determines the location of the first body behavior in the sound pickup space as the location of the interference source.
  • the conference terminal can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. position, and then based on the position information of the plurality of microphones in the sound pickup space, determine the microphone with the closest distance to the position of the first body behavior, and determine the microphone as the microphone corresponding to the position of the interference source.
  • the position of the interference source is determined based on the real-time image, which ensures the accuracy of the position of the interference source and further improves the sound quality.
  • the first body behavior indicates to mute the location where it is located. Therefore, the location of the interference source can be determined based on the first body behavior, and then the target sound signal can be enhanced by filtering out the sound of the interference source.
  • the target sound signal can be determined based on the second body behavior, so as to directly enhance the target sound signal, wherein the second body behavior is used to instruct the target sound signal to be enhanced, for example, the participant puts the index finger Hold it close to your lips to indicate that it needs to speak.
  • the conference terminal determines the position of the second body action in the sound pickup space as the position of the target.
  • the target refers to a target sound source that needs to be focused on among the multiple sound sources existing in the sound pickup space, and therefore, the target sound signal corresponding to the target sound source needs to be enhanced.
  • the position corresponding to the target sound signal is determined based on the second body behavior, so that the target sound signal can be enhanced in a targeted manner, thereby improving the sound quality.
  • the conference terminal determines the sound signal originating from the microphone corresponding to the location of the interference source as a reference signal.
  • step 603 For this step, refer to step 603, which will not be repeated here.
  • the conference terminal after the conference terminal determines the location of the interference source based on the images collected by the camera, it can continuously track the location of the interference source. For example, according to the characteristics of the interference source, the location of the interference source is tracked and detected. If the position of the interference source is tracked to change, the conference terminal re-determines the reference signal from the sound signal based on the changed position of the interference source. In some embodiments, the conference terminal determines the object corresponding to the position of the first body behavior in the image as the object corresponding to the interference source, and tracks the position change of the object based on the image collected in real time, and based on the changed object Position, to determine the position of the interference source after the change.
  • the tracking of the interference source can be manually released through the conference terminal or conference touch panel, or it can be set to be automatically released after a certain period of time.
  • the interference source can be locked after the interference source is determined, so that the location of the interference source can be determined based on the real-time position change, and the accuracy of the location of the interference source can be ensured by capturing the change of the location of the interference source in time. Further, Ensure that in the changing actual conference scene, the sound signal can always be processed against the interference source to ensure the sound quality.
  • the conference terminal performs denoising on the reference signal.
  • step 604 For this step, refer to step 604, which will not be repeated here.
  • the conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
  • step 605 For this step, refer to step 605, which will not be repeated here.
  • the conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
  • step 606 For this step, refer to step 606, which will not be repeated here.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • participants do not need to manually select, and can automatically locate the interference source based on the image, and realize intelligent shielding of the interference source in the conference scene, which improves the conference experience while ensuring the sound quality.
  • FIG. 9 is a flow chart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 4 above, and the sound processing system includes a microphone array, a conference terminal, physical keys on a desktop, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 9, the method includes:
  • the conference terminal picks up the sound signal in the sound pickup space through the microphone array.
  • step 801 For this step, refer to step 801, which will not be repeated here.
  • the sound processing system when configuring the sound processing system, it is necessary to configure the beam angle range corresponding to the sound signal picked up by the microphone array and the position information of the microphone array in the sound pickup space, so as to determine the corresponding beam angle range of the sound signal picked up by the microphone array.
  • the beam angle range corresponding to the sound signal A of the microphone array covers the left half of the sound pickup space.
  • the sound signals corresponding to different beam angle ranges are numbered, so that in the subsequent sound signal processing process, the required sound signal can be selected based on the number.
  • the conference terminal obtains the sound signal from the sound pickup space through the microphone array.
  • the microphone array since the microphone array includes a plurality of microphones arranged in a certain spatial structure, the microphone array determines the relative sound source relative The angle of the microphone array is used to determine the position of the sound source relative to the microphone array.
  • the conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  • the camera has data processing capability and can detect the collected images, and if the camera detects the first body behavior from the collected images, the camera sends the position selection instruction to the conference terminal. Based on the preconfigured relationship between the image captured by the camera and the position in the sound pickup space, the camera can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. Based on this, combined with the position information of the microphone array in the sound pickup space, the angle of the first body behavior relative to the microphone array can be determined. Therefore, the angle of the first body behavior relative to the microphone array is indicated in the position selection instruction. Based on this, the conference terminal receives the position selection instruction from the camera, and determines the angle indicated by the position selection instruction as the angle of the position of the interference source relative to the microphone array.
  • the location selection instruction is triggered based on the selection operation of the location of the interference source in the control device.
  • the location selection instruction indicates that the first body behavior is relative to The angle of the microphone array.
  • the conference terminal receives the image collected by the camera and detects the image to determine the location of the interference source.
  • the process of determining the location of the interference source includes the following steps 1 to 2:
  • Step 1 The conference terminal detects the image collected by the camera.
  • Step 2 In response to detecting the first body behavior in the image, the conference terminal determines the location of the first body behavior in the sound pickup space as the location of the interference source.
  • the conference terminal can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. position, and further based on the position information of the microphone array in the sound pickup space, the angle of the first body behavior relative to the microphone array is determined as the angle of the position of the interference source relative to the microphone array.
  • the conference terminal determines a beam angle range matching the angle information based on the angle information of the interference source location.
  • the angle information of the location of the interference source refers to the angle of the location of the interference source relative to the microphone array. Based on the angle information, the conference terminal can determine the beam angle range of the microphone array corresponding to the location of the interference source.
  • the conference terminal determines a reference signal from the sound signals picked up by the microphone array based on the beam angle range.
  • the conference terminal obtains the multi-path sound signal components corresponding to the beam angle range from the multi-path sound signals picked up by the microphone array, and based on the characteristics of each sound signal component, the multi-path sound signal components Combine them to get a reference signal.
  • the conference terminal pre-numbers the corresponding sound signals of different beam angle ranges, and based on this, the conference terminal acquires the corresponding sound signal based on the beam angle range matched with the angle information of the interference source position , so that the sound signal corresponding to the number is directly determined as the reference signal.
  • the conference terminal performs denoising on the reference signal.
  • step 604 For this step, refer to step 604, which will not be repeated here.
  • the conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
  • step 605 For this step, refer to step 605, which will not be repeated here.
  • the conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
  • step 606 For this step, refer to step 606, which will not be repeated here.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • the method provided by the embodiment of the present application can adapt to the spatial arrangement characteristics of the microphone array in the scenario where the microphone array is used to pick up sound, and use the angle information of the interference source to obtain a targeted designation for the interference source.
  • the sound signal within the angular range ensures the representativeness of the reference signal to the interference source, improves the accuracy of sound signal processing for the interference source, and effectively improves the sound quality.
  • Fig. 10 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 5 above, and the sound processing system includes a plurality of distributed microphones with a positioning function, a conference terminal, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 10, the method includes:
  • the conference terminal picks up the sound signal in the sound pickup space through the distributed microphone with positioning function.
  • the multiple distributed microphones with positioning functions perform signal interaction with the conference terminal, and the conference terminal determines the position information of each distributed microphone in the sound pickup space according to the signals received from the multiple distributed microphones .
  • the conference terminal can update the position information of the distributed microphone in real time based on the received signal.
  • the distributed microphone can perform signal interaction with the conference terminal through bluetooth, ultrasonic wave or wireless local area network.
  • multiple distributed microphones maintain time synchronization through continuous signal interaction.
  • the embodiment of the present application provides a schematic diagram of a distributed microphone positioning process, as shown in FIG.
  • Distance r i (i 1, 2, 3, 4).
  • For the distance calculation process refer to the following formula (7) to formula (14).
  • the di , 12 is the distance difference between the distributed microphone 1105 relative to the signal interaction device 1101 and the signal interaction device 1102;
  • the distance difference between; the d i, 34 is the distance difference between the distributed microphone 1105 relative to the signal interaction device 1103 and the signal interaction device 1104;
  • the d i, 41 is the distance difference between the distributed microphone 1105 and the signal interaction device 1104 and The distance difference between the signal interaction devices 1101;
  • c is the speed of light.
  • (x 1 , y 1 , z 1 ) is the coordinates of the signal interaction device 1101;
  • (x 2 , y 2 , z 2 ) is the coordinates of the signal interaction device 1102;
  • (x 3 , y 3 , z 3 ) is the signal The coordinates of the interaction device 1103;
  • the conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  • step 802 For this step, refer to step 802, which will not be repeated here.
  • the conference terminal determines the sound signal from the distributed microphone corresponding to the location of the interference source as a reference signal.
  • step 803 For this step, refer to step 803, and details are not repeated here.
  • the conference terminal performs denoising on the reference signal.
  • step 804 For this step, refer to step 804, which will not be repeated here.
  • the conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
  • step 805 For this step, refer to step 805, which will not be repeated here.
  • the conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
  • step 806, For this step, refer to step 806, which will not be repeated here.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • multiple microphones can be randomly placed according to the requirements, which greatly reduces the scene restrictions during equipment deployment. While improving the flexibility of equipment deployment in the sound processing system, real-time positioning of the microphones is achieved. Accurate positioning of the interference source, thereby more accurately filtering out the sound of the interference source from the sound signal, effectively ensuring the sound quality.
  • Fig. 12 is a schematic structural diagram of an audio signal processing device provided by an embodiment of the present application. As shown in Figure 12, the sound signal processing device includes:
  • the sound pickup module 1201 is used to pick up the sound signal in the sound pickup space through the sound pickup device;
  • a position determining module 1202 configured to determine the position of the interference source in the sound pickup space
  • a signal determination module 1203, configured to determine a reference signal from the sound signal based on the location of the interference source, and the reference signal is used to filter out the sound of the interference source;
  • An enhancement module 1204 configured to enhance the target sound signal based on the reference signal.
  • the location determining module 1202 includes:
  • the first determining unit is configured to receive a location selection instruction, and determine the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  • the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
  • the position selection instruction is triggered by an image acquisition device when the first body behavior is detected in the collected image, and the image acquisition device is used to perform image acquisition for the sound pickup space , the first body behavior is used to indicate to mute the location.
  • the location determining module 1202 includes:
  • An image detection unit configured to detect the target image collected by the image acquisition device, and the image acquisition device is used for image acquisition for the sound pickup space;
  • a second determination unit configured to determine a position of the first body action in the sound pickup space as the position of the interference source in response to detecting a first body action in the target image, the first Physical behavior is used to indicate muting of said location.
  • the device further includes:
  • a third determining unit configured to determine a position of the second body action in the sound pickup space as the position of the target in response to detecting a second body action in the target image, the second Physical behavior is used to indicate the enhancement of the target sound signal.
  • the device further includes:
  • a tracking unit configured to track the position of the interference source
  • the signal determination module is used for:
  • a reference signal is re-determined from the sound signal.
  • the sound pickup device includes multiple microphones
  • the signal determination module is configured to:
  • the sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
  • the multiple microphones have a positioning function.
  • the sound pickup device is a microphone array
  • the signal determining module 1203 is configured to:
  • a reference signal is determined from the sound signals picked up by the microphone array.
  • the enhancement module 1204 includes:
  • a signal determining unit configured to determine a first sound signal from sound signals in the sound pickup space based on the reference signal, the signal energy of the first sound signal is less than the signal energy of the reference signal, and, the correlation between the first sound signal and the reference signal is greater than a correlation threshold;
  • An enhancement unit configured to enhance a target sound signal in the first sound signal based on the reference signal.
  • the enhancing unit is used for:
  • the sound signal processing device provided in the above-mentioned embodiment performs sound signal processing
  • the division of the above-mentioned functional modules is used as an example for illustration.
  • the above-mentioned functions can be assigned to different functional modules according to needs.
  • To complete means to divide the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the sound signal processing device provided in the above embodiment and the sound signal processing method embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance Target sound signal.
  • the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
  • FIG. 13 is a schematic diagram of a hardware structure of an audio signal processing device provided by an embodiment of the present application.
  • the audio signal processing device 1300 includes a memory 1301 , a processor 1302 , a communication interface 1303 and a bus 1304 .
  • the memory 1301 , the processor 1302 , and the communication interface 1303 are connected to each other through a bus 1304 .
  • Memory 1301 may be read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be programmed by a computer Any other medium accessed, but not limited to.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • magnetic disk storage media or other magnetic storage devices or can be used to carry or store desired program code
  • the memory 1301 can store at least one piece of program code, and when the program code stored in the memory 1301 is executed by the processor 1302, the sound signal processing device can implement the above sound signal processing method.
  • the memory 1301 may also store various types of data, including but not limited to images and audio signals, which are not limited in this embodiment of the present application.
  • the processor 1302 may be a network processor (network processor, NP), a central processing unit (central processing unit, CPU), a specific application integrated circuit (application-specific integrated circuit, ASIC) or an integrated circuit for controlling the program execution of the application scheme. circuit.
  • the processor 1302 may be a single-core (single-CPU) processor, or a multi-core (multi-CPU) processor. The number of the processor 1302 may be one or more.
  • the communication interface 1303 uses a transceiver module such as a transceiver to implement communication between the sound signal processing device 1300 and other devices or communication networks. For example, sound signals can be acquired through the communication interface 1303 .
  • the memory 1301 and the processor 1302 may be provided separately, or may be integrated together.
  • the bus 1304 may include a path for transferring information between various components of the sound signal processing device 1300 (eg, memory 1301 , processor 1302 , communication interface 1303 ).
  • first and second are used to distinguish the same or similar items with basically the same function and function. It should be understood that “first”, “second” and “nth” There are no logical or timing dependencies, nor are there restrictions on quantity or order of execution. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first microphone could be termed a second microphone, and, similarly, a second microphone could be termed a first microphone, without departing from the scope of the various described examples. Both the first microphone and the second microphone may be microphones, and in some cases may be separate and distinct microphones.
  • the meaning of the term "at least one" in the present invention refers to one or more, the meaning of the term “multiple” in the present invention refers to two or more, for example, a plurality of microphones refers to two or more microphone.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a program product.
  • the program product includes one or more program instructions. When the program instructions are loaded and executed on the sound signal processing device, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)

Abstract

A sound signal processing method and apparatus, and a device and a storage medium. In the method, on the basis of the position of an interference source in a sound pickup space, a reference signal is determined from a sound signal in the sound pickup space, and on the basis of the reference signal, the sound of the interference source is then filtered out from the sound signal, such that a target sound signal is enhanced. In the method, sound signal processing is performed according to the position of an interference source, and the sound of the interference source is shielded in a targeted manner, such that a target sound signal is enhanced, and the sound quality is improved.

Description

声音信号处理方法、装置、设备及存储介质Sound signal processing method, device, equipment and storage medium 技术领域technical field
本申请涉及音频处理技术领域,特别涉及一种声音信号处理方法、装置、设备及存储介质。The present application relates to the technical field of audio processing, and in particular to an audio signal processing method, device, equipment and storage medium.
背景技术Background technique
在多人会议的场景下,在发言人讲话时,会场中会不可避免地出现一些干扰声音,例如,与会人之间的私密谈话内容以及突发手机铃声等。在与会人需要进行私密谈话的情况下,可以通过关闭拾音设备,使得该拾音设备不再拾取与会人所在区域内的声音信号,从而避免对发言人的声音产生干扰。In a multi-person conference, when the speaker is speaking, there will inevitably be some disturbing sounds in the venue, such as private conversations between participants and sudden mobile phone ringtones. In the case that a participant needs to have a private conversation, the sound pickup device can be turned off so that the sound pickup device no longer picks up the sound signal in the area where the participant is located, thereby avoiding interference to the speaker's voice.
但是,声音信号会不可避免地被邻近的多个拾音设备同时拾取,导致对发言人的声音产生干扰,极大地影响了会议的声音质量。However, the sound signal will inevitably be picked up by multiple adjacent sound pickup devices at the same time, which will cause interference to the voice of the speaker and greatly affect the sound quality of the conference.
发明内容Contents of the invention
本申请提供了一种声音信号处理方法、装置、设备及存储介质,能够有效提升声音质量。该技术方案如下:The present application provides a sound signal processing method, device, equipment and storage medium, which can effectively improve sound quality. The technical solution is as follows:
第一方面,提供了一种声音信号处理方法,该方法包括:In a first aspect, a sound signal processing method is provided, the method comprising:
通过拾音设备,拾取拾音空间内的声音信号;Pick up the sound signal in the sound pickup space through the sound pickup device;
确定所述拾音空间内的干扰源位置;determining the position of the interference source in the sound pickup space;
基于所述干扰源位置,从所述声音信号中确定参考信号,所述参考信号用于滤除所述干扰源的声音;Determining a reference signal from the sound signal based on the location of the interference source, the reference signal being used to filter out the sound of the interference source;
基于所述参考信号,对目标声音信号进行增强。Based on the reference signal, the target sound signal is enhanced.
其中,干扰源是指该拾音空间内存在的多个声源中,被认为会带来干扰的声源,例如,会议中进行私密谈话的与会人。通过获取干扰源在拾音空间中的位置,能够针对该干扰源进行相应的声音信号处理。基于不同系统部署情况,会以不同的方式确定干扰源位置。例如,在部署多个麦克风的情况下,基于干扰源对应的麦克风的编号,来确定干扰源位置;又例如,在部署麦克风阵列的情况下,基于干扰源相对于麦克风阵列的角度,来确定干扰源位置。Wherein, the interference source refers to a sound source that is considered to cause interference among multiple sound sources existing in the sound pickup space, for example, a participant having a private conversation in a conference. By acquiring the position of the interference source in the sound pickup space, corresponding sound signal processing can be performed on the interference source. Based on different system deployment conditions, the location of interference sources will be determined in different ways. For example, in the case of deploying multiple microphones, the location of the interference source is determined based on the number of the microphone corresponding to the interference source; for another example, in the case of deploying a microphone array, the interference source is determined based on the angle of the interference source relative to the microphone array source location.
其中,该目标声音信号是指:该拾音空间内存在的多个声源中,重点关注的声源对应的声音信号,例如,会议中的发言人对应的声音信号。Wherein, the target sound signal refers to: among the multiple sound sources existing in the sound pickup space, the sound signal corresponding to the focused sound source, for example, the sound signal corresponding to the speaker in the conference.
其中,对目标声音信号进行增强是指:抑制声音信号中的参考信号,以对目标声音信号增强,例如,通过减小声音信号中参考信号对应部分的占比,以增大声音信号中目标声音信号的占比,进而实现对目标声音信号进行增强的目的。Wherein, enhancing the target sound signal refers to: suppressing the reference signal in the sound signal to enhance the target sound signal, for example, by reducing the proportion of the corresponding part of the reference signal in the sound signal to increase the target sound in the sound signal The proportion of the signal, so as to achieve the purpose of enhancing the target sound signal.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
在一种可能实施方式中,所述确定所述拾音空间内的干扰源位置包括:In a possible implementation manner, the determining the position of the interference source in the sound pickup space includes:
接收位置选择指令,将所述位置选择指令所对应的位置,确定为所述拾音空间内的干扰源位置。The location selection instruction is received, and the location corresponding to the location selection instruction is determined as the location of the interference source in the sound pickup space.
通过上述技术方案,基于位置选择指令,提供了多种方式来确定干扰源位置,与会者能够根据需求自行设置,有效提高了声音信号处理方法的实用性。Through the above technical solution, multiple ways are provided to determine the location of the interference source based on the location selection instruction, and the participants can set it according to their needs, which effectively improves the practicability of the sound signal processing method.
在一种可能实施方式中,所述位置选择指令基于在控制设备中对所述干扰源所在位置的选择操作触发。In a possible implementation manner, the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
其中,该控制设备用于对干扰源位置进行选择,例如,该控制设备集成于麦克风上,或者,控制设备可以为会议触控平板。Wherein, the control device is used to select the location of the interference source, for example, the control device is integrated on a microphone, or the control device may be a conference touch panel.
通过上述技术方案,基于会议场景中实际部署的多种控制设备,提供多种方式来确定干扰源位置,在保证定位准确性的同时,进一步提升了声音信号处理方法的实用性。Through the above technical solution, based on various control devices actually deployed in the conference scene, multiple methods are provided to determine the location of the interference source, which further improves the practicability of the sound signal processing method while ensuring the positioning accuracy.
在一种可能实施方式中,所述位置选择指令由图像采集设备在所采集的图像中检测到第一肢体行为的情况下触发,所述图像采集设备用于针对所述拾音空间进行图像采集,所述第一肢体行为用于指示对所述位置静音。In a possible implementation manner, the position selection instruction is triggered by an image acquisition device when the first body behavior is detected in the collected image, and the image acquisition device is used to perform image acquisition for the sound pickup space , the first body behavior is used to indicate to mute the location.
其中,该第一肢体行为指示对其所在位置静音,例如,与会人将食指竖放靠近唇边。Wherein, the first body behavior indicates mute at its position, for example, the participant puts the index finger vertically close to the lip.
通过上述技术方案,与会者无需手动选择,基于图像即可自动对干扰源进行定位,在会议场景中实现对干扰源的智能屏蔽,在保证声音质量的同时,提升了会议体验。Through the above technical solution, participants do not need to manually select, and can automatically locate the interference source based on the image, and realize the intelligent shielding of the interference source in the conference scene, which improves the conference experience while ensuring the sound quality.
进一步地,通过上述技术方案,能够直接从位置选择指令中获取干扰源位置,减少了运算过程中涉及到的数据量,提高了声音信号处理的效率。Further, through the above technical solution, the position of the interference source can be obtained directly from the position selection instruction, the amount of data involved in the calculation process is reduced, and the efficiency of sound signal processing is improved.
在一种可能实施方式中,所述确定所述拾音空间内的干扰源位置包括:In a possible implementation manner, the determining the position of the interference source in the sound pickup space includes:
对图像采集设备所采集的目标图像进行检测,所述图像采集设备用于针对所述拾音空间进行图像采集;Detecting the target image collected by the image acquisition device, the image acquisition device is used for image acquisition for the sound pickup space;
响应于在所述目标图像中检测到第一肢体行为,将所述第一肢体行为在所述拾音空间中的位置确定为所述干扰源位置,所述第一肢体行为用于指示对所述位置静音。In response to detecting a first body action in the target image, determining a position of the first body action in the sound pickup space as the position of the interference source, the first body action being used to indicate the The above position is muted.
通过上述技术方案,基于实时图像确定干扰源位置,保证了干扰源位置的准确性,进一步提高声音质量。Through the above technical solution, the position of the interference source is determined based on the real-time image, which ensures the accuracy of the position of the interference source and further improves the sound quality.
在一种可能实施方式中,所述响应于在所述目标图像中检测到第一肢体行为,将所述第一肢体行为在所述拾音空间中的位置确定为所述干扰源位置包括:In a possible implementation manner, in response to detecting the first body action in the target image, determining the position of the first body action in the sound pickup space as the position of the interference source includes:
响应于在所述目标图像中检测到第一肢体行为,获取所述第一肢体行为在所述目标图像中的位置;In response to detecting a first body action in the target image, acquiring a position of the first body action in the target image;
基于所述第一肢体行为在所述目标图像中的位置以及所述图像采集设备在所述拾音空间中的空间位置,将所述第一肢体行为在所述拾音空间中的空间位置,确定为所述干扰源位置。Based on the position of the first body behavior in the target image and the spatial position of the image acquisition device in the sound pickup space, calculating the spatial position of the first body behavior in the sound pickup space, Determine the location of the interference source.
在一种可能实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
响应于在所述目标图像中检测到第二肢体行为,将所述第二肢体行为在所述拾音空间中的位置确定为所述目标的位置,所述第二肢体行为用于指示对所述目标声音信号进行增强。In response to detecting a second body action in the image of the target, determining a position of the second body action in the sound pickup space as the position of the target, the second body action being used to indicate the The above-mentioned target sound signal is enhanced.
其中,该第二肢体行为用于指示对目标声音信号进行增强,例如,与会人将食指横放靠近唇边,指示其需要发言。Wherein, the second body behavior is used to instruct to enhance the target sound signal, for example, the participant puts the index finger horizontally close to the lip to indicate that the participant needs to speak.
通过上述技术方案,基于第二肢体行为来确定目标声音信号对应的位置,从而能够对目标声音信号进行针对性的增强,进而提升声音质量。Through the above technical solution, the position corresponding to the target sound signal is determined based on the second body behavior, so that the target sound signal can be enhanced in a targeted manner, thereby improving the sound quality.
在一种可能实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
对所述干扰源位置进行跟踪;Tracking the location of the interference source;
所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:The determining the reference signal from the sound signal based on the location of the interference source includes:
基于跟踪到的所述干扰源位置发生变化,从所述声音信号中重新确定参考信号。Based on the tracked change in the position of the interference source, a reference signal is re-determined from the sound signal.
通过上述技术方案,在确定干扰源之后即可锁定该干扰源,从而基于实时位置变化来确 定干扰源位置,通过及时地捕捉到干扰源位置的变化,保证干扰源位置的准确性,进一步地,保证在多变的实际会议场景中,始终能够针对干扰源进行声音信号处理,保证声音质量。Through the above technical solution, the interference source can be locked after the interference source is determined, so that the location of the interference source can be determined based on the real-time position change, and the accuracy of the location of the interference source can be ensured by capturing the change of the location of the interference source in time. Further, Ensure that in the changing actual conference scene, the sound signal can always be processed against the interference source to ensure the sound quality.
在一种可能实施方式中,所述拾音设备包括多个麦克风,所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:In a possible implementation manner, the sound pickup device includes multiple microphones, and the determining the reference signal from the sound signal based on the location of the interference source includes:
将来源于所述干扰源位置对应的麦克风的声音信号,确定为参考信号。The sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
通过上述技术方案,在基于多个麦克风进行拾音的场景下,能够基于干扰源对应的麦克风,确定出对于干扰源而言具有代表性的参考信号,使得基于参考信号能够更好地滤除干扰源的声源,有效提高声音质量。Through the above technical solution, in the scenario of picking up sound based on multiple microphones, the representative reference signal for the interference source can be determined based on the microphone corresponding to the interference source, so that the interference can be better filtered out based on the reference signal The sound source of the source can effectively improve the sound quality.
在一种可能实施方式中,所述多个麦克风具有定位功能。In a possible implementation manner, the multiple microphones have a positioning function.
通过上述技术方案,能够根据需求随机摆放多个麦克风,大大减小了设备部署时的场景限制,在提升声音处理系统中设备部署灵活性的同时,通过对麦克风进行实时定位,实现对干扰源的准确定位,从而更加精准地从声音信号中滤除干扰源的声音,有效保证声音质量。Through the above technical solution, multiple microphones can be randomly placed according to the needs, which greatly reduces the scene restrictions during equipment deployment. While improving the flexibility of equipment deployment in the sound processing system, the real-time positioning of the microphones can realize the detection of interference sources. Accurate positioning, so that the sound of the interference source can be filtered out from the sound signal more accurately, and the sound quality can be effectively guaranteed.
在一种可能实施方式中,所述拾音设备为麦克风阵列,所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:In a possible implementation manner, the sound pickup device is a microphone array, and the determining the reference signal from the sound signal based on the location of the interference source includes:
基于所述干扰源位置的角度信息,确定与所述角度信息匹配的波束角度范围;Based on the angle information of the position of the interference source, determine a beam angle range matching the angle information;
基于所述波束角度范围,从所述麦克风阵列拾取的声音信号中,确定参考信号。Based on the beam angle range, a reference signal is determined from the sound signals picked up by the microphone array.
其中,干扰源位置的角度信息是指干扰源相对于麦克风阵列的角度。Wherein, the angle information of the position of the interference source refers to the angle of the interference source relative to the microphone array.
其中,波束角度范围是指:麦克风阵列形成的波束所覆盖的角度范围。基于指定波束角度范围,能够确定与麦克风阵列呈指定角度的拾音范围内的声音信号。Wherein, the beam angle range refers to the angle range covered by the beam formed by the microphone array. Based on the specified beam angle range, it is possible to determine the sound signal within the pickup range at the specified angle to the microphone array.
本申请实施例提供的方法在采用麦克风阵列进行拾音的场景下,能够适配于麦克风阵列的空间排列特性,利用干扰源的角度信息,获取对干扰源而言具有针对性的指定角度范围内的声音信号,保证了参考信号对干扰源的代表性,提升了针对干扰源进行声音信号处理的准确性,有效提升声音质量。The method provided by the embodiment of the present application can adapt to the spatial arrangement characteristics of the microphone array in the scenario where the microphone array is used to pick up sound, and use the angle information of the interference source to obtain the angle within the specified angle range targeted to the interference source. The sound signal ensures the representativeness of the reference signal to the interference source, improves the accuracy of sound signal processing for the interference source, and effectively improves the sound quality.
在一种可能实施方式中,所述基于所述干扰源位置,从所述声音信号中确定参考信号之后,所述方法还包括:In a possible implementation manner, after determining the reference signal from the sound signal based on the location of the interference source, the method further includes:
基于所述参考信号,确定噪声门限;determining a noise threshold based on the reference signal;
基于所述噪声门限和所述参考信号,确定参考信号的信噪比;determining a signal-to-noise ratio of a reference signal based on the noise threshold and the reference signal;
将所述信噪比小于目标阈值的参考信号置0。Set the reference signal whose signal-to-noise ratio is smaller than the target threshold to 0.
在本申请实施例中,通过上述技术方案,能够将参考信号中非人声的部分静音,得到包括更纯净的人声的参考信号,提高后续基于参考信号进行声音信号处理的效率,进而提升声音质量。In the embodiment of the present application, through the above technical solution, it is possible to mute the non-human voice part of the reference signal, obtain a reference signal including a purer human voice, improve the efficiency of subsequent sound signal processing based on the reference signal, and further improve the sound quality. quality.
在一种可能实施方式中,所述基于所述参考信号,对目标声音信号进行增强包括:In a possible implementation manner, the enhancing the target sound signal based on the reference signal includes:
基于所述参考信号,从所述拾音空间内的声音信号中,确定第一声音信号,所述第一声音信号的信号能量小于所述参考信号的信号能量,且,所述第一声音信号与所述参考信号之间的相关性大于相关性阈值;Based on the reference signal, determine a first sound signal from the sound signals in the sound pickup space, the signal energy of the first sound signal is less than the signal energy of the reference signal, and the first sound signal the correlation with the reference signal is greater than a correlation threshold;
基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强。Based on the reference signal, the target sound signal in the first sound signal is enhanced.
其中,信号能量的大小能够在一定程度上表示声音信号中人声的强度。Wherein, the magnitude of the signal energy can represent the strength of the human voice in the sound signal to a certain extent.
其中,信号之间的相关性能够体现信号之间互相影响的程度。Among them, the correlation between signals can reflect the degree of mutual influence between signals.
通过上述技术方案,能够从多路声音信号中,确定出受干扰源影响较大的第一声音信号,进而针对性地滤除该第一声音信号中干扰源的声音,通过提高滤除的准确性,有效提升了声音质量。考虑到实际会议场景中,出于私密谈话的需求,与会人认为自身即是干扰源,则通 过上述技术方案,能够在提升声音质量的基础上,保证会议场景中与会人谈话的私密性,有效提升了用户体验。Through the above technical solution, it is possible to determine the first sound signal that is greatly affected by the interference source from the multi-channel sound signals, and then filter out the sound of the interference source in the first sound signal in a targeted manner, by improving the accuracy of filtering performance, effectively improving the sound quality. Considering that in the actual conference scene, due to the need for private conversation, the participants think that they are the source of interference, the above-mentioned technical solution can ensure the privacy of the conversation of the participants in the conference scene on the basis of improving the sound quality, effectively Improved user experience.
在一种可能实施方式中,所述基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强包括:In a possible implementation manner, the enhancing the target sound signal in the first sound signal based on the reference signal includes:
以所述参考信号为滤波器的一路输入,以所述第一声音信号为所述滤波器的另一路输入,通过所述滤波器,滤除所述第一声音信号中与所述参考信号相关的部分,以增强所述第一声音信号中的所述目标声音信号,输出滤波结果。Using the reference signal as one input of the filter, using the first sound signal as the other input of the filter, and filtering out the first sound signal related to the reference signal through the filter to enhance the target sound signal in the first sound signal, and output a filtering result.
通过上述技术方案,基于参考信号对第一声音信号进行滤波处理,能够针对性地减小干扰源的声音对该第一声音信号的影响。Through the above technical solution, the first sound signal is filtered based on the reference signal, so that the influence of the sound of the interference source on the first sound signal can be reduced in a targeted manner.
在一种可能实施方式中,所述滤波器包括第一滤波器和第二滤波器,In a possible implementation manner, the filter includes a first filter and a second filter,
所述通过所述滤波器,滤除所述第一声音信号中与所述参考信号相关的部分,以增强所述第一声音信号中的所述目标声音信号,输出滤波结果包括:Filtering out the part related to the reference signal in the first sound signal through the filter, so as to enhance the target sound signal in the first sound signal, and outputting the filtering result includes:
通过所述第一滤波器,获取所述参考信号的估计信号,所述第一滤波器的参数基于所述第二滤波器的参数确定,所述第二滤波器的参数基于多次滤波结果之间的差异确定;Obtain an estimated signal of the reference signal through the first filter, the parameters of the first filter are determined based on the parameters of the second filter, and the parameters of the second filter are determined based on results of multiple filtering Determine the difference between;
基于所述估计信号,滤除所述第一声音信号中的所述估计信号,输出滤波结果。Based on the estimated signal, filter the estimated signal in the first sound signal, and output a filtering result.
上述技术方案中,自适应滤波器能够在滤波过程中,通过自适应算法来调整滤波器的参数,以获得更好的滤波效果。In the above technical solution, the adaptive filter can adjust the parameters of the filter through an adaptive algorithm during the filtering process, so as to obtain a better filtering effect.
在一种可能实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
基于所述第二滤波器的第n次滤波结果和第n-1次滤波结果之间的差异,调整所述第二滤波器的参数,n为大于1的整数;Adjusting parameters of the second filter based on the difference between the nth filtering result of the second filter and the n-1th filtering result, where n is an integer greater than 1;
在调整后的所述第二滤波器的参数满足收敛条件的情况下,将所述调整后的所述第二滤波器的参数配置至所述第一滤波器。If the adjusted parameters of the second filter satisfy a convergence condition, configure the adjusted parameters of the second filter to the first filter.
通过上述技术方案,能够有效提升自适应滤波器参数收敛的速度,进而提高滤波的效率。Through the above technical solution, the speed of parameter convergence of the adaptive filter can be effectively improved, thereby improving the efficiency of filtering.
在一种可能实施方式中,所述方法还包括:In a possible implementation manner, the method also includes:
在所述第一声音信号滤波前后的衰减量大于衰减阈值的情况下,对滤波后的所述第一声音信号进行剪切。In a case where the attenuation of the first sound signal before and after filtering is greater than an attenuation threshold, the filtered first sound signal is clipped.
通过上述技术方案,在该滤波后的第一声音信号被减弱导致失真的情况下,保证声音信号的质量。Through the above technical solution, in the case that the filtered first sound signal is weakened and causes distortion, the quality of the sound signal can be guaranteed.
第二方面,提供了一种声音信号处理装置,该装置包括多个功能模块,用于执行如第一方面所提供的声音信号处理方法中的对应步骤。A second aspect provides an audio signal processing device, which includes a plurality of functional modules for executing corresponding steps in the audio signal processing method provided in the first aspect.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
第三方面,提供了一种声音信号处理设备,该声音信号处理设备包括处理器和存储器,该存储器用于存储至少一段程序代码,该至少一段程序代码由该处理器加载并执行上述的声音信号处理方法。In a third aspect, a sound signal processing device is provided, the sound signal processing device includes a processor and a memory, the memory is used to store at least one piece of program code, the at least one piece of program code is loaded by the processor and executes the above-mentioned sound signal Approach.
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质用于存储至少一段 程序代码,该至少一段程序代码用于执行上述的声音信号处理方法。In a fourth aspect, a computer-readable storage medium is provided, and the computer-readable storage medium is used to store at least one piece of program code, and the at least one piece of program code is used to execute the above-mentioned sound signal processing method.
第五方面,提供了一种计算机程序产品,当该计算机程序产品在声音信号处理设备上运行时,使得该声音信号处理设备执行上述的声音信号处理方法。In a fifth aspect, a computer program product is provided. When the computer program product is run on a sound signal processing device, the sound signal processing device is made to execute the above sound signal processing method.
附图说明Description of drawings
图1是本申请实施例提供的一种声音处理系统的架构示意图;FIG. 1 is a schematic structural diagram of a sound processing system provided by an embodiment of the present application;
图2是本申请实施例提供的一种声音处理系统的部署示意图;Fig. 2 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application;
图3是本申请实施例提供的一种声音处理系统的部署示意图;Fig. 3 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application;
图4是本申请实施例提供的一种声音处理系统的部署示意图;Fig. 4 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application;
图5是本申请实施例提供的一种声音处理系统的部署示意图;Fig. 5 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application;
图6是本申请实施例提供的一种声音信号处理方法的流程图;FIG. 6 is a flow chart of a sound signal processing method provided by an embodiment of the present application;
图7是本申请实施例提供的一种自适应滤波器的示意图;FIG. 7 is a schematic diagram of an adaptive filter provided by an embodiment of the present application;
图8是本申请实施例提供的一种声音信号处理方法的流程图;FIG. 8 is a flow chart of a sound signal processing method provided in an embodiment of the present application;
图9是本申请实施例提供的一种声音信号处理方法的流程图;FIG. 9 is a flow chart of a sound signal processing method provided by an embodiment of the present application;
图10是本申请实施例提供的一种声音信号处理方法的流程图;FIG. 10 is a flow chart of a sound signal processing method provided by an embodiment of the present application;
图11是本申请实施例提供的一种分布式麦克风定位过程的示意图;FIG. 11 is a schematic diagram of a distributed microphone positioning process provided by an embodiment of the present application;
图12是本申请实施例提供的一种声音信号处理装置的结构示意图;Fig. 12 is a schematic structural diagram of an audio signal processing device provided in an embodiment of the present application;
图13是本申请实施例提供的一种声音信号处理设备的硬件结构示意图。Fig. 13 is a schematic diagram of a hardware structure of an audio signal processing device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.
在介绍本申请实施例提供的技术方案之前,下面先对本申请涉及的关键术语进行说明。Before introducing the technical solutions provided by the embodiments of the present application, key terms involved in the present application will be described below.
均方根(root mean square,RMS):通过对信号的所有离散值的平方求和,再对求和的结果取均值,最后对均值开平方,就得到信号的均方根。在物理学中,均方根是信号(例如电流信号和电压信号)的有效值,用于表征信号的能量。Root mean square (root mean square, RMS): By summing the squares of all discrete values of the signal, then taking the mean of the summed results, and finally taking the square root of the mean, the root mean square of the signal is obtained. In physics, the root mean square is the effective value of a signal (such as a current signal and a voltage signal), and is used to characterize the energy of the signal.
信噪比(signal noise ratio,SNR):信噪比是指一个电子设备或者电子系统中信号与噪声的比例,例如,信号能量与噪声能量的比值。信号指的是来自设备外部需要通过这台设备进行处理的信号,噪声是指经过该设备后产生的原信号中并不存在的无规则的额外信号(或信息),并且该种信号并不随原信号的变化而变化。Signal noise ratio (SNR): SNR refers to the ratio of signal to noise in an electronic device or electronic system, for example, the ratio of signal energy to noise energy. Signal refers to the signal from the outside of the device that needs to be processed by this device. Noise refers to the irregular extra signal (or information) that does not exist in the original signal after passing through the device, and the signal does not follow the original signal. signal changes.
子带:子带编码技术是将原始信号由时间域转变为频率域,然后将其分割为若干个子频带,并对其分别进行数字编码的技术。其利用带通滤波器组,将原始信号分割为若干个子带,每个子带对应于指定频带宽度,也即是,每个子带对应指定信号频率。Sub-band: Sub-band coding technology is a technology that converts the original signal from the time domain to the frequency domain, then divides it into several sub-bands, and digitally encodes them respectively. It uses a band-pass filter bank to divide the original signal into several subbands, each subband corresponds to a specified frequency bandwidth, that is, each subband corresponds to a specified signal frequency.
背景噪声抑制(automatic noise suppression,ANS)技术用于探测出背景固定频率的杂音(例如:风扇声和空调声)并自动滤除,从而呈现出与会者清晰的声音,广泛应用于视频会议、语音会议等场景下的声音信号处理技术中。Background noise suppression (automatic noise suppression, ANS) technology is used to detect the noise of background fixed frequency (such as: fan sound and air conditioner sound) and automatically filter out, so as to present the clear voice of the participants, widely used in video conferencing, voice In the sound signal processing technology in conferences and other scenarios.
互相关(cross correlation,CC):互相关运算的结果反映了两个信号之间相似性的量度。Cross correlation (cross correlation, CC): The result of cross correlation operation reflects the measure of similarity between two signals.
自适应滤波器(adaptive filter,ADF):自适应滤波器能够基于输入信号的特征,根据与期望信号的差异自适应地调整滤波器的参数,来保证滤波效果,因此,自适应滤波器被广 泛地应用在信号系统辨识、信号预测以及噪声消除中。Adaptive filter (adaptive filter, ADF): The adaptive filter can adaptively adjust the parameters of the filter according to the difference from the expected signal based on the characteristics of the input signal to ensure the filtering effect. Therefore, the adaptive filter is widely used. It is widely used in signal system identification, signal prediction and noise elimination.
接下来对本申请实施例提供的技术方案进行介绍。Next, the technical solutions provided by the embodiments of the present application will be introduced.
本申请实施例提供的声音信号处理方法应用于声音信号处理设备,例如,在视频会议或语音会议等会议场景中,该声音信号处理设备可以是会议终端或智能音箱等。其中,该声音信号处理设备用于对拾音设备从拾音空间拾取到的声音信号进行处理。例如,在会议场景中,会议终端对会场中的声音信号进行降噪。The sound signal processing method provided in the embodiment of the present application is applied to a sound signal processing device. For example, in a conference scene such as a video conference or a voice conference, the sound signal processing device may be a conference terminal or a smart speaker. Wherein, the sound signal processing device is used for processing the sound signal picked up by the sound pickup device from the sound pickup space. For example, in a conference scenario, the conference terminal performs noise reduction on sound signals in the conference site.
拾音设备用于拾取声音信号。拾音设备具有多种形态,例如,拾音设备可以是麦克风或麦克风阵列等。该麦克风可以是固定麦克风,例如,桌面嵌入式的麦克风;该麦克风还可以是可移动的麦克风。其中,麦克风阵列是指将多个麦克风按照某种空间结构进行排列得到的阵列结构,麦克风阵列根据阵列结构的空间特性,能够对多个方向的声音信号进行处理,得到各个角度范围内的声音信号。根据不同的使用场景,能够选择不同形态的拾音设备来拾取声音信号,本申请实施例中对拾音设备的形态不做限定。Pickup equipment is used to pick up sound signals. The sound pickup device has various forms, for example, the sound pickup device may be a microphone or a microphone array, and the like. The microphone may be a fixed microphone, for example, a desktop embedded microphone; the microphone may also be a movable microphone. Among them, the microphone array refers to an array structure obtained by arranging multiple microphones according to a certain spatial structure. According to the spatial characteristics of the array structure, the microphone array can process sound signals in multiple directions to obtain sound signals in various angle ranges. . According to different usage scenarios, different forms of sound pickup devices can be selected to pick up sound signals, and the form of the sound pickup device is not limited in the embodiments of the present application.
拾音空间为预先配置的三维拾音区域。拾音空间可以是封闭空间,即拾音空间的大小有限。例如拾音空间可以呈长方体,则拾音空间的大小可以采用长、宽、高表示。或者,拾音空间也可以是开放空间,例如不限制拾音空间的高度。拾音空间的大小和形状可以根据拾音需求或拾音场景进行设置,本申请实施例对拾音空间的大小和形状都不做限定。The pickup space is a pre-configured three-dimensional pickup area. The sound pickup space may be a closed space, ie the size of the sound pickup space is limited. For example, the sound pickup space can be in the shape of a cuboid, and the size of the sound pickup space can be represented by length, width, and height. Alternatively, the sound pickup space may also be an open space, for example, the height of the sound pickup space is not limited. The size and shape of the sound pickup space can be set according to sound pickup requirements or sound pickup scenarios, and the embodiment of the present application does not limit the size and shape of the sound pickup space.
图1是本申请实施例提供的一种声音处理系统的架构示意图。如图1所示,该声音处理系统包括拾音设备、声音信号处理设备以及拾音控制设备。其中,该拾音控制设备用于确定拾音空间中的干扰源位置。在一些实施例中,该拾音控制设备包括控制设备,该控制设备用于对干扰源位置进行选择,例如,该控制设备集成于麦克风上,或者,控制设备可以为会议触控平板。在一些实施例中,该拾音控制设备包括图像采集设备,该图像采集设备用于针对拾音空间进行图像采集,例如,会场中的摄像头。其中,该声音信号处理设备获取拾音设备所拾取的声音信号,并通过拾音控制设备,确定拾音空间内的干扰源位置,进而基于干扰源位置确定参考信号,以通过参考信号滤除声音信号中干扰源的声音,实现增强目标声音信号的目的。可以理解地,图1示出的声音处理系统仅用作示例性说明,并不用作对本申请方案所应用的声音处理系统的限定。Fig. 1 is a schematic structural diagram of a sound processing system provided by an embodiment of the present application. As shown in FIG. 1 , the sound processing system includes a sound pickup device, a sound signal processing device, and a sound pickup control device. Wherein, the sound pickup control device is used to determine the position of the interference source in the sound pickup space. In some embodiments, the sound pickup control device includes a control device for selecting the location of the interference source, for example, the control device is integrated on a microphone, or the control device may be a conference touch panel. In some embodiments, the sound pickup control device includes an image collection device, and the image collection device is used for image collection of a sound pickup space, for example, a camera in a venue. Wherein, the sound signal processing device obtains the sound signal picked up by the sound pickup device, and determines the position of the interference source in the sound pickup space through the sound pickup control device, and then determines the reference signal based on the position of the interference source, so as to filter out the sound through the reference signal The sound of the interference source in the signal, to achieve the purpose of enhancing the target sound signal. It can be understood that the sound processing system shown in FIG. 1 is only used as an example for illustration, and is not used as a limitation to the sound processing system applied to the solution of the present application.
本申请实施例中,由于声音处理系统的构成不同,因此,声音处理系统的部署方式也可以不同,本申请实施例基于图2至图5,对图1对应的声音处理系统架构下的四种声音处理系统的部署方式进行示意说明。下面将以声音信号处理设备为会议终端为例,对本申请的技术方案进行说明。In the embodiment of the present application, since the composition of the sound processing system is different, the deployment method of the sound processing system may also be different. The embodiment of the present application is based on Fig. 2 to Fig. 5, for the four types of The deployment of the sound processing system is schematically illustrated. The technical solution of the present application will be described below by taking the audio signal processing device as a conference terminal as an example.
图2是本申请实施例提供的一种声音处理系统的部署示意图,该声音处理系统应用于会议场景中,拾音空间即为会场。如图2所示,该声音处理系统包括:作为拾音设备的多个麦克风210;作为声音信号处理设备的会议终端220;作为拾音控制设备的会议触控平板230。其中,该会议终端220部署在会场的墙面上,该多个麦克风210部署在会议桌的指定位置。该会议终端220能够获取该多个麦克风210从会场中拾取的声音信号。其中,该多个麦克风210具有物理按键211。响应于对任一麦克风的物理按键211进行的选择操作,该麦克风向会议终端返回针对该麦克风的选择指令,进而将被选择的麦克风确定为干扰源对应的麦克风。在一些实施例中,麦克风对应有指示灯,指示灯用于指示对应麦克风的选择状态,例如,指示灯亮指示该麦克风被选择,则该麦克风被确定为干扰源对应的麦克风。可选地,该会议触 控平板230提供选择麦克风的功能。响应于在该会议触控平板上对任一麦克风的选择操作,会议触控平板230向该会议终端220返回针对麦克风的选择指令,该选择指令指示将麦克风确定为干扰源对应的麦克风。在一些实施例中,该会议触控平板230能够控制麦克风对应的指示灯的指示状态,以指示麦克风的选择状态,例如,会议触控平板控制麦克风对应的指示灯亮起,指示该麦克风被选择,则该麦克风被确定为干扰源对应的麦克风。需要说明的是,上述仅为示例性的描述,本申请实施例对会议终端的部署位置不做限定,例如,会议终端可以部署在会场中的可移动支架上。Fig. 2 is a schematic diagram of deployment of a sound processing system provided by an embodiment of the present application. The sound processing system is applied in a conference scene, and the sound pickup space is the meeting place. As shown in FIG. 2 , the sound processing system includes: a plurality of microphones 210 as sound pickup devices; a conference terminal 220 as a sound signal processing device; and a conference touch panel 230 as a sound pickup control device. Wherein, the conference terminal 220 is deployed on the wall of the venue, and the multiple microphones 210 are deployed at designated positions on the conference table. The conference terminal 220 can acquire sound signals picked up by the plurality of microphones 210 from the venue. Wherein, the multiple microphones 210 have physical buttons 211 . In response to a selection operation on the physical button 211 of any microphone, the microphone returns a selection instruction for the microphone to the conference terminal, and then the selected microphone is determined as the microphone corresponding to the interference source. In some embodiments, the microphone is corresponding to an indicator light, and the indicator light is used to indicate the selection state of the corresponding microphone. For example, if the indicator light is on to indicate that the microphone is selected, the microphone is determined to be the microphone corresponding to the interference source. Optionally, the conference touch panel 230 provides the function of selecting a microphone. In response to the selection operation of any microphone on the conference touch panel, the conference touch panel 230 returns a microphone selection instruction to the conference terminal 220, and the selection instruction indicates that the microphone is determined as the microphone corresponding to the interference source. In some embodiments, the conference touch panel 230 can control the indication status of the indicator light corresponding to the microphone to indicate the selection status of the microphone, for example, the conference touch panel controls the indicator light corresponding to the microphone to light up, indicating that the microphone is selected, Then the microphone is determined as the microphone corresponding to the interference source. It should be noted that the above is only an exemplary description, and the embodiment of the present application does not limit the deployment location of the conference terminal. For example, the conference terminal may be deployed on a movable stand in the conference venue.
图3是本申请实施例提供的另一种声音处理系统的部署示意图,该声音处理系统应用于会议场景中,拾音空间即为会场。如图3所示,该声音处理系统包括:作为拾音设备的多个麦克风310;作为声音信号处理设备的会议终端320;作为拾音控制设备的会议触控平板330以及摄像头340。其中,该多个麦克风310具有物理按键311。图3对应的声音处理系统中,除摄像头340以外的声音处理系统构成与图2对应的声音处理系统同理,在此不作赘述。其中,该摄像头340部署在会场的墙面上,用于采集会场中的图像。在一些实施例中,该摄像头是与该会议终端相连的外置摄像头。在另一些实施例中,该摄像头是该会议终端自带的内置摄像头。可选地,该摄像头具有数据处理能力,能够对采集到的图像进行处理,并向会议终端发送针对麦克风的选择指令,指示会议终端对来源于该麦克风的声音信号进行相应的处理。需要说明的是,上述仅为示例性的描述,本申请实施例对摄像头的部署位置不做限定,例如,摄像头还可以悬挂在会场中的天花板上。Fig. 3 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application. The sound processing system is applied in a meeting scene, and the sound pickup space is the meeting place. As shown in FIG. 3 , the sound processing system includes: a plurality of microphones 310 as sound pickup devices; a conference terminal 320 as a sound signal processing device; a conference touch panel 330 and a camera 340 as sound pickup control devices. Wherein, the multiple microphones 310 have physical buttons 311 . In the sound processing system corresponding to FIG. 3 , the configuration of the sound processing system other than the camera 340 is the same as that of the sound processing system corresponding to FIG. 2 , which will not be repeated here. Wherein, the camera 340 is deployed on the wall of the meeting place, and is used to collect images in the meeting place. In some embodiments, the camera is an external camera connected to the conference terminal. In some other embodiments, the camera is a built-in camera of the conference terminal. Optionally, the camera has data processing capability, can process the collected images, and send a microphone selection instruction to the conference terminal, instructing the conference terminal to process the sound signal from the microphone accordingly. It should be noted that the above is only an exemplary description, and the embodiment of the present application does not limit the deployment position of the camera. For example, the camera may also be hung on the ceiling in the venue.
图4是本申请实施例提供的又一种声音处理系统的部署示意图,该声音处理系统应用于会议场景中,拾音空间即为会场。如图4所示,该声音处理系统包括:作为拾音设备的麦克风阵列410;作为声音信号处理设备的会议终端420;作为拾音控制设备的桌面物理按键430、会议触控平板440以及摄像头450。在一些实施例中,麦克风阵列与会议终端在物理上集成在一起作为一个设备,也即是,会议终端内置麦克风阵列。在另一些实施例中,麦克风阵列与会议终端在物理上是分开的两个设备。可选地,基于会场实际情况,能够自行选择设备在会场中所部署的位置,使得麦克风阵列的拾音范围能够均匀地覆盖会场,例如,将内置麦克风阵列的会议终端部署在会场墙壁的中间位置。其中,该会议终端420能够从该麦克风阵列410获取会场中各个角度范围的声音信号。其中,该桌面物理按键430用于选择会场中的位置。在一些实施例中,响应于针对任一桌面物理按键430选择操作,该桌面物理按键430向会议终端420返回针对桌面物理按键在拾音空间中所处位置的选择指令。在一些实施例中,该桌面物理按键430对应有指示灯,该指示灯用于指示该桌面物理按键对应位置的选择状态,例如,指示灯亮指示对应的位置被选择,则该位置被确定为干扰源对应的位置。可选地,该会议触控平板440提供选择会场中位置的功能。响应于在该会议触控平板上针对任意位置的选择操作,该会议触控平板440向该会议终端420返回针对会场中任意位置的选择指令。在一些实施例中,该会议触控平板440能够控制该指示灯的指示状态,以指示会场中位置的选择状态,例如,会议触控平板控制指示灯亮起,该指示灯对应的位置被选择,则该位置被确定为干扰源对应的位置。其中,该摄像头450参考上述图3对应的声音处理系统中对摄像头340的描述,在此不作赘述。在一些实施例中,麦克风阵列410、会议终端420以及摄像头450集成在一起作为一个设备,也即是,会议终端内置麦克风阵列以及摄像头。Fig. 4 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application. The sound processing system is applied in a meeting scene, and the sound pickup space is the meeting place. As shown in Figure 4, the sound processing system includes: a microphone array 410 as a sound pickup device; a conference terminal 420 as a sound signal processing device; a desktop physical button 430 as a sound pickup control device, a conference touch panel 440 and a camera 450 . In some embodiments, the microphone array and the conference terminal are physically integrated as one device, that is, the conference terminal has a built-in microphone array. In some other embodiments, the microphone array and the conference terminal are physically separated two devices. Optionally, based on the actual situation of the venue, you can choose where the device is deployed in the venue, so that the sound pickup range of the microphone array can cover the venue evenly. For example, deploy the conference terminal with the built-in microphone array in the middle of the wall of the venue . Wherein, the conference terminal 420 can obtain sound signals from various angle ranges in the venue from the microphone array 410 . Wherein, the desktop physical button 430 is used to select a position in the meeting place. In some embodiments, in response to a selection operation on any desktop physical button 430 , the desktop physical button 430 returns a selection instruction for the location of the desktop physical button in the sound pickup space to the conference terminal 420 . In some embodiments, the desktop physical button 430 is corresponding to an indicator light, and the indicator light is used to indicate the selection state of the corresponding position of the desktop physical button. For example, if the indicator light is on to indicate that the corresponding position is selected, the position is determined as interference The location corresponding to the source. Optionally, the meeting touch panel 440 provides a function of selecting a location in the meeting place. In response to a selection operation for any position on the conference touch panel, the conference touch panel 440 returns an instruction for selecting any position in the conference site to the conference terminal 420 . In some embodiments, the conference touch panel 440 can control the indication state of the indicator light to indicate the selection status of the location in the venue. For example, the conference touch panel control indicator light is on, and the position corresponding to the indicator light is selected. Then the position is determined as the position corresponding to the interference source. Wherein, for the camera 450, refer to the description of the camera 340 in the audio processing system corresponding to FIG. 3 above, which will not be repeated here. In some embodiments, the microphone array 410, the conference terminal 420 and the camera 450 are integrated together as one device, that is, the conference terminal has a built-in microphone array and camera.
图5是本申请实施例提供的再一种声音处理系统的部署示意图,该声音处理系统应用于会议场景中,拾音空间即为会场。如图5所示,该声音处理系统包括:作为拾音设备的多个具有定位功能的分布式麦克风510;作为声音信号处理设备的会议终端520;作为拾音控制设备的桌面物理按键530、会议触控平板540以及摄像头550。该分布式麦克风510随机摆放在会议终端前的会议桌上,分布式麦克风的位置能够在会议终端520中实时更新。可选地,该会议触控平板获取分布式麦克风的位置,以提供选择分布式麦克风的功能。响应于在该会议触控平板上对任一分布式麦克风的选择操作,该会议触控平板向该会议终端返回针对该分布式麦克风的选择指令,指示该分布式麦克风被选择,则该分布式麦克风被确定为干扰源对应的分布式麦克风。可选地,该桌面物理按键530用于选择会场中的分布式麦克风。在一些实施例中,响应于针对任一桌面物理按键530选择操作,该桌面物理按键530向会议终端520返回针对桌面物理按键在拾音空间中所处位置的选择指令。会议终端基于桌面物理按键所在的位置以及多个分布式麦克风的位置,选择距离该桌面物理按键最近的分布式麦克风。在一些实施例中,分布式麦克风510对应有指示灯,该指示灯用于指示对应分布式麦克风的选择状态,例如,指示灯亮指示对应的分布式麦克风被选择,则该分布式麦克风被确定为干扰源对应的分布式麦克风。其中,该摄像头550参考上述图3对应的声音处理系统中对摄像头340的描述,在此不作赘述。Fig. 5 is a schematic diagram of deployment of another sound processing system provided by an embodiment of the present application. The sound processing system is applied in a conference scene, and the sound pickup space is the meeting place. As shown in Figure 5, the sound processing system includes: a plurality of distributed microphones 510 with positioning functions as sound pickup devices; a conference terminal 520 as a sound signal processing device; a desktop physical button 530 as a sound pickup control device, a meeting A touch panel 540 and a camera 550 . The distributed microphone 510 is randomly placed on the conference table in front of the conference terminal, and the position of the distributed microphone can be updated in real time in the conference terminal 520 . Optionally, the conference touch panel acquires the positions of distributed microphones to provide a function of selecting distributed microphones. In response to the selection operation of any distributed microphone on the conference touch panel, the conference touch panel returns a selection instruction for the distributed microphone to the conference terminal, indicating that the distributed microphone is selected, and the distributed microphone is selected. The microphone is determined as the distributed microphone corresponding to the interference source. Optionally, the physical button 530 on the desktop is used to select a distributed microphone in the venue. In some embodiments, in response to a selection operation for any desktop physical button 530 , the desktop physical button 530 returns a selection instruction for the location of the desktop physical button in the sound pickup space to the conference terminal 520 . The conference terminal selects the distributed microphone closest to the physical button on the desktop based on the location of the physical button on the desktop and the locations of the multiple distributed microphones. In some embodiments, the distributed microphone 510 corresponds to an indicator light, which is used to indicate the selection status of the corresponding distributed microphone. For example, if the indicator light is on to indicate that the corresponding distributed microphone is selected, the distributed microphone is determined to be Distributed microphones corresponding to interference sources. Wherein, for the camera 550, refer to the description of the camera 340 in the audio processing system corresponding to FIG. 3 above, which will not be repeated here.
需要说明的是,在上述图1至图5的声音处理系统中,各个设备之间可以通过无线通信的方式进行数据传输,也可以通过有线通信的方式进行数据传输,本申请实施例对此不做限定。It should be noted that, in the above-mentioned sound processing systems in FIGS. 1 to 5 , data transmission between devices can be performed through wireless communication, or data transmission can be performed through wired communication, which is not discussed in this embodiment of the present application. Do limited.
在一些实施例中,上述任一种声音处理系统中,声音信号处理设备可以获取拾音空间的大小和形状等信息以及该拾音空间中各个设备的位置信息,例如,拾音空间的长度、宽度和高度,麦克风(或麦克风阵列)、会议终端、摄像头在拾音空间中的位置信息以及多个麦克风的编号等。In some embodiments, in any of the above-mentioned sound processing systems, the sound signal processing device can obtain information such as the size and shape of the sound pickup space and the position information of each device in the sound pickup space, for example, the length of the sound pickup space, Width and height, position information of microphone (or microphone array), conference terminal, camera in sound pickup space, and numbers of multiple microphones, etc.
在一些实施例中,上述声音处理系统中的会议终端作为本地会议终端,能够向远端会议终端发送处理后的声音信号。远端会议终端是指与本地会议终端参与同一会议且部署在不同区域的会议终端。可选地,本地会议终端与远端会议终端之间通过多媒体控制平台连接。本地会议终端可以将经过增强处理的声音信号发送给多媒体控制平台,多媒体控制平台对接收到的声音信号进行混音、编码后发送给远端会议终端。当然,会议终端也可以集成有多媒体控制平台的部分或全部功能,本地会议终端可以对经过增强处理的声音信号进行混音、编码后直接发送给远端会议终端。In some embodiments, the conference terminal in the above audio processing system is used as a local conference terminal, and can send the processed audio signal to the remote conference terminal. A remote conference terminal refers to a conference terminal that participates in the same conference as a local conference terminal and is deployed in a different area. Optionally, the local conference terminal and the remote conference terminal are connected through a multimedia control platform. The local conference terminal can send the enhanced sound signal to the multimedia control platform, and the multimedia control platform mixes and codes the received sound signal and sends it to the remote conference terminal. Of course, the conference terminal can also integrate part or all of the functions of the multimedia control platform, and the local conference terminal can mix and encode the enhanced audio signal and send it directly to the remote conference terminal.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
在本申请实施例中,声音信号处理设备基于会议场景的实际需求和设备部署情况,获取拾音空间的大小和形状等信息以及声音处理系统的各个设备在拾音空间中的位置信息,以保证声音处理系统的部署与会议场景适配,使得声音信号处理设备能够基于会议场景的实际情况进行声音信号处理,提高了声音信号处理方法的灵活性与兼容性,为不同会议场景下的声音质量提供保障。In this embodiment of the application, the sound signal processing device obtains information such as the size and shape of the sound pickup space and the position information of each device in the sound pickup space in the sound pickup space based on the actual needs of the conference scene and the deployment of the device, so as to ensure The deployment of the sound processing system is adapted to the meeting scene, so that the sound signal processing equipment can perform sound signal processing based on the actual situation of the meeting scene, which improves the flexibility and compatibility of the sound signal processing method, and provides sound quality in different meeting scenes. Assure.
通过上述图1至图5,从系统架构以及系统部署的角度对本申请实施例提供的声音处理 系统进行了介绍,下面将基于上述声音处理系统,对本申请实施例提供的声音信号处理方法的流程进行举例说明。Through the above-mentioned Figures 1 to 5, the sound processing system provided by the embodiment of the present application is introduced from the perspective of system architecture and system deployment, and the flow of the sound signal processing method provided by the embodiment of the present application will be described below based on the above-mentioned sound processing system for example.
图6是本申请实施例提供的一种声音信号处理方法的流程图。该方法应用于上述图2对应的声音处理系统中,该声音处理系统包括多个麦克风、会议终端以及会议触控平板。该声音信号处理方法由该会议终端执行。如图6所示,该方法包括:Fig. 6 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 2 above, and the sound processing system includes multiple microphones, a conference terminal, and a conference touch panel. The sound signal processing method is executed by the conference terminal. As shown in Figure 6, the method includes:
601、会议终端通过多个麦克风,拾取拾音空间内的声音信号。601. The conference terminal picks up sound signals in the sound pickup space through multiple microphones.
在本申请实施例中,声音处理系统包括多个麦克风、会议终端以及会议触控平板。在一些实施例中,声音处理系统基于系统控制软件运行,在进行声音信号处理之前,需要基于系统控制软件对声音处理系统进行配置。例如,该会议终端上安装有系统控制软件,会议终端能够通过该系统控制软件,获取声音处理系统的配置信息。例如,获取在系统控制软件的配置界面输入的配置信息。在一些实施例中,该配置信息包括:拾音空间的长度、宽度和高度;该多个麦克风以及会议终端在拾音空间中的位置信息,例如,麦克风在拾音空间对应的空间坐标系中的坐标;该多个麦克风的编号以及每个麦克风对应的拾音范围。当然,可以通过系统控制软件重新对声音处理系统进行配置,例如,拾音空间的范围需要调整,则可以通过系统控制软件调整拾音空间的长度、宽度和高度。会议终端基于该配置信息,即可确定拾音空间内多个麦克风的位置、编号以及拾音范围,从而通过该多个麦克风,获取该拾音空间内的多路声音信号。In the embodiment of the present application, the sound processing system includes a plurality of microphones, a conference terminal, and a conference touch panel. In some embodiments, the sound processing system runs based on the system control software, and the sound processing system needs to be configured based on the system control software before performing sound signal processing. For example, system control software is installed on the conference terminal, and the conference terminal can obtain configuration information of the sound processing system through the system control software. For example, the configuration information entered on the configuration interface of the system control software is acquired. In some embodiments, the configuration information includes: the length, width and height of the sound pickup space; the position information of the multiple microphones and the conference terminal in the sound pickup space, for example, the microphones in the spatial coordinate system corresponding to the sound pickup space The coordinates of the multiple microphones; the number of the multiple microphones and the corresponding pickup range of each microphone. Of course, the sound processing system can be reconfigured through the system control software. For example, if the scope of the pickup space needs to be adjusted, the length, width and height of the pickup space can be adjusted through the system control software. Based on the configuration information, the conference terminal can determine the positions, numbers and sound pickup ranges of the multiple microphones in the sound pickup space, so as to obtain multiple sound signals in the sound pickup space through the multiple microphones.
在一些实施例中,上述系统控制软件安装在会议触控平板上,相应地,可以通过会议触控平板获取声音处理系统的配置信息。In some embodiments, the above-mentioned system control software is installed on the conference touch panel, and correspondingly, the configuration information of the sound processing system can be acquired through the conference touch panel.
其中,该会议终端通过该多个麦克风,获取来源于该拾音空间的多路声音信号。在一些实施例中,由于每一路声音信号都包括麦克风所对应的一定拾音范围内的声音,因此,每路声音信号可能由多个声源的声音信号组成,例如,在会议场景中,多个与会人同时发言,则一个麦克风所拾取的一路声音信号可能会包括拾音范围内的多个与会人的声音。而该多个声源的声音信号在一路麦克风拾取的声音信号中的占比,是根据每个声源与麦克风的相对位置来决定,例如,对于一个麦克风,越靠近这个麦克风的与会人,其对应的声音信号在该麦克风拾取到的那一路声音信号中的占比越大,也即是,在该麦克风拾取到的声音信号中该与会人声音的音量越大。Wherein, the conference terminal acquires multiple sound signals from the sound pickup space through the multiple microphones. In some embodiments, since each sound signal includes sound within a certain pickup range corresponding to the microphone, each sound signal may be composed of sound signals from multiple sound sources. For example, in a conference scene, multiple If two participants speak at the same time, the sound signal picked up by one microphone may include the voices of multiple participants within the pickup range. The proportion of the sound signals of the multiple sound sources in the sound signals picked up by one microphone is determined according to the relative position of each sound source and the microphone. For example, for a microphone, the closer the participant is to the microphone, the The larger the proportion of the corresponding sound signal in the sound signal picked up by the microphone, that is, the louder the volume of the participant's voice is in the sound signal picked up by the microphone.
602、会议终端接收位置选择指令,将该位置选择指令所对应的位置,确定为该拾音空间内的干扰源位置。602. The conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
在本申请实施例中,该拾音空间中存在多个声源,干扰源是指该拾音空间内存在的多个声源中,被认为会带来干扰的声源,例如,会议中进行私密谈话的与会人。会议终端基于接收到的位置选择指令,能够基于位置来确定被认为是干扰源的声源,从而在后续的声音信号处理过程中,针对干扰源的声音进行处理,例如,滤除声音信号中干扰源的声音。In the embodiment of the present application, there are multiple sound sources in the sound pickup space, and the interference source refers to the sound source that is considered to cause interference among the multiple sound sources in the sound pickup space, for example, the Participants in private conversations. Based on the received position selection instruction, the conference terminal can determine the sound source considered as the interference source based on the position, so that in the subsequent sound signal processing process, the sound of the interference source is processed, for example, the interference in the sound signal is filtered out source sound.
其中,该位置选择指令基于在控制设备上对该干扰源所在位置的选择操作触发。在一些实施例中,选择操作对应的麦克风,被认为是距离干扰源最近的麦克风,因此,将该选择操作对应的麦克风作为干扰源对应的麦克风。在一些实施例中,该选择操作包括对麦克风对应的物理按键的按压操作,基于该按压操作,能够触发针对该麦克风所在位置的位置选择指令,而会议终端根据接收到的位置选择指令,将该麦克风所在的位置,确定为干扰源位置。在另一些实施例中,该选择操作包括在会议触控平板中对麦克风的选择操作,会议触控平板响应于该选择操作,向会议终端发送针对麦克风所在位置的位置选择指令。在一些实施例中,会议终端响应于接收到该位置选择指令,获取该位置选择指令携带的麦克风编号,将该麦克风 编号对应的麦克风位置确定为干扰源位置。Wherein, the location selection instruction is triggered based on a selection operation of the location of the interference source on the control device. In some embodiments, the microphone corresponding to the selection operation is considered to be the microphone closest to the interference source, therefore, the microphone corresponding to the selection operation is taken as the microphone corresponding to the interference source. In some embodiments, the selection operation includes pressing a physical button corresponding to the microphone. Based on the pressing operation, a location selection instruction for the location of the microphone can be triggered, and the conference terminal will select the location according to the received location selection instruction. The location of the microphone is determined as the location of the interference source. In some other embodiments, the selection operation includes a selection operation of the microphone on the conference touch panel, and the conference touch panel sends a location selection instruction for the location of the microphone to the conference terminal in response to the selection operation. In some embodiments, in response to receiving the location selection instruction, the conference terminal acquires the microphone number carried in the location selection instruction, and determines the microphone location corresponding to the microphone number as the location of the interference source.
在一些实施例中,该麦克风对应有指示灯,在该麦克风被确定为干扰源位置对应的麦克风之后,通过切换该麦克风对应的指示灯的指示状态,以指示该麦克风对应于干扰源。In some embodiments, the microphone is corresponding to an indicator light, and after the microphone is determined to be the microphone corresponding to the location of the interference source, the indication state of the indicator light corresponding to the microphone is switched to indicate that the microphone corresponds to the interference source.
在另一些实施例中,在确定了干扰源位置对应的麦克风之后,在会议触控平板中显示该麦克风为干扰源对应的麦克风。In some other embodiments, after the microphone corresponding to the location of the interference source is determined, the microphone is displayed on the conference touch panel as the microphone corresponding to the interference source.
需要说明的是,上述切换指示灯状态的步骤与在会议触控平板中显示的步骤之间,可以同步执行,也可以先后执行,本申请实施例对此不作限定。It should be noted that the above step of switching the state of the indicator light and the step displayed on the conference touch panel can be executed synchronously or sequentially, which is not limited in this embodiment of the present application.
通过上述技术方案,基于会议场景中实际部署的多种控制设备,提供多种方式来确定干扰源位置,在保证定位准确性的同时,进一步提升了声音信号处理方法的实用性。Through the above technical solution, based on various control devices actually deployed in the conference scene, multiple methods are provided to determine the location of the interference source, which further improves the practicability of the sound signal processing method while ensuring the positioning accuracy.
603、会议终端将来源于该干扰源位置对应的麦克风的声音信号,确定为参考信号。603. The conference terminal determines the sound signal originating from the microphone corresponding to the position of the interference source as a reference signal.
在本申请实施例中,会议终端在确定了干扰源位置对应的麦克风之后,将该麦克风所拾取的声音信号确定为参考信号。其中,由于参考信号来源于距离干扰源位置最近的麦克风,因此,干扰源的声音信号在参考信号中的占比,要大于干扰源的声音信号在其他麦克风的声音信号中的占比,也即是,参考信号相较于其他麦克风拾取的声音信号,能够更好地代表干扰源的声音信号。基于此,该参考信号在声音信号处理过程中,能够代表该干扰源的声音信号,用于滤除干扰源的声音。In the embodiment of the present application, after determining the microphone corresponding to the location of the interference source, the conference terminal determines the sound signal picked up by the microphone as the reference signal. Among them, since the reference signal comes from the microphone closest to the interference source, the proportion of the sound signal of the interference source in the reference signal is greater than the proportion of the sound signal of the interference source in the sound signals of other microphones, that is, Yes, the reference signal is better representative of the sound signal of the interferer than the sound signal picked up by other microphones. Based on this, the reference signal can represent the sound signal of the interference source during sound signal processing, and is used to filter out the sound of the interference source.
通过上述技术方案,在基于多个麦克风进行拾音的场景下,能够基于干扰源对应的麦克风,确定出对于干扰源而言具有代表性的参考信号,使得基于参考信号能够更好地滤除干扰源的声源,有效提高声音质量。Through the above technical solution, in the scenario of picking up sound based on multiple microphones, the representative reference signal for the interference source can be determined based on the microphone corresponding to the interference source, so that the interference can be better filtered out based on the reference signal The sound source of the source can effectively improve the sound quality.
604、会议终端对该参考信号进行去噪。604. The conference terminal performs denoising on the reference signal.
在本申请实施例中,由于该参考信号来源于拾音空间中的多个声源,在参考信号中存在噪声的情况下,去噪后的参考信号用于滤除干扰源的声音时,能够实现更好的滤除效果。In the embodiment of the present application, since the reference signal originates from multiple sound sources in the sound pickup space, when there is noise in the reference signal, when the denoised reference signal is used to filter out the sound of the interference source, it can for better filtering.
在本申请实施例中,对参考信号进行去噪的过程包括下述步骤6041至步骤6042:In the embodiment of this application, the process of denoising the reference signal includes the following steps 6041 to 6042:
6041、会议终端基于该参考信号,确定噪声门限。6041. The conference terminal determines the noise threshold based on the reference signal.
在一些实施例中,将参考信号划分为指定时间长度(例如30毫秒)的多个信号帧,以信号帧为最小处理单位,对参考信号进行去噪。In some embodiments, the reference signal is divided into multiple signal frames of a specified time length (for example, 30 milliseconds), and the reference signal is denoised with the signal frame as the minimum processing unit.
在一些实施例中,基于全局幅度谱最小原理,认为信号幅度谱最小的信号帧对应的声音非人声,非人声则被认为是噪声。在一些实施例中,由于信号能量与信号的幅度谱呈正相关,因此,能够基于信号帧的信号能量来比较幅度谱的大小。基于此,从参考信号局部100(或其他值)个信号帧的信号能量中,将最小信号能量确定为会议场景中的噪声门限,基于噪声门限,对参考信号进行去噪。其中,该噪声门限作为判断人声的标准,则信号能量低于该噪声门限的信号帧为噪声,也即是,非人声。其中,计算信号能量的原理参见公式(1)。In some embodiments, based on the principle of minimum global amplitude spectrum, it is considered that the sound corresponding to the signal frame with the smallest signal amplitude spectrum is not human voice, and the non-human voice is considered as noise. In some embodiments, since the signal energy is positively correlated with the amplitude spectrum of the signal, the magnitude of the amplitude spectrum can be compared based on the signal energy of the signal frame. Based on this, the minimum signal energy is determined as the noise threshold in the conference scene from the signal energy of 100 (or other values) signal frames of the reference signal, and the reference signal is denoised based on the noise threshold. Wherein, the noise threshold is used as a criterion for judging the human voice, and the signal frames whose signal energy is lower than the noise threshold are noise, that is, non-human voice. Wherein, the principle of calculating signal energy refers to formula (1).
Figure PCTCN2022142338-appb-000001
Figure PCTCN2022142338-appb-000001
公式(1)中,X是信号帧对应的信号幅值集合;N为信号帧X的信号幅值个数,N为正整数;RMS X是信号帧X的信号能量。 In the formula (1), X is the signal amplitude set corresponding to the signal frame; N is the number of signal amplitudes of the signal frame X, and N is a positive integer; RMS X is the signal energy of the signal frame X.
在一些实施例中,会议终端依据实时获取的参考信号,使用递归平均型噪声估计算法,确定会议场景中的长时平稳噪声能量,并用该长时平稳噪声能量持续更新该会议场景中的噪声门限。In some embodiments, the conference terminal uses the recursive average noise estimation algorithm to determine the long-term stationary noise energy in the conference scene according to the reference signal obtained in real time, and continuously updates the noise threshold in the conference scene with the long-term stationary noise energy .
在一些实施例中,基于递归平均型噪声估计算法确定长时平稳噪声能量的确定过程参见 公式(2)至公式(4)。通过递归平均型噪声估计算法,基于当前信号帧的语音存在概率,确定平滑系数。在当前信号帧的语音存在概率越接近1,则平滑系数越趋向于1,表示倾向于使用前一信号帧的信号能量作为当前信号帧的噪声能量估计;当前信号帧的语音存在概率越接近0,则平滑系数趋向于0,表示倾向于使用当前信号帧的信号能量作为噪声能量估计。In some embodiments, the determination process of determining the long-term stationary noise energy based on the recursive average noise estimation algorithm refers to formula (2) to formula (4). The smoothing coefficient is determined based on the speech existence probability of the current signal frame through a recursive average noise estimation algorithm. The closer the speech existence probability of the current signal frame is to 1, the smoother the coefficient tends to be 1, indicating that the signal energy of the previous signal frame is tended to be used as the noise energy estimation of the current signal frame; the closer the speech existence probability of the current signal frame is to 0 , then the smoothing coefficient tends to 0, indicating that the signal energy of the current signal frame is tended to be used as the noise energy estimate.
ρ′(k,l)=α pρ′(k,l-1)+(1-α p)I(k,l)         (2) ρ'(k,l)=α p ρ'(k,l-1)+(1-α p )I(k,l) (2)
基于公式(2),能够确定参考信号的第k个信号帧位于l子带处的语音存在概率ρ′(k,l),α p(0<α p<1)为第一平滑常数,其中,在第k个信号帧位于l子带处的信号能量大于预设噪声门限的情况下,I(k,l)为1;在第k个信号帧位于l子带处的信号能量小于预设噪声门限的情况下,I(k,l)为0。 Based on formula (2), it can be determined that the kth signal frame of the reference signal is located at the speech existence probability ρ'(k, l) of the l subband, and α p (0<α p <1) is the first smoothing constant, where , when the signal energy at the l subband of the k signal frame is greater than the preset noise threshold, I(k,l) is 1; the signal energy at the l subband of the k signal frame is less than the preset In the case of the noise threshold, I(k,l) is 0.
Figure PCTCN2022142338-appb-000002
Figure PCTCN2022142338-appb-000002
基于公式(3),能够计算参考信号的第k个信号帧位于l子带处的(时变)平滑系数
Figure PCTCN2022142338-appb-000003
α d(0<α d<1)为第二平滑常数。
Based on formula (3), it is possible to calculate the (time-varying) smoothing coefficient of the kth signal frame of the reference signal at the l subband
Figure PCTCN2022142338-appb-000003
α d (0<α d <1) is the second smoothing constant.
在确定了信号帧的语音存在概率以及信号帧对应的平滑系数之后,基于公式(4),即可确定参考信号的第k个信号帧位于l+1子带处的噪声能量谱
Figure PCTCN2022142338-appb-000004
其中,该Y(k,l)是参考信号的第k个信号帧位于l子带处的信号表达式。基于
Figure PCTCN2022142338-appb-000005
即可更新长时平稳噪声能量。
After determining the speech existence probability of the signal frame and the smoothing coefficient corresponding to the signal frame, based on formula (4), the noise energy spectrum of the kth signal frame of the reference signal at the l+1 subband can be determined
Figure PCTCN2022142338-appb-000004
Wherein, the Y(k, l) is a signal expression in which the kth signal frame of the reference signal is located at the l subband. based on
Figure PCTCN2022142338-appb-000005
The long-term stationary noise energy can be updated.
Figure PCTCN2022142338-appb-000006
Figure PCTCN2022142338-appb-000006
6042、会议终端基于该噪声门限和该参考信号,确定参考信号的信噪比,将信噪比小于目标阈值的参考信号置0。6042. The conference terminal determines the signal-to-noise ratio of the reference signal based on the noise threshold and the reference signal, and sets the reference signal whose signal-to-noise ratio is smaller than the target threshold to 0.
基于该噪声门限,会议终端计算每一个信号帧的信号能量与噪声门限的比例,也即是,信号帧的信噪比。在信号帧的信噪比小于目标阈值的情况下,该信号帧大概率是噪声,则将该信号帧的信号幅值置0。其中,计算信噪比的原理参见公式(5)。Based on the noise threshold, the conference terminal calculates the ratio of the signal energy of each signal frame to the noise threshold, that is, the signal-to-noise ratio of the signal frame. If the signal-to-noise ratio of the signal frame is less than the target threshold, the signal frame is likely to be noise, and the signal amplitude of the signal frame is set to 0. Wherein, the principle of calculating the signal-to-noise ratio refers to the formula (5).
Figure PCTCN2022142338-appb-000007
Figure PCTCN2022142338-appb-000007
公式(5)中,X是信号帧对应的信号幅值集合;SNR X是信号帧X的信噪比;RMS X是信号帧X的信号能量;RMS N是噪声能量(或长时平稳噪声能量),也即是,噪声门限。其中,该RMS N可以是基于参考信号的局部多个信号帧确定的噪声能量,也可以是基于参考信号累计确定的长时平稳噪声能量,本申请实施例对此不做限定。 In the formula (5), X is the signal amplitude set corresponding to the signal frame; SNR X is the signal-to-noise ratio of the signal frame X; RMS X is the signal energy of the signal frame X; RMS N is the noise energy (or long-term stationary noise energy ), that is, the noise threshold. The RMS N may be the noise energy determined based on multiple local signal frames of the reference signal, or the long-term stationary noise energy determined based on the accumulation of the reference signal, which is not limited in this embodiment of the present application.
在本申请实施例中,通过上述步骤604,能够将参考信号中非人声的部分静音,得到包括更纯净的人声的参考信号,提高后续基于参考信号进行声音信号处理的效率,进而提升声音质量。In the embodiment of the present application, through the above step 604, it is possible to mute the non-human voice part of the reference signal, obtain a reference signal including a purer human voice, improve the efficiency of subsequent sound signal processing based on the reference signal, and further improve the sound quality.
需要说明的是,本步骤604为可选步骤,在一些实施例中,可以直接基于步骤603中确定的参考信号,执行步骤605。It should be noted that step 604 is an optional step, and in some embodiments, step 605 may be performed directly based on the reference signal determined in step 603 .
在一些实施例中,在执行完上述步骤604之后,会议终端将该去噪后的参考信号以及来源于该拾音空间的其他多路声音信号,输入ANS模块进行处理,以滤除该参考信号中的背景杂音以及该其他多路声音信号中的背景杂音,从而提高后续进行声音信号处理的效率,进一步提升声音质量。In some embodiments, after performing the above step 604, the conference terminal inputs the denoised reference signal and other multi-channel sound signals from the sound pickup space into the ANS module for processing, so as to filter out the reference signal The background noise in and the background noise in the other multi-channel sound signals, thereby improving the efficiency of subsequent sound signal processing and further improving the sound quality.
605、会议终端基于去噪后的该参考信号,从该拾音空间内的声音信号中,确定第一声音信号。605. The conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
在本申请实施例中,该参考信号用于滤除干扰源的声音,因此,首先需要从拾音空间内 的声音信号中,确定出被干扰源影响到的声音信号,然后再基于参考信号,针对性地对干扰源的声音进行滤除。In the embodiment of the present application, the reference signal is used to filter out the sound of the interference source. Therefore, it is first necessary to determine the sound signal affected by the interference source from the sound signals in the pickup space, and then based on the reference signal, Filter out the sound of the interference source in a targeted manner.
在一些实施例中,基于信号能量的大小以及与参考信号之间的相关性,确定被干扰源的声音信号影响到的第一声音信号。其中,由于信号能量与信号的幅度谱呈正相关,因此,信号能量的大小能够在一定程度上表示声音信号中人声的强度。若一路声音信号的信号能量大于另一路声音信号的信号能量,则说明该路声音信号中确实存在人声,且人声的强度能够对该另一路声音信号造成影响。进一步地,若一路声音信号被干扰源影响,则表示该路声音信号中会持续交织有干扰源的声音信号,因此,该路受影响的声音信号与干扰源的声音信号的相关性会高于未受到影响的其他路声音信号。基于此,由于参考信号能够很好地代表干扰源的声音信号,当第一声音信号的信号能量小于该参考信号的信号能量,且,该第一声音信号与该参考信号之间的相关性大于相关性阈值,则说明该参考信号对该第一声音信号造成了影响,也即是,该第一声音信号被干扰源的声音影响。例如,干扰源是正在以一定音量进行私密谈话的与会人A,而该与会人A的旁边有一位与会人B,则该与会人B面前的麦克风所拾取的声音信号中,会持续交织有该与会人A进行私密谈话的声音,因此,该与会人B面前的麦克风所拾取的声音信号,即为受干扰源影响的声音信号,也即是,该第一声音信号。其中,该相关性阈值可以基于声音信号处理的精确度需求自行设定,本申请实施例对此不做限定。In some embodiments, the first sound signal affected by the sound signal of the interference source is determined based on the magnitude of the signal energy and the correlation with the reference signal. Wherein, since the signal energy is positively correlated with the amplitude spectrum of the signal, the magnitude of the signal energy can represent the strength of the human voice in the sound signal to a certain extent. If the signal energy of one sound signal is greater than the signal energy of the other sound signal, it means that there is indeed a human voice in the sound signal, and the strength of the human voice can affect the other sound signal. Further, if a sound signal of one path is affected by the interference source, it means that the sound signal of the sound signal of the path will continue to be interleaved with the sound signal of the interference source, therefore, the correlation between the sound signal of the path affected and the sound signal of the interference source will be higher than Other audio signals that are not affected. Based on this, since the reference signal can well represent the sound signal of the interference source, when the signal energy of the first sound signal is less than the signal energy of the reference signal, and the correlation between the first sound signal and the reference signal is greater than The correlation threshold indicates that the reference signal affects the first sound signal, that is, the first sound signal is affected by the sound of the interference source. For example, if the source of interference is participant A who is having a private conversation at a certain volume, and there is a participant B next to the participant A, the sound signal picked up by the microphone in front of the participant B will continue to be interwoven with the The voice of participant A having a private conversation, therefore, the sound signal picked up by the microphone in front of participant B is the sound signal affected by the interference source, that is, the first sound signal. Wherein, the correlation threshold can be set based on the accuracy requirement of the sound signal processing, which is not limited in this embodiment of the present application.
在一些实施例中,会议终端接收经过ANS模块处理后的参考信号和其他多路声音信号,并基于参考信号的信号能量、其他多路声音信号的信号能量以及其他多路声音信号与参考信号之间的互相关值,从该其他多路声音信号中确定出该第一声音信号。其中,该信号能量的计算原理参见上述公式(1)。In some embodiments, the conference terminal receives the reference signal and other multi-channel sound signals processed by the ANS module, and based on the signal energy of the reference signal, the signal energy of other multi-channel sound signals, and the relationship between the other multi-channel sound signals and the reference signal and determine the first sound signal from the other multi-channel sound signals. Wherein, the calculation principle of the signal energy refers to the above formula (1).
需要说明的是,本申请实施例以信号帧为最小单位进行信号能量的对比,在一些实施例中,信号能量的对比也可以基于一段时间内多个信号帧的平均能量来进行,以提高能量对比的准确度。It should be noted that the embodiment of the present application uses the signal frame as the smallest unit to compare the signal energy. In some embodiments, the comparison of the signal energy can also be based on the average energy of multiple signal frames within a period of time, so as to improve the energy The accuracy of the comparison.
在一些实施例中,信号之间的相关性的大小能够用信号之间的互相关值来体现,计算信号之间的互相关值的原理参见公式(6)。In some embodiments, the magnitude of the correlation between the signals can be reflected by the cross-correlation value between the signals, and the principle of calculating the cross-correlation value between the signals can be referred to formula (6).
Figure PCTCN2022142338-appb-000008
Figure PCTCN2022142338-appb-000008
公式(6)中,f(t)和g(t)为两个信号;
Figure PCTCN2022142338-appb-000009
为信号f(t)和信号g(t)之间的互相关值。
In formula (6), f(t) and g(t) are two signals;
Figure PCTCN2022142338-appb-000009
is the cross-correlation value between signal f(t) and signal g(t).
在一些实施例中,对于任一路声音信号,若该路声音信号的信号能量大于参考信号的信号能量,且,与该参考信号之间的相关性大于相关性阈值,则表明该参考信号并未对该路声音信号造成影响,在这种情况下,将该参考信号置零,例如,将参考信号中的多个信号帧的信号幅值置0,使得在后续的处理过程中,无需再考虑该参考信号对该路声音信号的影响。In some embodiments, for any sound signal, if the signal energy of the sound signal is greater than the signal energy of the reference signal, and the correlation with the reference signal is greater than the correlation threshold, it indicates that the reference signal is not In this case, the reference signal is set to zero, for example, the signal amplitudes of multiple signal frames in the reference signal are set to 0, so that in the subsequent processing, there is no need to consider The influence of the reference signal on the audio signal of the channel.
通过上述技术方案,能够从多路声音信号中,确定出受干扰源影响较大的第一声音信号,进而针对性地滤除该第一声音信号中干扰源的声音,通过提高滤除的准确性,有效提升了声音质量。考虑到实际会议场景中,出于私密谈话的需求,与会人认为自身即是干扰源,则通过上述技术方案,能够在提升声音质量的基础上,保证会议场景中与会人谈话的私密性,有效提升了用户体验。Through the above technical solution, it is possible to determine the first sound signal that is greatly affected by the interference source from the multi-channel sound signals, and then filter out the sound of the interference source in the first sound signal in a targeted manner, by improving the accuracy of filtering performance, effectively improving the sound quality. Considering that in the actual conference scene, due to the need for private conversation, the participants think that they are the source of interference, the above-mentioned technical solution can ensure the privacy of the conversation of the participants in the conference scene on the basis of improving the sound quality, effectively Improved user experience.
需要说明的是,本步骤605为可选步骤,在一些实施例中,可以基于步骤603中确定的参考信号,直接执行步骤606。在另一些实施例中,基于步骤604中去噪后的参考信号,执行步骤606。It should be noted that step 605 is an optional step, and in some embodiments, step 606 may be directly performed based on the reference signal determined in step 603 . In other embodiments, step 606 is performed based on the denoised reference signal in step 604 .
606、会议终端基于该参考信号,对该第一声音信号中的目标声音信号进行增强。606. The conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
在一些实施例中,该第一声音信号中包括多个声源对应的声音信号,其中,该目标声音信号是重点关注的声源对应的声音信号,例如,会议中发言人对应的声音信号,因此,对声音信号进行处理的目的通常是为了突出该目标声音信号。由于参考信号能够很好地代表干扰源的声音信号,通过参考信号对第一声音信号进行处理,能够针对性地减小干扰源的声音对该第一声音信号的影响,进而保证该第一声音信号中目标声音信号的突出。In some embodiments, the first sound signal includes sound signals corresponding to multiple sound sources, wherein the target sound signal is a sound signal corresponding to a focused sound source, for example, a sound signal corresponding to a speaker in a meeting, Therefore, the purpose of processing the sound signal is usually to highlight the target sound signal. Since the reference signal can well represent the sound signal of the interference source, processing the first sound signal through the reference signal can specifically reduce the influence of the sound of the interference source on the first sound signal, thereby ensuring that the first sound Prominence of the target sound signal in the signal.
在本申请实施例中,对该第一声音信号中的目标声音信号进行增强是指:抑制该第一声音信号中的参考信号,以对该第一声音信号中的目标声音信号增强,例如,通过减小该第一声音信号中该参考信号对应部分的占比,以增大该第一声音信号中目标声音信号的占比,进而实现对目标声音信号进行增强的目的。In this embodiment of the present application, enhancing the target sound signal in the first sound signal refers to suppressing the reference signal in the first sound signal to enhance the target sound signal in the first sound signal, for example, By reducing the proportion of the corresponding portion of the reference signal in the first sound signal, the proportion of the target sound signal in the first sound signal is increased, thereby achieving the purpose of enhancing the target sound signal.
在一些实施例中,以该参考信号为滤波器的一路输入,以该第一声音信号为该滤波器的另一路输入,通过该滤波器,滤除该第一声音信号中与该参考信号相关的部分,以增强该第一声音信号中的该目标声音信号,输出滤波结果。In some embodiments, the reference signal is used as one input of the filter, and the first sound signal is used as the other input of the filter. Through the filter, the first sound signal related to the reference signal is filtered out. to enhance the target sound signal in the first sound signal, and output a filtering result.
在一些实施例中,该滤波器包括第一滤波器和第二滤波器,通过将参考信号输入该第一滤波器,基于该第一滤波器的参数,调整参考信号中不同频率的信号成分的权重值,以重构该参考信号,进而得到该参考信号的估计信号,估计信号是对参考信号中干扰源的声音信号进行估计的结果。基于此,将该第一声音信号与估计信号之间的差值信号作为滤波结果,通过滤除该第一声音信号中的估计信号,实现滤除第一声音信号中与参考信号相关的部分。在一些实施例中,第一滤波器的参数基于第二滤波器的参数确定,而第二滤波器的参数基于多次滤波结果之间的差异确定。其中,在将参考信号输入第一滤波器的同时,也将参考信号输入该第二滤波器,从而获得第二滤波器的第n次滤波结果。第二滤波器基于该第二滤波器的第n次滤波结果和第n-1次滤波结果之间的差异,调整第二滤波器的参数,使得基于调整后的参数获得的估计信号,能够更加接近该第一声音信号中干扰源的声音信号。在调整后的该第二滤波器的参数满足收敛条件的情况下,将调整后的第二滤波器的参数配置至第一滤波器,从而提升对第一声音信号进行滤波的效果。其中,n为大于1的整数。In some embodiments, the filter includes a first filter and a second filter, by inputting the reference signal into the first filter, based on the parameters of the first filter, adjusting the signal components of different frequencies in the reference signal The weight value is used to reconstruct the reference signal to obtain an estimated signal of the reference signal, and the estimated signal is a result of estimating the sound signal of the interference source in the reference signal. Based on this, the difference signal between the first sound signal and the estimated signal is used as a filtering result, and the part related to the reference signal in the first sound signal is filtered out by filtering out the estimated signal in the first sound signal. In some embodiments, the parameters of the first filter are determined based on the parameters of the second filter, and the parameters of the second filter are determined based on the difference between multiple filtering results. Wherein, when the reference signal is input into the first filter, the reference signal is also input into the second filter, so as to obtain the nth filtering result of the second filter. The second filter adjusts the parameters of the second filter based on the difference between the nth filtering result of the second filter and the n-1th filtering result, so that the estimated signal obtained based on the adjusted parameters can be more accurate A sound signal that is close to the interference source in the first sound signal. When the adjusted parameters of the second filter meet the convergence condition, the adjusted parameters of the second filter are configured to the first filter, thereby improving the effect of filtering the first sound signal. Wherein, n is an integer greater than 1.
在一些实施例中,上述滤波器是自适应滤波器,自适应滤波器在滤波过程中,通过自适应算法来调整滤波器的参数,以获得更好的滤波效果,例如,该第二滤波器基于该第二滤波器的第n次滤波结果和第n-1次滤波结果之间的差异,通过自适应算法,调整该第二滤波器的参数,其中,该滤波器参数包括滤波器步长,通过调整滤波器步长,能够改变滤波器参数的收敛速度。进一步地,基于不同需求,能够选择不同优化准则下的自适应算法,例如,递推最小二乘算法(recursive least square,RLS)、最小均方误差算法((least mean square,LMS)以及归一化均方误差算法(normalized least mean square,NLMS)等,本申请实施例对此不做限定。In some embodiments, the above-mentioned filter is an adaptive filter. During the filtering process, the adaptive filter uses an adaptive algorithm to adjust the parameters of the filter to obtain a better filtering effect. For example, the second filter Based on the difference between the nth filtering result of the second filter and the n-1th filtering result, the parameters of the second filter are adjusted through an adaptive algorithm, wherein the filter parameters include a filter step size , by adjusting the filter step size, the convergence speed of the filter parameters can be changed. Further, based on different requirements, adaptive algorithms under different optimization criteria can be selected, for example, recursive least square algorithm (recursive least square, RLS), least mean square error algorithm ((least mean square, LMS) and normalized Normalized least mean square (NLMS), etc., which are not limited in this embodiment of the present application.
本申请实施例提供了一种自适应滤波器的示意图,如图7所示,其中,参考信号即是该输入信号x(n);该期望信号y(n)包括第一声音信号v(n)以及该参考信号的系统回声d(n);x(n)经过快速傅里叶变换处理后,同时输入该第一滤波器以及该第二滤波器;该第一滤波器输出(频域)估计信号X′(m);该y(n)经过快速傅里叶变换处理后得到Y(m),该Y(m)通过加法器与该X′(m)相减,输出差值信号E(m),该E(m)经过傅里叶逆变换,得到滤波结果e(n);该第二滤波器输出的(频域)估计信号与Y(m)通过加法器相加后得到的差值信号返回该第二滤波器,用于更新滤波器的参数。其中,H(n)是用于模拟系统回声的系统函数。The embodiment of the present application provides a schematic diagram of an adaptive filter, as shown in FIG. 7 , wherein the reference signal is the input signal x(n); the desired signal y(n) includes the first sound signal v(n ) and the system echo d(n) of the reference signal; x(n) is input to the first filter and the second filter at the same time after fast Fourier transform processing; the first filter output (frequency domain) Estimated signal X'(m); the y(n) is processed by fast Fourier transform to obtain Y(m), the Y(m) is subtracted from the X'(m) by the adder, and the difference signal E is output (m), the E(m) undergoes inverse Fourier transform to obtain the filtering result e(n); the (frequency domain) estimated signal output by the second filter and Y(m) are obtained by adding the The difference signal is returned to the second filter for updating the parameters of the filter. Among them, H(n) is the system function used to simulate the system echo.
在一些实施例中,上述自适应滤波的过程能够基于深度学习模型来进行,通过深度学习 模型对自适应滤波器的参数进行训练,能够有效提升自适应滤波器参数收敛的速度,进而提高滤波的效率。In some embodiments, the above-mentioned adaptive filtering process can be performed based on a deep learning model, and the parameters of the adaptive filter can be trained through the deep learning model, which can effectively improve the convergence speed of the adaptive filter parameters, thereby improving the filtering efficiency. efficiency.
在一些实施例中,在第一声音信号滤波前后的衰减量大于衰减阈值的情况下,该滤波后的第一声音信号可能被减弱导致失真,此时,需要对该滤波后的第一声音信号进行相应处理,例如,增强信号中的人声或剪切信号中的失真片段,以进一步保证声音信号的质量。In some embodiments, when the attenuation of the first sound signal before and after filtering is greater than the attenuation threshold, the filtered first sound signal may be weakened and cause distortion. In this case, the filtered first sound signal needs to be Carry out corresponding processing, for example, enhance the human voice in the signal or cut the distorted segment in the signal, so as to further ensure the quality of the sound signal.
在一些实施例中,会议终端将滤波后的该第一声音信号发送给多媒体控制平台,多媒体控制平台对接收到的该第一声音信号编码后发送给远端会议终端。In some embodiments, the conference terminal sends the filtered first sound signal to the multimedia control platform, and the multimedia control platform encodes the received first sound signal and sends it to the remote conference terminal.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
图8是本申请实施例提供的一种声音信号处理方法的流程图。该方法应用于上述图3对应的声音处理系统中,该声音处理系统包括多个麦克风、会议终端、会议触控平板以及摄像头。该声音信号处理方法由该会议终端执行。如图8所示,该方法包括:Fig. 8 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 3 above, and the sound processing system includes multiple microphones, a conference terminal, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 8, the method includes:
801、会议终端通过多个麦克风,拾取拾音空间内的声音信号。801. The conference terminal picks up sound signals in the sound pickup space through multiple microphones.
本步骤参考步骤601,在此不作赘述。其中,该摄像头用于针对拾音空间进行图像采集,在对声音处理系统进行配置时,需要配置摄像头在拾音空间中的位置信息以及摄像头进行图像采集的角度范围,以确定摄像头采集的图像与拾音空间中的位置之间的关系。例如,在图像和实际拾音空间呈镜面对称关系的情况下,图像中的左半边区域,对应于拾音空间的右半边空间。For this step, refer to step 601, and details are not repeated here. Wherein, the camera is used for image collection for the sound pickup space. When configuring the sound processing system, it is necessary to configure the position information of the camera in the sound pickup space and the angle range of the camera for image collection, so as to determine the difference between the image collected by the camera and The relationship between positions in pickup space. For example, in the case that the image and the actual sound pickup space are mirror-symmetrical, the left half of the image corresponds to the right half of the sound pickup space.
802、会议终端接收位置选择指令,将该位置选择指令所对应的位置,确定为该拾音空间内的干扰源位置。802. The conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
在本申请实施例中,该拾音空间中存在多个声源,会议终端基于接收到的位置选择指令,能够基于位置来确定被认为是干扰源的声源,从而在后续的声音信号处理过程中,针对干扰源的声音进行处理,例如,滤除声音信号中干扰源的声音。In the embodiment of the present application, there are multiple sound sources in the sound pickup space, and the conference terminal can determine the sound source considered as the interference source based on the position based on the received position selection instruction, so that in the subsequent sound signal processing process In , the sound of the interference source is processed, for example, the sound of the interference source in the sound signal is filtered out.
在一些实施例中,该摄像头具有数据处理能力,能够对采集到的图像进行检测,该摄像头在从所采集的图像中检测到第一肢体行为的情况下,向会议终端发送该位置选择指令。其中,该第一肢体行为用于指示对该位置静音,例如,与会人将食指竖放靠近唇边。基于预先配置的摄像头采集的图像与拾音空间中的位置之间的关系,该摄像头能够根据该第一肢体行为在该图像中的位置,确定该第一肢体行为在拾音空间中的位置,从而在位置选择指令中指示该第一肢体行为在拾音空间中的位置。基于此,会议终端从该摄像头接收该位置选择指令,获取该位置选择指令指示的位置,基于该位置选择指令指示的位置,确定干扰源位置对应的麦克风。In some embodiments, the camera has data processing capability and can detect the collected images, and if the camera detects the first body behavior from the collected images, the camera sends the position selection instruction to the conference terminal. Wherein, the first body behavior is used to indicate to mute the position, for example, the participant puts the index finger vertically close to the lip. Based on the relationship between the image captured by the pre-configured camera and the position in the sound pickup space, the camera can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image, Therefore, the position of the first body behavior in the sound pickup space is indicated in the position selection instruction. Based on this, the conference terminal receives the location selection instruction from the camera, acquires the location indicated by the location selection instruction, and determines the microphone corresponding to the location of the interference source based on the location indicated by the location selection instruction.
在另一些实施例中,该摄像头具有数据处理能力,摄像头对采集到的图像进行检测,在从所采集的图像中检测到第一肢体行为的情况下,基于拾音空间中该多个麦克风的位置信息,确定与该第一肢体行为所处位置距离最近的麦克风,基于该麦克风的编号,确定位置选择指令,以指示该麦克风所在的位置为干扰源位置。基于此,会议终端从该摄像头接收该位置选择指令,获取该位置选择指令携带的麦克风编号,将该位置选择指令中的麦克风编号对应的麦克风,确定为干扰源位置对应的麦克风。In some other embodiments, the camera has data processing capabilities, and the camera detects the collected images, and in the case that the first limb behavior is detected from the collected images, based on the multiple microphones in the sound pickup space The position information determines the microphone closest to the position of the first body behavior, and determines a position selection instruction based on the number of the microphone to indicate that the position of the microphone is the position of the interference source. Based on this, the conference terminal receives the position selection command from the camera, acquires the microphone number carried in the position selection command, and determines the microphone corresponding to the microphone number in the position selection command as the microphone corresponding to the position of the interference source.
通过上述技术方案,能够直接从位置选择指令中获取干扰源位置,减少了运算过程中涉 及到的数据量,提高了声音信号处理的效率。Through the above technical solution, the position of the interference source can be obtained directly from the position selection instruction, the amount of data involved in the calculation process is reduced, and the efficiency of sound signal processing is improved.
在另一些实施例中,该位置选择指令基于在控制设备中对该干扰源所在位置的选择操作触发,原理参考步骤602。In some other embodiments, the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device, and for a principle, refer to step 602 .
上述过程是以会议终端接收摄像机发送的位置选择指令为例进行说明,在一些实施例中,会议终端接收该摄像头采集到的图像,并对该图像进行检测,以确定干扰源位置,在这种示例下,确定干扰源位置的过程包括下述步骤1至步骤2:The above process is described by taking the conference terminal receiving the position selection instruction sent by the camera as an example. In some embodiments, the conference terminal receives the image collected by the camera and detects the image to determine the location of the interference source. As an example, the process of determining the location of the interference source includes the following steps 1 to 2:
步骤1、会议终端对摄像头所采集的图像进行检测。 Step 1. The conference terminal detects the image collected by the camera.
步骤2、会议终端响应于在该图像中检测到第一肢体行为,将该第一肢体行为在该拾音空间中的位置确定为该干扰源位置。Step 2: In response to detecting the first body behavior in the image, the conference terminal determines the location of the first body behavior in the sound pickup space as the location of the interference source.
在一些实施例中,会议终端基于摄像头采集的图像与拾音空间中的位置之间的关系,能够根据该第一肢体行为在图像中的位置,确定该第一肢体行为在拾音空间中的位置,进而基于拾音空间中该多个麦克风的位置信息,确定与该第一肢体行为所处位置距离最近的麦克风,将该麦克风确定为干扰源位置对应的麦克风。In some embodiments, based on the relationship between the image collected by the camera and the position in the sound pickup space, the conference terminal can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. position, and then based on the position information of the plurality of microphones in the sound pickup space, determine the microphone with the closest distance to the position of the first body behavior, and determine the microphone as the microphone corresponding to the position of the interference source.
通过上述技术方案,基于实时图像确定干扰源位置,保证了干扰源位置的准确性,进一步提高声音质量。Through the above technical solution, the position of the interference source is determined based on the real-time image, which ensures the accuracy of the position of the interference source and further improves the sound quality.
在一些实施例中,该第一肢体行为指示对其所在的位置静音,因此,能够基于该第一肢体行为确定干扰源位置,进而通过滤除干扰源的声音实现对目标声音信号的增强。在另一些实施例中,基于第二肢体行为能够确定目标声音信号,从而直接对目标声音信号进行增强,其中,该第二肢体行为用于指示对目标声音信号进行增强,例如,与会人将食指横放靠近唇边,指示其需要发言。在这种示例下,会议终端响应于在图像中检测到第二肢体行为,将该第二肢体行为在该拾音空间中的位置,确定为目标的位置。其中,该目标是指该拾音空间内存在的多个声源中,需要重点关注的目标声源,因此,需要对该目标声源对应的目标声音信号进行增强。In some embodiments, the first body behavior indicates to mute the location where it is located. Therefore, the location of the interference source can be determined based on the first body behavior, and then the target sound signal can be enhanced by filtering out the sound of the interference source. In some other embodiments, the target sound signal can be determined based on the second body behavior, so as to directly enhance the target sound signal, wherein the second body behavior is used to instruct the target sound signal to be enhanced, for example, the participant puts the index finger Hold it close to your lips to indicate that it needs to speak. In this example, in response to detecting the second body action in the image, the conference terminal determines the position of the second body action in the sound pickup space as the position of the target. Wherein, the target refers to a target sound source that needs to be focused on among the multiple sound sources existing in the sound pickup space, and therefore, the target sound signal corresponding to the target sound source needs to be enhanced.
通过上述技术方案,基于第二肢体行为来确定目标声音信号对应的位置,从而能够对目标声音信号进行针对性的增强,进而提升声音质量。Through the above technical solution, the position corresponding to the target sound signal is determined based on the second body behavior, so that the target sound signal can be enhanced in a targeted manner, thereby improving the sound quality.
803、会议终端将来源于该干扰源位置对应的麦克风的声音信号,确定为参考信号。803. The conference terminal determines the sound signal originating from the microphone corresponding to the location of the interference source as a reference signal.
本步骤参考步骤603,在此不做赘述。For this step, refer to step 603, which will not be repeated here.
在一些实施例中,会议终端在基于摄像机采集的图像确定了干扰源位置之后,能够持续对该干扰源位置进行跟踪。例如,根据干扰源的特征,对干扰源位置进行跟踪检测。在跟踪到该干扰源位置发生变化的情况下,会议终端基于干扰源变化后的位置,从该声音信号中重新确定参考信号。在一些实施例中,会议终端将该第一肢体行为在图像中的位置对应的对象,确定为干扰源对应的对象,基于实时采集的图像,跟踪该对象的位置变化,基于该对象变化后的位置,确定干扰源变化后的位置。当然,对干扰源的追踪能够通过会议终端或会议触控平板手动解除,也可以设置在一定时长后自动解除。In some embodiments, after the conference terminal determines the location of the interference source based on the images collected by the camera, it can continuously track the location of the interference source. For example, according to the characteristics of the interference source, the location of the interference source is tracked and detected. If the position of the interference source is tracked to change, the conference terminal re-determines the reference signal from the sound signal based on the changed position of the interference source. In some embodiments, the conference terminal determines the object corresponding to the position of the first body behavior in the image as the object corresponding to the interference source, and tracks the position change of the object based on the image collected in real time, and based on the changed object Position, to determine the position of the interference source after the change. Of course, the tracking of the interference source can be manually released through the conference terminal or conference touch panel, or it can be set to be automatically released after a certain period of time.
通过上述技术方案,在确定干扰源之后即可锁定该干扰源,从而基于实时位置变化来确定干扰源位置,通过及时地捕捉到干扰源位置的变化,保证干扰源位置的准确性,进一步地,保证在多变的实际会议场景中,始终能够针对干扰源进行声音信号处理,保证声音质量。Through the above technical solution, the interference source can be locked after the interference source is determined, so that the location of the interference source can be determined based on the real-time position change, and the accuracy of the location of the interference source can be ensured by capturing the change of the location of the interference source in time. Further, Ensure that in the changing actual conference scene, the sound signal can always be processed against the interference source to ensure the sound quality.
804、会议终端对该参考信号进行去噪。804. The conference terminal performs denoising on the reference signal.
本步骤参考步骤604,在此不做赘述。For this step, refer to step 604, which will not be repeated here.
805、会议终端基于去噪后的该参考信号,从该拾音空间内的声音信号中,确定第一声音信号。805. The conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
本步骤参考步骤605,在此不做赘述。For this step, refer to step 605, which will not be repeated here.
806、会议终端基于该参考信号,对该第一声音信号中的目标声音信号进行增强。806. The conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
本步骤参考步骤606,在此不作赘述。For this step, refer to step 606, which will not be repeated here.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
进一步地,通过上述技术方案,与会者无需手动选择,基于图像即可自动对干扰源进行定位,在会议场景中实现对干扰源的智能屏蔽,在保证声音质量的同时,提升了会议体验。Furthermore, through the above technical solution, participants do not need to manually select, and can automatically locate the interference source based on the image, and realize intelligent shielding of the interference source in the conference scene, which improves the conference experience while ensuring the sound quality.
图9是本申请实施例提供的一种声音信号处理方法的流程图。该方法应用于上述图4对应的声音处理系统中,该声音处理系统包括麦克风阵列、会议终端、桌面物理按键、会议触控平板以及摄像头。该声音信号处理方法由该会议终端执行。如图9所示,该方法包括:FIG. 9 is a flow chart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 4 above, and the sound processing system includes a microphone array, a conference terminal, physical keys on a desktop, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 9, the method includes:
901、会议终端通过麦克风阵列,拾取拾音空间内的声音信号。901. The conference terminal picks up the sound signal in the sound pickup space through the microphone array.
本步骤参考步骤801,在此不作赘述。其中,在对声音处理系统进行配置时,需要配置该麦克风阵列拾取的声音信号对应的波束角度范围以及该麦克风阵列在该拾音空间中的位置信息,以确定该麦克风阵列拾取的声音信号对应的波束角度范围与该拾音空间中的位置之间的关系。例如,麦克风阵列的声音信号A对应的波束角度范围覆盖拾音空间的左半边空间。在一些实施例中,对声音处理系统进行配置时,对不同波束角度范围的对应的声音信号进行编号,以便于在后续的声音信号处理过程中,能够基于编号,选择所需的声音信号。For this step, refer to step 801, which will not be repeated here. Wherein, when configuring the sound processing system, it is necessary to configure the beam angle range corresponding to the sound signal picked up by the microphone array and the position information of the microphone array in the sound pickup space, so as to determine the corresponding beam angle range of the sound signal picked up by the microphone array. The relationship between the angular extent of the beam and the position in that pickup space. For example, the beam angle range corresponding to the sound signal A of the microphone array covers the left half of the sound pickup space. In some embodiments, when configuring the sound processing system, the sound signals corresponding to different beam angle ranges are numbered, so that in the subsequent sound signal processing process, the required sound signal can be selected based on the number.
其中,该会议终端通过该麦克风阵列,获取来源于该拾音空间的声音信号。在一些实施例中,由于麦克风阵列包括按照某种空间结构排列的多个麦克风,因此,麦克风阵列根据阵列结构的空间特性,通过声音信号到达麦克风阵列中不同阵列单元的差异,来确定声源相对于麦克风阵列的角度,进而确定声源相对于麦克风阵列的位置。Wherein, the conference terminal obtains the sound signal from the sound pickup space through the microphone array. In some embodiments, since the microphone array includes a plurality of microphones arranged in a certain spatial structure, the microphone array determines the relative sound source relative The angle of the microphone array is used to determine the position of the sound source relative to the microphone array.
902、会议终端接收位置选择指令,将该位置选择指令所对应的位置,确定为该拾音空间内的干扰源位置。902. The conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
本步骤参考步骤802。For this step, refer to step 802.
在一些实施例中,该摄像头具有数据处理能力,能够对采集到的图像进行检测,该摄像头在从所采集的图像中检测到第一肢体行为的情况下,向会议终端发送该位置选择指令。基于预先配置的摄像头采集的图像与拾音空间中的位置之间的关系,该摄像头能够根据该第一肢体行为在该图像中的位置,确定该第一肢体行为在拾音空间中的位置。基于此,结合麦克风阵列在拾音空间中的位置信息,即可确定出该第一肢体行为相对于该麦克风阵列的角度。从而在位置选择指令中指示该第一肢体行为相对于麦克风阵列的角度。基于此,会议终端从该摄像头接收该位置选择指令,将该位置选择指令指示的角度,确定为干扰源位置相对于麦克风阵列的角度。In some embodiments, the camera has data processing capability and can detect the collected images, and if the camera detects the first body behavior from the collected images, the camera sends the position selection instruction to the conference terminal. Based on the preconfigured relationship between the image captured by the camera and the position in the sound pickup space, the camera can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. Based on this, combined with the position information of the microphone array in the sound pickup space, the angle of the first body behavior relative to the microphone array can be determined. Therefore, the angle of the first body behavior relative to the microphone array is indicated in the position selection instruction. Based on this, the conference terminal receives the position selection instruction from the camera, and determines the angle indicated by the position selection instruction as the angle of the position of the interference source relative to the microphone array.
在另一些实施例中,该位置选择指令基于在控制设备中对该干扰源所在位置的选择操作触发,原理参考步骤602,在这种示例下,该位置选择指令指示该第一肢体行为相对于麦克风阵列的角度。In some other embodiments, the location selection instruction is triggered based on the selection operation of the location of the interference source in the control device. For the principle, refer to step 602. In this example, the location selection instruction indicates that the first body behavior is relative to The angle of the microphone array.
上述过程是以会议终端接收摄像机发送的位置选择指令为例进行说明,在一些实施例中,会议终端接收该摄像头采集到的图像,并对该图像进行检测,以确定干扰源位置,在这种示例下,确定干扰源位置的过程包括下述步骤1至步骤2:The above process is described by taking the conference terminal receiving the position selection instruction sent by the camera as an example. In some embodiments, the conference terminal receives the image collected by the camera and detects the image to determine the location of the interference source. As an example, the process of determining the location of the interference source includes the following steps 1 to 2:
步骤1、会议终端对摄像头所采集的图像进行检测。 Step 1. The conference terminal detects the image collected by the camera.
步骤2、会议终端响应于在该图像中检测到第一肢体行为,将该第一肢体行为在该拾音空间中的位置确定为该干扰源位置。Step 2: In response to detecting the first body behavior in the image, the conference terminal determines the location of the first body behavior in the sound pickup space as the location of the interference source.
在一些实施例中,会议终端基于摄像头采集的图像与拾音空间中的位置之间的关系,能够根据该第一肢体行为在图像中的位置,确定该第一肢体行为在拾音空间中的位置,进而基麦克风阵列在拾音空间中的位置信息,将该第一肢体行为相对于麦克风阵列的角度,确定为干扰源位置相对于麦克风阵列的角度。In some embodiments, based on the relationship between the image collected by the camera and the position in the sound pickup space, the conference terminal can determine the position of the first body behavior in the sound pickup space according to the position of the first body behavior in the image. position, and further based on the position information of the microphone array in the sound pickup space, the angle of the first body behavior relative to the microphone array is determined as the angle of the position of the interference source relative to the microphone array.
903、会议终端基于该干扰源位置的角度信息,确定与该角度信息匹配的波束角度范围。903. The conference terminal determines a beam angle range matching the angle information based on the angle information of the interference source location.
在一些实施例中,该干扰源位置的角度信息是指干扰源位置相对于麦克风阵列的角度。会议终端基于该角度信息,能够确定与干扰源位置对应的麦克风阵列的波束角度范围。In some embodiments, the angle information of the location of the interference source refers to the angle of the location of the interference source relative to the microphone array. Based on the angle information, the conference terminal can determine the beam angle range of the microphone array corresponding to the location of the interference source.
904、会议终端基于该波束角度范围,从该麦克风阵列拾取的声音信号中,确定参考信号。904. The conference terminal determines a reference signal from the sound signals picked up by the microphone array based on the beam angle range.
在一些实施例中,会议终端从该麦克风阵列拾取的多路声音信号中,获取与该波束角度范围对应的多路声音信号分量,基于每路声音信号分量的特征,对该多路声音信号分量进行组合,得到参考信号。In some embodiments, the conference terminal obtains the multi-path sound signal components corresponding to the beam angle range from the multi-path sound signals picked up by the microphone array, and based on the characteristics of each sound signal component, the multi-path sound signal components Combine them to get a reference signal.
在另一些实施例中,会议终端预先对不同波束角度范围的对应的声音信号进行编号,基于此,该会议终端基于该与该干扰源位置的角度信息匹配的波束角度范围,获取对应的声音信号的编号,从而直接将编号对应的声音信号确定为参考信号。In some other embodiments, the conference terminal pre-numbers the corresponding sound signals of different beam angle ranges, and based on this, the conference terminal acquires the corresponding sound signal based on the beam angle range matched with the angle information of the interference source position , so that the sound signal corresponding to the number is directly determined as the reference signal.
905、会议终端对该参考信号进行去噪。905. The conference terminal performs denoising on the reference signal.
本步骤参考步骤604,在此不做赘述。For this step, refer to step 604, which will not be repeated here.
906、会议终端基于去噪后的该参考信号,从该拾音空间内的声音信号中,确定第一声音信号。906. The conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
本步骤参考步骤605,在此不做赘述。For this step, refer to step 605, which will not be repeated here.
907、会议终端基于该参考信号,对该第一声音信号中的目标声音信号进行增强。907. The conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
本步骤参考步骤606,在此不作赘述。For this step, refer to step 606, which will not be repeated here.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
进一步地,本申请实施例提供的方法在采用麦克风阵列进行拾音的场景下,能够适配于麦克风阵列的空间排列特性,利用干扰源的角度信息,获取对干扰源而言具有针对性的指定角度范围内的声音信号,保证了参考信号对干扰源的代表性,提升了针对干扰源进行声音信号处理的准确性,有效提升声音质量。Furthermore, the method provided by the embodiment of the present application can adapt to the spatial arrangement characteristics of the microphone array in the scenario where the microphone array is used to pick up sound, and use the angle information of the interference source to obtain a targeted designation for the interference source. The sound signal within the angular range ensures the representativeness of the reference signal to the interference source, improves the accuracy of sound signal processing for the interference source, and effectively improves the sound quality.
图10是本申请实施例提供的一种声音信号处理方法的流程图。该方法应用于上述图5对应的声音处理系统中,该声音处理系统包括多个具有定位功能的分布式麦克风、会议终端、会议触控平板以及摄像头。该声音信号处理方法由该会议终端执行。如图10所示,该方法包括:Fig. 10 is a flowchart of a sound signal processing method provided by an embodiment of the present application. This method is applied to the sound processing system corresponding to FIG. 5 above, and the sound processing system includes a plurality of distributed microphones with a positioning function, a conference terminal, a conference touch panel, and a camera. The sound signal processing method is executed by the conference terminal. As shown in Figure 10, the method includes:
1001、会议终端通过具有定位功能的分布式麦克风,拾取拾音空间内的声音信号。1001. The conference terminal picks up the sound signal in the sound pickup space through the distributed microphone with positioning function.
本步骤参考步骤801,在此不作赘述。在会议开始前,该多个具有定位功能的分布式麦克风与会议终端进行信号交互,该会议终端根据从多个分布式麦克风接收到的信号,确定各个分布式麦克风在拾音空间中的位置信息。其中,在该分布式麦克风的位置发生变化的情况 下,会议终端能够基于接收到的信号,实时更新该分布式麦克风的位置信息。可选地,该分布式麦克风可以通过蓝牙、超声波或无线局域网等方式与会议终端进行信号交互。可选地,多个分布式麦克风之间通过持续进行信号交互,来保持时间同步。For this step, refer to step 801, and details are not described here. Before the start of the meeting, the multiple distributed microphones with positioning functions perform signal interaction with the conference terminal, and the conference terminal determines the position information of each distributed microphone in the sound pickup space according to the signals received from the multiple distributed microphones . Wherein, when the position of the distributed microphone changes, the conference terminal can update the position information of the distributed microphone in real time based on the received signal. Optionally, the distributed microphone can perform signal interaction with the conference terminal through bluetooth, ultrasonic wave or wireless local area network. Optionally, multiple distributed microphones maintain time synchronization through continuous signal interaction.
本申请实施例提供了一种分布式麦克风定位过程的示意图,如图11所示,其中,该会议终端上安装有四个信号交互装置1101、1102、1103和1104,用于与分布式麦克风1105进行信号交互,各个信号交互装置之间的相对位置已预先确定,参见图11中各个信号交互装置的坐标。会议终端获取四个信号交互装置接收到分布式麦克风1105所发出信号的时刻分别为t i(i=1,2,3,4),用于计算分布式麦克风1105到第i个信号交互装置的距离r i(i=1,2,3,4)。距离计算过程参见下述公式(7)至公式(14)。 The embodiment of the present application provides a schematic diagram of a distributed microphone positioning process, as shown in FIG. For signal interaction, the relative positions of each signal interaction device have been predetermined, see the coordinates of each signal interaction device in FIG. 11 . The conference terminal obtains the moments when the four signal interaction devices receive the signals sent by the distributed microphone 1105 respectively as t i (i=1, 2, 3, 4), which is used to calculate the distance between the distributed microphone 1105 and the i-th signal interaction device. Distance r i (i=1, 2, 3, 4). For the distance calculation process, refer to the following formula (7) to formula (14).
d i,12=r 1-r 2=(t 1-t 2)×c                (7) d i,12 =r 1 -r 2 =(t 1 -t 2 )×c (7)
d i,23=r 2-r 3=(t 2-t 3)×c              (8) d i,23 =r 2 -r 3 =(t 2 -t 3 )×c (8)
d i,34=r 3-r 4=(t 3-t 4)×c          (9) d i,34 =r 3 -r 4 =(t 3 -t 4 )×c (9)
d i,41=r 4-r 1=(t 4-t 1)×c        (10) d i,41 =r 4 -r 1 =(t 4 -t 1 )×c (10)
其中,该d i,12是分布式麦克风1105相对于信号交互装置1101与信号交互装置1102之间的距离差;该d i,23是分布式麦克风1105相对于信号交互装置1102与信号交互装置1103之间的距离差;该d i,34是分布式麦克风1105相对于信号交互装置1103与信号交互装置1104之间的距离差;该d i,41是分布式麦克风1105相对于信号交互装置1104与信号交互装置1101之间的距离差;c为光速。 Wherein, the di , 12 is the distance difference between the distributed microphone 1105 relative to the signal interaction device 1101 and the signal interaction device 1102; The distance difference between; the d i, 34 is the distance difference between the distributed microphone 1105 relative to the signal interaction device 1103 and the signal interaction device 1104; the d i, 41 is the distance difference between the distributed microphone 1105 and the signal interaction device 1104 and The distance difference between the signal interaction devices 1101; c is the speed of light.
基于上述d i,12、d i,23、d i,34以及d i,41,能够建立双曲线方程组,用于确定分布式麦克风1105到第i个信号交互装置的距离r i(i=1,2,3,4),实现对该分布式麦克风1105的定位。 Based on the above d i,12 , d i,23 , d i,34 and d i,41 , a system of hyperbolic equations can be established for determining the distance r i (i= 1, 2, 3, 4), to realize the positioning of the distributed microphone 1105.
Figure PCTCN2022142338-appb-000010
Figure PCTCN2022142338-appb-000010
Figure PCTCN2022142338-appb-000011
Figure PCTCN2022142338-appb-000011
Figure PCTCN2022142338-appb-000012
Figure PCTCN2022142338-appb-000012
Figure PCTCN2022142338-appb-000013
Figure PCTCN2022142338-appb-000013
其中,(x 1,y 1,z 1)是信号交互装置1101的坐标;(x 2,y 2,z 2)是信号交互装置1102的坐标;(x 3,y 3,z 3)是信号交互装置1103的坐标;(x 4,y 4,z 4)是信号交互装置1104的坐标;其中,i=1,2,3,4。 Among them, (x 1 , y 1 , z 1 ) is the coordinates of the signal interaction device 1101; (x 2 , y 2 , z 2 ) is the coordinates of the signal interaction device 1102; (x 3 , y 3 , z 3 ) is the signal The coordinates of the interaction device 1103; (x 4 , y 4 , z 4 ) are the coordinates of the signal interaction device 1104; wherein, i=1, 2, 3, 4.
1002、会议终端接收位置选择指令,将该位置选择指令所对应的位置,确定为该拾音空间内的干扰源位置。1002. The conference terminal receives the location selection instruction, and determines the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
本步骤参考步骤802,在此不作赘述。For this step, refer to step 802, which will not be repeated here.
1003、会议终端将来源于该干扰源位置对应的分布式麦克风的声音信号,确定为参考信号。1003. The conference terminal determines the sound signal from the distributed microphone corresponding to the location of the interference source as a reference signal.
本步骤参考步骤803,在此不做赘述。For this step, refer to step 803, and details are not repeated here.
1004、会议终端对该参考信号进行去噪。1004. The conference terminal performs denoising on the reference signal.
本步骤参考步骤804,在此不做赘述。For this step, refer to step 804, which will not be repeated here.
1005、会议终端基于去噪后的该参考信号,从该拾音空间内的声音信号中,确定第一声音信号。1005. The conference terminal determines the first sound signal from the sound signals in the sound pickup space based on the denoised reference signal.
本步骤参考步骤805,在此不做赘述。For this step, refer to step 805, which will not be repeated here.
1006、会议终端基于该参考信号,对该第一声音信号中的目标声音信号进行增强。1006. The conference terminal enhances the target sound signal in the first sound signal based on the reference signal.
本步骤参考步骤806,在此不作赘述。For this step, refer to step 806, which will not be repeated here.
在本申请实施例提供的技术方案中,基于拾音空间中干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance the target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
进一步地,通过上述技术方案,能够根据需求随机摆放多个麦克风,大大减小了设备部署时的场景限制,在提升声音处理系统中设备部署灵活性的同时,通过对麦克风进行实时定位,实现对干扰源的准确定位,从而更加精准地从声音信号中滤除干扰源的声音,有效保证声音质量。Furthermore, through the above technical solution, multiple microphones can be randomly placed according to the requirements, which greatly reduces the scene restrictions during equipment deployment. While improving the flexibility of equipment deployment in the sound processing system, real-time positioning of the microphones is achieved. Accurate positioning of the interference source, thereby more accurately filtering out the sound of the interference source from the sound signal, effectively ensuring the sound quality.
图12是本申请实施例提供的一种声音信号处理装置的结构示意图。如图12所示,该声音信号处理装置包括:Fig. 12 is a schematic structural diagram of an audio signal processing device provided by an embodiment of the present application. As shown in Figure 12, the sound signal processing device includes:
拾音模块1201,用于通过拾音设备,拾取拾音空间内的声音信号;The sound pickup module 1201 is used to pick up the sound signal in the sound pickup space through the sound pickup device;
位置确定模块1202,用于确定所述拾音空间内的干扰源位置;A position determining module 1202, configured to determine the position of the interference source in the sound pickup space;
信号确定模块1203,用于基于所述干扰源位置,从所述声音信号中确定参考信号,所述参考信号用于滤除所述干扰源的声音;A signal determination module 1203, configured to determine a reference signal from the sound signal based on the location of the interference source, and the reference signal is used to filter out the sound of the interference source;
增强模块1204,用于基于所述参考信号,对目标声音信号进行增强。An enhancement module 1204, configured to enhance the target sound signal based on the reference signal.
在一种可能实施方式中,所述位置确定模块1202包括:In a possible implementation manner, the location determining module 1202 includes:
第一确定单元,用于接收位置选择指令,将所述位置选择指令所对应的位置,确定为所述拾音空间内的干扰源位置。The first determining unit is configured to receive a location selection instruction, and determine the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
在一种可能实施方式中,所述位置选择指令基于在控制设备中对所述干扰源所在位置的选择操作触发。In a possible implementation manner, the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
在一种可能实施方式中,所述位置选择指令由图像采集设备在所采集的图像中检测到第一肢体行为的情况下触发,所述图像采集设备用于针对所述拾音空间进行图像采集,所述第一肢体行为用于指示对所述位置静音。In a possible implementation manner, the position selection instruction is triggered by an image acquisition device when the first body behavior is detected in the collected image, and the image acquisition device is used to perform image acquisition for the sound pickup space , the first body behavior is used to indicate to mute the location.
在一种可能实施方式中,所述位置确定模块1202包括:In a possible implementation manner, the location determining module 1202 includes:
图像检测单元,用于对图像采集设备所采集的目标图像进行检测,所述图像采集设备用于针对所述拾音空间进行图像采集;An image detection unit, configured to detect the target image collected by the image acquisition device, and the image acquisition device is used for image acquisition for the sound pickup space;
第二确定单元,用于响应于在所述目标图像中检测到第一肢体行为,将所述第一肢体行为在所述拾音空间中的位置确定为所述干扰源位置,所述第一肢体行为用于指示对所述位置静音。A second determination unit, configured to determine a position of the first body action in the sound pickup space as the position of the interference source in response to detecting a first body action in the target image, the first Physical behavior is used to indicate muting of said location.
在一种可能实施方式中,所述装置还包括:In a possible implementation manner, the device further includes:
第三确定单元,用于响应于在所述目标图像中检测到第二肢体行为,将所述第二肢体行为在所述拾音空间中的位置确定为所述目标的位置,所述第二肢体行为用于指示对所述目标 声音信号进行增强。A third determining unit, configured to determine a position of the second body action in the sound pickup space as the position of the target in response to detecting a second body action in the target image, the second Physical behavior is used to indicate the enhancement of the target sound signal.
在一种可能实施方式中,所述装置还包括:In a possible implementation manner, the device further includes:
跟踪单元,用于对所述干扰源位置进行跟踪;a tracking unit, configured to track the position of the interference source;
所述信号确定模块用于:The signal determination module is used for:
基于跟踪到的所述干扰源位置发生变化,从所述声音信号中重新确定参考信号。Based on the tracked change in the position of the interference source, a reference signal is re-determined from the sound signal.
在一种可能实施方式中,所述拾音设备包括多个麦克风,所述信号确定模块用于:In a possible implementation manner, the sound pickup device includes multiple microphones, and the signal determination module is configured to:
将来源于所述干扰源位置对应的麦克风的声音信号,确定为参考信号。The sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
在一种可能实施方式中,所述多个麦克风具有定位功能。In a possible implementation manner, the multiple microphones have a positioning function.
在一种可能实施方式中,所述拾音设备为麦克风阵列,所述信号确定模块1203用于:In a possible implementation manner, the sound pickup device is a microphone array, and the signal determining module 1203 is configured to:
基于所述干扰源位置的角度信息,确定与所述角度信息匹配的波束角度范围;Based on the angle information of the position of the interference source, determine a beam angle range matching the angle information;
基于所述波束角度范围,从所述麦克风阵列拾取的声音信号中,确定参考信号。Based on the beam angle range, a reference signal is determined from the sound signals picked up by the microphone array.
在一种可能实施方式中,所述增强模块1204包括:In a possible implementation manner, the enhancement module 1204 includes:
信号确定单元,用于基于所述参考信号,从所述拾音空间内的声音信号中,确定第一声音信号,所述第一声音信号的信号能量小于所述参考信号的信号能量,且,所述第一声音信号与所述参考信号之间的相关性大于相关性阈值;A signal determining unit, configured to determine a first sound signal from sound signals in the sound pickup space based on the reference signal, the signal energy of the first sound signal is less than the signal energy of the reference signal, and, the correlation between the first sound signal and the reference signal is greater than a correlation threshold;
增强单元,用于基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强。An enhancement unit, configured to enhance a target sound signal in the first sound signal based on the reference signal.
在一种可能实施方式中,所述增强单元用于:In a possible implementation manner, the enhancing unit is used for:
以所述参考信号为滤波器的一路输入,以所述第一声音信号为所述滤波器的另一路输入,通过所述滤波器,滤除所述第一声音信号中与所述参考信号相关的部分,以增强所述第一声音信号中的所述目标声音信号,输出滤波结果。Using the reference signal as one input of the filter, using the first sound signal as the other input of the filter, and filtering out the first sound signal related to the reference signal through the filter to enhance the target sound signal in the first sound signal, and output a filtering result.
需要说明的是:上述实施例提供的声音信号处理装置在进行声音信号处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的声音信号处理装置与声音信号处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the sound signal processing device provided in the above-mentioned embodiment performs sound signal processing, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be assigned to different functional modules according to needs. To complete means to divide the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the sound signal processing device provided in the above embodiment and the sound signal processing method embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
在本申请实施例提供的技术方案中,基于拾音空间中的干扰源位置,从拾音空间内的声音信号中确定参考信号,进而基于参考信号滤除声音信号中干扰源的声音,以增强目标声音信号。通过上述技术方案,根据干扰源位置进行声音信号处理,能够针对性地对干扰源的声音进行屏蔽,以增强目标声音信号,从而提升声音质量。In the technical solution provided by the embodiment of the present application, based on the position of the interference source in the sound pickup space, the reference signal is determined from the sound signal in the sound pickup space, and then the sound of the interference source in the sound signal is filtered out based on the reference signal, so as to enhance Target sound signal. Through the above technical solution, the sound signal processing is performed according to the position of the interference source, and the sound of the interference source can be shielded in a targeted manner to enhance the target sound signal, thereby improving the sound quality.
本申请实施例提供了一种声音信号处理设备,能够作为上述声音处理系统中的声音信号处理设备。示意性地,参考图13,图13是本申请实施例提供的一种声音信号处理设备的硬件结构示意图。如图13所示,该声音信号处理设备1300包括存储器1301、处理器1302、通信接口1303以及总线1304。其中,存储器1301、处理器1302、通信接口1303通过总线1304实现彼此之间的通信连接。An embodiment of the present application provides a sound signal processing device, which can be used as the sound signal processing device in the above sound processing system. Schematically, refer to FIG. 13 , which is a schematic diagram of a hardware structure of an audio signal processing device provided by an embodiment of the present application. As shown in FIG. 13 , the audio signal processing device 1300 includes a memory 1301 , a processor 1302 , a communication interface 1303 and a bus 1304 . Wherein, the memory 1301 , the processor 1302 , and the communication interface 1303 are connected to each other through a bus 1304 .
存储器1301可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或 数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器1301可以存储至少一段程序代码,当存储器1301中存储的程序代码被处理器1302执行时,使得声音信号处理设备能够实现上述声音信号处理方法。存储器1301还可以存储各类数据,包括但不限于图像和声音信号等,本申请实施例对此不作限定。 Memory 1301 may be read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be programmed by a computer Any other medium accessed, but not limited to. The memory 1301 can store at least one piece of program code, and when the program code stored in the memory 1301 is executed by the processor 1302, the sound signal processing device can implement the above sound signal processing method. The memory 1301 may also store various types of data, including but not limited to images and audio signals, which are not limited in this embodiment of the present application.
处理器1302可以是网络处理器(network processor,NP)、中央处理器(central processing unit,CPU)、特定应用集成电路(application-specific integrated circuit,ASIC)或用于控制本申请方案程序执行的集成电路。该处理器1302可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。该处理器1302的数量可以是一个,也可以是多个。通信接口1303使用例如收发器一类的收发模块,来实现声音信号处理设备1300与其他设备或通信网络之间的通信。例如,可以通过通信接口1303获取声音信号。The processor 1302 may be a network processor (network processor, NP), a central processing unit (central processing unit, CPU), a specific application integrated circuit (application-specific integrated circuit, ASIC) or an integrated circuit for controlling the program execution of the application scheme. circuit. The processor 1302 may be a single-core (single-CPU) processor, or a multi-core (multi-CPU) processor. The number of the processor 1302 may be one or more. The communication interface 1303 uses a transceiver module such as a transceiver to implement communication between the sound signal processing device 1300 and other devices or communication networks. For example, sound signals can be acquired through the communication interface 1303 .
其中,存储器1301和处理器1302可以分离设置,也可以集成在一起。Wherein, the memory 1301 and the processor 1302 may be provided separately, or may be integrated together.
总线1304可包括在声音信号处理设备1300各个部件(例如,存储器1301、处理器1302、通信接口1303)之间传送信息的通路。The bus 1304 may include a path for transferring information between various components of the sound signal processing device 1300 (eg, memory 1301 , processor 1302 , communication interface 1303 ).
本发明中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一麦克风可以被称为第二麦克风,并且类似地,第二麦克风可以被称为第一麦克风。第一麦克风和第二麦克风都可以是麦克风,并且在某些情况下,可以是单独且不同的麦克风。In the present invention, the terms "first" and "second" are used to distinguish the same or similar items with basically the same function and function. It should be understood that "first", "second" and "nth" There are no logical or timing dependencies, nor are there restrictions on quantity or order of execution. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first microphone could be termed a second microphone, and, similarly, a second microphone could be termed a first microphone, without departing from the scope of the various described examples. Both the first microphone and the second microphone may be microphones, and in some cases may be separate and distinct microphones.
本发明中术语“至少一个”的含义是指一个或多个,本发明中术语“多个”的含义是指两个或两个以上,例如,多个麦克风是指两个或两个以上的麦克风。The meaning of the term "at least one" in the present invention refers to one or more, the meaning of the term "multiple" in the present invention refers to two or more, for example, a plurality of microphones refers to two or more microphone.
以上描述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above description is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalent modifications within the technical scope disclosed in the present invention Or replacement, these modifications or replacements should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以程序产品的形式实现。该程序产品包括一个或多个程序指令。在声音信号处理设备上加载和执行该程序指令时,全部或部分地产生按照本发明实施例中的流程或功能。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a program product. The program product includes one or more program instructions. When the program instructions are loaded and executed on the sound signal processing device, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above-mentioned embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage medium can be read-only memory, magnetic disk or optical disk and so on.
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions recorded in each embodiment are modified, or some of the technical features are replaced equivalently; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (27)

  1. 一种声音信号处理方法,其特征在于,所述方法包括:A sound signal processing method, characterized in that the method comprises:
    通过拾音设备,拾取拾音空间内的声音信号;Pick up the sound signal in the sound pickup space through the sound pickup device;
    确定所述拾音空间内的干扰源位置;determining the position of the interference source in the sound pickup space;
    基于所述干扰源位置,从所述声音信号中确定参考信号,所述参考信号用于滤除所述干扰源的声音;Determining a reference signal from the sound signal based on the location of the interference source, the reference signal being used to filter out the sound of the interference source;
    基于所述参考信号,对目标声音信号进行增强。Based on the reference signal, the target sound signal is enhanced.
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述拾音空间内的干扰源位置包括:The method according to claim 1, wherein said determining the position of the interference source in the sound pickup space comprises:
    接收位置选择指令,将所述位置选择指令所对应的位置,确定为所述拾音空间内的干扰源位置。The location selection instruction is received, and the location corresponding to the location selection instruction is determined as the location of the interference source in the sound pickup space.
  3. 根据权利要求2所述的方法,其特征在于,所述位置选择指令基于在控制设备中对所述干扰源所在位置的选择操作触发。The method according to claim 2, wherein the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
  4. 根据权利要求2所述的方法,其特征在于,所述位置选择指令由图像采集设备在所采集的图像中检测到第一肢体行为的情况下触发,所述图像采集设备用于针对所述拾音空间进行图像采集,所述第一肢体行为用于指示对所述位置静音。The method according to claim 2, wherein the position selection instruction is triggered by an image acquisition device when a first limb behavior is detected in the captured image, and the image acquisition device is used for the picked-up Image acquisition is performed in a sound space, and the first body behavior is used to indicate to mute the location.
  5. 根据权利要求1所述的方法,其特征在于,所述确定所述拾音空间内的干扰源位置包括:The method according to claim 1, wherein said determining the position of the interference source in the sound pickup space comprises:
    对图像采集设备所采集的目标图像进行检测,所述图像采集设备用于针对所述拾音空间进行图像采集;Detecting the target image collected by the image acquisition device, the image acquisition device is used for image acquisition for the sound pickup space;
    响应于在所述目标图像中检测到第一肢体行为,将所述第一肢体行为在所述拾音空间中的位置确定为所述干扰源位置,所述第一肢体行为用于指示对所述位置静音。In response to detecting a first body action in the target image, determining a position of the first body action in the sound pickup space as the position of the interference source, the first body action being used to indicate the The above position is muted.
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method according to claim 5, wherein the method further comprises:
    响应于在所述目标图像中检测到第二肢体行为,将所述第二肢体行为在所述拾音空间中的位置确定为所述目标的位置,所述第二肢体行为用于指示对所述目标声音信号进行增强。In response to detecting a second body action in the image of the target, determining a position of the second body action in the sound pickup space as the position of the target, the second body action being used to indicate the The above-mentioned target sound signal is enhanced.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    对所述干扰源位置进行跟踪;Tracking the location of the interference source;
    所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:The determining the reference signal from the sound signal based on the location of the interference source includes:
    基于跟踪到的所述干扰源位置发生变化,从所述声音信号中重新确定参考信号。Based on the tracked change in the position of the interference source, a reference signal is re-determined from the sound signal.
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述拾音设备包括多个麦克风,所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:The method according to any one of claims 1 to 7, wherein the sound pickup device includes a plurality of microphones, and determining a reference signal from the sound signal based on the position of the interference source comprises:
    将来源于所述干扰源位置对应的麦克风的声音信号,确定为参考信号。The sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
  9. 根据权利要求8所述的方法,其特征在于,所述多个麦克风具有定位功能。The method according to claim 8, wherein the plurality of microphones have a positioning function.
  10. 根据权利要求1至7任一项所述的方法,其特征在于,所述拾音设备为麦克风阵列,所述基于所述干扰源位置,从所述声音信号中确定参考信号包括:The method according to any one of claims 1 to 7, wherein the sound pickup device is a microphone array, and determining a reference signal from the sound signal based on the position of the interference source comprises:
    基于所述干扰源位置的角度信息,确定与所述角度信息匹配的波束角度范围;Based on the angle information of the position of the interference source, determine a beam angle range matching the angle information;
    基于所述波束角度范围,从所述麦克风阵列拾取的声音信号中,确定参考信号。Based on the beam angle range, a reference signal is determined from the sound signals picked up by the microphone array.
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述基于所述参考信号,对目标声音信号进行增强包括:The method according to any one of claims 1 to 10, wherein said enhancing the target sound signal based on the reference signal comprises:
    基于所述参考信号,从所述拾音空间内的声音信号中,确定第一声音信号,所述第一声音信号的信号能量小于所述参考信号的信号能量,且,所述第一声音信号与所述参考信号之间的相关性大于相关性阈值;Based on the reference signal, determine a first sound signal from the sound signals in the sound pickup space, the signal energy of the first sound signal is less than the signal energy of the reference signal, and the first sound signal the correlation with the reference signal is greater than a correlation threshold;
    基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强。Based on the reference signal, the target sound signal in the first sound signal is enhanced.
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强包括:The method according to claim 11, wherein said enhancing the target sound signal in the first sound signal based on the reference signal comprises:
    以所述参考信号为滤波器的一路输入,以所述第一声音信号为所述滤波器的另一路输入,通过所述滤波器,滤除所述第一声音信号中与所述参考信号相关的部分,以增强所述第一声音信号中的所述目标声音信号,输出滤波结果。Using the reference signal as one input of the filter, using the first sound signal as the other input of the filter, and filtering out the first sound signal related to the reference signal through the filter to enhance the target sound signal in the first sound signal, and output a filtering result.
  13. 一种声音信号处理装置,其特征在于,所述装置包括:A sound signal processing device, characterized in that the device comprises:
    拾音模块,用于通过拾音设备,拾取拾音空间内的声音信号;The pickup module is used to pick up the sound signal in the pickup space through the pickup device;
    位置确定模块,用于确定所述拾音空间内的干扰源位置;A position determining module, configured to determine the position of the interference source in the sound pickup space;
    信号确定模块,用于基于所述干扰源位置,从所述声音信号中确定参考信号,所述参考信号用于滤除所述干扰源的声音;A signal determination module, configured to determine a reference signal from the sound signal based on the position of the interference source, and the reference signal is used to filter out the sound of the interference source;
    增强模块,用于基于所述参考信号,对目标声音信号进行增强。An enhancement module, configured to enhance the target sound signal based on the reference signal.
  14. 根据权利要求13所述的装置,其特征在于,所述位置确定模块包括:The device according to claim 13, wherein the position determining module comprises:
    第一确定单元,用于接收位置选择指令,将所述位置选择指令所对应的位置,确定为所述拾音空间内的干扰源位置。The first determining unit is configured to receive a location selection instruction, and determine the location corresponding to the location selection instruction as the location of the interference source in the sound pickup space.
  15. 根据权利要求14所述的装置,其特征在于,所述位置选择指令基于在控制设备中对所述干扰源所在位置的选择操作触发。The apparatus according to claim 14, wherein the location selection instruction is triggered based on a selection operation of the location of the interference source in the control device.
  16. 根据权利要求14所述的装置,其特征在于,所述位置选择指令由图像采集设备在所采集的图像中检测到第一肢体行为的情况下触发,所述图像采集设备用于针对所述拾音空间进行图像采集,所述第一肢体行为用于指示对所述位置静音。The device according to claim 14, wherein the position selection instruction is triggered by an image acquisition device when a first body action is detected in the captured image, and the image acquisition device is used for the picked-up Image acquisition is performed in a sound space, and the first body behavior is used to indicate to mute the location.
  17. 根据权利要求13所述的装置,其特征在于,所述位置确定模块包括:The device according to claim 13, wherein the position determining module comprises:
    图像检测单元,用于对图像采集设备所采集的目标图像进行检测,所述图像采集设备用于针对所述拾音空间进行图像采集;An image detection unit, configured to detect the target image collected by the image acquisition device, and the image acquisition device is used for image acquisition for the sound pickup space;
    第二确定单元,用于响应于在所述目标图像中检测到第一肢体行为,将所述第一肢体行 为在所述拾音空间中的位置确定为所述干扰源位置,所述第一肢体行为用于指示对所述位置静音。A second determination unit, configured to determine a position of the first body action in the sound pickup space as the position of the interference source in response to detecting a first body action in the target image, the first Physical behavior is used to indicate muting of said location.
  18. 根据权利要求17所述的装置,其特征在于,所述装置还包括:The device according to claim 17, further comprising:
    第三确定单元,用于响应于在所述目标图像中检测到第二肢体行为,将所述第二肢体行为在所述拾音空间中的位置确定为所述目标的位置,所述第二肢体行为用于指示对所述目标声音信号进行增强。A third determining unit, configured to determine a position of the second body action in the sound pickup space as the position of the target in response to detecting a second body action in the target image, the second Physical behavior is used to indicate the enhancement of the target sound signal.
  19. 根据权利要求13至18任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 13 to 18, wherein the device further comprises:
    跟踪单元,用于对所述干扰源位置进行跟踪;a tracking unit, configured to track the position of the interference source;
    所述信号确定模块用于:The signal determination module is used for:
    基于跟踪到的所述干扰源位置发生变化,从所述声音信号中重新确定参考信号。Based on the tracked change in the position of the interference source, a reference signal is re-determined from the sound signal.
  20. 根据权利要求13至19任一项所述的装置,其特征在于,所述拾音设备包括多个麦克风,所述信号确定模块用于:The device according to any one of claims 13 to 19, wherein the sound pickup device includes a plurality of microphones, and the signal determination module is used for:
    将来源于所述干扰源位置对应的麦克风的声音信号,确定为参考信号。The sound signal originating from the microphone corresponding to the location of the interference source is determined as a reference signal.
  21. 根据权利要求20所述的装置,其特征在于,所述多个麦克风具有定位功能。The device according to claim 20, wherein the plurality of microphones have a positioning function.
  22. 根据权利要求13至19任一项所述的装置,其特征在于,所述拾音设备为麦克风阵列,所述信号确定模块用于:The device according to any one of claims 13 to 19, wherein the sound pickup device is a microphone array, and the signal determination module is used for:
    基于所述干扰源位置的角度信息,确定与所述角度信息匹配的波束角度范围;Based on the angle information of the position of the interference source, determine a beam angle range matching the angle information;
    基于所述波束角度范围,从所述麦克风阵列拾取的声音信号中,确定参考信号。Based on the beam angle range, a reference signal is determined from the sound signals picked up by the microphone array.
  23. 根据权利要求13至22任一项所述的装置,其特征在于,所述增强模块包括:The device according to any one of claims 13 to 22, wherein the enhancement module comprises:
    信号确定单元,用于基于所述参考信号,从所述拾音空间内的声音信号中,确定第一声音信号,所述第一声音信号的信号能量小于所述参考信号的信号能量,且,所述第一声音信号与所述参考信号之间的相关性大于相关性阈值;A signal determining unit, configured to determine a first sound signal from sound signals in the sound pickup space based on the reference signal, the signal energy of the first sound signal is less than the signal energy of the reference signal, and, the correlation between the first sound signal and the reference signal is greater than a correlation threshold;
    增强单元,用于基于所述参考信号,对所述第一声音信号中的目标声音信号进行增强。An enhancement unit, configured to enhance a target sound signal in the first sound signal based on the reference signal.
  24. 根据权利要求23所述的装置,其特征在于,所述增强单元用于:The device according to claim 23, wherein the enhancing unit is used for:
    以所述参考信号为滤波器的一路输入,以所述第一声音信号为所述滤波器的另一路输入,通过所述滤波器,滤除所述第一声音信号中与所述参考信号相关的部分,以增强所述第一声音信号中的所述目标声音信号,输出滤波结果。Using the reference signal as one input of the filter, using the first sound signal as the other input of the filter, and filtering out the first sound signal related to the reference signal through the filter to enhance the target sound signal in the first sound signal, and output a filtering result.
  25. 一种声音信号处理设备,其特征在于,所述声音信号处理设备包括处理器和存储器,所述存储器用于存储至少一段程序代码,所述至少一段程序代码由所述处理器加载并执行如权利要求1至权利要求12中任一项所述的声音信号处理方法。A sound signal processing device, characterized in that the sound signal processing device includes a processor and a memory, the memory is used to store at least one piece of program code, and the at least one piece of program code is loaded and executed by the processor as claimed in the claims The sound signal processing method described in any one of claims 1 to 12.
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储至少一段程序代码,所述至少一段程序代码用于执行如权利要求1至权利要求12中任一项所述的声音 信号处理方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store at least one piece of program code, and the at least one piece of program code is used to execute any one of claims 1 to 12. sound signal processing method.
  27. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至权利要求12中任一项所述的声音信号处理方法。A computer program product, characterized in that, when the computer program product is run on a computer, the computer is made to execute the sound signal processing method according to any one of claims 1 to 12.
PCT/CN2022/142338 2021-12-31 2022-12-27 Sound signal processing method and apparatus, and device and storage medium WO2023125537A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111667547.3A CN116417006A (en) 2021-12-31 2021-12-31 Sound signal processing method, device, equipment and storage medium
CN202111667547.3 2021-12-31

Publications (1)

Publication Number Publication Date
WO2023125537A1 true WO2023125537A1 (en) 2023-07-06

Family

ID=86997948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142338 WO2023125537A1 (en) 2021-12-31 2022-12-27 Sound signal processing method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN116417006A (en)
WO (1) WO2023125537A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945672A (en) * 2012-09-29 2013-02-27 深圳市国华识别科技开发有限公司 Voice control system for multimedia equipment, and voice control method
CN103558911A (en) * 2013-10-24 2014-02-05 广东欧珀移动通信有限公司 Mute achievement method and system of mobile terminal
US20180040332A1 (en) * 2016-08-03 2018-02-08 Akihito Aiba Voice processing device, audio and video output apparatus, communication system, and sound processing method
CN108200515A (en) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 Multi-beam meeting pickup system and method
CN108694957A (en) * 2018-04-08 2018-10-23 湖北工业大学 The echo cancelltion design method formed based on circular microphone array beams
CN110493690A (en) * 2019-08-29 2019-11-22 北京搜狗科技发展有限公司 A kind of sound collection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945672A (en) * 2012-09-29 2013-02-27 深圳市国华识别科技开发有限公司 Voice control system for multimedia equipment, and voice control method
CN103558911A (en) * 2013-10-24 2014-02-05 广东欧珀移动通信有限公司 Mute achievement method and system of mobile terminal
US20180040332A1 (en) * 2016-08-03 2018-02-08 Akihito Aiba Voice processing device, audio and video output apparatus, communication system, and sound processing method
CN108200515A (en) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 Multi-beam meeting pickup system and method
CN108694957A (en) * 2018-04-08 2018-10-23 湖北工业大学 The echo cancelltion design method formed based on circular microphone array beams
CN110493690A (en) * 2019-08-29 2019-11-22 北京搜狗科技发展有限公司 A kind of sound collection method and device

Also Published As

Publication number Publication date
CN116417006A (en) 2023-07-11

Similar Documents

Publication Publication Date Title
US9197974B1 (en) Directional audio capture adaptation based on alternative sensory input
JP6703525B2 (en) Method and device for enhancing sound source
US8180067B2 (en) System for selectively extracting components of an audio input signal
US8712069B1 (en) Selection of system parameters based on non-acoustic sensor information
KR101456866B1 (en) Method and apparatus for extracting the target sound signal from the mixed sound
CN104424953B (en) Audio signal processing method and device
KR101449433B1 (en) Noise cancelling method and apparatus from the sound signal through the microphone
US9521486B1 (en) Frequency based beamforming
US9232309B2 (en) Microphone array processing system
WO2018091650A1 (en) Beamsteering
US9813808B1 (en) Adaptive directional audio enhancement and selection
US10726857B2 (en) Signal processing for speech dereverberation
WO2008041878A2 (en) System and procedure of hands free speech communication using a microphone array
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
KR20120114327A (en) Adaptive noise reduction using level cues
CN111078185A (en) Method and equipment for recording sound
US20200286501A1 (en) Apparatus and a method for signal enhancement
JP6631010B2 (en) Microphone selection device, microphone system, and microphone selection method
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
CN112929506A (en) Audio signal processing method and apparatus, computer storage medium, and electronic device
CN112489680B (en) Evaluation method and device of acoustic echo cancellation algorithm and terminal equipment
Ba et al. Enhanced MVDR beamforming for arrays of directional microphones
WO2023125537A1 (en) Sound signal processing method and apparatus, and device and storage medium
JP3514714B2 (en) Sound collection method and device
CN115410593A (en) Audio channel selection method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914790

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE