WO2023143041A1 - 信号处理方法、装置、设备及存储介质 - Google Patents

信号处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023143041A1
WO2023143041A1 PCT/CN2023/071517 CN2023071517W WO2023143041A1 WO 2023143041 A1 WO2023143041 A1 WO 2023143041A1 CN 2023071517 W CN2023071517 W CN 2023071517W WO 2023143041 A1 WO2023143041 A1 WO 2023143041A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
sound pickup
signal
target
pickup area
Prior art date
Application number
PCT/CN2023/071517
Other languages
English (en)
French (fr)
Inventor
张磊
刘智辉
梁浩恩
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023143041A1 publication Critical patent/WO2023143041A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • H04R27/04Electric megaphones

Definitions

  • the present application relates to the field of computer technology, and in particular to a signal processing method, device, equipment and storage medium.
  • Sound amplification refers to amplifying the picked up sound and putting it out.
  • the sound reinforcement requirements change for example, sudden noises or private conversations occur at the meeting site, directly amplifying the picked-up sound will affect the sound quality in the meeting.
  • the present application provides a signal processing method, device, equipment and storage medium, which can effectively improve sound quality.
  • the technical solution is as follows:
  • a signal processing method comprising:
  • the posture change of the object refers to the object changing from one state to another state, for example, changing from a sitting posture to a standing posture.
  • the detection of a posture change of at least one object in the sound pickup area based on the image of the sound pickup area includes:
  • the attitude change of at least one object in the sound pickup area is determined.
  • the coordinates of each object at different times are continuously recorded, and then whether the posture of each object has changed at different times can be identified, which provides data for sound amplification processing based on the posture change of the object Based on this, it is possible to accurately determine the object of sound reinforcement control and the corresponding sound reinforcement control method, thereby effectively improving the sound quality.
  • the determining the coordinate sets corresponding to the different moments in the sound pickup area based on the images of the sound pickup area at different moments respectively includes:
  • Object recognition is performed on the collected image of the sound pickup area at intervals of a first time period, and coordinates of a target feature of the recognized at least one object in the image are obtained, so as to obtain coordinate sets corresponding to the different moments.
  • the target feature may be a face feature of the object, for example, a central point of a face or facial features such as eyes.
  • the coordinates of the object in the sound pickup area can be determined only based on the target characteristics of the object, and the coordinate changes of the object are continuously detected based on the target characteristics, while ensuring the accuracy of the coordinates, reducing the The amount of calculation is reduced, and the efficiency of detecting attitude changes is improved.
  • the determining the posture change of at least one object in the sound pickup area based on the coordinate sets corresponding to the different moments includes:
  • the target variance representing the attitude change degree of at least one object in the sound pickup area at different moments
  • the attitude change of the at least one object is determined.
  • the target variance can represent the difference between different objects and all objects. Therefore, the target variance determined based on the coordinate sets corresponding to different moments in the pickup area can reflect the difference between the coordinates of each object relative to the average value of the coordinates, that is, It can timely and accurately identify whether there is an object whose posture changes.
  • the attitude change in the sound pickup area is pre-judged based on the target variance, and the subsequent steps are performed only when the target variance is greater than the variance threshold, which saves computing resources and improves the efficiency of sound amplification control.
  • the coordinates include an abscissa and a ordinate
  • the at least one object is determined based on the coordinates of the at least one object at the different times.
  • An object's pose changes include:
  • the target time being the time when the ordinate of the object changes
  • determining that the posture of the object changes includes:
  • the ordinate of the object becomes smaller, and the change range of the ordinate corresponding to the target moment within the second duration is smaller than the target range, it is determined that the object changes from a standing posture to a sitting posture;
  • the ordinate of the object becomes larger, and the change range of the ordinate corresponding to the target moment within the second time period is smaller than the target range, it is determined that the object changes from a sitting posture to a standing posture.
  • the posture change corresponding to the sound reinforcement requirement usually corresponds to a longitudinal posture change, for example, from a standing posture to a sitting posture, or from a sitting posture to a standing posture, it is possible to determine the first object with a large posture change based on the ordinate. In line with the actual situation in the conference scene, it effectively improves the accuracy of public address control based on posture changes.
  • performing corresponding amplification processing on the sound signal originating from the first object includes:
  • a corresponding sound amplification process is performed on the sound signal originating from the first object in combination with the sound signal in the sound pickup area.
  • the sound signal from the first object is controlled to amplify the sound, and the sound amplification requirements in various special cases are accurately judged , effectively improving the accuracy of the sound amplification control based on the attitude change, thereby improving the sound quality.
  • Sound reinforcement processing includes:
  • the sound signal originating from the first object carry out sound amplification processing
  • the sound signal originating from the first object No amplification is performed.
  • the sound signal from the first object is controlled in combination with the volume of the sound signal in the sound pickup area, taking into account the sound amplification requirements in different scenarios, and improving the Accuracy of sound amplification control, thus effectively improving the sound quality.
  • Sound reinforcement processing includes:
  • human voice detection is performed on the sound signal in the sound pickup area, and when a human voice is detected, the human voice originating from the first object is detected.
  • the sound signal is amplified;
  • the first posture change means that the first object changes from a sitting posture to a standing posture.
  • the human voice detection is performed on the sound signal in the sound pickup area, so as to realize a more intelligent judgment on the sound reinforcement demand in the scene, and improve the accuracy of the sound reinforcement control for different scenes performance, thereby effectively improving the sound quality.
  • performing corresponding amplification processing on the sound signal originating from the first object includes:
  • the sound source position refers to angle information of a sound source corresponding to the sound signal in the sound pickup area
  • performing amplifying processing on the sound signal originating from the first object includes:
  • the sound signal originating from the first object Perform sound amplification.
  • the angle information may be the angle of the sound source corresponding to the sound signal relative to the microphone array in the sound pickup area, combined with the position of the microphone array in the sound pickup area, the angle of the sound signal in the sound pickup area can be determined. The location of the sound source.
  • performing corresponding amplification processing on the sound signal originating from the first object includes:
  • the second posture change indicates that the first object changes from a standing posture to a sitting posture.
  • a signal processing method comprising:
  • a target sound pickup device of a target sound source Determining a target sound pickup device of a target sound source from a plurality of sound pickup devices in the sound pickup area, and the distance between the target sound pickup device and the target sound source satisfies a target condition
  • the sound signal of the target sound source can be amplified and controlled in a timely and accurate manner, effectively improving the sound quality.
  • the sound pickup area is configured with the plurality of sound pickup devices and a remote control device
  • the target sound pickup device for determining a target sound source from the plurality of sound pickup devices in the sound pickup area includes:
  • the target sound pickup device is determined based on the distance between the remote control device and the plurality of sound pickup devices.
  • the target sound pickup device of the target sound source can be determined in real time according to the position of the remote control device, and then the sound signal of the target sound source can be amplified timely and accurately. Sound control, effectively improving the sound quality.
  • the determining the distance between the remote control device and the multiple sound pickup devices based on the signal interaction between the remote control device and the multiple sound pickup devices includes:
  • time information includes the interaction time recorded by the remote control device and the interaction time recorded by the plurality of sound pickup devices
  • the distance between the remote control device and the plurality of sound pickup devices is determined.
  • the time information of the signal interaction between the remote control device and the first sound pickup device includes: the moment T a1 when the remote control device sends a signal to the first sound pickup device; The moment T b1 when the first sound pickup device receives the signal sent by the remote control device; the moment T b2 when the first sound pickup device sends a signal to the remote control device after receiving the signal sent by the remote control device; the remote control device receives the first sound pickup Time T a2 of the signal sent by the device.
  • time information between multiple sound pickup devices and remote control devices can be acquired synchronously, greatly improving the efficiency of time information acquisition.
  • the acquiring time information of signal interaction between the remote control device and the plurality of sound pickup devices includes:
  • the interaction time recorded by the remote control device is received from the remote control device, and the interaction time recorded by the plurality of sound pickup devices is received from the plurality of sound pickup devices.
  • the determining the distance between the remote control device and the multiple sound pickup devices based on the signal interaction between the remote control device and the multiple sound pickup devices includes:
  • the conference terminal can directly determine the target sound pickup device based on the obtained distance, while reducing the number of signal interactions between the conference terminal and multiple sound pickup devices, it fully utilizes the computing power of the remote control device, and reduces the The computing load of the conference terminal.
  • the signal interaction is performed through any one of Bluetooth, ultrasonic, ultra-wideband and wireless local area network.
  • the method before determining the target sound pickup device of the target sound source from the multiple sound pickup devices in the sound pickup area, the method further includes:
  • the remote control device performs time synchronization with the plurality of sound pickup devices.
  • the time synchronization between the remote control device and multiple sound pickup devices can ensure that the interaction time recorded by each device is in the same time system, and ensure the accuracy of the determined interaction time, thereby ensuring the accuracy of the determined distance.
  • the determining the target sound pickup device of the target sound source from the plurality of sound pickup devices in the sound pickup area includes:
  • a sound pickup device whose distance from the target sound source satisfies the target condition is determined as the target sound pickup device.
  • the multiple sound pickup devices are multiple microphone arrays
  • the positioning information includes angle information between the plurality of microphone arrays and the target sound source.
  • the method also includes:
  • the method before determining the target sound pickup device of the target sound source from the multiple sound pickup devices in the sound pickup area, the method further includes:
  • Noise reduction processing is performed on the sound signals of the plurality of sound pickup devices.
  • a signal processing device in a third aspect, includes a plurality of functional modules, configured to execute corresponding steps in the signal processing method provided in the first aspect.
  • a signal processing device in a fourth aspect, includes a plurality of functional modules, configured to execute corresponding steps in the signal processing method provided in the second aspect.
  • a signal processing device includes a processor and a memory, where the memory is used to store at least one piece of program code, and the at least one piece of program code is loaded by the processor to execute the above signal processing method.
  • a computer-readable storage medium is provided, where the computer-readable storage medium is used to store at least one piece of program code, and the at least one piece of program code is used to execute the above-mentioned signal processing method.
  • a computer program product is provided.
  • the signal processing device is made to execute the above signal processing method.
  • FIG. 1 is a schematic structural diagram of a signal processing system provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of deployment of a signal processing system provided by an embodiment of the present application
  • FIG. 3 is a flow chart of a signal processing method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of object coordinates provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of a sound source position provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a signal processing system provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of deployment of a signal processing system provided by an embodiment of the present application.
  • FIG. 8 is a flow chart of a signal processing method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a signal interaction process provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a TW-TOF ranging method provided in an embodiment of the present application.
  • Fig. 11 is a schematic diagram of deployment of a signal processing system provided by an embodiment of the present application.
  • FIG. 12 is a flow chart of a signal processing method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a positioning information acquisition process provided by an embodiment of the present application.
  • Fig. 14 is a schematic diagram of a distance determination principle provided by an embodiment of the present application.
  • Fig. 15 is a schematic diagram of a target sound source provided by an embodiment of the present application that is not within the effective pickup range;
  • Fig. 16 is a schematic structural diagram of a signal processing device provided by an embodiment of the present application.
  • Fig. 17 is a schematic structural diagram of a signal processing device provided by an embodiment of the present application.
  • Fig. 18 is a schematic structural diagram of a signal processing device provided by an embodiment of the present application.
  • TW-TOF Two-way time-of-flight
  • Ultra-wideband (UWB) technology is a technology that transmits extremely low-power signals on a wide frequency spectrum. It can achieve a data transmission rate of hundreds of Mbit/s to 2Gbit/s. Low power, good anti-interference effect, high security, large space capacity, accurate positioning and many other advantages.
  • An embodiment of the present application provides a signal processing method, which is applied to a signal processing system including an image acquisition device, and the signal processing device in the signal processing system can detect the sound pickup area based on the image of the sound pickup area collected by the image acquisition device
  • the attitude of the object in the middle object changes, so that when the attitude of the first object in the sound pickup area changes, the sound signal from the first object is correspondingly amplified.
  • the embodiment of the present application provides another signal processing method, which is applied to a signal processing system including multiple sound pickup devices.
  • the signal processing device in the signal processing system can determine from the multiple sound pickup devices in the sound pickup area The target sound pickup device whose distance from the target sound source satisfies the target condition, so as to perform sound amplification processing on the sound signal originating from the target sound pickup device.
  • the sound signal of the target sound source can be amplified and controlled in a timely and accurate manner, effectively improving the sound quality.
  • the sound pickup device is used for picking up sound signals.
  • the sound pickup device has various forms, for example, the sound pickup device may be a microphone or a microphone array, and the like.
  • the microphone may be a fixed microphone, for example, a desktop embedded microphone; the microphone may also be a movable microphone.
  • the microphone array refers to an array structure obtained by arranging a plurality of microphones (units) according to a certain spatial structure. According to the spatial characteristics of the array structure, the microphone array can process sound signals in multiple directions to obtain sound signal. According to different usage scenarios, different forms of sound pickup devices can be selected to pick up sound signals, and the form of the sound pickup device is not limited in the embodiments of the present application.
  • FIG. 1 is a schematic structural diagram of a signal processing system provided by an embodiment of the present application.
  • the signal processing system includes: an image acquisition device 110 , a sound pickup device 120 , a signal processing device 130 and a sound amplification device 140 .
  • the image acquisition device 110 is used to collect the image of the sound pickup area; the sound pickup device 120 is used to pick up the sound signal in the sound pickup area; the signal processing device 130 is used to detect the sound pickup area based on the image of the sound pickup area The attitude change of the object in the center, and based on the detected attitude change, determine the sound amplification control mode for the sound signal in the sound pickup area, and generate a corresponding sound amplification control command based on the sound amplification control mode and send it to the sound amplification device 140,
  • the sound amplification control method includes: turning on the sound amplification and closing the sound amplification; the sound amplification device 140 responds to receiving the sound amplification control instruction, and amplifies the sound signal according to the sound amplification control method indicated by the sound amplification control instruction, Alternatively, the sound signal is not amplified.
  • the embodiment of the present application provides a schematic diagram of the deployment of a signal processing system.
  • the signal processing system is applied in a conference scene, and the sound pickup area is the venue.
  • the object is at least one participant in the conference site.
  • the signal processing system includes: a camera 210 as an image acquisition device; a microphone array 220 as a sound pickup device; a conference terminal 230 as a signal processing device; and a speaker 240 as a sound amplification device.
  • the camera 210 is deployed in the venue and is used to collect images of the venue.
  • the camera 210 includes a plurality of cameras, which are respectively deployed at different positions in the venue.
  • the microphone array 220 is used to pick up sound signals in the venue.
  • the sound pickup range of the microphone array 220 can evenly cover the venue.
  • the conference terminal 230 detects the posture change of each participant based on the image of the conference site collected by the camera 210, and generates a corresponding sound amplification control instruction for the sound signal picked up by the microphone array 220, and the sound amplification control instruction indicates that the sound signal is Corresponding amplification processing.
  • the loudspeaker 240 responds to receiving the sound amplification control instruction, and under the situation that the sound amplification control instruction indicates that the sound signal is amplified, amplifies the sound signal and outputs the amplified sound; In the case of amplification, no sound is output.
  • the camera 210 and the microphone array 220 are taken as an example to illustrate that they are independent of the conference terminal 230.
  • the camera 210 and the microphone array 220 can be built in the conference terminal 230 and deployed as a device in the conference venue. middle.
  • Fig. 3 is a flow chart of a signal processing method provided by an embodiment of the present application. The method is applied to the signal processing system corresponding to FIG. 2 , the signal processing system includes a camera 210 , a microphone array 220 , a conference terminal 230 and a speaker 240 , and the signal processing method is executed by the conference terminal 230 . As shown in Figure 3, the method includes:
  • the conference terminal determines coordinate sets corresponding to different moments in the sound pickup area based on images of the sound pickup area at different times, where the coordinate sets include coordinates of at least one object in the sound pickup area.
  • the image of the sound pickup area collected by the camera includes the position of the at least one object in the sound pickup area.
  • the position of the camera is fixed, and the image collected by the camera is an image of the sound pickup area within the collection range of the camera.
  • the conference terminal determines the reference coordinate system of the sound pickup area based on the image of the sound pickup area received from the camera. Based on this, the coordinates of at least one object in the sound pickup area in the sound pickup area can be represented by coordinates in the reference coordinate system .
  • the embodiment of the present application provides a schematic diagram of object coordinates, as shown in Figure 4, the initial image of the pickup area includes four objects, and the reference coordinate system of the pickup area is based on the The lower left corner of the image is the origin (0, 0), the x-axis range of the reference coordinate system is the horizontal width of the image, and the y-axis range of the reference coordinate system is the vertical length of the image.
  • the coordinates of the center point of the image area occupied by the face of the object are the coordinates of the object in the reference coordinate system of the sound pickup area.
  • the coordinates of object 1 are (x 1 , y 1 ), the coordinates of object 2 are (x 2 , y 2 ), the coordinates of object 3 are (x 3 , y 3 ), and the coordinates of object 4 are (x 4 , y 4 ).
  • the conference terminal uses coordinate sets to record the coordinates of at least one object in the reference coordinate system at different times according to the images collected by the camera at different times.
  • the coordinate set at the initial moment of the sound pickup area includes the coordinates of the four objects corresponding to the initial image in FIG. 4 above.
  • the conference terminal can determine the coordinate sets corresponding to different moments based on the images at different moments.
  • the conference terminal performs object recognition on the image of the sound pickup area collected by the camera, and obtains the coordinates of the target feature of at least one recognized object in the image, so as to obtain the corresponding set of coordinates.
  • the target feature is a facial feature of at least one object, for example, a central point of a face or facial features such as eyes.
  • the conference terminal recognizes the image, and the face can be determined based on the recognized facial features, so that the coordinates of the facial features in the image are used to represent the coordinates of the object in the sound pickup area.
  • the coordinates of the face features in the image may be the coordinates of the center point of the image area occupied by the face features, for example, the coordinates of the center point of the face of the subject, see FIG. 4 above.
  • changes in the coordinates of the object can be used to represent changes in the pose of the object. For example, from one moment to the next, if the object changes from a sitting posture to a standing posture, the face of the object will move upward accordingly, and the vertical coordinate of the object will increase; if the object changes from a standing posture to a sitting posture, the corresponding The face of the object will move down, and the vertical coordinate of the object will decrease. It is understandable that in a conference scene, if a participant changes from a sitting posture to a standing posture, it means that the participant needs to speak, and the loudspeaker needs to be amplified; sound. Based on this, based on the coordinates of the at least one object in the sound pickup area at different times, it can be determined whether the at least one object has a need for sound reinforcement.
  • the foundation can accurately determine the object of sound reinforcement control and the corresponding sound reinforcement control method, and then effectively improve the sound quality.
  • the conference terminal can determine the identity of each object while determining the coordinates of the at least one object in the sound pickup area.
  • the conference terminal is associated with a face database, and the face database stores face data of multiple known objects.
  • the face data includes face feature data of each known object, for example, eye feature data of the object. Based on this, the conference terminal matches the face data in the face database based on the face recognized from the image of the sound pickup area, and matches the face data of any known object In the case of , it is determined that the recognized human face is the human face of the known object, and then the coordinates of the recognized human face are determined as the coordinates of the known object in the sound pickup area.
  • each known object in the face database has an object identifier, and by binding the coordinates of the facial features of the recognized face in the image with the matched object identifier of the known object, the While determining the coordinates of the object in the pickup area, the identity of the object is determined.
  • a new object identifier may be created for the object corresponding to the recognized face, and Write the identity information of the object corresponding to the recognized face to achieve the purpose of adding a new object in the face database.
  • the coordinates of the object in the sound pickup area can be determined only based on the target characteristics of the object, and the coordinate changes of the object are continuously detected based on the target characteristics, while ensuring the accuracy of the coordinates, reducing the The amount of calculation is reduced, and the efficiency of detecting attitude changes is improved. Further, identifying the identity of the object based on the face database can prevent unauthorized objects from participating in the meeting, and provide security for the meeting.
  • the conference terminal determines a target variance based on the coordinates in the coordinate set corresponding to different moments in the sound pickup area, where the target variance represents the attitude change degree of at least one object in the sound pickup area at different moments.
  • the target variance can represent the difference between different objects and all objects. Therefore, the target variance determined based on the coordinate sets corresponding to different moments in the pickup area can reflect the difference between the coordinates of each object relative to the average value of the coordinates, that is, It can timely and accurately identify whether there is an object whose posture changes.
  • the target variance can represent the degree of posture change of at least one object in the sound pickup area at different moments, and the more obvious the posture change of the at least one object is , the larger the value of the target variance is, for example, at time T 1 , the N objects in the pickup area are all sitting, and at T 2 after T 1 , object A among the N objects changes from a sitting position to a standing position , then at time T2 , the difference between the coordinates of object A and the average value of the coordinates of N objects is greater than that at time T1 .
  • the calculation of the target variance refers to the formula (1).
  • D(x) is the target variance at the current moment
  • X is the current coordinate of at least one object corresponding to the current moment
  • E(x) is the coordinate average value of each coordinate in the coordinate set
  • the coordinate set includes coordinates corresponding to different moments gather.
  • the attitude change in the sound pickup area is pre-judged based on the target variance, and the subsequent steps are performed only when the target variance is greater than the variance threshold, which saves computing resources and improves the efficiency of sound amplification control.
  • the conference terminal determines the target time of the first object in the at least one object based on the ordinate in the coordinate set corresponding to the at least one object at different times, and the target time is the target time of the first object. The moment when the ordinate changes.
  • the target variance is greater than the variance threshold, indicating that the attitude change of the at least one object at different moments is sufficiently obvious, that is, there is a posture change in the at least one object, for example, changing from a standing posture to a sitting posture.
  • the variance of the target is smaller than the variance threshold, it means that the attitude of the at least one object has not changed, or some attitude changes of small magnitude have occurred, for example, shaking the head slightly.
  • the size of the variance threshold determines the sensitivity of the conference terminal to detect posture changes.
  • the high-resolution camera can capture very subtle pose changes, that is, the high-resolution camera is very sensitive to pose changes. Therefore, in the case of a high-resolution camera, in order to avoid a large number of subtle attitude changes from affecting the detection of attitude changes corresponding to the sound reinforcement requirements, the variance threshold can be increased accordingly to ensure the accuracy of attitude detection.
  • the conference terminal further determines the first object whose posture has changed from the at least one object according to the ordinate of the object.
  • the conference terminal can obtain the vertical coordinates corresponding to the same object at different times from the coordinate sets corresponding to different times based on the object identifier of the object, and then determine the object whose vertical coordinate changes as the first object, and obtain the second The target moment when the ordinate of an object changes.
  • the posture change corresponding to the sound reinforcement requirement usually corresponds to a longitudinal posture change, for example, from a standing posture to a sitting posture, or from a sitting posture to a standing posture
  • the first object that undergoes a large posture change is determined based on the ordinate , which can fit the actual situation in the conference scene, and effectively improve the accuracy of the voice amplification processing based on the posture change.
  • the object whose posture changes can be determined according to data of different dimensions, which is not limited in this embodiment of the present application.
  • the conference terminal determines that the posture of the first object has changed.
  • the ordinate of the first object changes at the target moment
  • the ordinate of the object becomes smaller, and the range of change of the ordinate corresponding to the target moment within the second duration is smaller than the target range, it can be determined that the object changes from a standing posture to a sitting posture; if the ordinate of the object becomes larger, and the range of change of the ordinate corresponding to the target moment within the second duration is smaller than the target range, it can be determined that the object changes from a sitting posture to a standing posture.
  • the posture change of the first object due to the need for sound reinforcement should be from one stable state to another stable state, for example, from a continuous sitting posture to a continuous standing posture. If the ordinate of the first object changes significantly within the second time period after the target time, that is, the range of change is greater than the target range, it means that the state of the first object after the change is not stable, for example, the first object is in a sitting position at the beginning , and then quickly sat down after standing up to pick up the object at the target time. In this case, the conference terminal determines that the change in posture of the first object is not due to a need for sound reinforcement, and thus does not perform corresponding sound reinforcement control.
  • the conference terminal In response to a posture change of the first object among the at least one object, the conference terminal performs corresponding sound amplification processing on the sound signal from the first object in combination with the sound signal in the sound pickup area.
  • the conference terminal can determine the sound reinforcement requirement of the first object based on the posture change of the first object, and then determine the corresponding sound reinforcement control method according to the sound reinforcement requirement. For example, if sound reinforcement is required, the sound reinforcement control method is to turn on the sound amplification; if the sound amplification is not needed, the sound amplification control method is to turn off the sound amplification. Based on the sound amplification control method determined for the first object, the conference terminal generates a sound amplification control instruction for the sound signal from the first object, and sends the sound amplification control instruction to the speaker in the signal processing system.
  • the sound amplification control instruction includes a sound amplification on instruction and a sound amplification off instruction.
  • the sound amplification on command instructs the speaker to amplify the sound signal and output the amplified sound; the sound amplification off command instructs the speaker not to output sound.
  • the embodiment of the present application does not limit the manner in which the conference terminal performs corresponding sound amplification control.
  • the posture change of the first object is a first posture change
  • the first posture change indicates that the first object changes from a sitting posture to a standing posture.
  • the change of the first object from a sitting posture to a standing posture does not necessarily mean that the first object has a need for sound amplification.
  • the posture of the first object changes from a sitting posture to a standing posture, and slowly Stepping out of the sound pickup area, no sound was made during the period.
  • such special cases can be further excluded by combining the sound signal in the sound pickup area after the posture of the first object changes.
  • Method 1 Combining the volume of the sound signal in the sound pickup area.
  • the conference terminal in response to the first posture change of the first object, amplifies the sound signal from the first object when the volume of the sound signal in the sound pickup area is greater than or equal to the volume threshold. sound processing. It can be understood that the volume of the sound signal in the sound pickup area is greater than or equal to the volume threshold, which means that there is a high probability of sound that needs to be amplified in the sound pickup area.
  • the first object made a speech after the first posture change occurred, that is, the first object has a need for sound amplification, and the The sound signal of the subject is amplified.
  • the conference terminal responds to the first posture change of the first object, and when the volume of the sound signal in the sound pickup area is lower than the volume threshold, the sound signal originating from the first object is not processed. Amplification processing.
  • the volume of the sound signal in the sound pickup area is lower than the volume threshold, which means that there is a high probability that there is no sound that needs to be amplified in the sound pickup area.
  • the sound signal from the first object is amplified in combination with the volume of the sound signal in the sound pickup area, taking into account the sound amplification requirements in different scenarios, and improving the sound amplification. Processing accuracy, thereby effectively improving the sound quality.
  • Method 2 Human voice detection is performed on the sound signal in the sound pickup area.
  • the conference terminal in response to the first posture change of the first object, performs human voice detection on the sound signal in the sound pickup area, and performs sound amplification processing on the sound signal originating from the first object. Understandably, the human voice is detected in the sound pickup area, indicating that there is a high probability that someone is speaking in the sound pickup area.
  • the first object made a speech after the first posture change occurred, that is, the first object has a need for sound amplification, and the The sound signal of the subject is amplified.
  • the conference terminal in response to the first posture change of the first object, performs human voice detection on the sound signal in the sound pickup area, and if the human voice is not detected, detects the human voice from the first object The sound signal is not amplified. Correspondingly, no human voice is detected in the sound pickup area, which means that there is a high probability that no one is speaking in the sound pickup area. At this time, even if the first posture change occurs to the first object, it is still considered that the first object has no need for sound amplification, and no sound amplification processing is performed on the sound signal originating from the first object.
  • the human voice detection is performed on the sound signal in the sound pickup area, so as to realize a more intelligent judgment on the sound reinforcement requirements in the scene, and improve the accuracy of the sound reinforcement processing for different scenes performance, thereby effectively improving the sound quality.
  • Mode 3 Combining the sound source position of the sound signal in the sound pickup area.
  • the posture change of the first object is the first posture change
  • the volume of the sound signal in the sound pickup area is greater than the volume threshold, and there is a human voice in the sound pickup area, but the first object does not There is no need for sound amplification.
  • the posture of the first object changes from a sitting posture to a standing posture, and then the first object walks out of the sound pickup area without making a sound. During this period, other objects in the sound pickup area are speaking. That is, the first object that undergoes the first posture change is not the sound source corresponding to the sound signal in the sound pickup area.
  • such special cases can be further eliminated by combining the sound source position of the sound signal in the sound pickup area after the posture of the first object changes.
  • the process of performing corresponding amplification processing on the sound signal from the first object includes the following steps 1 to 3:
  • Step 1 The conference terminal obtains the position of the first object in the sound pickup area in response to the first posture change of the first object.
  • the conference terminal can determine the position of the first object in the sound pickup area by acquiring the coordinates of the first object in the sound pickup area and combining the deployment position of the camera in the sound pickup area.
  • Step 2 The conference terminal determines the sound source position of the sound signal based on the sound signal in the sound pickup area.
  • the conference terminal obtains the position of the sound source of the sound signal relative to the microphone array through the microphone array, and then determines the position of the sound source of the sound signal in the sound pickup area based on the position of the microphone array in the sound pickup area.
  • the microphone array acquires an angle of the sound source relative to the microphone array to determine the position of the sound source relative to the microphone array.
  • the distance of the sound source relative to the microphone array can be determined, so that The position in the sound area can determine the sound source position of the sound signal in the sound pickup area.
  • Step 3 When the first object is located at the sound source of the sound signal, the conference terminal performs sound amplification processing on the sound signal originating from the first object.
  • the fact that the first object is located at the sound source of the sound signal indicates that the first posture change has occurred to the first object, and the first object is the sound source of the sound in the sound pickup area. Therefore, it can be considered
  • the first object has a sound amplification requirement, and the sound signal originating from the first object is subjected to sound amplification processing.
  • the embodiment of the present application provides a schematic diagram of a sound source position.
  • the conference terminal determines the sound source position of the sound source 502 corresponding to the sound signal based on the sound signal in the sound pickup area picked up by the microphone array 501, and The position of the first object 503 is obtained and compared with the position of the sound source, wherein the position of the first object 503 is determined based on the image collected by the camera 504 .
  • the above method 1, method 2 and method 3 can be used in combination to make more accurate judgments on the sound reinforcement requirements in different scenarios, so as to improve the performance of sound reinforcement processing in a targeted manner. Accuracy to improve sound quality, which is not limited in this embodiment of the present application.
  • the posture change of the first object is a second posture change
  • the second posture change indicates that there is no sound amplification requirement for the first object change, for example, changing from a standing posture to a sitting posture. Therefore, in response to the second posture change of the first object among the at least one object, the conference terminal does not perform sound amplification processing on the sound signal originating from the first object without combining the sound signal.
  • the sound signal from the first object is amplified in combination with the sound signal in the sound pickup area, so as to accurately determine the sound amplification requirements in various special situations , effectively improving the accuracy of the sound amplification processing based on the attitude change, thereby improving the sound quality.
  • the sound amplification demand in the scene can be judged in a timely and accurate manner, and then the sound signal is controlled accordingly according to the sound amplification demand, effectively improving the sound quality. quality.
  • FIG. 6 is a schematic structural diagram of another signal processing system provided by an embodiment of the present application.
  • the signal processing system includes: a plurality of sound pickup devices 610 supporting positioning functions, a signal processing device 620 and a sound amplification device 630 .
  • the sound pickup device 610 supporting the localization function is used to pick up the sound signal in the sound pickup area, and obtain information about the target sound source in the sound pickup area;
  • the embodiment of the present application provides a schematic deployment diagram of a signal processing system.
  • the signal processing system is applied in a conference scene, and the sound pickup area is the conference site.
  • the signal processing system includes: a plurality of microphones 710 supporting the positioning function as a plurality of sound pickup devices supporting the positioning function; a remote control device 720 for performing signal interaction with the microphone 710; as a signal processing device a conference terminal 730; a loudspeaker 740 as a sound amplification device.
  • the plurality of microphones 710 with a positioning function are used to pick up sound signals in the venue and perform signal interaction with the remote control device 720 .
  • the position of the remote control device 720 represents the position of the target sound source.
  • the conference terminal 730 can determine the target microphone of the target sound source and the sound amplification control method for the target sound source from the multiple microphones 710 in the venue, and then generate The amplifying control command is based on the sound signal of the target microphone; optionally, the target microphone is the closest microphone to the target sound source.
  • the speaker 740 amplifies the sound signal from the target microphone and outputs the amplified sound according to the sound amplification control mode indicated by the sound amplification control instruction, or does not output the sound signal from the target microphone. sound signal.
  • FIG. 8 is a flowchart of a signal processing method provided by an embodiment of the present application. The method is applied in the signal processing system corresponding to FIG. 7 , and the signal processing method is executed by the conference terminal 730 . As shown in Figure 8, the method includes:
  • the conference terminal acquires time information of signal interaction between the remote control device and multiple microphones, where the time information includes the interaction time recorded by the remote control device and the interaction time recorded by the multiple microphones.
  • the time information of the signal interaction between the remote control device and the first microphone includes: the moment T a1 when the remote control device sends a signal to the first microphone; The time T b1 of the signal; the time T b2 when the first microphone sends a signal to the remote control device after receiving the signal sent by the remote control device; the time T a2 when the remote control device receives the signal sent by the first microphone.
  • the conference terminal receives the interaction time T a1 and T a2 recorded by the remote control device from the remote control device, and receives the interaction time T recorded by the first microphone from the first microphone. b1 and T b2 .
  • the first microphone sends a signal to the remote control device after receiving the signal sent by the remote control device, and carries T b1 in the signal sent to the remote control device and T b2 , based on which, the conference terminal can receive from the remote control device the interaction times T a1 and T a2 recorded by the remote control device and the interaction times T b1 and T b2 recorded by the first microphone. Acquiring the time information of signal interaction in this way can reduce the number of signal interactions between the conference terminal and the microphone, simplify the process of obtaining time information by the conference terminal, and improve the efficiency of obtaining time information.
  • the embodiment of the present application provides a schematic diagram of a signal interaction process.
  • each microphone and the remote control device respectively record the corresponding interaction time and send it to the conference terminal 906 .
  • the conference terminal can also obtain corresponding sound amplification control information from the remote control device, for example, turn on sound amplification, turn off sound amplification, or Loud volume etc. Based on the sound amplification control information, the conference terminal can determine the sound amplification control mode for the target sound source.
  • the sound amplification control mode includes: turning on the sound amplification, turning off the sound amplification, increasing the volume, and decreasing the volume.
  • time synchronization is performed between the remote control device and the multiple microphones, so as to ensure that their respective recorded interaction times are in the same time system, and ensure the accuracy of the determined interaction time, thereby ensuring The accuracy of the determined distance.
  • the time information between multiple microphones and the remote control device can be acquired synchronously, greatly improving the efficiency of time information acquisition.
  • the conference terminal determines the distance between the remote control device and multiple microphones based on the time information.
  • the conference terminal can determine the first For the distance between the microphone and the remote control device, refer to formula (2) to formula (3) for the determination process.
  • T a1 , T b1 , T b2 and T a2 refer to step 801; t 1 is the time delay of the signal from the first microphone to the remote control device; c is the speed of light; d 1 is the distance between the first microphone and the remote control device distance.
  • module A sends a data packet A to module B, and records the packet sending time T a1 ;
  • the data packet B is received, and the packet receiving time T a2 is recorded.
  • the distance d 1 between module A and module B can be calculated.
  • the time information corresponding to multiple microphones is converted into a reference distance, which ensures the accuracy of the determined distance.
  • the remote control device determines the distance between the multiple microphones and the remote control device based on the interaction time recorded during the signal interaction between the remote control device and the multiple microphones, and the conference terminal directly obtains the distance from the remote control device The distance between the plurality of microphones and the remote control device is received.
  • the conference terminal can directly determine the target sound pickup device based on the obtained distance, while reducing the number of signal interactions between the conference terminal and multiple sound pickup devices, it fully utilizes the computing power of the remote control device, and reduces the The computing load of the conference terminal.
  • the conference terminal determines a target microphone based on the distance between the remote control device and the multiple microphones, and the distance between the target microphone and the target sound source satisfies the target condition.
  • the target condition refers to: among the multiple microphones, the distance to the target sound source is the shortest. It can be understood that the position of the target sound source is the position of the remote control device, therefore, the microphone closest to the remote control device is the microphone closest to the target sound source, that is, the microphone closest to the remote control device is the target sound source target microphone.
  • the target condition can be set according to the actual needs of the scene to determine the required target microphone, and then accurately perform sound amplification control on the sound signal from the target microphone, effectively improving the sound quality.
  • the target condition can be set according to actual needs.
  • the target condition can be: among multiple microphones, between the target sound source the farthest. This embodiment of the present application does not limit it.
  • the conference terminal performs sound amplification processing on the sound signal from the target microphone.
  • the conference terminal determines the target microphone of the target sound source, it acquires the sound signal from the target microphone, generates a corresponding sound amplification control instruction based on the sound signal from the target microphone, and sends it to the speaker in the signal processing system. PA control commands.
  • the conference terminal determines the sound amplification control instruction for the sound signal from the target microphone based on the sound amplification control method for the target sound source, and sends the sound amplification control instruction to the speaker, and the sound amplification control instruction indicates
  • the loudspeaker controls the sound amplification of the sound signal from the target microphone according to the corresponding sound amplification control mode.
  • the loudspeaker amplifies the sound signal from the target microphone according to the sound amplification control mode indicated by the sound amplification control instruction and outputs the amplified sound, Or, the sound corresponding to the sound signal from the target microphone is not output.
  • the conference terminal processes the sound signal from the target microphone to ensure that the sound signal from the target microphone is better amplified and output, thereby improving the sound quality.
  • the sound signal of the microphone is subjected to noise reduction processing, which is not limited in this embodiment of the present application.
  • the target microphone of the target sound source can be determined in real time according to the position of the remote control device, and then the sound signal of the target sound source can be amplified and controlled in a timely and accurate manner, effectively Improved sound quality.
  • this embodiment of the present application provides a schematic diagram of deployment of another signal processing system.
  • the signal processing system is applied in a conference scene, and the sound pickup area is the conference site.
  • the signal processing system includes: a plurality of microphone arrays 1110 as a plurality of sound pickup devices supporting positioning functions; a conference terminal 1120 as a signal processing device; and a speaker 1130 as a sound amplification device.
  • the microphone array 1110 is used to pick up the sound signal in the venue, and determine the location information of the target sound source.
  • the conference terminal 1120 determines the target microphone array of the target sound source 1140 and the sound amplification control method for the target sound source from among the plurality of microphone arrays 1110 in the venue, and then generates an audio signal for the target sound source.
  • the speaker 1130 amplifies the sound signal from the target microphone array and outputs the amplified sound according to the sound amplification control mode indicated by the sound amplification control instruction, or does not output the sound signal from the target microphone array. Sound signal from microphone array.
  • Fig. 12 is a flowchart of a signal processing method provided by an embodiment of the present application. The method is applied in the signal processing system corresponding to FIG. 11 , and the signal processing method is executed by the conference terminal 1120 . As shown in Figure 12, the method includes:
  • the conference terminal acquires positioning information of a target sound source by multiple microphone arrays, where the positioning information includes angle information between the multiple microphone arrays and the target sound source.
  • parameter configuration is performed in the conference terminal based on the deployment of the device.
  • the parameters that need to be configured include but are not limited to: size information of the sound pickup area, for example, pickup The width and length of the sound area; the positional relationship between multiple microphone arrays and the conference terminal, for example, the distance between the conference terminal and any microphone array, and the distance between multiple microphone arrays.
  • the first microphone in the plurality of microphone arrays is built in the conference terminal.
  • the shortest distance between the conference terminal and the sound source is configured in the conference terminal, and the shortest distance is used to pre-plan Determine the sound pickup range corresponding to the first microphone array, so as to prevent the sound signal from the sound source from being unable to be picked up when the sound source is outside the sound pickup range of the first microphone array.
  • the conference terminal is configured with position information of a 0-degree angle of the second microphone array among the plurality of microphone arrays, and the 0-degree angle is used to define a non-picking range of the second microphone array.
  • the conference terminal sends pre-configured parameters to each microphone array to implement parameter configuration of the microphone array, for example, sending the position information of the second microphone array at an angle of 0 degrees to the second microphone array, the second microphone array
  • the two-microphone array divides its own non-pickup range based on the received position information at an angle of 0 degrees.
  • the embodiment of the present application provides a schematic diagram of the positioning information acquisition process.
  • a second microphone array 1303 is deployed.
  • the shortest distance D1 between the conference terminal 1301 and the sound source is predetermined, then based on the width D of the pickup range of the first microphone array 1302 and the second microphone array 1303, the distance between the second microphone array 1303 and the first microphone array 1303 can be determined.
  • the shaded area between the microphone arrays 1302 is the effective sound pickup area.
  • the 0-degree angle of the second microphone array is located on the straight line 1, and the 180-degree angle range in the counterclockwise direction of the 0-degree angle is the non-picking range of the second microphone array 1303 .
  • D 1 is 0, then L is the length of the effective pickup area.
  • the positioning information of the target sound source 1304 by the first microphone array and the second microphone array includes: the angle ⁇ 1 of the target sound source relative to the first microphone array, and the angle ⁇ 2 of the target sound source relative to the second microphone array.
  • the sound pickup range angle ⁇ 3 of the first microphone array 1302 is determined based on D1 and D.
  • the above-mentioned process is described by taking the first microphone array and the second microphone array as examples.
  • the process of obtaining positioning information is the same as the above-mentioned process, which will not be repeated here.
  • the plurality of microphone arrays respectively determine angle information between each and the target sound source based on the picked-up sound signals, and send the respective angle information to the conference terminal.
  • the microphone array sends the angle information between itself and the target sound source to other microphone arrays, so that each microphone array receives complete angle information about the target sound source.
  • the sudden noise in the sound pickup area may affect the localization information of the target sound source, for example, the sudden noise picked up by a certain microphone array is mistaken for the target sound source. Therefore, after the sound signals of the plurality of microphone arrays are acquired, by performing noise reduction processing on the sound signals of the plurality of microphone arrays, it is possible to prevent sudden noise in the sound pickup area from affecting the accuracy of the positioning information.
  • the conference terminal determines the distance between the target sound source and multiple microphone arrays based on the positioning information.
  • the positioning information includes angle information between the plurality of microphone arrays and the target sound source
  • the conference terminal can determine the target sound source based on the angle of the target sound source relative to each microphone array and pre-configured parameters Distance to multiple microphone arrays.
  • the following description will be made by taking a plurality of microphone arrays including a first microphone array and a second microphone array as an example.
  • the embodiment of the present application provides a schematic diagram of the distance determination principle.
  • formula (4) to formula (8) for the above calculation process.
  • the above-mentioned process is described by taking the first microphone array and the second microphone array as examples. In the case of including more microphone arrays, the process of determining the distance is the same as the above-mentioned process, which will not be repeated here.
  • the conference terminal determines a target microphone array based on the distance between the remote control device and the multiple microphone arrays, and the distance between the target microphone array and the target sound source satisfies a target condition.
  • step 803 For this step, refer to step 803, which will not be repeated here.
  • the conference terminal performs sound amplification processing on the sound signal from the target microphone array.
  • step 804 For this step, refer to step 804, which will not be repeated here.
  • the target sound source when the target sound source is not within the effective sound pickup range of the sound pickup area, no corresponding amplification control is performed on the sound signal originating from the target microphone array; In the case of within the range, corresponding amplification control is performed on the sound signal from the target microphone array.
  • Ds calculated in formula (4) is greater than (D is the width of the effective sound pickup area), it is considered that the target sound source is not within the effective sound pickup range of the sound pickup area.
  • the embodiment of the present application provides a schematic diagram of a target sound source not within the effective pickup range. As shown in FIG.
  • the angle of the target sound source 1501 relative to the first microphone array 1502 is The angle of the two microphone arrays 1503 is ⁇ 2 , and the distance Ds of the target sound source 1501 away from the central connecting line of the two microphone arrays is greater than half of the width of the effective pickup area
  • the target sound pickup device of the target sound source can be determined based on the positioning information of the target sound source, and the sound signal of the target sound source can be amplified in a timely and accurate manner, improving the conference experience At the same time, the sound quality is effectively improved.
  • FIG. 16 is a schematic structural diagram of a signal processing device provided by an embodiment of the present application. As shown in Figure 16, the signal processing device includes:
  • a detection module 1601 configured to detect a posture change of at least one object in the sound pickup area based on the image of the sound pickup area;
  • the signal processing module 1602 is configured to, in response to a posture change of the first object among the at least one object, perform corresponding amplification processing on the sound signal originating from the first object.
  • the detection module 1601 includes:
  • a coordinate determining unit configured to respectively determine coordinate sets corresponding to different moments in the sound pickup area based on images of the sound pickup area at different moments, the coordinate sets including the coordinates of the at least one object in the sound pickup area;
  • the attitude change determination unit is configured to determine the attitude change of at least one object in the sound pickup area based on the coordinate sets corresponding to the different moments.
  • the attitude change determining unit is used for:
  • the target variance represents the degree of attitude change of at least one object in the sound pickup area at different moments
  • the attitude change of the at least one object is determined based on the coordinates of the at least one object at the different moments.
  • the signal processing module 1602 includes:
  • the first processing unit is configured to, in response to a posture change of the first object among the at least one object, perform corresponding sound amplification processing on the sound signal from the first object in combination with the sound signal in the sound pickup area.
  • the first processing unit is used for:
  • the sound signal originating from the first object is not amplified deal with.
  • the first processing unit is used for:
  • human voice detection is performed on the sound signal in the sound pickup area, and when the human voice is detected, the sound signal originating from the first object is detected. carry out sound amplification processing;
  • human voice detection is performed on the sound signal in the sound pickup area, and when the human voice is not detected, the sound originating from the first object is detected.
  • the signal is not amplified.
  • the signal processing module 1602 includes:
  • a position acquiring unit configured to acquire the position of the first object in the sound pickup area in response to a first posture change of the first object among the at least one object
  • a sound source localization unit configured to determine the sound source position of the sound signal based on the sound signal in the sound pickup area
  • the second processing unit is configured to amplify the sound signal from the first object when the first object is located at the sound source position.
  • the signal processing module 1602 is used to:
  • the signal processing device provided in the above-mentioned embodiment performs signal processing
  • the division of the above-mentioned functional modules is used as an example for illustration.
  • the above-mentioned function allocation can be completed by different functional modules according to needs. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the signal processing device and the signal processing method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • Fig. 17 is a schematic structural diagram of a signal processing device provided by an embodiment of the present application. As shown in Figure 17, the signal processing device includes:
  • a determining module 1701 configured to determine a target sound pickup device of a target sound source from among multiple sound pickup devices in the sound pickup area, and the distance between the target sound pickup device and the target sound source satisfies the target condition;
  • the processing module 1702 is configured to perform sound amplification processing on the sound signal from the target sound pickup device.
  • the sound pickup area is configured with the plurality of sound pickup devices and the remote control device, and the determining module 1701 includes:
  • a distance determining unit configured to determine the distance between the remote control device and the plurality of sound pickup devices based on the signal interaction between the remote control device and the plurality of sound pickup devices;
  • a device determining unit configured to determine the target sound-picking device based on the distance between the remote control device and the plurality of sound-picking devices.
  • the distance determining unit is used for:
  • time information includes the interaction time recorded by the remote control device and the interaction time recorded by the plurality of sound pickup devices
  • the distance between the remote control device and the plurality of sound pickup devices is determined.
  • the determination module 1701 is used to:
  • a sound pickup device whose distance from the target sound source satisfies the target condition is determined as the target sound pickup device.
  • the multiple sound pickup devices are multiple microphone arrays
  • the positioning information includes angle information between the plurality of microphone arrays and the target sound source.
  • the sound signal of the target sound source can be amplified and controlled in a timely and accurate manner, effectively improving the sound quality.
  • the signal processing device 1700 provided in the above-mentioned embodiment performs signal processing, it only uses the division of the above-mentioned functional modules as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs , that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the signal processing device and the signal processing method embodiments provided in the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, and will not be repeated here.
  • FIG. 18 is a schematic diagram of a hardware structure of a signal processing device provided by an embodiment of the present application.
  • the signal processing device 1800 includes a memory 1801 , a processor 1802 , a communication interface 1803 and a bus 1804 .
  • the memory 1801 , the processor 1802 , and the communication interface 1803 are connected to each other through a bus 1804 .
  • the memory 1801 may be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, a random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be programmed by a computer Any other medium accessed, but not limited to.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • magnetic disk storage media or other magnetic storage devices or can be used to carry
  • the memory 1801 may store at least one piece of program code, and when the program code stored in the memory 1801 is executed by the processor 1802, the signal processing device can implement the above signal processing method.
  • the memory 1801 may also store various types of data, including but not limited to images and audio signals, which is not limited in this embodiment of the present application.
  • the processor 1802 may be a network processor (network processor, NP), a central processing unit (central processing unit, CPU), a specific application integrated circuit (application-specific integrated circuit, ASIC) or an integrated circuit for controlling the program execution of the application scheme. circuit.
  • the processor 1802 may be a single-core (single-CPU) processor, or a multi-core (multi-CPU) processor. The number of the processor 1802 may be one or more.
  • the communication interface 1803 uses a transceiver module such as a transceiver to implement communication between the signal processing device 1800 and other devices or communication networks. For example, data can be acquired through the communication interface 1803 .
  • the memory 1801 and the processor 1802 may be provided separately, or may be integrated together.
  • the bus 1804 may include a path for transferring information between various components of the signal processing device 1800 (eg, memory 1801 , processor 1802 , communication interface 1803 ).
  • first and second are used to distinguish the same or similar items with basically the same function and function. It should be understood that “first”, “second” and “nth” There are no logical or timing dependencies, nor are there restrictions on quantity or order of execution. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, a first microphone could be termed a second microphone, and, similarly, a second microphone could be termed a first microphone, without departing from the scope of the various described examples. Both the first microphone and the second microphone may be microphones, and in some cases may be separate and distinct microphones.
  • the meaning of the term "at least one" in the present invention refers to one or more, the meaning of the term “multiple” in the present invention refers to two or more, for example, a plurality of microphones refers to two or more microphone.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a program product.
  • the program product includes one or more program instructions. When the program instructions are loaded and executed on the signal processing device, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

本申请公开了一种信号处理方法、装置、设备及存储介质,属于计算机技术领域。本申请实施例提供的信号处理方法中,基于拾音区域中的图像,检测拾音区域中第一对象发生的姿态变化,从而基于第一对象发生的姿态变化,进行相应的扩音处理。通过上述技术方案,根据检测出的拾音区域中对象的姿态变化,能够及时且精准地判断场景中的扩音需求,进而按照扩音需求对声音信号进行相应的扩音控制,有效提升了声音质量。

Description

信号处理方法、装置、设备及存储介质 技术领域
本申请涉及计算机技术领域,特别涉及一种信号处理方法、装置、设备及存储介质。
背景技术
在多人会议的场景下,需要为发言者提供扩音服务。扩音是指:将拾取到的声音进行放大并外放。在一些扩音需求变化的场景中,例如,会议现场出现突发噪音或发生私密谈话,直接对拾取到的声音进行扩音,会影响到会议中的声音质量。
因此,亟需一种信号处理方法,能够根据场景中的扩音需求进行扩音,以提高会议中的声音质量。
发明内容
本申请提供了一种信号处理方法、装置、设备及存储介质,能够有效提升声音质量。该技术方案如下:
第一方面,提供了一种信号处理方法,该方法包括:
基于拾音区域的图像,检测所述拾音区域中至少一个对象的姿态变化;
响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理。
其中,对象的姿态变化是指对象从一个状态变化到另一个状态,例如,从坐姿变化到站姿。
通过上述技术方案,能够根据检测出的拾音区域中对象的姿态变化,及时且精准地判断场景中的扩音需求,进而按照扩音需求对声音信号进行相应的扩音控制,有效提升了声音质量。
在一种可能实施方式中,所述基于拾音区域的图像,检测所述拾音区域中至少一个对象的姿态变化包括:
基于所述拾音区域在不同时刻的图像,分别确定所述拾音区域内的所述不同时刻对应的坐标集合,所述坐标集合包括所述至少一个对象在所述拾音区域中的坐标;
基于所述不同时刻对应的坐标集合,确定所述拾音区域中至少一个对象的姿态变化。
通过上述技术方案,基于拾音区域中的图像,持续记录各个对象在不同时刻的坐标,进而识别出各个对象在不同时刻是否发生了姿态变化,为基于对象的姿态变化进行扩音处理提供了数据基础,能够准确地确定扩音控制的对象以及相应的扩音控制方式,进而有效地提高声音质量。
在一种可能实施方式中,所述基于所述拾音区域在不同时刻的图像,分别确定所述拾音区域内的所述不同时刻对应的坐标集合包括:
每隔第一时长,对采集到的所述拾音区域的图像进行对象识别,获取识别到的至少一个对象的目标特征在所述图像中的坐标,以得到所述不同时刻对应的坐标集合。
其中,该目标特征可以是对象的人脸特征,例如,脸部中心点或者眼睛等五官。
通过上述技术方案,无需关注对象整体,仅基于对象的目标特征即可确定对象在拾音区域中的坐标,并持续基于目标特征来检测对象的坐标变化,在保证坐标准确性的同时,减少 了运算量,提高了检测姿态变化的效率。
在一种可能实施方式中,所述基于所述不同时刻对应的坐标集合,确定所述拾音区域中至少一个对象的姿态变化包括:
基于所述不同时刻对应的坐标集合中的坐标,确定目标方差,所述目标方差表示所述拾音区域中至少一个对象在不同时刻的姿态变化程度;
在所述目标方差大于方差阈值的情况下,基于所述至少一个对象在所述不同时刻的坐标,确定所述至少一个对象的姿态变化。
其中,目标方差能够表示不同对象与所有对象之间的差异,因此,基于拾音区域不同时刻对应的坐标集合确定的目标方差,能够体现每一个对象的坐标相对于坐标平均值的差异,也就能够及时且准确的识别出是否存在发生姿态变化的对象。
通过上述技术方案,基于目标方差对拾音区域中发生的姿态变化进行预判断,在目标方差大于方差阈值的情况下才进行后续步骤,节省了计算资源,提高了扩音控制的效率。
在一种可能实施方式中,所述坐标包括横坐标和纵坐标,所述在所述目标方差大于方差阈值的情况下,基于所述至少一个对象在所述不同时刻的坐标,确定所述至少一个对象的姿态变化包括:
基于所述对象在所述不同时刻的纵坐标,确定所述对象的目标时刻,所述目标时刻为所述对象的纵坐标发生变化的时刻;
若所述目标时刻对应的纵坐标在第二时长内的变化幅度小于目标幅度,确定所述对象的姿态发生变化。
通过上述技术方案,结合了发生姿态变化的目标时刻之后的一段时间内的纵坐标进行判断,避免了复杂姿态变化带来的干扰,提高了识别对象姿态变化的准确性,保证了扩音控制的针对性,进而有效提高了声音质量。
在一种可能实施方式中,所述若所述目标时刻对应的纵坐标在第二时长内的变化幅度小于目标幅度,确定所述对象的姿态发生变化包括:
若所述对象的纵坐标变小,且所述目标时刻对应的纵坐标在所述第二时长内的变化幅度小于目标幅度,确定所述对象从站姿改变为坐姿;
若所述对象的纵坐标变大,且所述目标时刻对应的纵坐标在所述第二时长内的变化幅度小于目标幅度,确定所述对象从坐姿改变为站姿。
其中,由于扩音需求对应的姿态变化通常对应于纵向的姿态变化,例如,站姿到坐姿,或者,坐姿到站姿,因此,基于纵坐标来确定发生大幅度姿态变化的第一对象,能够贴合会议场景中的实际情况,有效提高了基于姿态变化进行扩音控制的准确性。
在一种可能实施方式中,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理。
在上述技术方案中,考虑到对象的姿态变化具有不可预测性,结合拾音区域中的声音信号对来源于第一对象的声音信号进行扩音控制,精准判断各种特殊情况下的扩音需求,有效提高基于姿态变化进行扩音控制的准确性,进而提高声音质量。
在一种可能实施方式中,所述响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
响应于所述至少一个对象中第一对象发生了第一姿态变化,在所述拾音区域中的声音信 号的音量大于或等于音量阈值的情况下,对来源于所述第一对象的声音信号进行扩音处理;
响应于所述至少一个对象中第一对象发生了所述第一姿态变化,在所述拾音区域中的声音信号的音量小于音量阈值的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
其中,通过上述技术方案,在发生姿态变化后,结合拾音区域中的声音信号的音量,对来源于第一对象的声音信号进行扩音控制,考虑到了不同场景下的扩音需求,提高了扩音控制的准确性,进而有效提高了声音质量。
在一种可能实施方式中,所述响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在检测到人声的情况下,对来源于所述第一对象的声音信号进行扩音处理;
响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在未检测到人声的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
其中,该第一姿态变化表示该第一对象从坐姿变化为站姿。
通过上述技术方案,在发生姿态变化后,对拾音区域中的声音信号的进行人声检测,实现对场景中的扩音需求进行更加智能的判断,提高了针对不同场景进行扩音控制的准确性,进而有效提高了声音质量。
在一种可能实施方式中,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
响应于所述至少一个对象中第一对象发生了第一姿态变化,获取所述第一对象在所述拾音区域中的位置;
基于所述拾音区域中的声音信号,确定所述声音信号的声源位置;
所述第一对象位于所述声源位置时,对来源于所述第一对象的声音信号进行扩音处理。
在一些实施例中,所述声源位置是指所述声音信号对应的声源在所述拾音区域中的角度信息;
所述第一对象位于所述声源位置时,对来源于所述第一对象的声音信号进行扩音处理包括:
所述第一对象在所述拾音区域中的角度信息与所述声音信号对应的声源在所述拾音区域中的角度信息匹配的情况下,对来源于所述第一对象的声音信号进行扩音处理。
其中,所述角度信息可以是所述声音信号对应的声源相对于拾音区域中的麦克风阵列的角度,结合基于麦克风阵列在拾音区域中的位置,即可确定拾音区域中声音信号的声源位置。
通过上述技术方案,在发生姿态变化后,通过对比拾音区域中的声源位置与发生姿态变化的第一对象的位置,判断场景中的扩音需求,进一步提高了针对不同场景进行扩音控制的准确性,进而有效提高了声音质量。
在一种可能实施方式中,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
响应于所述至少一个对象中第一对象发生了第二姿态变化,对来源于所述第一对象的声音信号不进行扩音处理。
该第二姿态变化表示第一对象从站姿变化为坐姿。
第二方面,提供了一种信号处理方法,该方法包括:
从拾音区域的多个拾音设备中确定目标声源的目标拾音设备,所述目标拾音设备与所述目标声源之间的距离满足目标条件;
对来源于所述目标拾音设备的声音信号进行扩音处理。
上述技术方案中,通过确定目标声源的目标拾音设备,能够及时且精准地对目标声源的声音信号进行扩音控制,有效提高了声音质量。
在一种可能实施方式中,所述拾音区域配置有所述多个拾音设备和遥控设备,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备包括:
基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离;
基于所述遥控设备和所述多个拾音设备之间的距离,确定所述目标拾音设备。
上述技术方案中,基于遥控设备与拾音设备之间的信号交互,能够根据遥控设备的位置,实时确定目标声源的目标拾音设备,进而及时且精准地对目标声源的声音信号进行扩音控制,有效提高了声音质量。
在一种可能实施方式中,所述基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离包括:
获取所述遥控设备与所述多个拾音设备之间进行信号交互的时间信息,所述时间信息包括所述遥控设备记录的交互时间以及所述多个拾音设备记录的交互时间;
基于所述时间信息,确定所述遥控设备和所述多个拾音设备之间的距离。
其中,以多个拾音设备中的第一拾音设备为例,该遥控设备与第一拾音设备进行信号交互的时间信息包括:遥控设备向第一拾音设备发送信号的时刻T a1;第一拾音设备接收到遥控设备发送的信号的时刻T b1;第一拾音设备在接收到遥控设备发送的信号之后,向遥控设备发送信号的时刻T b2;遥控设备接收到第一拾音设备发送的信号的时刻T a2
通过上述技术方案,基于遥控设备与多个拾音设备之间一对多的信号交互过程,能够同步获取多个拾音设备与遥控设备之间的时间信息,大大提高了获取时间信息的效率。
在一种可能实施方式中,所述获取所述遥控设备与所述多个拾音设备之间进行信号交互的时间信息包括:
从所述遥控设备接收所述遥控设备记录的交互时间以及所述多个拾音设备记录的交互时间;或,
从所述遥控设备接收所述遥控设备记录的交互时间,从所述多个拾音设备接收所述多个拾音设备记录的交互时间。
上述技术方案中,提供了多种方式从不同的设备获取时间信息,使得本申请实施例提供的信号处理方法能够灵活地适配不同的应用场景。其中,仅从遥控设备获取时间信息的方式,能够减少会议终端与拾音设备进行信号交互的次数,简化会议终端获取时间信息的过程,提高获取时间信息的效率。
在一种可能实施方式中,所述基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离包括:
从所述遥控设备接收所述遥控设备和所述多个拾音设备之间的距离,所述距离由所述遥控设备基于所述遥控设备与所述多个拾音设备之间进行信号交互的过程中记录的交互时间确定。
通过上述技术方案,会议终端直接基于获取到的距离即可确定目标拾音设备,在减少会议终端与多个拾音设备进行信号交互的次数的同时,充分利用到了遥控设备的运算能力,减 轻了会议终端的运算负荷。
在一种可能实施方式中,所述信号交互通过蓝牙、超声波、超宽带和无线局域网中任一种方式进行。
在一种可能实施方式中,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备之前,所述方法还包括:
所述遥控设备与所述多个拾音设备进行时间同步。
遥控设备与多个拾音设备之间进行时间同步,能够确保各个设备记录交互时间处于同一时间体系中,保证确定出的交互时间的准确性,进而保证确定出的距离的准确性。
在一种可能实施方式中,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备包括:
获取所述多个拾音设备对所述目标声源的定位信息;
基于所述定位信息,确定所述目标声源与所述多个拾音设备之间的距离;
将与所述目标声源之间的距离满足所述目标条件的拾音设备,确定为所述目标拾音设备。
在一种可能实施方式中,所述多个拾音设备为多个麦克风阵列,
所述定位信息包括所述多个麦克风阵列与所述目标声源之间的角度信息。
在一种可能实施方式中,所述方法还包括:
在所述目标声源不在所述拾音区域的有效拾音范围内的情况下,不对来源于所述目标拾音设备的声音信号进行扩音处理;
在所述目标声源在所述拾音区域的有效拾音范围内的情况下,对来源于所述目标拾音设备的声音信号进行扩音处理。
在一种可能实施方式中,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备之前,所述方法还包括:
对所述多个拾音设备的声音信号进行降噪处理。
通过对声音信号进行降噪处理,可以避免拾音区域中的突发噪音对目标声源的定位信息的准确性造成影响,进而提高声音质量。
第三方面,提供了一种信号处理装置,该装置包括多个功能模块,用于执行如第一方面所提供的信号处理方法中的对应步骤。
第四方面,提供了一种信号处理装置,该装置包括多个功能模块,用于执行如第二方面所提供的信号处理方法中的对应步骤。
第五方面,提供了一种信号处理设备,该信号处理设备包括处理器和存储器,该存储器用于存储至少一段程序代码,该至少一段程序代码由该处理器加载并执行上述的信号处理方法。
第六方面,提供了一种计算机可读存储介质,该计算机可读存储介质用于存储至少一段程序代码,该至少一段程序代码用于执行上述的信号处理方法。
第七方面,提供了一种计算机程序产品,当该计算机程序产品在信号处理设备上运行时,使得该信号处理设备执行上述的信号处理方法。
附图说明
图1是本申请实施例提供的一种信号处理系统的架构示意图;
图2是本申请实施例提供的一种信号处理系统的部署示意图;
图3是本申请实施例提供的一种信号处理方法的流程图;
图4是本申请实施例提供的一种对象坐标的示意图;
图5是本申请实施例提供的一种声源位置的示意图;
图6是本申请实施例提供的一种信号处理系统的架构示意图;
图7是本申请实施例提供的一种信号处理系统的部署示意图;
图8是本申请实施例提供的一种信号处理方法的流程图;
图9是本申请实施例提供的一种信号交互过程的示意图;
图10是本申请实施例提供的一种TW-TOF测距方法的示意图;
图11是本申请实施例提供的一种信号处理系统的部署示意图;
图12是本申请实施例提供的一种信号处理方法的流程图;
图13是本申请实施例提供的一种定位信息获取过程的示意图;
图14是本申请实施例提供的一种距离确定原理的示意图;
图15是本申请实施例提供的一种目标声源不在有效拾音范围内的示意图;
图16是本申请实施例提供的一种信号处理装置的结构示意图;
图17是本申请实施例提供的一种信号处理装置的结构示意图;
图18是本申请实施例提供的一种信号处理设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在介绍本申请实施例提供的技术方案之前,下面先对本申请涉及的专业术语进行说明。
双向飞行时间法(two way-time of flight,TW-TOF):利用信号在两个异步收发机(transceiver)之间的飞行时间来测量两个异步收发机对应的节点间的距离。
超宽频(ultra-wideband,UWB)技术是在较宽的频谱上传送极低功率的信号的技术,能实现数百Mbit/s至2Gbit/s的数据传输速率,具有穿透力强、功耗低、抗干扰效果好、安全性高、空间容量大、能精确定位等诸多优点。
接下来对本申请实施例的技术方案进行介绍:
本申请实施例提供了一种信号处理方法,应用于包括图像采集设备的信号处理系统中,该信号处理系统中的信号处理设备能够基于图像采集设备采集的拾音区域的图像,检测拾音区域中对象的姿态变化,从而在拾音区域中的第一对象发生姿态变化的情况下,对来源于第一对象的声音信号进行相应的扩音处理。通过上述技术方案,能够根据检测出的拾音区域中对象的姿态变化,及时且精准地判断场景中的扩音需求,进而按照扩音需求对声音信号进行相应的扩音控制,有效提升了声音质量。
本申请实施例提供了另一种信号处理方法,应用于包括多个拾音设备的信号处理系统中,该信号处理系统中的信号处理设备能够从拾音区域的多个拾音设备中,确定与目标声源之间的距离满足目标条件的目标拾音设备,从而对来源于该目标拾音设备的声音信号进行扩音处理。上述技术方案中,通过确定目标声源的目标拾音设备,能够及时且精准地对目标声源的 声音信号进行扩音控制,有效提高了声音质量。
其中,拾音设备用于拾取声音信号。拾音设备具有多种形态,例如,拾音设备可以是麦克风或麦克风阵列等。该麦克风可以是固定麦克风,例如,桌面嵌入式的麦克风;该麦克风还可以是可移动的麦克风。其中,麦克风阵列是指将多个麦克风(单元)按照某种空间结构进行排列得到的阵列结构,麦克风阵列根据阵列结构的空间特性,能够对多个方向的声音信号进行处理,得到各个角度范围内的声音信号。根据不同的使用场景,能够选择不同形态的拾音设备来拾取声音信号,本申请实施例中对拾音设备的形态不做限定。
图1是本申请实施例提供的一种信号处理系统的架构示意图。如图1所示,该信号处理系统包括:图像采集设备110、拾音设备120、信号处理设备130以及扩音设备140。其中,该图像采集设备110用于采集拾音区域的图像;该拾音设备120用于拾取拾音区域中的声音信号;该信号处理设备130用于基于拾音区域的图像,检测拾音区域中对象的姿态变化,并基于检测出的姿态变化,确定对拾音区域中的声音信号的扩音控制方式,并基于扩音控制方式生成相应的扩音控制指令并向扩音设备140发送,其中,该扩音控制方式包括:打开扩音和关闭扩音;该扩音设备140响应于接收到扩音控制指令,按照扩音控制指令指示的扩音控制方式,对声音信号进行扩音,或者,不对声音信号进行扩音。
基于图1对应的信号处理系统的架构,本申请实施例提供了一种信号处理系统的部署示意图,该信号处理系统应用于会议场景中,拾音区域即为会场,拾音区域中的至少一个对象也即是会场中的至少一个与会人。如图2所示,该信号处理系统包括:作为图像采集设备的摄像头210;作为拾音设备的麦克风阵列220;作为信号处理设备的会议终端230;作为扩音设备的扬声器240。其中,该摄像头210部署在会场中,用于采集会场图像。可选地,该摄像头210包括多个摄像头,分别部署在会场中的不同位置,通过该多个摄像头,能够获得更完整的会场图像。其中,该麦克风阵列220用于拾取会场中的声音信号。可选地,通过将该麦克风阵列220部署在会场墙壁的中间位置,使得麦克风阵列220的拾音范围能够均匀地覆盖会场。其中,该会议终端230基于摄像头210采集的会场图像,检测各个与会人的姿态变化,并针对麦克风阵列220拾取的声音信号,生成相应的扩音控制指令,该扩音控制指令指示对声音信号进行相应的扩音处理。该扬声器240响应于接收到扩音控制指令,在扩音控制指令指示对声音信号进行扩音的情况下,对声音信号进行放大并输出放大后的声音;在扩音控制指令指示不对声音信号进行扩音的情况下,不输出声音。图2中是以摄像头210和麦克风阵列220为分别独立于会议终端230以外的设备为例进行说明,可选地,摄像头210、麦克风阵列220可以内置在会议终端230中,作为一个设备部署在会场中。
图3是本申请实施例提供的一种信号处理方法的流程图。该方法应用于图2对应的信号处理系统中,该信号处理系统包括摄像头210、麦克风阵列220、会议终端230以及扬声器240,该信号处理方法由会议终端230执行。如图3所示,该方法包括:
301、会议终端基于拾音区域在不同时刻的图像,分别确定该拾音区域内的不同时刻对应的坐标集合,该坐标集合包括至少一个对象在拾音区域中的坐标。
其中,该拾音区域中存在至少一个对象,则摄像头采集到的拾音区域的图像中,包括至少一个对象在拾音区域中的位置。
在一些实施例中,摄像头的位置固定,摄像头采集到的图像是摄像头采集范围内拾音区 域的图像。会议终端基于从摄像头接收到拾音区域的图像,确定拾音区域的参考坐标系,基于此,拾音区域中至少一个对象在拾音区域中的坐标,即可用参考坐标系中的坐标来表示。为了便于理解上述过程,本申请实施例提供了一种对象坐标的示意图,如图4所示,拾音区域的初始图像中,包括四个对象,拾音区域的参考坐标系以拾音区域的图像的左下角为原点(0,0),参考坐标系的x轴范围即为图像的横向宽度,参考坐标系的y轴范围即为图像的纵向长度。其中,对象的脸部所占的图像区域的中心点坐标是对象在拾音区域的参考坐标系中的坐标。如图4所示,对象1的坐标为(x 1,y 1),对象2的坐标为(x 2,y 2),对象3的坐标为(x 3,y 3),对象4的坐标为(x 4,y 4)。
在一些实施例中,会议终端根据摄像头在不同时刻采集的图像,用坐标集合记录不同时刻至少一个对象在参考坐标系中的坐标。在这种示例下,拾音区域初始时刻的坐标集合即包括上述图4中的初始图像对应的四个对象的坐标。随着摄像头不断采集下一时刻的图像,会议终端基于不同时刻的图像即可确定不同时刻对应的坐标集合。
在一些实施例中,每隔第一时长,会议终端对摄像头采集到的拾音区域的图像进行对象识别,获取识别到的至少一个对象的目标特征在该图像中的坐标,以得到不同时刻对应的坐标集合。可选地,该目标特征是至少一个对象的人脸特征,例如,脸部中心点或者眼睛等五官。会议终端对图像进行识别,基于识别出的人脸特征即可确定人脸,从而用人脸特征在图像中的坐标,表示对象在拾音区域中的坐标。可选地,人脸特征在图像中的坐标可以是人脸特征所占的图像区域的中心点坐标,例如,对象的脸部中心点的坐标,参见上述图4。
在一些实施例中,由于至少一个对象在拾音区域中的坐标是基于人脸识别确定的,对象的坐标变化能够用于表示对象的姿态变化。例如,从一个时刻到下一个时刻,若对象从坐姿改变为站姿,相应地该对象的人脸会向上移动,则对象的纵坐标会增大;若对象从站姿改变为坐姿,相应地该对象的人脸会向下移动,则对象的纵坐标会减小。可以理解地,在会议场景中,与会人从坐姿改变为站姿,表示该与会人需要发言,则需要扩音;与会人从站姿变坐姿,表示该与会人停止发言,则不需要继续扩音。基于此,能够基于不同时刻该至少一个对象在该拾音区域中的坐标,判断至少一个对象是否存在扩音需求。
通过上述技术方案,基于拾音区域中的图像,持续记录各个对象在不同时刻的坐标,进而识别出各个对象在不同时刻是否发生了姿态变化,为基于对象的姿态变化进行扩音处理提供了数据基础,能够准确的确定扩音控制的对象以及相应的扩音控制方式,进而有效地提高声音质量。
在一些实施例中,会议终端能够在确定该至少一个对象在拾音区域中的坐标的同时,确定每个对象的身份。在这种示例下,会议终端关联有人脸数据库,该人脸数据库中存储有多个已知对象的人脸数据。可选地,该人脸数据包括每个已知对象的人脸特征数据,例如,对象的眼睛特征数据。基于此,会议终端基于从拾音区域的图像中识别出的人脸,与该人脸数据库中的人脸数据进行匹配,在该识别到的人脸与任一已知对象的人脸数据匹配的情况下,则确定识别到的人脸为该已知对象的人脸,进而将所识别到的人脸的坐标,确定为该已知对象在拾音区域中的坐标。可选地,人脸数据库中的每个已知对象具有对象标识,通过将识别到的人脸的人脸特征在图像中的坐标与和匹配的已知对象的对象标识进行绑定,能够在确定对象在拾音区域中的坐标的同时,确定对象身份。在一些实施例中,在识别到的人脸与人脸数据库中任一已知对象的人脸数据都不匹配的情况下,可以为识别到的人脸对应的对象创建新的对象标识,并写入该识别到的人脸对应的对象的身份信息,实现在人脸数据库中新添加新对象的目的。
通过上述技术方案,无需关注对象整体,仅基于对象的目标特征即可确定对象在拾音区域中的坐标,并持续基于目标特征来检测对象的坐标变化,在保证坐标准确性的同时,减少了运算量,提高了检测姿态变化的效率。进一步地,基于人脸数据库来识别对象的身份,能够防止未经许可的对象参与到会议中,为会议进行提供了安全保障。
302、会议终端基于拾音区域不同时刻对应的坐标集合中的坐标,确定目标方差,该目标方差表示拾音区域中至少一个对象在不同时刻的姿态变化程度。
其中,目标方差能够表示不同对象与所有对象之间的差异,因此,基于拾音区域不同时刻对应的坐标集合确定的目标方差,能够体现每一个对象的坐标相对于坐标平均值的差异,也就能够及时且准确的识别出是否存在发生姿态变化的对象。
在一些实施例中,由于该坐标集合中的坐标能够表示对象的姿态,因此,该目标方差能够表示拾音区域中至少一个对象在不同时刻的姿态变化程度,该至少一个对象的姿态变化越明显,则目标方差的值越大,例如,在T 1时刻,拾音区域中的N个对象均为坐姿,在T 1之后的T 2时刻,N个对象中的对象A从坐姿改变为站姿,则在T 2时刻,对象A的坐标相对于N个对象的坐标平均值的差异要大于T 1时刻。其中,目标方差的计算参见公式(1)。
D(x)=E{Σ[X-E(X)] 2}     (1)
其中,D(x)是当前时刻的目标方差,X是当前时刻对应的至少一个对象的当前坐标,E(x)是坐标集合中各个坐标的坐标平均值,该坐标集合包括不同时刻对应的坐标集合。
通过上述技术方案,基于目标方差对拾音区域中发生的姿态变化进行预判断,在目标方差大于方差阈值的情况下才进行后续步骤,节省了计算资源,提高了扩音控制的效率。
303、在目标方差大于方差阈值的情况下,会议终端基于至少一个对象在不同时刻对应的坐标集合中的纵坐标,确定至少一个对象中第一对象的目标时刻,该目标时刻为第一对象的纵坐标发生变化的时刻。
在一些实施例中,该目标方差大于方差阈值,说明该至少一个对象在不同时刻的姿态变化程度足够明显,也即是,该至少一个对象中存在姿态变化,例如,从站姿改变为坐姿。相应地,若该目标方差小于方差阈值,说明该至少一个对象未发生姿态变化,或者,发生了一些幅度较小的姿态变化,例如,轻微晃动头部。
可以理解地,方差阈值的大小决定了会议终端检测姿态变化的灵敏程度,方差阈值越小,会议终端对拾音区域中对象的姿态变化越敏感。在一些实施例中,具有高分辨率摄像头能够捕捉到十分细微的姿态变化,也即是,具有高分辨率摄像头对姿态变化十分敏感。因此,在摄像头具有高分辨率的情况下,为了避免大量细微的姿态变化影响到对扩音需求对应的姿态变化的检测,可以相应增大方差阈值,保证姿态检测的准确性。
在一些实施例中,会议终端从该至少一个对象中,根据对象的纵坐标,进一步确定出发生了姿态变化的第一对象。可选地,会议终端能够基于对象的对象标识,从不同时刻对应的坐标集合中,获取同一对象对应的不同时刻的纵坐标,进而将纵坐标发生变化的对象确定为第一对象,并获取第一对象纵坐标发生变化的目标时刻。可以理解地,由于扩音需求对应的姿态变化通常对应于纵向的姿态变化,例如,站姿到坐姿,或者,坐姿到站姿,因此,基于纵坐标来确定发生大幅度姿态变化的第一对象,能够贴合会议场景中的实际情况,有效提高基于姿态变化进行扩音处理的准确性。当然,出于不同场景的考虑,能够根据不同维度的数据来确定发生姿态变化的对象,本申请实施例对此不做限定。
304、若目标时刻对应的纵坐标在第二时长内的变化幅度小于目标幅度,会议终端确定第 一对象的姿态发生变化。
在一些实施例中,在确定了第一对象的纵坐标在目标时刻发生变化之后,需要结合第一对象的纵坐标变化的趋势来确定第一对象发生了何种姿态变化。其中,若该对象的纵坐标变小,且该目标时刻对应的纵坐标在该第二时长内的变化幅度小于目标幅度,则能够确定该对象从站姿改变为坐姿;若该对象的纵坐标变大,且该目标时刻对应的纵坐标在该第二时长内的变化幅度小于目标幅度,则能够确定该对象从坐姿改变为站姿。
可以理解地,第一对象出于扩音需求发生的姿态变化应该是从一个稳定状态变化到另一个稳定状态,例如,从持续坐姿变化到持续站姿。若第一对象的纵坐标在目标时刻之后的第二时长内变化明显,也即是,变化幅度大于目标幅度,说明第一对象变化后的状态并不稳定,例如,第一对象一开始为坐姿,在目标时刻站起取物后又迅速坐下。在这种情况下,会议终端判断该第一对象发生的姿态变化并非是由于扩音需求而发生的,从而不会进行相应的扩音控制。
通过上述技术方案,结合了发生姿态变化的目标时刻之后的一段时间内的纵坐标进行判断,避免了复杂姿态变化带来的干扰,提高了识别对象姿态变化的准确性,保证了扩音控制的针对性,进而有效提高了声音质量。
305、响应于至少一个对象中第一对象发生姿态变化,会议终端结合拾音区域中的声音信号,对来源于第一对象的声音信号进行相应的扩音处理。
其中,会议终端基于第一对象发生的姿态变化能够确定第一对象的扩音需求,进而根据扩音需求确定相应的扩音控制方式,例如,在需要扩音的情况下,则扩音控制方式为打开扩音;在不需要扩音的情况下,则扩音控制方式为关闭扩音。会议终端基于针对第一对象确定的扩音控制方式,生成针对来源于第一对象的声音信号的扩音控制指令,并向信号处理系统中的扬声器发送该扩音控制指令。在一些实施例中,该扩音控制指令包括扩音打开指令以及扩音关闭指令。该扩音打开指令指示扬声器对声音信号进行放大并输出放大后的声音;该扩音关闭指令指示扬声器不输出声音。本申请实施例对会议终端进行相应扩音控制的方式不做限定。
在一些实施例中,第一对象发生的姿态变化为第一姿态变化,该第一姿态变化表示该第一对象从坐姿变化为站姿。可以理解地,在一些特殊情况下,该第一对象从坐姿变化为站姿并不一定表示该第一对象存在扩音需求,例如,该第一对象的姿态从坐姿变化为站姿,并慢步走出拾音区域,期间并未发出声音。此时,结合第一对象的姿态发生变化后拾音区域中的声音信号,能够进一步排除此类特殊情况。
方式一、结合拾音区域中的声音信号的音量。
在一些实施例中,会议终端响应于第一对象发生了第一姿态变化,在拾音区域中的声音信号的音量大于或等于音量阈值的情况下,对来源于第一对象的声音信号进行扩音处理。可以理解地,拾音区域中的声音信号的音量大于或等于音量阈值,表示拾音区域中大概率存在需要扩音的声音。此时,结合第一对象发生了第一姿态变化,可以认为是该第一对象发生第一姿态变化后进行了发言,也即是,该第一对象存在扩音需求,则对来源于第一对象的声音信号进行扩音处理。
在另一些实施例中,会议终端响应于第一对象发生了第一姿态变化,在拾音区域中的声音信号的音量小于音量阈值的情况下,对来源于该第一对象的声音信号不进行扩音处理。相应地,拾音区域中的声音信号的音量小于音量阈值,表示拾音区域中大概率不存在需要扩音的声音。此时,即使第一对象发生了第一姿态变化,依旧认为该第一对象不存在扩音需求, 则对来源于第一对象的声音信号不进行扩音处理。
通过上述技术方案,在发生姿态变化后,结合拾音区域中的声音信号的音量,对来源于第一对象的声音信号进行扩音处理,考虑到了不同场景下的扩音需求,提高了扩音处理的准确性,进而有效提高了声音质量。
方式二、对拾音区域中的声音信号进行人声检测。
在一些实施例中,会议终端响应于第一对象发生了第一姿态变化,对拾音区域中的声音信号进行人声检测,对来源于该第一对象的声音信号进行扩音处理。可以理解地,在拾音区域中检测到人声,表示拾音区域中大概率有人在发言。此时,结合第一对象发生了第一姿态变化,可以认为是该第一对象发生第一姿态变化后进行了发言,也即是,该第一对象存在扩音需求,则对来源于第一对象的声音信号进行扩音处理。
在另一些实施例中,会议终端响应于第一对象发生了第一姿态变化,对拾音区域中的声音信号进行人声检测,在未检测到人声的情况下,对来源于第一对象的声音信号不进行扩音处理。相应地,在拾音区域中未检测到人声,表示拾音区域中大概率无人在发言。此时,即使第一对象发生了第一姿态变化,依旧认为该第一对象不存在扩音需求,则对来源于第一对象的声音信号不进行扩音处理。
通过上述技术方案,在发生姿态变化后,对拾音区域中的声音信号的进行人声检测,实现对场景中的扩音需求进行更加智能的判断,提高了针对不同场景进行扩音处理的准确性,进而有效提高了声音质量。
方式三、结合拾音区域中的声音信号的声源位置。
在另一些实施例中,第一对象发生的姿态变化为第一姿态变化,拾音区域中的声音信号的音量大于音量阈值,且,该拾音区域中存在人声,但该第一对象并不存在扩音需求,例如,该第一对象的姿态从坐姿变化为站姿,随后该第一对象走出拾音区域且未发出声音,在此期间拾音区域中的其他对象正在进行发言,也即是,发生第一姿态变化的第一对象,并不是拾音区域中的声音信号对应的声源。此时,结合第一对象的姿态发生变化后拾音区域中的声音信号的声源位置,能够进一步排除此类特殊情况。结合拾音区域中的声音信号的声源位置,对来源于第一对象的声音信号进行相应的扩音处理的过程包括下述步骤1至步骤3:
步骤1、会议终端响应于第一对象发生了第一姿态变化,获取第一对象在拾音区域中的位置。
在一些实施例中,会议终端通过获取第一对象在拾音区域中的坐标,并结合摄像头在拾音区域中的部署位置,能够确定第一对象在拾音区域中的位置。
步骤2、会议终端基于拾音区域中的声音信号,确定该声音信号的声源位置。
在一些实施例中,会议终端通过麦克风阵列,获取声音信号的声源相对于麦克风阵列的位置,进而基于麦克风阵列在拾音区域中的位置,确定拾音区域中声音信号的声源位置。可选地,该麦克风阵列获取声源相对于麦克风阵列的角度,以确定声源相对于麦克风阵列的位置。在一些实施例中,基于声源相对于麦克风阵列的角度,进一步结合声音信号到达麦克风阵列的不同麦克风(单元)的时延,能够确定声源相对于麦克风阵列的距离,从而根据麦克风阵列在拾音区域中的位置,即可确定声音信号在拾音区域中的声源位置。
步骤3、在第一对象位于声音信号的声源位置时,会议终端对来源于该第一对象的声音信号进行扩音处理。
在一些实施例中,第一对象位于声音信号的声源位置,表示该第一对象发生了第一姿态变化,且,第一对象即为拾音区域中发出声音的声源,因此,可以认为该第一对象存在扩音 需求,对来源于该第一对象的声音信号进行扩音处理。
本申请实施例提供了一种声源位置的示意图,如图5所示,会议终端基于麦克风阵列501拾取的拾音区域中的声音信号,确定声音信号对应的声源502的声源位置,并获取第一对象503的位置,与声源位置进行对比,其中,第一对象503的位置基于摄像头504采集的图像确定。
需要说明的是,出于不同的需求,可以对上述方式一、方式二和方式三进行组合后使用,以对不同场景中的扩音需求进行更加精准的判断,从而针对性提高扩音处理的准确性,以提升声音质量,本申请实施例对此不做限定。
在一些实施例中,该第一对象发生的姿态变化为第二姿态变化,该第二姿态变化表示第一对象变化不存在扩音需求,例如,从站姿变化为坐姿。因此,响应于该至少一个对象中第一对象发生了第二姿态变化,无需结合声音信号,会议终端对来源于该第一对象的声音信号不进行扩音处理。
通过上述技术方案,在发生姿态变化后,通过对比拾音区域中的声源位置与发生姿态变化的第一对象的位置,判断场景中的扩音需求,进一步提高了针对不同场景进行扩音控制的准确性,进而有效提高了声音质量。
在上述技术方案中,考虑到对象的姿态变化具有不可预测性,结合拾音区域中的声音信号对来源于第一对象的声音信号进行扩音处理,精准判断各种特殊情况下的扩音需求,有效提高基于姿态变化进行扩音处理的准确性,进而提高声音质量。
通过上述技术方案,根据检测出的拾音区域中对象的姿态变化,能够及时且精准地判断场景中的扩音需求,进而按照扩音需求对声音信号进行相应的扩音控制,有效提升了声音质量。
图6是本申请实施例提供的另一种信号处理系统的架构示意图。如图6所示,该信号处理系统包括:多个支持定位功能的拾音设备610、信号处理设备620以及扩音设备630。其中,该支持定位功能的拾音设备610用于拾取拾音区域中的声音信号,并获取关于拾音区域中目标声源的信息;该信号处理设备620从拾音设备610获取拾音区域中的声音信号以及关于目标声源的信息,从多个拾音设备610中,确定出目标声源的目标拾音设备以及对目标声源的扩音控制方式,基于此,生成针对来源于目标拾音设备的声音信号的扩音控制指令,并向扩音设备630发送,其中,该扩音控制方式包括:打开扩音和关闭扩音;该扩音设备630响应于接收到扩音控制指令,按照扩音控制指令指示的扩音控制方式,对来源于目标拾音设备的声音信号进行扩音,或者,不对来源于目标拾音设备的声音信号进行扩音。
基于图6对应的信号处理系统,本申请实施例提供了一种信号处理系统的部署示意图,该信号处理系统应用于会议场景中,拾音区域即为会场。如图7所示,该信号处理系统包括:作为多个支持定位功能的拾音设备的多个支持定位功能的麦克风710;用于和麦克风710进行信号交互的遥控设备720;作为信号处理设备的会议终端730;作为扩音设备的扬声器740。其中,该多个带有定位功能的麦克风710用于拾取会场中的声音信号,并与遥控设备720进行信号交互。其中,该遥控设备720的位置代表目标声源的位置。该会议终端730基于麦克风710与遥控设备720之间的信号交互,能够从会场中的多个麦克风710中,确定目标声源的目标麦克风以及对目标声源的扩音控制方式,进而生成针对来源于目标麦克风的声音信号的扩音控制指令;可选地,目标麦克风是距离目标声源最近的麦克风。该扬声器740响应于 接收到该扩音控制指令,按照扩音控制指令指示的扩音控制方式,对来源于目标麦克风的声音信号进行放大并输出放大后的声音,或者,不输出来源于目标麦克风的声音信号。
图8是本申请实施例提供的一种信号处理方法的流程图。该方法应用于图7对应的信号处理系统中,该信号处理方法由会议终端730执行。如图8所示,该方法包括:
801、会议终端获取遥控设备与多个麦克风之间进行信号交互的时间信息,该时间信息包括该遥控设备记录的交互时间以及多个麦克风记录的交互时间。
其中,以多个麦克风中的第一麦克风为例,该遥控设备与第一麦克风进行信号交互的时间信息包括:遥控设备向第一麦克风发送信号的时刻T a1;第一麦克风接收到遥控设备发送的信号的时刻T b1;第一麦克风在接收到遥控设备发送的信号之后,向遥控设备发送信号的时刻T b2;遥控设备接收到第一麦克风发送的信号的时刻T a2
在一些实施例中,以多个麦克风中的第一麦克风为例,会议终端从遥控设备接收遥控设备记录的交互时间T a1和T a2,从第一麦克风接收该第一麦克风记录的交互时间T b1和T b2
在另一些实施例中,以多个麦克风中的第一麦克风为例,第一麦克风在接收到遥控设备发送的信号之后,向遥控设备发送信号,并在向遥控设备发送的信号中携带T b1和T b2,基于此,会议终端能够从该遥控设备接收遥控设备记录的交互时间T a1和T a2以及第一麦克风记录的交互时间T b1和T b2。通过这种方式来获取信号交互的时间信息,能够减少会议终端与麦克风进行信号交互的次数,简化会议终端获取时间信息的过程,提高获取时间信息的效率。
需要说明的是,多个麦克风中的其他麦克风与上述第一麦克风同理,在此不作赘述。
上述技术方案中,提供了多种方式从不同的设备获取时间信息,使得本申请实施例提供的信号处理方法能够灵活地适配不同的应用场景。
本申请实施例提供了一种信号交互过程的示意图,如图9所示,拾音区域中部署有麦克风901、麦克风902、麦克风903以及麦克风904,遥控设备905向各个麦克风发送信号,信号经过各个麦克风对应的发送时延t i(i=1,2,3,4)后被各个麦克风接收;各个麦克风在接收到遥控设备发送的信号之后,分别向遥控设备905发送回复信号,回复信号经过各个麦克风对应的回复时延t ireply(i=1,2,3,4)后分别被遥控设备905接收。在上述信号交互的过程中,各个麦克风以及遥控设备各自记录对应的交互时间,并向会议终端906发送。
需要说明的是,上述信号交互可以通过蓝牙、超声波、超宽带和无线局域网中任一种方式进行,本申请实施例对此不做限定。
在一些实施例中,会议终端在获取遥控设备与多个麦克风之间进行信号交互的时间信息的同时,能够从遥控设备获取相应的扩音控制信息,例如,打开扩音、关闭扩音或增大音量等。会议终端基于扩音控制信息,即可确定对目标声源的扩音控制方式,该扩音控制方式包括:打开扩音、关闭扩音、增大音量以及减小音量等。
在一些实施例中,在信号交互过程开始之前,遥控设备与多个麦克风之间进行时间同步,以保证其各自记录交互时间处于同一时间体系中,保证确定出的交互时间的准确性,进而保证确定出的距离的准确性。
通过上述技术方案,基于遥控设备与多个麦克风之间一对多的信号交互过程,能够同步获取多个麦克风与遥控设备之间的时间信息,大大提高了获取时间信息的效率。
802、会议终端基于该时间信息,确定该遥控设备和多个麦克风之间的距离。
在一些实施例中,以多个麦克风中的第一麦克风为例,会议终端基于第一麦克风对应的T a1、T b1、T b2以及T a2,采用TW-TOF测距方法,能够确定第一麦克风与遥控设备之间的距离, 确定过程参见公式(2)至公式(3)。
Figure PCTCN2023071517-appb-000001
d 1=t 1*c     (3)
其中,T a1、T b1、T b2以及T a2的定义参见步骤801;t 1是信号从第一麦克风到遥控设备经过的时延;c是光速;d 1是第一麦克风与遥控设备之间的距离。
为了便于理解上述过程,本申请实施例提供了一种TW-TOF测距方法的示意图,如图10所示,模块A向模块B发送一个数据包A,并记录下发包时刻T a1;模块B接收到数据包A,记录下收包时刻T b1;模块B等待T reply时长后,向模块A发送数据包B,并记录下发包时刻T b2(T b2=T reply+T b1);模块A接收到数据包B,并记录下收包时刻T a2。则根据公式(2)和公式(3),能够计算出模块A和模块B之间的距离d 1
通过上述技术方案,基于TW-TOF测距方法将多个麦克风对应的时间信息转化为可供参考的距离,保证了确定出的距离的精确度。
在另一些实施例中,遥控设备基于该遥控设备与该多个麦克风之间进行信号交互的过程中记录的交互时间,确定多个麦克风和遥控设备之间的距离,会议终端直接从该遥控设备接收该多个麦克风和遥控设备之间的距离。通过上述技术方案,会议终端直接基于获取到的距离即可确定目标拾音设备,在减少会议终端与多个拾音设备进行信号交互的次数的同时,充分利用到了遥控设备的运算能力,减轻了会议终端的运算负荷。
803、会议终端基于遥控设备和多个麦克风之间的距离,确定目标麦克风,该目标麦克风与目标声源之间的距离满足目标条件。
在本申请实施例中,该目标条件是指:在多个麦克风之中,与目标声源之间的距离最近。可以理解地,该目标声源的位置即为遥控设备的位置,因此,距离遥控设备最近的麦克风即为距离目标声源最近的麦克风,也即是,距离遥控设备最近的麦克风即为目标声源的目标麦克风。通过上述技术方案,能够根据场景实际需求来设置目标条件,以确定所需的目标麦克风,进而精确地针对来源于目标麦克风的声音信号进行扩音控制,有效提高声音质量。
在另一些实施例中,可以根据实际需求,设置目标条件,例如,出于不需要拾取目标声源的声音的目的,该目标条件可以是:在多个麦克风之中,与目标声源之间的距离最远。本申请实施例对此不作限定。
804、会议终端对来源于目标麦克风的声音信号进行扩音处理。
其中,会议终端在确定了目标声源的目标麦克风之后,获取来源于目标麦克风的声音信号,基于来源于目标麦克风的声音信号,生成对应的扩音控制指令,并向信号处理系统中的扬声器发送扩音控制指令。
在一些实施例中,会议终端基于针对目标声源的扩音控制方式,确定针对来源于目标麦克风的声音信号的扩音控制指令,并向扬声器发送该扩音控制指令,该扩音控制指令指示扬声器按照相应的扩音控制方式对来源于目标麦克风的声音信号进行扩音控制。扬声器响应于接收到针对来源于目标麦克风的声音信号的扩音控制指令,按照该扩音控制指令指示的扩音控制方式,对来源于该目标麦克风的声音信号进行放大并输出放大后的声音,或者,不输出来源于目标麦克风的声音信号对应的声音。
在一些实施例中,会议终端对来源于目标麦克风的声音信号进行处理,以保证来源于目 标麦克风的声音信号被更好地放大并输出,进而提高声音质量,例如,对获取到的来源于目标麦克风的声音信号进行降噪处理,本申请实施例对此不做限定。
上述技术方案中,基于麦克风与遥控设备之间的信号交互,能够根据遥控设备的位置,实时确定目标声源的目标麦克风,进而及时且精准地对目标声源的声音信号进行扩音控制,有效提高了声音质量。
基于图6对应的信号处理系统,本申请实施例提供了另一种信号处理系统的部署示意图,该信号处理系统应用于会议场景中,拾音区域即为会场。如图11所示,该信号处理系统包括:作为多个支持定位功能的拾音设备的多个麦克风阵列1110;作为信号处理设备的会议终端1120;作为扩音设备的扬声器1130。其中,该麦克风阵列1110用于拾取会场中的声音信号,并确定对目标声源的定位信息。该会议终端1120基于对目标声源的定位信息,从会场中的多个麦克风阵列1110中,确定目标声源1140的目标麦克风阵列以及对目标声源的扩音控制方式,进而生成针对来源于目标麦克风阵列的声音信号的扩音控制指令;可选地,目标麦克风阵列是距离目标声源最近的麦克风阵列。该扬声器1130响应于接收到该扩音控制指令,按照扩音控制指令指示的扩音控制方式,对来源于目标麦克风阵列的声音信号进行放大并输出放大后的声音,或者,不输出来源于目标麦克风阵列的声音信号。
图12是本申请实施例提供的一种信号处理方法的流程图。该方法应用于图11对应的信号处理系统中,该信号处理方法由会议终端1120执行。如图12所示,该方法包括:
1201、会议终端获取多个麦克风阵列对目标声源的定位信息,该定位信息包括该多个麦克风阵列与该目标声源之间的角度信息。
在一些实施例中,在多个麦克风阵列对目标声源进行定位之前,在会议终端中基于设备部署情况进行参数配置,需要配置的参数包括但不限于:拾音区域的尺寸信息,例如,拾音区域的宽度和长度;多个麦克风阵列与会议终端之间的位置关系,例如,会议终端与任一麦克风阵列之间的距离,多个麦克风阵列之间的距离。
在另一些实施例中,多个麦克风阵列中的第一麦克风内置在会议终端中,在这种情况下,会议终端中配置会议终端和声源之间的最短距离,该最短距离用于预先划定第一麦克风阵列对应的拾音范围,以避免声源位于第一麦克风阵列的拾音范围以外时,无法拾取声源的声音信号。可选地,会议终端中配置有多个麦克风阵列中的第二麦克风阵列的0度角的位置信息,该0度角用于划定该第二麦克风阵列的非拾音范围。
在一些实施例中,会议终端将预先配置的参数发送给各个麦克风阵列,实现对麦克风阵列的参数配置,例如,将第二麦克风阵列的0度角的位置信息发送给第二麦克风阵列,该第二麦克风阵列基于接收到的0度角的位置信息,划分自身的非拾音范围。
为了便于理解上述过程,本申请实施例提供了一种定位信息获取过程的示意图,如图13所示,会议终端1301内置有第一麦克风阵列1302,在拾音区域中与会议终端1301距离L的对称位置,部署有第二麦克风阵列1303。其中,会议终端1301与声源之间的最短距离D 1预先确定,则基于第一麦克风阵列1302和第二麦克风阵列1303的拾音范围的宽度D,可以确定该第二麦克风阵列1303与第一麦克风阵列1302之间的阴影区域为有效拾音区域。其中,第二麦克风阵列的0度角位于直线l处,则0度角逆时针方向的180度角范围为第二麦克风阵列1303的非拾音范围。在一些实施例中,D 1为0,则L即为有效拾音区域的长度。基于此,第一麦克风阵列以及第二麦克风阵列对目标声源1304的定位信息包括:目标声源相对于第一 麦克风阵列的角度θ 1,目标声源相对于第二麦克风阵列的角度θ 2。其中,第一麦克风阵列1302的拾音范围角度θ 3基于D 1和D确定。
需要说明的是,上述过程以第一麦克风阵列和第二麦克风阵列为例进行说明,在包括更多麦克风阵列的情况下,获取定位信息的过程与上述过程同理,在此不作赘述。
在一些实施例中,多个麦克风阵列基于拾取到的声音信号,分别确定各自与目标声源之间的角度信息,并将各自的角度信息发送给会议终端。可选地,麦克风阵列将自身与目标声源之间的角度信息,发送给其他麦克风阵列,使得每个麦克风阵列都接收到对目标声源的完整角度信息。
在一些实施例中,拾音区域中的突发噪音可能会影响目标声源的定位信息,例如,某一路麦克风阵列拾取到的突发噪音被误认为是目标声源。因此,在获取该多个麦克风阵列的声音信号之后,通过对该多个麦克风阵列的声音信号进行降噪处理,可以避免拾音区域中的突发噪音对定位信息的准确性造成影响。
1202、会议终端基于定位信息,确定目标声源与多个麦克风阵列之间的距离。
在一些实施例中,定位信息包括该多个麦克风阵列与该目标声源之间的角度信息,会议终端基于目标声源相对于各个麦克风阵列的角度以及预先配置的参数,能够确定出目标声源与多个麦克风阵列之间的距离。下面以多个麦克风阵列包括第一麦克风阵列和第二麦克风阵列为例进行说明。
为了便于理解,本申请实施例提供了一种距离确定原理的示意图,如图14所示,基于目标声源1401相对于第一麦克风阵列1402的角度θ 1、目标声源1401相对于第二麦克风阵列1403的角度θ 2以及两个麦克风阵列之间的距离L,能够计算出目标声源偏离两个麦克风阵列的中心连接线的距离Ds;基于Ds、θ 2、θ 2以及L(L=L 1+L 2),则能够确定目标声源与第一麦克风阵列之间的距离d 1以及目标声源与第二麦克风阵列之间的距离d 2。上述计算过程参见公式(4)至公式(8)。
Figure PCTCN2023071517-appb-000002
Figure PCTCN2023071517-appb-000003
Figure PCTCN2023071517-appb-000004
Figure PCTCN2023071517-appb-000005
Figure PCTCN2023071517-appb-000006
需要说明的是,上述过程以第一麦克风阵列和第二麦克风阵列为例进行说明,在包括更多麦克风阵列的情况下,确定距离的过程与上述过程同理,在此不作赘述。
1203、会议终端基于该遥控设备和该多个麦克风阵列之间的距离,确定目标麦克风阵列,该目标麦克风阵列与该目标声源之间的距离满足目标条件。
本步骤参考步骤803,在此不作赘述。
1204、会议终端对来源于目标麦克风阵列的声音信号进行扩音处理。
本步骤参考步骤804,在此不作赘述。
可选地,在目标声源不在拾音区域的有效拾音范围内的情况下,不对来源于目标麦克风阵列的声音信号进行相应的扩音控制;在目标声源在拾音区域的有效拾音范围内的情况下,对来源于目标麦克风阵列的声音信号进行相应的扩音控制。在一些实施例中,在公式(4)中计算出的Ds大于
Figure PCTCN2023071517-appb-000007
(D为有效拾音区域的宽度)的情况下,则认为该目标声源不在拾音区域的有效拾音范围内。本申请实施例提供了一种目标声源不在有效拾音范围内的示意图,如图15所示,目标声源1501相对于第一麦克风阵列1502的角度为θ 1、目标声源1501相对于第二麦克风阵列1503的角度为θ 2,目标声源1501偏离两个麦克风阵列的中心连接线的距离Ds大于有效拾音区域的宽度的一半
Figure PCTCN2023071517-appb-000008
通过上述技术方案,无需手动操作,即可基于对目标声源的定位信息来确定目标声源的目标拾音设备,及时且精准地对目标声源的声音信号进行扩音处理,在提升会议体验的同时,有效提高了声音质量。
图16是本申请实施例提供的一种信号处理装置的结构示意图。如图16所示,该信号处理装置包括:
检测模块1601,用于基于拾音区域的图像,检测该拾音区域中至少一个对象的姿态变化;
信号处理模块1602,用于响应于该至少一个对象中第一对象发生姿态变化,对来源于该第一对象的声音信号进行相应的扩音处理。
在一种可能实施方式中,该检测模块1601包括:
坐标确定单元,用于基于该拾音区域在不同时刻的图像,分别确定该拾音区域内的该不同时刻对应的坐标集合,该坐标集合包括该至少一个对象在该拾音区域中的坐标;
姿态变化确定单元,用于基于该不同时刻对应的坐标集合,确定该拾音区域中至少一个对象的姿态变化。
在一种可能实施方式中,该姿态变化确定单元用于:
基于该不同时刻对应的坐标集合中的坐标,确定目标方差,该目标方差表示该拾音区域中至少一个对象在不同时刻的姿态变化程度;
在该目标方差大于方差阈值的情况下,基于该至少一个对象在该不同时刻的坐标,确定该至少一个对象的姿态变化。
在一种可能实施方式中,该信号处理模块1602包括:
第一处理单元,用于响应于该至少一个对象中第一对象发生姿态变化,结合该拾音区域中的声音信号,对来源于该第一对象的声音信号进行相应的扩音处理。
在一种可能实施方式中,该第一处理单元用于:
响应于该至少一个对象中第一对象发生了第一姿态变化,在该拾音区域中的声音信号的音量大于或等于音量阈值的情况下,对来源于该第一对象的声音信号进行扩音处理;
响应于该至少一个对象中第一对象发生了该第一姿态变化,在该拾音区域中的声音信号的音量小于音量阈值的情况下,对来源于该第一对象的声音信号不进行扩音处理。
在一种可能实施方式中,该第一处理单元用于:
响应于该至少一个对象中第一对象发生了第一姿态变化,对该拾音区域中的声音信号进行人声检测,在检测到人声的情况下,对来源于该第一对象的声音信号进行扩音处理;
响应于该至少一个对象中第一对象发生了第一姿态变化,对该拾音区域中的声音信号进 行人声检测,在未检测到人声的情况下,对来源于该第一对象的声音信号不进行扩音处理。
在一种可能实施方式中,该信号处理模块1602包括:
位置获取单元,用于响应于该至少一个对象中第一对象发生了第一姿态变化,获取该第一对象在该拾音区域中的位置;
声源定位单元,用于基于该拾音区域中的声音信号,确定该声音信号的声源位置;
第二处理单元,用于在该第一对象位于该声源位置时,对来源于该第一对象的声音信号进行扩音处理。
在一种可能实施方式中,该信号处理模块1602用于:
响应于该至少一个对象中第一对象发生了第二姿态变化,对来源于该第一对象的声音信号不进行扩音处理。
通过上述技术方案,能够根据检测出的拾音区域中对象的姿态变化,及时且精准地判断场景中的扩音需求,进而按照扩音需求对声音信号进行相应的扩音控制,有效提升了声音质量。
需要说明的是:上述实施例提供的信号处理装置在进行信号处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的信号处理装置与信号处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图17是本申请实施例提供的一种信号处理装置的结构示意图。如图17所示,该信号处理装置包括:
确定模块1701,用于从拾音区域的多个拾音设备中确定目标声源的目标拾音设备,该目标拾音设备与该目标声源之间的距离满足目标条件;
处理模块1702,用于对来源于该目标拾音设备的声音信号进行扩音处理。
在一种可能实施方式中,该拾音区域配置有该多个拾音设备和遥控设备,该确定模块1701包括:
距离确定单元,用于基于该遥控设备和该多个拾音设备之间的信号交互,确定该遥控设备和该多个拾音设备之间的距离;
设备确定单元,用于基于该遥控设备和该多个拾音设备之间的距离,确定该目标拾音设备。
在一种可能实施方式中,该距离确定单元用于:
获取该遥控设备与该多个拾音设备之间进行信号交互的时间信息,该时间信息包括该遥控设备记录的交互时间以及该多个拾音设备记录的交互时间;
基于该时间信息,确定该遥控设备和该多个拾音设备之间的距离。
在一种可能实施方式中,该确定模块1701用于:
获取该多个拾音设备对该目标声源的定位信息;
基于该定位信息,确定该目标声源与该多个拾音设备之间的距离;
将与该目标声源之间的距离满足该目标条件的拾音设备,确定为该目标拾音设备。
在一种可能实施方式中,该多个拾音设备为多个麦克风阵列,
该定位信息包括该多个麦克风阵列与该目标声源之间的角度信息。
上述技术方案中,通过确定目标声源的目标拾音设备,能够及时且精准地对目标声源的 声音信号进行扩音控制,有效提高了声音质量。
需要说明的是:上述实施例提供的信号处理装置1700在进行信号处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的信号处理装置与信号处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例提供了一种信号处理设备。示意性地,参考图18,图18是本申请实施例提供的一种信号处理设备的硬件结构示意图。如图18所示,该信号处理设备1800包括存储器1801、处理器1802、通信接口1803以及总线1804。其中,存储器1801、处理器1802、通信接口1803通过总线1804实现彼此之间的通信连接。
存储器1801可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器1801可以存储至少一段程序代码,当存储器1801中存储的程序代码被处理器1802执行时,使得信号处理设备能够实现上述信号处理方法。存储器1801还可以存储各类数据,包括但不限于图像和声音信号等,本申请实施例对此不作限定。
处理器1802可以是网络处理器(network processor,NP)、中央处理器(central processing unit,CPU)、特定应用集成电路(application-specific integrated circuit,ASIC)或用于控制本申请方案程序执行的集成电路。该处理器1802可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。该处理器1802的数量可以是一个,也可以是多个。通信接口1803使用例如收发器一类的收发模块,来实现信号处理设备1800与其他设备或通信网络之间的通信。例如,可以通过通信接口1803获取数据。
其中,存储器1801和处理器1802可以分离设置,也可以集成在一起。
总线1804可包括在信号处理设备1800各个部件(例如,存储器1801、处理器1802、通信接口1803)之间传送信息的通路。
本发明中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一麦克风可以被称为第二麦克风,并且类似地,第二麦克风可以被称为第一麦克风。第一麦克风和第二麦克风都可以是麦克风,并且在某些情况下,可以是单独且不同的麦克风。
本发明中术语“至少一个”的含义是指一个或多个,本发明中术语“多个”的含义是指两个或两个以上,例如,多个麦克风是指两个或两个以上的麦克风。
以上描述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉 本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以程序产品的形式实现。该程序产品包括一个或多个程序指令。在信号处理设备上加载和执行该程序指令时,全部或部分地产生按照本发明实施例中的流程或功能。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (29)

  1. 一种信号处理方法,其特征在于,所述方法包括:
    基于拾音区域的图像,检测所述拾音区域中至少一个对象的姿态变化;
    响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理。
  2. 根据权利要求1所述的方法,其特征在于,所述基于拾音区域的图像,检测所述拾音区域中至少一个对象的姿态变化包括:
    基于所述拾音区域在不同时刻的图像,分别确定所述拾音区域内的所述不同时刻对应的坐标集合,所述坐标集合包括所述至少一个对象在所述拾音区域中的坐标;
    基于所述不同时刻对应的坐标集合,确定所述拾音区域中至少一个对象的姿态变化。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述不同时刻对应的坐标集合,确定所述拾音区域中至少一个对象的姿态变化包括:
    基于所述不同时刻对应的坐标集合中的坐标,确定目标方差,所述目标方差表示所述拾音区域中至少一个对象在不同时刻的姿态变化程度;
    在所述目标方差大于方差阈值的情况下,基于所述至少一个对象在所述不同时刻的坐标,确定所述至少一个对象的姿态变化。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
    响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理。
  5. 根据权利要求4所述的方法,其特征在于,所述响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
    响应于所述至少一个对象中第一对象发生了第一姿态变化,在所述拾音区域中的声音信号的音量大于或等于音量阈值的情况下,对来源于所述第一对象的声音信号进行扩音处理;
    响应于所述至少一个对象中第一对象发生了所述第一姿态变化,在所述拾音区域中的声音信号的音量小于音量阈值的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
  6. 根据权利要求4所述的方法,其特征在于,所述响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
    响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在检测到人声的情况下,对来源于所述第一对象的声音信号进行扩音处理;
    响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在未检测到人声的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
  7. 根据权利要求4所述的方法,其特征在于,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
    响应于所述至少一个对象中第一对象发生了第一姿态变化,获取所述第一对象在所述拾音区域中的位置;
    基于所述拾音区域中的声音信号,确定所述声音信号的声源位置;
    在所述第一对象位于所述声源位置时,对来源于所述第一对象的声音信号进行扩音处理。
  8. 根据权利要求1所述的方法,其特征在于,所述响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述第一对象的声音信号进行相应的扩音处理包括:
    响应于所述至少一个对象中第一对象发生了第二姿态变化,对来源于所述第一对象的声音信号不进行扩音处理。
  9. 一种信号处理方法,其特征在于,所述方法包括:
    从拾音区域的多个拾音设备中确定目标声源的目标拾音设备,所述目标拾音设备与所述目标声源之间的距离满足目标条件;
    对来源于所述目标拾音设备的声音信号进行扩音处理。
  10. 根据权利要求9所述的方法,其特征在于,所述拾音区域配置有所述多个拾音设备和遥控设备,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备包括:
    基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离;
    基于所述遥控设备和所述多个拾音设备之间的距离,确定所述目标拾音设备。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离包括:
    获取所述遥控设备与所述多个拾音设备之间进行信号交互的时间信息,所述时间信息包括所述遥控设备记录的交互时间以及所述多个拾音设备记录的交互时间;
    基于所述时间信息,确定所述遥控设备和所述多个拾音设备之间的距离。
  12. 根据权利要求9所述的方法,其特征在于,所述从拾音区域的多个拾音设备中确定目标声源的目标拾音设备包括:
    获取所述多个拾音设备对所述目标声源的定位信息;
    基于所述定位信息,确定所述目标声源与所述多个拾音设备之间的距离;
    将与所述目标声源之间的距离满足所述目标条件的拾音设备,确定为所述目标拾音设备。
  13. 根据权利要求12所述的方法,其特征在于,所述多个拾音设备为多个麦克风阵列,所述定位信息包括所述多个麦克风阵列与所述目标声源之间的角度信息。
  14. 一种信号处理装置,其特征在于,所述装置包括:
    检测模块,用于基于拾音区域的图像,检测所述拾音区域中至少一个对象的姿态变化;
    信号处理模块,用于响应于所述至少一个对象中第一对象发生姿态变化,对来源于所述 第一对象的声音信号进行相应的扩音处理。
  15. 根据权利要求14所述的装置,其特征在于,所述检测模块包括:
    坐标确定单元,用于基于所述拾音区域在不同时刻的图像,分别确定所述拾音区域内的所述不同时刻对应的坐标集合,所述坐标集合包括所述至少一个对象在所述拾音区域中的坐标;
    姿态变化确定单元,用于基于所述不同时刻对应的坐标集合,确定所述拾音区域中至少一个对象的姿态变化。
  16. 根据权利要求15所述的装置,其特征在于,所述姿态变化确定单元用于:
    基于所述不同时刻对应的坐标集合中的坐标,确定目标方差,所述目标方差表示所述拾音区域中至少一个对象在不同时刻的姿态变化程度;
    在所述目标方差大于方差阈值的情况下,基于所述至少一个对象在所述不同时刻的坐标,确定所述至少一个对象的姿态变化。
  17. 根据权利要求14至16任一项所述的装置,其特征在于,所述信号处理模块包括:
    第一处理单元,用于响应于所述至少一个对象中第一对象发生姿态变化,结合所述拾音区域中的声音信号,对来源于所述第一对象的声音信号进行相应的扩音处理。
  18. 根据权利要求17所述的装置,其特征在于,所述第一处理单元用于:
    响应于所述至少一个对象中第一对象发生了第一姿态变化,在所述拾音区域中的声音信号的音量大于或等于音量阈值的情况下,对来源于所述第一对象的声音信号进行扩音处理;
    响应于所述至少一个对象中第一对象发生了所述第一姿态变化,在所述拾音区域中的声音信号的音量小于音量阈值的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
  19. 根据权利要求17所述的装置,其特征在于,所述第一处理单元用于:
    响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在检测到人声的情况下,对来源于所述第一对象的声音信号进行扩音处理;
    响应于所述至少一个对象中第一对象发生了第一姿态变化,对所述拾音区域中的声音信号进行人声检测,在未检测到人声的情况下,对来源于所述第一对象的声音信号不进行扩音处理。
  20. 根据权利要求17所述的装置,其特征在于,所述信号处理模块包括:
    位置获取单元,用于响应于所述至少一个对象中第一对象发生了第一姿态变化,获取所述第一对象在所述拾音区域中的位置;
    声源定位单元,用于基于所述拾音区域中的声音信号,确定所述声音信号的声源位置;
    第二处理单元,用于在所述第一对象位于所述声源位置时,对来源于所述第一对象的声音信号进行扩音处理。
  21. 根据权利要求14所述的装置,其特征在于,所述信号处理模块用于:
    响应于所述至少一个对象中第一对象发生了第二姿态变化,对来源于所述第一对象的声 音信号不进行扩音处理。
  22. 一种信号处理装置,其特征在于,所述装置包括:
    确定模块,用于从拾音区域的多个拾音设备中确定目标声源的目标拾音设备,所述目标拾音设备与所述目标声源之间的距离满足目标条件;
    处理模块,用于对来源于所述目标拾音设备的声音信号进行扩音处理。
  23. 根据权利要求22所述的装置,其特征在于,所述拾音区域配置有所述多个拾音设备和遥控设备,所述确定模块包括:
    距离确定单元,用于基于所述遥控设备和所述多个拾音设备之间的信号交互,确定所述遥控设备和所述多个拾音设备之间的距离;
    设备确定单元,用于基于所述遥控设备和所述多个拾音设备之间的距离,确定所述目标拾音设备。
  24. 根据权利要求23所述的装置,其特征在于,所述距离确定单元用于:
    获取所述遥控设备与所述多个拾音设备之间进行信号交互的时间信息,所述时间信息包括所述遥控设备记录的交互时间以及所述多个拾音设备记录的交互时间;
    基于所述时间信息,确定所述遥控设备和所述多个拾音设备之间的距离。
  25. 根据权利要求22所述的装置,其特征在于,所述确定模块用于:
    获取所述多个拾音设备对所述目标声源的定位信息;
    基于所述定位信息,确定所述目标声源与所述多个拾音设备之间的距离;
    将与所述目标声源之间的距离满足所述目标条件的拾音设备,确定为所述目标拾音设备。
  26. 根据权利要求25所述的装置,其特征在于,所述多个拾音设备为多个麦克风阵列,所述定位信息包括所述多个麦克风阵列与所述目标声源之间的角度信息。
  27. 一种信号处理设备,其特征在于,所述信号处理设备包括处理器和存储器,所述存储器用于存储至少一段程序代码,所述至少一段程序代码由所述处理器加载并执行如权利要求1至权利要求13中任一项所述的信号处理方法。
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储至少一段程序代码,所述至少一段程序代码用于执行如权利要求1至权利要求13中任一项所述的信号处理方法。
  29. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至权利要求13中任一项所述的信号处理方法。
PCT/CN2023/071517 2022-01-25 2023-01-10 信号处理方法、装置、设备及存储介质 WO2023143041A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210089027.7 2022-01-25
CN202210089027.7A CN116546409A (zh) 2022-01-25 2022-01-25 信号处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023143041A1 true WO2023143041A1 (zh) 2023-08-03

Family

ID=87452959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071517 WO2023143041A1 (zh) 2022-01-25 2023-01-10 信号处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN116546409A (zh)
WO (1) WO2023143041A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106357871A (zh) * 2016-09-29 2017-01-25 维沃移动通信有限公司 一种扩音方法及移动终端
JP2017201747A (ja) * 2016-05-02 2017-11-09 国立大学法人 筑波大学 信号処理装置、信号処理方法及び信号処理プログラム
CN110035372A (zh) * 2019-04-24 2019-07-19 广州视源电子科技股份有限公司 扩声系统的输出控制方法、装置、扩声系统及计算机设备
CN110166920A (zh) * 2019-04-15 2019-08-23 广州视源电子科技股份有限公司 桌面会议扩音方法、系统、装置、设备以及存储介质
CN110992971A (zh) * 2019-12-24 2020-04-10 达闼科技成都有限公司 一种语音增强方向的确定方法、电子设备及存储介质
CN112148922A (zh) * 2019-06-28 2020-12-29 鸿富锦精密工业(武汉)有限公司 会议记录方法、装置、数据处理设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017201747A (ja) * 2016-05-02 2017-11-09 国立大学法人 筑波大学 信号処理装置、信号処理方法及び信号処理プログラム
CN106357871A (zh) * 2016-09-29 2017-01-25 维沃移动通信有限公司 一种扩音方法及移动终端
CN110166920A (zh) * 2019-04-15 2019-08-23 广州视源电子科技股份有限公司 桌面会议扩音方法、系统、装置、设备以及存储介质
CN110035372A (zh) * 2019-04-24 2019-07-19 广州视源电子科技股份有限公司 扩声系统的输出控制方法、装置、扩声系统及计算机设备
CN112148922A (zh) * 2019-06-28 2020-12-29 鸿富锦精密工业(武汉)有限公司 会议记录方法、装置、数据处理设备及可读存储介质
CN110992971A (zh) * 2019-12-24 2020-04-10 达闼科技成都有限公司 一种语音增强方向的确定方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN116546409A (zh) 2023-08-04

Similar Documents

Publication Publication Date Title
US10848889B2 (en) Intelligent audio rendering for video recording
WO2020192721A1 (zh) 一种语音唤醒方法、装置、设备及介质
CN107534725B (zh) 一种语音信号处理方法及装置
US20210217433A1 (en) Voice processing method and apparatus, and device
US11038704B2 (en) Video conference system
WO2021037129A1 (zh) 一种声音采集方法及装置
US20150022636A1 (en) Method and system for voice capture using face detection in noisy environments
CN108766457B (zh) 音频信号处理方法、装置、电子设备及存储介质
JP2016513410A (ja) マルチチャネルオーディオデータのビデオ解析支援生成
CN110035372B (zh) 扩声系统的输出控制方法、装置、扩声系统及计算机设备
US20230209255A1 (en) Electronic device and control method
CN104424073A (zh) 一种信息处理的方法及电子设备
KR20210035725A (ko) 혼합 오디오 신호를 저장하고 지향성 오디오를 재생하기 위한 방법 및 시스템
CN107450882B (zh) 一种调节声音响度的方法、装置及存储介质
WO2023143041A1 (zh) 信号处理方法、装置、设备及存储介质
CN113409800B (zh) 一种监控音频的处理方法、装置、存储介质及电子设备
CN112015364A (zh) 拾音灵敏度的调整方法、装置
CN113676593B (zh) 视频录制方法、装置、电子设备及存储介质
CN109545217A (zh) 语音信号接收方法、装置、智能终端及可读存储介质
US20210266663A1 (en) Automatic beamforming
CN114038452A (zh) 一种语音分离方法和设备
CN113824916A (zh) 图像显示方法、装置、设备及存储介质
CN113707165A (zh) 音频处理方法、装置及电子设备和存储介质
CN112860067B (zh) 基于麦克风阵列的魔镜调整方法、系统及存储介质
WO2023088156A1 (zh) 一种声速矫正方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745924

Country of ref document: EP

Kind code of ref document: A1