WO2021168620A1 - 声源跟踪控制方法和控制装置、声源跟踪系统 - Google Patents

声源跟踪控制方法和控制装置、声源跟踪系统 Download PDF

Info

Publication number
WO2021168620A1
WO2021168620A1 PCT/CN2020/076462 CN2020076462W WO2021168620A1 WO 2021168620 A1 WO2021168620 A1 WO 2021168620A1 CN 2020076462 W CN2020076462 W CN 2020076462W WO 2021168620 A1 WO2021168620 A1 WO 2021168620A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio segment
sound source
segment
collection circuit
Prior art date
Application number
PCT/CN2020/076462
Other languages
English (en)
French (fr)
Inventor
王建亭
邵喜斌
布占场
孟智明
雷利平
石阳
孙元慧
Original Assignee
京东方科技集团股份有限公司
北京京东方显示技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京京东方显示技术有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202080000167.1A priority Critical patent/CN113631942B/zh
Priority to PCT/CN2020/076462 priority patent/WO2021168620A1/zh
Publication of WO2021168620A1 publication Critical patent/WO2021168620A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements

Definitions

  • the present disclosure relates to the field of information processing, and in particular to a sound source tracking control method and control device, and a sound source tracking system.
  • the first solution is to track the sound source at a fixed position. Personnel turn on the microphone when speaking, and turn off the microphone when not speaking. By monitoring the on-off state of the microphone and controlling the camera to aim at the speaker, the sound source tracking is achieved.
  • the second solution is to combine voice recognition and face recognition. Identify the audio features by detecting the voice, query the face image information of the speaker from the database based on the audio features, and then use the queried face image information to identify the speaker in the current scene, and control the camera to aim Speakers, so as to achieve sound source tracking.
  • a sound source tracking control method including: extracting a first audio segment from first audio information collected by a first audio collecting circuit, and collecting synchronously from a second audio collecting circuit Extract a second audio segment from the second audio information of the According to the first time offset, determine the first distance between the sound source and the first audio collection circuit and the second distance between the sound source and the second audio collection circuit According to the first distance difference, determine the first offset angle of the sound source; adjust the video capture direction of the video capture circuit according to the first offset angle, so that the video capture circuit Quasi the sound source.
  • the determining the first offset angle of the sound source according to the first distance difference includes: using the first distance difference, and the first audio collection circuit and the second The distance between the audio collection circuits determines a first distance parameter; the first offset angle of the sound source is determined according to the ratio of the first distance parameter and the first distance difference.
  • the first time of the first audio segment and the second audio segment is determined according to the deviation between the preset peak value in the first audio segment and the second audio segment
  • the offset includes: according to the first difference between the maximum positive peak sample number in the first audio segment and the maximum positive peak sample number in the second audio segment, between the first audio segment and the first audio segment
  • the corresponding effective positive peak value is selected from the second audio segment, wherein the first audio segment and the second audio segment respectively include multiple sample values; according to the minimum negative peak sample number in the first audio segment and the According to the second difference of the minimum negative peak sample number in the second audio segment, the corresponding effective negative peak is selected in the first audio segment and the second audio segment; according to the first audio segment and the second audio segment
  • the sample sequence number deviation of the corresponding effective positive peak value in the second audio segment, and the sample sequence number deviation of the corresponding effective negative peak value in the first audio segment and the second audio segment determine the first audio segment and the The first sampling clock deviation of the second audio segment; the first time offset is determined according to the first sampling clock deviation
  • the difference between the effective positive peak sample number in the first audio segment and the corresponding effective positive peak sample number in the second audio segment and the difference between the first difference is set in the first preset Within the range; the difference between the effective negative peak sample sequence number in the first audio segment and the corresponding effective negative peak sample sequence number in the second audio segment and the difference between the second difference value are within a second preset range.
  • the above method further includes: determining whether the first sum of the effective positive peak value and the effective negative peak value in the first audio segment or the second audio segment is less than a first preset threshold; If the first sum value is less than the first preset threshold, the video acquisition circuit is controlled to perform panoramic shooting.
  • the above method further includes: if the first sum value is not less than a first preset threshold, determining the number of the effective positive peaks in the first audio segment or the second audio segment Is the same as the number of effective negative peaks; if the number of effective positive peaks in the first audio segment or the second audio segment is the same as the number of effective negative peaks, the second audio segment is further calculated The second sum of the total number of positive peaks and the total number of negative peaks in an audio segment or a second audio segment; in response to the ratio of the first sum to the second sum being greater than a second preset threshold, controlling the Video capture circuit for panoramic shooting.
  • the above method further includes: calculating a third difference between the maximum positive peak sample sequence number in the first audio segment and the minimum negative positive peak sample sequence number; calculating the second audio segment The fourth difference between the largest positive peak sample sequence number and the smallest negative positive peak sample sequence number; in response to the third difference and the fourth difference being the same in sign, and the third difference If the difference between the value and the fourth difference value is within a third preset range, a corresponding effective positive peak value is selected in the first audio segment and the second audio segment.
  • the above method further includes: calculating a fifth difference between the total number of positive peaks in the first audio segment and the total number of positive peaks in the second audio segment, and The third sum of the total number of positive peaks and the total number of positive peaks in the second audio segment; calculating the sixth difference between the total number of negative peaks in the first audio segment and the total number of negative peaks in the second audio segment , And the fourth sum of the total number of negative peaks in the first audio segment and the total number of negative peaks in the second audio segment; in response to the ratio of the fifth difference to the third sum at the first If the ratio of the sixth difference value to the fourth sum value is within the fifth predetermined range, the corresponding one is selected from the first audio segment and the second audio segment Effective positive peak value.
  • the above method further includes: synchronously extracting a third audio segment from the third audio information collected by the third audio collecting circuit, and extracting a fourth audio segment from the fourth audio information collected by the fourth audio collecting circuit.
  • determine the second time offset of the third audio segment and the fourth audio segment determines the second distance difference between the third distance between the sound source and the third audio collection circuit and the fourth distance between the sound source and the fourth audio collection circuit; according to the The second distance difference determines the second offset angle of the sound source; adjusts the video capture direction of the video capture circuit according to the first offset angle and the second offset angle, so that the video capture circuit is aligned The sound source.
  • a sound source tracking control device including: an extraction module configured to extract a first audio segment from first audio information collected by a first audio collection circuit, and synchronously extract a first audio segment from The second audio segment is extracted from the second audio information collected by the second audio collection circuit; the time offset determination module is configured to determine the difference between the preset peak values in the first audio segment and the second audio segment Deviation, determining the first time offset of the first audio segment and the second audio segment; the distance difference determining module is configured to determine that the sound source is away from the first time offset according to the first time offset The first distance difference between the first distance of the audio collection circuit and the second distance between the sound source and the second audio collection circuit; the offset angle determination module is configured to determine the The first offset angle of the sound source; the direction adjustment module is configured to adjust the video capture direction of the video capture circuit according to the first offset angle, so that the video capture circuit is aligned with the sound source.
  • a sound source tracking control device including: a memory configured to store instructions; a processor coupled to the memory, and the processor is configured to execute the implementation based on instructions stored in the memory as described above The method described in any embodiment.
  • a sound source tracking system including the sound source tracking control device as described in any of the above embodiments, and: a video capture circuit configured to follow the sound source tracking control device The control adjusts the video collection direction; a first audio collection circuit and a second audio collection circuit, wherein the first audio collection circuit and the second audio collection circuit are symmetrically arranged on both sides of the video collection circuit.
  • the ratio of the distance from the sound source to the video collection circuit to the distance from the first audio collection circuit to the second audio collection circuit is greater than a preset distance threshold.
  • the tracking system further includes: an analog-to-digital converter for performing analog-to-digital conversion on the audio signal collected by the first audio collecting circuit to generate first audio information, and performing analog-to-digital conversion on the audio signal collected by the second audio collecting circuit Analog-to-digital conversion to generate second audio information;
  • the video capture circuit includes: a direction control platform and a camera arranged on the direction control platform, the direction control platform is configured to follow the control of the sound source tracking control device Adjust the direction.
  • a computer-readable storage medium wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, a method related to any of the above-mentioned embodiments is implemented.
  • Fig. 1 is a schematic flowchart of a sound source tracking control method according to an embodiment of the present disclosure
  • Fig. 2 is a schematic flowchart of a method for calculating a time offset according to an embodiment of the present disclosure
  • Fig. 3 is a schematic diagram of a hyperbolic model according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a sound source tracking control method according to another embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a method for calculating a time offset according to another embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of a sound source tracking control device according to an embodiment of the present disclosure
  • Fig. 7 is a schematic structural diagram of a sound source tracking control device according to an embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of a sound source tracking system according to an embodiment of the present disclosure.
  • Fig. 9 is a schematic structural diagram of a sound source tracking system according to another embodiment of the present disclosure.
  • Fig. 10 is a schematic structural diagram of a sound source tracking system according to another embodiment of the present disclosure.
  • the second related technology due to the need for voice recognition and face recognition, the calculation cost is high.
  • the recognition rate of voice recognition and face recognition also affects the accuracy of sound source tracking.
  • the present disclosure proposes a solution that can easily and quickly implement sound source tracking.
  • Fig. 1 is a schematic flowchart of a sound source tracking control method according to an embodiment of the present disclosure. In some embodiments, the following steps of the sound source tracking control method are executed by the sound source tracking control device.
  • step 101 the first audio segment is extracted from the first audio information collected by the first audio collection circuit, and the second audio segment is synchronously extracted from the second audio information collected by the second audio collection circuit.
  • the first audio collection circuit and the second audio collection circuit are pickups.
  • the duration of the first audio segment and the second audio segment is 50-100ms.
  • the first audio collection circuit and the second audio collection circuit are symmetrically arranged on both sides of the video collection circuit.
  • the distance from the video acquisition circuit to the first audio acquisition circuit is the same as the distance from the video acquisition circuit to the second audio acquisition circuit.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on the first straight line.
  • the video capture circuit includes a direction control platform and a camera arranged on the direction control platform.
  • the direction control platform is PTZ.
  • the control parameters are sent to the direction control platform to adjust the direction of the direction control platform, thereby adjusting the video capture direction of the camera.
  • the communication protocol used is UART (Universal Asynchronous Receiver/Transmitter, Universal Asynchronous Receiver/Transmitter) protocol.
  • the first straight line is a horizontal direction.
  • the first audio collecting circuit and the second audio collecting circuit are respectively arranged on the left and right sides of the video collecting circuit. Analog-to-digital conversion is performed on the audio signal collected by the first audio collection circuit to generate first audio information, and the analog-to-digital conversion is performed on the audio signal collected by the second audio collection circuit to generate second audio information.
  • step 102 the first time offset between the first audio segment and the second audio segment is determined according to the deviation between the preset peak values in the first audio segment and the second audio segment.
  • Fig. 2 is a schematic flowchart of a method for calculating a time offset according to an embodiment of the present disclosure. In some embodiments, the following steps of the time offset calculation method are executed by the sound source tracking control device.
  • step 201 the largest positive peak sample sequence number and the smallest negative peak sample sequence number in the first audio segment, and the largest positive peak sample sequence number and the smallest negative peak sample sequence number in the second audio segment are identified.
  • first audio segment and the second audio segment respectively include multiple sample values.
  • the first audio segment and the second audio segment are identified, it can also be detected whether the first audio segment and the second audio segment correspond.
  • the largest positive peak sample number is L max
  • the smallest negative positive peak sample number is L min
  • the largest positive peak sample number is R max
  • the smallest negative positive peak sample number is R min .
  • ⁇ 1 is the preset threshold.
  • the total number of positive peaks in the first audio segment is L Ptotal
  • the total number of negative peaks in the first audio segment is L ntotal
  • the total number of positive peaks in the second audio segment is R Ptotal
  • the negative peaks in the second audio segment The total is R ntotal .
  • ⁇ 1 and ⁇ 2 are preset thresholds. ⁇ 1 and ⁇ 2 may be the same or different.
  • the positions of the largest positive peak and the smallest negative positive peak in the first audio segment and the second audio segment correspond, and the total number of positive peaks and the total number of negative peaks in the first audio segment and the second audio segment are within a reasonable range, This can ensure the calculation accuracy of the time offset. If the positions of the largest positive peak and the smallest negative positive peak in the first audio segment and the second audio segment do not correspond, or the total number of positive peaks and the total number of negative peaks in the first audio segment and the second audio segment are not within a reasonable range, It indicates that the first audio segment and the second audio segment are interfered by the outside world. In this case, it is necessary to re-extract the first audio segment from the first audio information collected by the first audio collection circuit, and simultaneously re-extract the second audio segment from the second audio information collected by the second audio collection circuit.
  • step 202 the effective positive peak value and the effective negative peak value in the first audio segment and the second audio segment are obtained.
  • the corresponding valid one is selected from the first audio segment and the second audio segment.
  • Positive peak According to the difference between the smallest negative peak sample number in the first audio segment and the smallest negative peak sample number in the second audio segment, the corresponding effective negative peak is selected in the first audio segment and the second audio segment.
  • Li is the sample number of the i-th effective positive peak DL i in the first audio segment
  • R j is the sample number of the j-th effective positive peak DR j in the second audio segment.
  • the maximum positive peak sample number in the first audio segment is L max
  • the maximum positive peak sample number in the second audio segment is R max .
  • Li is the sampling sequence of the i-th effective negative peak DL i in the first audio segment
  • R j is the sampling sequence of the j-th effective negative peak DR j in the second audio segment. If the minimum negative peak sample number in the first audio segment is L min , and the minimum negative peak sample number in the second audio segment is R min , the following formula (6) holds when DL i corresponds to DR j.
  • ⁇ 1 and ⁇ 2 are preset thresholds. ⁇ 1 and ⁇ 2 may be the same or different.
  • the first audio segment is determined according to the sample sequence number deviation of the corresponding effective positive peak value in the first audio segment and the second audio segment, and the sample sequence number deviation of the corresponding effective negative peak value in the first audio segment and the second audio segment.
  • sampling sequence deviation represents the number of sampling clocks between corresponding positive peaks or corresponding negative peaks. Therefore, the deviation of the first sampling clock between the first audio segment and the second audio segment can be determined by using the deviation of the sampling sequence number.
  • the sample sequence number deviations of the corresponding effective positive peaks in the first audio segment and the second audio segment and the sample sequence number deviations of the corresponding effective negative peaks in the first audio segment and the second audio segment, you can pass Calculate the arithmetic mean, geometric mean, or standard deviation value to determine the first sampling clock deviation between the first audio segment and the second audio segment.
  • the sampling sequence number deviation of the i-th effective peak value in the first audio segment and the corresponding j-th effective peak value in the second audio segment is ⁇ i.
  • the standard deviation M1 of the deviation of the sampling sequence number is calculated by using the following formula (7) as the deviation of the first sampling clock between the first audio segment and the second audio segment.
  • the first time offset is determined according to the first sampling clock deviation and the sampling conversion frequency.
  • the effective positive peak value M Vaiid and the effective positive peak value M Vaiid in the first audio segment or the second audio segment can be further determined. Does the negative peak value N Vaiid satisfy the following formula (9)?
  • D1 is the preset threshold. If the above formula (9) is established, it indicates that there are too few effective peaks in the first audio segment and the second audio segment. This is usually caused by the silence of the current scene. In this case, control the video capture circuit to perform panoramic shooting.
  • the video capture direction of the video capture circuit is perpendicular to the plane where the first audio capture circuit and the second audio capture circuit are located. Therefore, the video capture circuit can fully cover the current scene.
  • L Ptotal is the total number of positive peaks in the first audio segment
  • L ntotal is the total number of negative peaks in the first audio segment
  • R Ptotal is the total number of positive peaks in the second audio segment
  • R ntotal is the total number of positive peaks in the second audio segment.
  • step 103 according to the first time offset, the first distance difference between the first distance between the sound source and the first audio collection circuit and the second distance between the sound source and the second audio collection circuit is determined.
  • the first distance difference a1 is calculated using formula (11).
  • step 104 the first offset angle of the sound source is determined according to the first distance difference.
  • Fig. 3 is a schematic diagram of a hyperbolic model according to an embodiment of the present disclosure.
  • F1 is the first audio collection circuit
  • F2 is the second audio collection circuit
  • P is the speaker
  • O is the video collection circuit.
  • the distance between F1 and F2 (for example, 10-30 cm) is less than the distance between the video capture circuit and the speaker (for example, 2-5 meters), so the hyperbolic asymptotic equation can be adopted to solve the problem.
  • the ratio of the distance D from the sound source to the video collection circuit to the distance d from the first audio collection circuit to the second audio collection circuit is greater than a preset distance threshold. If the value of D/d is greater than the preset distance threshold, it indicates that the distance between the video capture circuit and the speaker is sufficiently large relative to the distance between F1 and F2. In this case, it is suitable for the hyperbolic model.
  • the preset distance threshold is 5.
  • the first distance difference a1
  • the corresponding hyperbolic equation is shown in the following formula (12).
  • c is the distance between F1 and F2, and the distance parameter b satisfies the following formula (13).
  • the first deviation angle of the sound source is obtained according to the slope of the asymptote.
  • the first offset angle ⁇ 1 is calculated using the following formula (15).
  • step 105 the video capture direction of the video capture circuit is adjusted according to the first offset angle, so that the video capture circuit is aligned with the sound source.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on a first straight line, and the first straight line is a horizontal direction.
  • the sound source tracking control device uses the first offset angle to control the deflection angle of the video capture circuit in the left and right directions. Therefore, the sound source tracking can be realized on the horizontal plane.
  • the offset angle of the sound source is determined by using the distance difference between the sound source to reach the first audio collecting circuit and the second audio collecting circuit.
  • the direction of the video acquisition circuit is adjusted according to the determined offset angle, so as to be able to aim at the sound source for shooting, so as to easily and quickly realize the sound source tracking.
  • Fig. 4 is a schematic flowchart of a sound source tracking control method according to another embodiment of the present disclosure. In some embodiments, the following steps of the sound source tracking control method are executed by the sound source tracking control device.
  • step 401 extract the first audio segment from the first audio information collected by the first audio collection circuit, and synchronously extract the second audio segment from the second audio information collected by the second audio collection circuit, and synchronously extract the second audio segment from the third audio information collected by the second audio collection circuit.
  • the third audio segment is extracted from the third audio information collected by the audio collection circuit
  • the fourth audio segment is synchronously extracted from the fourth audio information collected by the fourth audio collection circuit.
  • the first audio collection circuit to the fourth audio collection circuit are pickups.
  • the duration from the first audio segment to the fourth audio segment is 50-100ms.
  • the first audio collection circuit and the second audio collection circuit are symmetrically arranged on both sides of the video collection circuit.
  • the distance from the video acquisition circuit to the first audio acquisition circuit is the same as the distance from the video acquisition circuit to the second audio acquisition circuit.
  • the third audio collecting circuit and the fourth audio collecting circuit are symmetrically arranged on the other two sides of the video collecting circuit.
  • the distance from the video acquisition circuit to the third audio acquisition circuit is the same as the distance from the video acquisition circuit to the fourth audio acquisition circuit.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on the first straight line.
  • the third audio collection circuit, the fourth audio collection circuit and the video collection circuit are located on the second straight line.
  • the first straight line is perpendicular to the second straight line.
  • the video capture circuit includes a direction control platform and a camera arranged on the direction control platform.
  • the direction control platform is PTZ.
  • the control parameters are sent to the direction control platform to adjust the direction of the direction control platform, thereby adjusting the video capture direction of the camera.
  • the communication protocol used is the UART protocol.
  • the first straight line is a horizontal direction.
  • the first audio collecting circuit and the second audio collecting circuit are respectively arranged on the left and right sides of the video collecting circuit.
  • the second straight line is the vertical direction.
  • the third audio collecting circuit and the fourth audio collecting circuit are respectively arranged on the upper and lower sides of the video collecting circuit. Perform analog-to-digital conversion on the audio signal collected by the first audio collection circuit to generate first audio information, perform analog-to-digital conversion on the audio signal collected by the second audio collection circuit to generate second audio information, and perform analog-to-digital conversion on the audio signal collected by the third audio collection circuit.
  • the audio signal is subjected to analog-to-digital conversion to generate third audio information
  • the audio signal collected by the fourth audio collecting circuit is subjected to analog-to-digital conversion to generate fourth audio information.
  • step 402 the first time offset between the first audio segment and the second audio segment is determined according to the deviation between the preset peak values in the first audio segment and the second audio segment, and the first time offset is determined according to the third audio segment and the fourth audio segment.
  • the deviation between the preset peak values in the audio segment determines the second time offset between the third audio segment and the fourth audio segment.
  • the time offset calculation method described in any of the embodiments in FIG. 2 is used to calculate the first time offset
  • the time offset calculation described in any of the embodiments in FIG. 5 below is used to calculate the first time offset. The method calculates the second time offset.
  • Fig. 5 is a schematic flowchart of a method for calculating a time offset according to another embodiment of the present disclosure. In some embodiments, the following steps of the time offset calculation method are executed by the sound source tracking control device.
  • step 501 the largest positive peak sample sequence number and the smallest negative peak sample sequence number in the third audio segment, and the largest positive peak sample sequence number and the smallest negative peak sample sequence number in the fourth audio segment are identified.
  • the third audio segment and the fourth audio segment respectively include multiple sample values.
  • the third audio segment and the fourth audio segment after identifying the third audio segment and the fourth audio segment, it can also be detected whether the third audio segment and the fourth audio segment correspond.
  • the maximum positive peak sample number is U max
  • the minimum negative positive peak sample number is U min
  • the largest positive peak sample number is D max
  • the smallest negative positive peak sample number is D min .
  • ⁇ 2 is the preset threshold.
  • the total number of positive peaks in the third audio segment is U Ptotal
  • the total number of negative peaks in the third audio segment is Untotal
  • the total number of positive peaks in the fourth audio segment is D Ptotal
  • the negative peaks in the fourth audio segment The total is D ntotal .
  • ⁇ 3 and ⁇ 4 are preset thresholds. ⁇ 3 and ⁇ 4 may be the same or different.
  • the positions of the largest positive peak and the smallest negative positive peak in the third audio segment and the fourth audio segment correspond, and the total number of positive peaks and the total number of negative peaks in the third audio segment and the fourth audio segment are within a reasonable range, This can ensure the calculation accuracy of the time offset. If the positions of the largest positive peaks and the smallest negative positive peaks in the third audio segment and the fourth audio segment do not correspond, or the total number of positive peaks and the total number of negative peaks in the third audio segment and the fourth audio segment are not within a reasonable range, It indicates that the third audio segment and the fourth audio segment are interfered by the outside world.
  • the third audio segment is synchronously extracted from the third audio information collected by the third audio collection circuit
  • the fourth audio segment is synchronously extracted from the fourth audio information collected by the fourth audio collection circuit.
  • step 502 the effective positive peak value and the effective negative peak value in the third audio segment and the fourth audio segment are obtained.
  • the corresponding valid one is selected from the third audio segment and the fourth audio segment.
  • Positive peak According to the difference between the smallest negative peak sample number in the third audio segment and the smallest negative peak sample number in the fourth audio segment, the corresponding effective negative peak is selected in the third audio segment and the fourth audio segment.
  • U i is the sample number of the i-th effective positive peak DU i in the third audio segment
  • D j is the sample number of the j-th effective positive peak DD j in the fourth audio segment.
  • the maximum positive peak sample number in the third audio segment is U max
  • the maximum positive peak sample number in the fourth audio segment is D max .
  • U i is the sampling sequence of the i-th effective negative peak DU i in the third audio segment
  • D j is the sampling sequence of the j-th effective negative peak DD j in the fourth audio segment. If the smallest negative peak sample number in the third audio segment is U min and the smallest negative peak sample number in the fourth audio segment is D min , the following formula (21) holds true when DU i corresponds to DD j.
  • ⁇ 3 and ⁇ 4 are preset thresholds. ⁇ 3 and ⁇ 4 may be the same or different.
  • the third audio segment is determined according to the sample sequence number deviation of the corresponding effective positive peaks in the third audio segment and the fourth audio segment, and the sample sequence number deviations of the corresponding effective negative peaks in the third audio segment and the fourth audio segment.
  • the sample sequence number deviations of the corresponding effective positive peaks in the third audio segment and the fourth audio segment and the sample sequence number deviations of the corresponding effective negative peaks in the third audio segment and the fourth audio segment, you can pass Calculate the arithmetic mean, geometric mean, or standard deviation value to determine the second sampling clock deviation between the third audio segment and the fourth audio segment.
  • the sampling sequence number deviation between the i-th effective peak in the third audio segment and the corresponding j-th effective peak in the fourth audio segment is ⁇ i.
  • the standard deviation M2 of the deviation of the sampling sequence number is calculated by using the following formula (22) as the second sampling clock deviation of the third audio segment and the fourth audio segment.
  • step 504 the second time offset is determined according to the second sampling clock deviation and the sampling conversion frequency.
  • the effective positive peak value M Vaild and the effective positive peak value in the third audio segment or the fourth audio segment can be further determined. Does the negative peak value N Vaild satisfy the following formula (24)?
  • D3 is the preset threshold. If the above formula (24) is established, it indicates that there are too few effective peaks in the third audio segment and the fourth audio segment. This is usually caused by the silence of the current scene. In this case, control the video capture circuit to perform panoramic shooting.
  • D4 is the preset threshold
  • U Ptotal is the total number of positive peaks in the third audio segment
  • Untotal is the total number of negative peaks in the third audio segment
  • D Ptotal is the total number of positive peaks in the fourth audio segment
  • D ntotal is the total number of positive peaks in the fourth audio segment.
  • step 403 the first distance difference between the first distance between the sound source and the first audio collection circuit and the second distance between the sound source and the second audio collection circuit is determined according to the first time offset, and according to the second time offset Determine the second distance difference between the third distance between the sound source and the third audio collection circuit and the fourth distance between the sound source and the fourth audio collection circuit.
  • the first distance difference a1 is calculated using the above formula (11).
  • formula (26) is used to calculate the second distance difference a2.
  • the propagation speed v of sound in the air is 340 m/s.
  • step 404 the first offset angle of the sound source is determined according to the first distance difference, and the second offset angle of the sound source is determined according to the second distance difference.
  • the first offset angle ⁇ 1 is calculated using the above formula (15).
  • the corresponding hyperbolic equation is shown in the following formula (27).
  • c is the distance between the third audio collection circuit and the fourth audio collection circuit, and the distance parameter b satisfies the following formula (28).
  • the second offset angle of the sound source is obtained according to the slope of the asymptote.
  • the second offset angle ⁇ 2 is calculated using the following formula (30).
  • step 405 the video capture direction of the video capture circuit is adjusted according to the first offset angle and the second offset angle, so that the video capture circuit is aligned with the sound source.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on a first straight line, and the first straight line is a horizontal direction.
  • the third audio collection circuit, the fourth audio collection circuit, and the video collection circuit are located on the fourth straight line, and the second straight line is the vertical direction.
  • the first offset angle can be used to control the deflection angle of the video capture circuit in the left and right directions
  • the second offset angle can be used to control the deflection angle of the video capture circuit in the up and down directions. Therefore, sound source tracking can be achieved in three-dimensional space.
  • Fig. 6 is a schematic structural diagram of a sound source tracking control device according to an embodiment of the present disclosure.
  • the sound source tracking control device includes an extraction module 61, a time offset determination module 62, a distance difference determination module 63, an offset angle determination module 64 and a direction adjustment module 65.
  • the extraction module 61 extracts the first audio segment from the first audio information collected by the first audio collection circuit, and synchronously extracts the second audio segment from the second audio information collected by the second audio collection circuit.
  • the time offset determination module 62 determines the first time offset of the first audio segment and the second audio segment according to the deviation between the preset peak values in the first audio segment and the second audio segment.
  • the time offset determination module 62 calculates the first time offset between the first audio segment and the second audio segment using the process shown in FIG. 2 described above.
  • the distance difference determining module 63 determines the first distance difference between the first distance between the sound source and the first audio collection circuit and the second distance between the sound source and the second audio collection circuit according to the first time offset.
  • the offset angle determination module 64 determines the first offset angle of the sound source according to the first distance difference.
  • the offset angle determination module 64 uses the above formula (15) to calculate the first offset angle of the sound source.
  • the direction adjustment module 65 adjusts the video capture direction of the video capture circuit according to the first offset angle, so that the video capture circuit is aligned with the sound source.
  • the first audio collection circuit and the second audio collection circuit are symmetrically arranged on both sides of the video collection circuit.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on the first straight line. If the first straight line is the horizontal direction, the sound source tracking control device uses the first offset angle to control the deflection angle of the video capture circuit in the left and right directions. Therefore, sound source tracking can be achieved on the horizontal plane.
  • the extraction module 61 extracts the first audio segment from the first audio information collected by the first audio collection circuit, and synchronously extracts the second audio segment from the second audio information collected by the second audio collection circuit, The third audio segment is synchronously extracted from the third audio information collected by the third audio collecting circuit, and the fourth audio segment is synchronously extracted from the fourth audio information collected by the fourth audio collecting circuit.
  • the time offset determination module 62 determines the first time offset of the first audio segment and the second audio segment according to the deviation between the preset peak values in the first audio segment and the second audio segment. The time offset determination module 62 also determines the second time offset of the third audio segment and the third audio segment according to the deviation between the preset peak values in the third audio segment and the fourth audio segment.
  • the time offset determination module 62 calculates the first time offset between the first audio segment and the second audio segment using the process shown in FIG. 2 described above.
  • the time offset determination module 62 calculates the second time offset of the third audio segment and the fourth audio segment by using the process shown in FIG. 5 above.
  • the distance difference determining module 63 determines the first distance difference between the first distance between the sound source and the first audio collection circuit and the second distance between the sound source and the second audio collection circuit according to the first time offset. The distance difference determining module 63 also determines the second distance difference between the third distance of the sound source from the third audio collecting circuit and the fourth distance of the sound source from the fourth audio collecting circuit according to the second time offset.
  • the offset angle determination module 64 determines the first offset angle of the sound source according to the first distance difference. The offset angle determination module 64 also determines the second offset angle of the sound source according to the second distance difference.
  • the offset angle determination module 64 uses the above formula (15) to calculate the first offset angle of the sound source.
  • the offset angle determination module 64 uses the above formula (30) to calculate the second offset angle of the sound source.
  • the direction adjustment module 65 adjusts the video capture direction of the video capture circuit according to the first offset angle and the second offset angle, so that the video capture circuit is aligned with the sound source.
  • the first audio collection circuit and the second audio collection circuit are symmetrically arranged on both sides of the video collection circuit.
  • the third audio collecting circuit and the fourth audio collecting circuit are symmetrically arranged on the other two sides of the video collecting circuit.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on the first straight line.
  • the third audio collection circuit, the fourth audio collection circuit and the video collection circuit are located on the second straight line.
  • the first straight line is perpendicular to the second straight line.
  • the sound source tracking control device uses the first offset angle to control the deflection angle of the video capture circuit in the left and right directions, and the second offset angle to control the video capture The deflection angle of the circuit in the up and down direction. Therefore, the sound source tracking can be realized on the horizontal plane.
  • Fig. 7 is a schematic structural diagram of a sound source tracking control device according to an embodiment of the present disclosure. As shown in FIG. 7, the sound source tracking control device includes a memory 701 and a processor 702.
  • the memory 701 is used to store instructions, and the processor 702 is coupled to the memory 701, and the processor 702 is configured to execute the method involved in any one of the embodiments in FIG. 1, FIG. 2, FIG. 4, and FIG. 5 based on the execution of instructions stored in the memory.
  • the sound source tracking control device further includes a communication interface 703, which is used to exchange information with other devices.
  • the sound source tracking control device also includes a bus 704, a processor 702, a communication interface 703, and a memory 701 communicate with each other through the bus 704.
  • the memory 701 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the memory 701 may also be a memory array.
  • the memory 701 may also be divided into blocks, and the blocks may be combined into a virtual volume according to certain rules.
  • processor 702 may be a central processing unit CPU, or may be an application specific integrated circuit (ASIC for short), or configured as one or more integrated circuits for implementing the embodiments of the present disclosure.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the present disclosure also relates to a computer-readable storage medium, in which the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the instructions involved in any one of the embodiments shown in FIG. 1, FIG. 2, FIG. 4, and FIG. 5 are realized. method.
  • Fig. 8 is a schematic structural diagram of a sound source tracking system according to an embodiment of the present disclosure.
  • the sound source tracking system includes a first audio acquisition circuit 811, a second audio acquisition circuit 812, a sound source tracking control device 82 and a video acquisition circuit 83.
  • the sound source tracking control device 82 is a sound source tracking control device related to any one of the embodiments in FIG. 6 or FIG. 7.
  • the first audio collection circuit 811 and the second audio collection circuit 812 are symmetrically arranged on both sides of the video collection circuit 73.
  • the distance from the video acquisition circuit to the first audio acquisition circuit is the same as the distance from the video acquisition circuit to the second audio acquisition circuit.
  • the first audio collection circuit, the second audio collection circuit, and the video collection circuit are located on the first straight line.
  • the first audio collection circuit 811 and the second audio collection circuit 812 are microphones.
  • the first straight line is a horizontal direction.
  • the sound source tracking control device 82 uses the calculated first offset angle to control the left and right deflection angle of the video capture circuit 83, so as to realize the sound source tracking on the horizontal plane.
  • Fig. 9 is a schematic structural diagram of a sound source tracking system according to another embodiment of the present disclosure.
  • the video capture circuit 83 includes a direction control platform 831 and a camera 832 provided on the direction control platform 831.
  • the direction control platform 831 is a pan-tilt.
  • the sound source tracking control device 82 sends control parameters to the direction control platform 831 by using the communication protocol supported by the direction control platform 831 to adjust the direction of the direction control platform 831 to adjust the video capture direction of the camera 832.
  • the communication protocol used is UART protocol
  • the sound source tracking system further includes an analog-to-digital converter 84.
  • the analog-to-digital converter 84 performs analog-to-digital conversion on the audio signal collected by the first audio collection circuit 811 to generate first audio information.
  • the analog-to-digital converter 84 performs analog-to-digital conversion on the audio signal collected by the second audio collection circuit 812 to generate second audio information.
  • analog-to-digital converter 84 is provided with multiple independent conversion modules. Therefore, the first conversion module in the analog-to-digital converter 84 can be used to perform analog-to-digital conversion on the audio signal collected by the first audio collection circuit 811 to generate first audio information, and the second conversion module in the analog-to-digital converter 84 can be used to The audio signal collected by the second audio collection circuit 812 undergoes analog-to-digital conversion to generate second audio information.
  • the analog-to-digital converter 84 is a pipelined analog-to-digital converter, a successive approximation (successive approximation register, abbreviation: SAR) analog-to-digital converter, or a sigma-delta (Sigma-Delta) analog-to-digital converter. converter.
  • Fig. 10 is a schematic structural diagram of a sound source tracking system according to another embodiment of the present disclosure. The difference between FIG. 10 and FIG. 9 is that in the embodiment shown in FIG. 10, the sound source tracking system further includes a third audio collection circuit 813 and a fourth audio collection circuit 814.
  • the third audio collection circuit 813 and the fourth audio collection circuit 814 are symmetrically arranged on the other two sides of the video collection circuit 83.
  • the distance from the video capture circuit 83 to the third audio capture circuit 813 is the same as the distance from the video capture circuit 83 to the fourth audio capture circuit 814.
  • the first audio collection circuit 811, the second audio collection circuit 812, and the video collection circuit 83 are located on the first straight line.
  • the third audio collection circuit 813, the fourth audio collection circuit 814, and the video collection circuit 83 are located on the second straight line.
  • the first straight line is perpendicular to the second straight line.
  • the first straight line is a horizontal direction
  • the second straight line is a vertical direction.
  • the sound source tracking control device 82 uses the first offset angle to control the deflection angle of the video capture circuit 83 in the left-right direction.
  • the sound source tracking control device 82 uses the second offset angle to control the deflection angle of the video capture circuit 83 in the vertical direction. This enables sound source tracking in a three-dimensional space.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Studio Devices (AREA)

Abstract

一种声源跟踪控制方法和控制装置(82)、声源跟踪系统。声源跟踪控制装置(82)从第一音频采集电路(F1,811)采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路(F2,812)采集的第二音频信息中提取第二音频段(101);根据第一音频段和第二音频段中的预设峰值之间的偏差,确定第一音频段和第二音频段的第一时间偏移量(102);根据第一时间偏移量,确定声源相距第一音频采集电路(F1,811)的第一距离和声源相距第二音频采集电路(F2,812)的第二距离的第一距离差(103);根据第一距离差,确定声源的第一偏移角(104);根据第一偏移角调整视频采集电路(83)的视频采集方向,以便视频采集电路(83)对准声源(105)。

Description

声源跟踪控制方法和控制装置、声源跟踪系统 技术领域
本公开涉及信息处理领域,特别涉及一种声源跟踪控制方法和控制装置、声源跟踪系统。
背景技术
在声源跟踪的相关技术中,第一种方案是针对固定位置的声源跟踪。人员在发言时打开话筒,不发言时关闭话筒。通过监测话筒的开关状态,并控制摄像机对准发言人员,从而实现声源跟踪。第二种方案是将语音识别和人脸识别相结合。通过对语音进行检测以识别出音频特征,根据音频特征从数据库中查询出发言人员的人脸图像信息,进而利用查询出的人脸图像信息在当前场景中识别出发言人员,并控制摄像机对准发言人员,从而实现声源跟踪。
发明内容
根据本公开实施例的第一方面,提供一种声源跟踪控制方法,包括:从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段;根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量;根据所述第一时间偏移量,确定声源相距所述第一音频采集电路的第一距离和所述声源相距所述第二音频采集电路的第二距离的第一距离差;根据所述第一距离差,确定所述声源的第一偏移角;根据所述第一偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
在一些实施例中,所述根据所述第一距离差,确定所述声源的第一偏移角包括:利用所述第一距离差,以及所述第一音频采集电路和所述第二音频采集电路之间的距离确定第一距离参数;根据所述第一距离参数和所述第一距离差的比值确定所述声源的第一偏移角。
在一些实施例中,所述根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量包括:根据所述第一音频段中的最大正峰值采样序号和所述第二音频段中的最大正峰值采样序号的第 一差值,在所述第一音频段和所述第二音频段中选择出对应的有效正峰值,其中所述第一音频段和所述第二音频段中分别包括多个采样值;根据所述第一音频段中的最小负峰值采样序号和所述第二音频段中的最小负峰值采样序号的第二差值,在所述第一音频段和所述第二音频段中选择出对应的有效负峰值;根据所述第一音频段和所述第二音频段中对应的有效正峰值的采样序号偏差,以及所述第一音频段和所述第二音频段中对应的有效负峰值的采样序号偏差,确定所述第一音频段和所述第二音频段的第一采样时钟偏差;根据所述第一采样时钟偏差和采样转换频率确定所述第一时间偏移量。
在一些实施例中,所述第一音频段中的有效正峰值采样序号和所述第二音频段中对应的有效正峰值采样序号之差与所述第一差值的差在第一预设范围内;所述第一音频段中的有效负峰值采样序号和所述第二音频段中对应的有效负峰值采样序号之差与所述第二差值的差在第二预设范围内。
在一些实施例中,上述方法还包括:判断所述第一音频段或所述第二音频段中的有效正峰值和有效负峰值的第一和值是否小于第一预设门限;若所述第一和值小于第一预设门限,则控制所述视频采集电路进行全景拍摄。
在一些实施例中,上述方法还包括:若所述第一和值不小于第一预设门限,则判断所述第一音频段或所述第二音频段中的所述有效正峰值的数量和所述有效负峰值的数量是否相同;在所述第一音频段或所述第二音频段中的所述有效正峰值的数量和所述有效负峰值的数量相同的情况下,进一步计算第一音频段或第二音频段中的正峰值总数和负峰值总数的第二和值;响应于所述第一和值与所述第二和值之比大于第二预设门限,控制所述视频采集电路进行全景拍摄。
在一些实施例中,上述方法还包括:计算所述第一音频段中的所述最大正峰值采样序号和所述最小负正峰值采样序号的第三差值;计算所述第二音频段中的所述最大正峰值采样序号和所述最小负正峰值采样序号的第四差值;响应于所述第三差值和所述第四差值的正负性一致,且所述第三差值和所述第四差值的差在第三预设范围内,则在所述第一音频段和所述第二音频段中选择出对应的有效正峰值。
在一些实施例中,上述方法还包括:计算所述第一音频段中的正峰值总数和所述第二音频段中的正峰值总数的第五差值,以及所述第一音频段中的正峰值总数和所述第二音频段中的正峰值总数的第三和值;计算所述第一音频段中的负峰值总数和所述第二音频段中的负峰值总数的第六差值,以及所述第一音频段中的负峰值总数和所述 第二音频段中的负峰值总数的第四和值;响应于所述第五差值与所述第三和值的比值在第四预定范围内,且所述第六差值与所述第四和值的比值在所述第五预定范围内,则在所述第一音频段和所述第二音频段中选择出对应的有效正峰值。
在一些实施例中,上述方法还包括:同步地从第三音频采集电路采集的第三音频信息中提取第三音频段,从第四音频采集电路采集的第四音频信息中提取第四音频段;根据所述第三音频段和所述第四音频段中的预设峰值之间的偏差,确定所述第三音频段和所述第四音频段的第二时间偏移量;根据所述第二时间偏移量,确定所述声源相距所述第三音频采集电路的第三距离和所述声源相距所述第四音频采集电路的第四距离的第二距离差;根据所述第二距离差,确定所述声源的第二偏移角;根据所述第一偏移角和所述第二偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
根据本公开实施例的第二方面,提供一种声源跟踪控制装置,包括:提取模块,被配置为从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段;时间偏移量确定模块,被配置为根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量;距离差确定模块,被配置为根据所述第一时间偏移量,确定声源相距所述第一音频采集电路的第一距离和所述声源相距所述第二音频采集电路的第二距离的第一距离差;偏移角确定模块,被配置为根据所述第一距离差,确定所述声源的第一偏移角;方向调整模块,被配置为根据所述第一偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
根据本公开实施例的第三方面,提供一种声源跟踪控制装置,包括:存储器,被配置为存储指令;处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如上述任一实施例所述的方法。
根据本公开实施例的第四方面,提供一种声源跟踪系统,包括如上述任一实施例所述声源跟踪控制装置,以及:视频采集电路,被配置为根据所述声源跟踪控制装置的控制调整视频采集方向;第一音频采集电路和第二音频采集电路,其中所述第一音频采集电路和所述第二音频采集电路对称设置在所述视频采集电路的两侧。
在一些实施例中,所述声源到所述视频采集电路的距离与所述第一音频采集电路到第二音频采集电路的距离之比大于预设距离门限。
在一些实施例中,跟踪系统还包括:模数转换器,用于对第一音频采集电路采集 的音频信号进行模数转换以生成第一音频信息,对第二音频采集电路采集的音频信号进行模数转换以生成第二音频信息;所述视频采集电路包括:方向控制平台和设置在所述方向控制平台上的摄像头,所述方向控制平台被配置为根据所述声源跟踪控制装置的控制调整方向。
根据本公开实施例的第五方面,提供一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如上述任一实施例涉及的方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1是根据本公开一个实施例的声源跟踪控制方法的流程示意图;
图2是根据本公开一个实施例的时间偏移量计算方法的流程示意图;
图3是根据本公开一个实施例的双曲线模型示意图;
图4是根据本公开另一个实施例的声源跟踪控制方法的流程示意图;
图5是根据本公开另一个实施例的时间偏移量计算方法的流程示意图;
图6是根据本公开一个实施例的声源跟踪控制装置的结构示意图;
图7是根据本公开一个实施例的声源跟踪控制装置的结构示意图;
图8是根据本公开一个实施例的声源跟踪系统的结构示意图;
图9是根据本公开另一个实施例的声源跟踪系统的结构示意图;
图10是根据本公开又一个实施例的声源跟踪系统的结构示意图。
应当明白,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。此外,相同或类似的参考标号表示相同或类似的构件。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。对示例性实施例的描述仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。本公开可以以许多 不同的形式实现,不限于这里所述的实施例。提供这些实施例是为了使本公开透彻且完整,并且向本领域技术人员充分表达本公开的范围。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、材料的组分和数值应被解释为仅仅是示例性的,而不是作为限制。
本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的部分。“包括”或者“包含”等类似的词语意指在该词前的要素涵盖在该词后列举的要素,并不排除也涵盖其他要素的可能。
本公开使用的所有术语(包括技术术语或者科学术语)与本公开所属领域的普通技术人员理解的含义相同,除非另外特别定义。还应当理解,在诸如通用字典中定义的术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
发明人通过研究发现,在上述第一种相关技术中,人员在发言时打开话筒,不发言时关闭话筒,因此操作较为繁琐。在上述第二种相关技术中,由于需要进行语音识别和人脸识别,因此计算成本高,同时语音识别和人脸识别的识别率也影响到声源跟踪的精度。
据此,本公开提出一种能够方便快捷地实现声源跟踪的方案。
图1是根据本公开一个实施例的声源跟踪控制方法的流程示意图。在一些实施例中,下列的声源跟踪控制方法步骤由声源跟踪控制装置执行。
在步骤101,从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段。
在一些实施例中,第一音频采集电路和第二音频采集电路为拾音器。第一音频段和第二音频段的时长为50-100ms。
在一些实施例中,第一音频采集电路、第二音频采集电路对称设置在视频采集电路的两侧。视频采集电路到第一音频采集电路的距离和视频采集电路到第二音频采集电路的距离相同。例如,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上。
在一些实施例中,视频采集电路包括方向控制平台和设置在方向控制平台上的摄像头。例如,方向控制平台为云台。通过利用方向控制平台支持的通信协议,将控制 参数发送给方向控制平台,以便对方向控制平台的方向进行调节,从而调整摄像头的视频采集方向。例如,所使用的通信协议为UART(Universal Asynchronous Receiver/Transmitter,通用异步收发传输器)协议。
在一些实施例中,第一直线为水平方向。第一音频采集电路和第二音频采集电路分别设置在视频采集电路的左右两侧。对第一音频采集电路采集的音频信号进行模数转换以生成第一音频信息,对第二音频采集电路采集的音频信号进行模数转换以生成第二音频信息。
在步骤102,根据第一音频段和第二音频段中的预设峰值之间的偏差,确定第一音频段和第二音频段的第一时间偏移量。
图2是根据本公开一个实施例的时间偏移量计算方法的流程示意图。在一些实施例中,下列的时间偏移量计算方法步骤由声源跟踪控制装置执行。
在步骤201,识别出第一音频段中的最大正峰值采样序号和最小负峰值采样序号、第二音频段中的最大正峰值采样序号和最小负峰值采样序号。
这里需要说明的是,第一音频段和第二音频段中分别包括多个采样值。
例如,在第一音频段中,对于三个连续的音频数据Data(i)、Data(i+1)和Data(i+2),若Data(i+1)>Data(i)、Data(i+1)>Data(i+2),且|Data(i+1)|>Th,则Data(i+1)为正峰值。Th为预设门限值。
又例如,在第一音频段中,对于三个连续的音频数据Data(i)、Data(i+1)和Data(i+2),若Data(i+1)<Data(i)、Data(i+1)<Data(i+2),且|Data(i+1)|>Th,则Data(i+1)为负峰值。
在一些实施例中,在对第一音频段和第二音频段进行识别处理后,还可检测第一音频段和第二音频段是否对应。
例如,在第一音频段中,最大正峰值采样序号为L max,最小负正峰值采样序号为L min。在第二音频段中,最大正峰值采样序号为R max,最小负正峰值采样序号为R min。若下列公式(1)和公式(2)成立,即:
(L max-L min)(R max-R min)>0                  (1)
|(L max-L min)-(R max-R min)|≤ε1              (2)
则表明第一音频段和第二音频段中的最大正峰值和最小负正峰值的位置是相对应的。ε1为预设门限。
又例如,第一音频段中的正峰值总数为L Ptotal,第一音频段中的负峰值总数为L ntotal,第二音频段中的正峰值总数为R Ptotal,第二音频段中的负峰值总数为R ntotal。若下列公 式(3)和公式(4)成立,即:
Figure PCTCN2020076462-appb-000001
Figure PCTCN2020076462-appb-000002
则表明第一音频段和第二音频段中的正峰值总数和负峰值总数在合理范围内。ρ 1和ρ 2为预设门限。ρ 1和ρ 2可以相同,也可不同。
若第一音频段和第二音频段中的最大正峰值和最小负正峰值的位置相对应,且第一音频段和第二音频段中的正峰值总数和负峰值总数在合理范围内,由此可保证时间偏移量的计算精度。若第一音频段和第二音频段中的最大正峰值和最小负正峰值的位置不相对应,或第一音频段和第二音频段中的正峰值总数和负峰值总数不在合理范围内,则表明第一音频段和第二音频段受到外界的干扰。在这种情况下,需要重新从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中重新提取第二音频段。
在步骤202,获得第一音频段和第二音频段中的有效正峰值和有效负峰值。
在一些实施例中,根据第一音频段中的最大正峰值采样序号和第二音频段中的最大正峰值采样序号的差值,在第一音频段和第二音频段中选择出对应的有效正峰值。根据第一音频段中的最小负峰值采样序号和第二音频段中的最小负峰值采样序号的差值,在第一音频段和第二音频段中选择出对应的有效负峰值。
例如,L i是第一音频段中的第i个有效正峰值DL i的采样序号,R j是第二音频段中的第j个有效正峰值DR j的采样序号。第一音频段中的最大正峰值采样序号为L max,第二音频段中的最大正峰值采样序号为R max,在DL i与DR j相对应的情况下,下列公式(5)成立。
|(L i-R j)-(L max-R max)|≤σ1           (5)
又例如,L i是第一音频段中的第i个有效负峰值DL i的采样序列,R j是第二音频段中的第j个有效负峰值DR j的采样序列。若第一音频段中的最小负峰值采样序号为L min,第二音频段中的最小负峰值采样序号为R min,在DL i与DR j相对应的情况下,下列公式(6)成立。
|(L i-R j)-(L min-R min)|≤σ2           (6)
在上述公式(5)和公式(6)中,σ1和σ2为预设门限。σ1和σ2为可以相同,也可不同。
在利用上述公式(5)在第一音频段和第二音频段中识别对应的有效正峰值的过 程中,若对于第一音频段中的一个正峰值A,无法在第二音频段中查找出相对应的正峰值,则表明正峰值A是因外界干扰而形成的伪峰值。
由此可知,通过利用上述公式(5)和公式(6)识别有效正峰值和有效负峰值的过程也是一个过滤过程,能够有效消除因外界干扰而形成的伪峰值,从而提高第一时间偏移量的精度。
在步骤203,根据第一音频段和第二音频段中对应的有效正峰值的采样序号偏差,以及第一音频段和第二音频段中对应的有效负峰值的采样序号偏差,确定第一音频段和第二音频段的第一采样时钟偏差。
这里需要说明的是,采样序列偏差代表相应正峰值或相应负峰值之间的采样时钟个数。因此通过利用采样序号偏差能够确定出第一音频段和第二音频段的第一采样时钟偏差。
在一些实施例中,针对第一音频段和第二音频段中对应的有效正峰值的采样序号偏差,以及第一音频段和第二音频段中对应的有效负峰值的采样序号偏差,可通过计算算术平均值、几何平均值或标准差值来确定第一音频段和第二音频段的第一采样时钟偏差。
例如,在第一音频段或第二音频段中,有M Vaiid个有效正峰值,以及N Vaiid个有效负峰值。第一音频段中的第i个有效峰值与第二音频段中对应的第j个有效峰值的采样序号偏差为△i。通过利用下列公式(7)计算出采样序号偏差的标准差M1,以作为第一音频段和第二音频段的第一采样时钟偏差。
Figure PCTCN2020076462-appb-000003
在步骤204,根据第一采样时钟偏差和采样转换频率确定第一时间偏移量。
设采样转换频率为f COV,则利用下列公式(8)计算第一时间偏移量t1。
Figure PCTCN2020076462-appb-000004
在一些实施例中,在获得第一音频段和第二音频段中的有效正峰值和有效负峰值后,还可进一步判断第一音频段或第二音频段中的有效正峰值M Vaiid和有效负峰值N Vaiid是否满足下列公式(9)。
M Vaiid+N Vaild<D1               (9)
D1为预设门限。若上述公式(9)成立,则表明第一音频段和第二音频段中的有效峰值太少。这通常是由当前场景处于静默状态所导致的。在这种情况下,控制视频采集电路进行全景拍摄。
这里需要说明的是,视频采集电路在全景拍摄模式下,视频采集电路的视频采集方向与第一音频采集电路和第二音频采集电路所在的平面相垂直。由此视频采集电路能够对当前场景进行全覆盖。
在一些实施例中,在上述公式(9)不成立的情况下,进一步判断第一音频段或第二音频段中的有效正峰值的数量和有效负峰值的数量是否相同。在有效正峰值的数量和有效负峰值的数量相同的情况下,若下列公式(10)成立,即:
Figure PCTCN2020076462-appb-000005
则表明第一音频段和第二音频段中的有效峰值太多。其中D2为预设门限,L Ptotal是第一音频段中的正峰值总数,L ntotal是第一音频段中的负峰值总数,R Ptotal是第二音频段中的正峰值总数,R ntotal是第二音频段中的负峰值总数。若上述公式(10)成立,这通常是多人同时发言所导致的。在这种情况下,通过控制视频采集电路进行全景拍摄。
返回图1。在步骤103,根据第一时间偏移量,确定声源相距第一音频采集电路的第一距离和声源相距第二音频采集电路的第二距离的第一距离差。
由于声音在空气中的传播速度v为340米/秒,因此利用公式(11)计算出第一距离差a1。
Figure PCTCN2020076462-appb-000006
在步骤104,根据第一距离差,确定声源的第一偏移角。
图3是根据本公开一个实施例的双曲线模型示意图。
如图3所示,F1为第一音频采集电路,F2为第二音频采集电路,P为发言人员,坐标原点O处为视频采集电路。F1和F2之间的距离(例如,10-30cm)小于视频采集电路相距发言人员的距离(例如,2-5米),因此可以采取双曲线的渐近线方程来求解。
在一些实施例中,声源到视频采集电路的距离D与第一音频采集电路到第二音频采集电路的距离d的比值大于预设距离门限。若D/d的值大于预设距离门限,则表明相对于F1和F2之间的距离,视频采集电路相距发言人员的距离足够大,在这种情况下适用于双曲线模型。例如,预设距离门限为5。
如图3所示,第一距离差a1=|PF1|-|PF2|。若a1为正值,则P所在轨迹为双曲线右侧。若a1为负值,则P所在轨迹为双曲线右左侧。相应的双曲线方程如下列公式(12)所示。
Figure PCTCN2020076462-appb-000007
这里需要说明的是,c为F1和F2之间的距离,距离参数b满足下列公式(13)。
a1 2+b 2=c 2             (13)
相应的渐近线方程如下列公式(14)所示。
Figure PCTCN2020076462-appb-000008
由此,根据渐近线的斜率获得声源的第一偏移角。例如,利用下列公式(15)计算第一偏移角θ1。
Figure PCTCN2020076462-appb-000009
在步骤105,根据第一偏移角调整视频采集电路的视频采集方向,以便视频采集电路对准声源。
在一些实施例中,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上,第一直线为水平方向。声源跟踪控制装置利用第一偏移角控制视频采集电路在左右方向上的偏转角度。因此能够在水平面上实现声源跟踪。
在本公开上述实施例提供的声源跟踪控制方法中,通过利用声源到达第一音频采集电路和第二音频采集电路的距离差,确定出声源的偏移角。根据所确定的偏移角对视频采集电路进行方向调整,以便能够对准声源进行拍摄,从而方便快捷地实现声源跟踪。
图4是根据本公开另一个实施例的声源跟踪控制方法的流程示意图。在一些实施例中,下列的声源跟踪控制方法步骤由声源跟踪控制装置执行。
在步骤401,从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段,同步地从第三音频采集电路采集的第三音频信息中提取第三音频段,同步地从第四音频采集电路采集的第四音频信息中提取第四音频段。
在一些实施例中,第一音频采集电路至第四音频采集电路为拾音器。第一音频段至第四音频段的时长为50-100ms。
在一些实施例中,第一音频采集电路、第二音频采集电路对称设置在视频采集电路的两侧。视频采集电路到第一音频采集电路的距离和视频采集电路到第二音频采集电路的距离相同。第三音频采集电路、第四音频采集电路对称设置在视频采集电路的另两侧。视频采集电路到第三音频采集电路的距离和视频采集电路到第四音频采集电路的距离相同。例如,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上。第三音频采集电路、第四音频采集电路和视频采集电路位于第二直线上。 第一直线和第二直线垂直。
在一些实施例中,视频采集电路包括方向控制平台和设置在方向控制平台上的摄像头。例如,方向控制平台为云台。通过利用方向控制平台支持的通信协议,将控制参数发送给方向控制平台,以便对方向控制平台的方向进行调节,从而调整摄像头的视频采集方向。例如,所使用的通信协议为UART协议。
在一些实施例中,第一直线为水平方向。第一音频采集电路和第二音频采集电路分别设置在视频采集电路的左右两侧。第二直线为竖直方向。第三音频采集电路和第四音频采集电路分别设置在视频采集电路的上下两侧。对第一音频采集电路采集的音频信号进行模数转换以生成第一音频信息,对第二音频采集电路采集的音频信号进行模数转换以生成第二音频信息,对第三音频采集电路采集的音频信号进行模数转换以生成第三音频信息,对第四音频采集电路采集的音频信号进行模数转换以生成第四音频信息。
在步骤402,根据第一音频段和第二音频段中的预设峰值之间的偏差,确定第一音频段和第二音频段的第一时间偏移量,根据第三音频段和第四音频段中的预设峰值之间的偏差,确定第三音频段和第四音频段的第二时间偏移量。
在一些实施例中,利用上述图2中任一实施例所述的时间偏移量计算方法计算第一时间偏移量,利用下述图5中任一实施例所述的时间偏移量计算方法计算第二时间偏移量。
图5是根据本公开另一个实施例的时间偏移量计算方法的流程示意图。在一些实施例中,下列的时间偏移量计算方法步骤由声源跟踪控制装置执行。
在步骤501,识别出第三音频段中的最大正峰值采样序号和最小负峰值采样序号、第四音频段中的最大正峰值采样序号和最小负峰值采样序号。
这里需要说明的是,第三音频段和第四音频段中分别包括多个采样值。
在一些实施例中,在对第三音频段和第四音频段进行识别处理后,还可检测第三音频段和第四音频段是否对应。
例如,在第三音频段中,最大正峰值采样序号为U max,最小负正峰值采样序号为U min。在第四音频段中,最大正峰值采样序号为D max,最小负正峰值采样序号为D min。若下列公式(16)和公式(17)成立,即:
(U max-U min)(D max-D min)>0             (16)
|(U max-U min)-(D max-D min)|≤ε2         (17)
则表明第三音频段和第四音频段中的最大正峰值和最小负正峰值的位置是相对应的。ε2为预设门限。
又例如,第三音频段中的正峰值总数为U Ptotal,第三音频段中的负峰值总数为U ntotal,第四音频段中的正峰值总数为D Ptotal,第四音频段中的负峰值总数为D ntotal。若下列公式(18)和公式(19)成立,即:
Figure PCTCN2020076462-appb-000010
Figure PCTCN2020076462-appb-000011
则表明第三音频段和第四音频段中的正峰值总数和负峰值总数在合理范围内。ρ 3和ρ 4为预设门限。ρ 3和ρ 4可以相同,也可不同。
若第三音频段和第四音频段中的最大正峰值和最小负正峰值的位置相对应,且第三音频段和第四音频段中的正峰值总数和负峰值总数在合理范围内,由此可保证时间偏移量的计算精度。若第三音频段和第四音频段中的最大正峰值和最小负正峰值的位置不相对应,或第三音频段和第四音频段中的正峰值总数和负峰值总数不在合理范围内,则表明第三音频段和第四音频段受到外界的干扰。在这种情况下,需要重新从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中重新提取第二音频段,同步地从第三音频采集电路采集的第三音频信息中提取第三音频段,同步地从第四音频采集电路采集的第四音频信息中提取第四音频段。
在步骤502,获得第三音频段和第四音频段中的有效正峰值和有效负峰值。
在一些实施例中,根据第三音频段中的最大正峰值采样序号和第四音频段中的最大正峰值采样序号的差值,在第三音频段和第四音频段中选择出对应的有效正峰值。根据第三音频段中的最小负峰值采样序号和第四音频段中的最小负峰值采样序号的差值,在第三音频段和第四音频段中选择出对应的有效负峰值。
例如,U i是第三音频段中的第i个有效正峰值DU i的采样序号,D j是第四音频段中的第j个有效正峰值DD j的采样序号。第三音频段中的最大正峰值采样序号为U max,第四音频段中的最大正峰值采样序号为D max,在DU i与DD j相对应的情况下,下列公式(20)成立。
|(U i-D j)-(U max-D max)|≤σ3           (20)
又例如,U i是第三音频段中的第i个有效负峰值DU i的采样序列,D j是第四音频段中的第j个有效负峰值DD j的采样序列。若第三音频段中的最小负峰值采样序号为U min, 第四音频段中的最小负峰值采样序号为D min,在DU i与DD j相对应的情况下,下列公式(21)成立。
|(U i-D j)-(U min-D min)|≤σ4            (21)
在上述公式(20)和公式(21)中,σ3和σ4为预设门限。σ3和σ4为可以相同,也可不同。
在利用上述公式(20)在第三音频段和第四音频段中识别对应的有效正峰值的过程中,若对于第三音频段中的一个正峰值B,无法在第四音频段中查找出相对应的正峰值,则表明正峰值B是因外界干扰而形成的伪峰值。
由此可知,通过利用上述公式(20)和公式(21)识别有效正峰值和有效负峰值的过程也是一个过滤过程,能够有效消除因外界干扰而形成的伪峰值,从而提高第二时间偏移量的精度。
在步骤503,根据第三音频段和第四音频段中对应的有效正峰值的采样序号偏差,以及第三音频段和第四音频段中对应的有效负峰值的采样序号偏差,确定第三音频段和第四音频段的第二采样时钟偏差。
在一些实施例中,针对第三音频段和第四音频段中对应的有效正峰值的采样序号偏差,以及第三音频段和第四音频段中对应的有效负峰值的采样序号偏差,可通过计算算术平均值、几何平均值或标准差值来确定第三音频段和第四音频段的第二采样时钟偏差。
例如,在第三音频段或第四音频段中,有M Vaild个有效正峰值,以及N Vaild个有效负峰值。第三音频段中的第i个有效峰值与第四音频段中对应的第j个有效峰值的采样序号偏差为△i。通过利用下列公式(22)计算出采样序号偏差的标准差M2,以作为第三音频段和第四音频段的第二采样时钟偏差。
Figure PCTCN2020076462-appb-000012
在步骤504,根据第二采样时钟偏差和采样转换频率确定第二时间偏移量。
设采样转换频率为f COV,则利用下列公式(23)计算第二时间偏移量t2。
Figure PCTCN2020076462-appb-000013
在一些实施例中,在获得第三音频段和第四音频段中的有效正峰值和有效负峰值后,还可进一步判断第三音频段或第四音频段中的有效正峰值M Vaild和有效负峰值N Vaild是否满足下列公式(24)。
M Vaild+N Vaild<D3         (24)
D3为预设门限。若上述公式(24)成立,则表明第三音频段和第四音频段中的有效峰值太少。这通常是由当前场景处于静默状态所导致的。在这种情况下,控制视频采集电路进行全景拍摄。
在一些实施例中,在上述公式(24)不成立的情况下,进一步判断第三音频段或第四音频段中的有效正峰值的数量和有效负峰值的数量是否相同。在有效正峰值的数量和有效负峰值的数量相同的情况下,若下列公式(25)成立,即:
Figure PCTCN2020076462-appb-000014
则表明第三音频段和第四音频段中的有效峰值太多。其中D4为预设门限,U Ptotal是第三音频段中的正峰值总数,U ntotal是第三音频段中的负峰值总数,D Ptotal是第四音频段中的正峰值总数,D ntotal是第四音频段中的负峰值总数。若上述公式(25)成立,这通常是多人同时发言所导致的。在这种情况下,通过控制视频采集电路进行全景拍摄。
返回图4。在步骤403,根据第一时间偏移量确定声源相距第一音频采集电路的第一距离和声源相距第二音频采集电路的第二距离的第一距离差,根据第二时间偏移量确定声源相距第三音频采集电路的第三距离和声源相距第四音频采集电路的第四距离的第二距离差。
在一些实施例中,利用上述公式(11)计算出第一距离差a1。
在一些实施例中,利用公式(26)计算出第二距离差a2。声音在空气中的传播速度v为340米/秒。
Figure PCTCN2020076462-appb-000015
在步骤404,根据第一距离差确定声源的第一偏移角,根据第二距离差确定声源的第二偏移角。
在一些实施例中,利用上述公式(15)计算第一偏移角θ1。
在一些实施例中,根据如图3所示的双曲线模型,相应的双曲线方程如下列公式(27)所示。
Figure PCTCN2020076462-appb-000016
这里需要说明的是,c为第三音频采集电路和第四音频采集电路之间的距离,距离参数b满足下列公式(28)。
a2 2+b 2=c 2             (28)
相应的渐近线方程如下列公式(29)所示。
Figure PCTCN2020076462-appb-000017
由此,根据渐近线的斜率获得声源的第二偏移角。例如,利用下列公式(30)计算第二偏移角θ2。
Figure PCTCN2020076462-appb-000018
在步骤405,根据第一偏移角和第二偏移角调整视频采集电路的视频采集方向,以便视频采集电路对准声源。
在一些实施例中,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上,第一直线为水平方向。第三音频采集电路、第四音频采集电路和视频采集电路位于第四直线上,第二直线为竖直方向。利用第一偏移角可控制视频采集电路在左右方向上的偏转角度,利用第二偏移角可控制视频采集电路在上下方向上的偏转角度。因此能够在三维空间中实现声源跟踪。
图6是根据本公开一个实施例的声源跟踪控制装置的结构示意图。如图6所示,声源跟踪控制装置包括提取模块61、时间偏移量确定模块62、距离差确定模块63、偏移角确定模块64和方向调整模块65。
提取模块61从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段。
时间偏移量确定模块62根据第一音频段和第二音频段中的预设峰值之间的偏差,确定第一音频段和第二音频段的第一时间偏移量。
在一些实施例中,时间偏移量确定模块62利用上述图2所示的流程计算第一音频段和第二音频段的第一时间偏移量。
距离差确定模块63根据第一时间偏移量,确定声源相距第一音频采集电路的第一距离和声源相距第二音频采集电路的第二距离的第一距离差。
偏移角确定模块64根据第一距离差,确定声源的第一偏移角。
在一些实施例中,偏移角确定模块64利用上述公式(15)计算声源的第一偏移角。
方向调整模块65根据第一偏移角调整视频采集电路的视频采集方向,以便视频采集电路对准声源。
在一些实施例中,第一音频采集电路、第二音频采集电路对称设置在视频采集电路的两侧。例如,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上。若第一直线为水平方向,则声源跟踪控制装置利用第一偏移角控制视频采集电路在左右方向上的偏转角度。因此能够在水平面上实现声源跟踪。
在一些实施例中,提取模块61从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段,同步地从第三音频采集电路采集的第三音频信息中提取第三音频段,同步地从第四音频采集电路采集的第四音频信息中提取第四音频段。
时间偏移量确定模块62根据第一音频段和第二音频段中的预设峰值之间的偏差,确定第一音频段和第二音频段的第一时间偏移量。时间偏移量确定模块62还根据第三音频段和第四音频段中的预设峰值之间的偏差,确定第三音频段和第三音频段的第二时间偏移量。
在一些实施例中,时间偏移量确定模块62利用上述图2所示的流程计算第一音频段和第二音频段的第一时间偏移量。时间偏移量确定模块62利用上述图5所示的流程计算第三音频段和第四音频段的第二时间偏移量。
距离差确定模块63根据第一时间偏移量,确定声源相距第一音频采集电路的第一距离和声源相距第二音频采集电路的第二距离的第一距离差。距离差确定模块63还根据第二时间偏移量,确定声源相距第三音频采集电路的第三距离和声源相距第四音频采集电路的第四距离的第二距离差。
偏移角确定模块64根据第一距离差,确定声源的第一偏移角。偏移角确定模块64还根据第二距离差,确定声源的第二偏移角。
在一些实施例中,偏移角确定模块64利用上述公式(15)计算声源的第一偏移角。偏移角确定模块64利用上述公式(30)计算声源的第二偏移角。
方向调整模块65根据第一偏移角和第二偏移角调整视频采集电路的视频采集方向,以便视频采集电路对准声源。
在一些实施例中,第一音频采集电路、第二音频采集电路对称设置在视频采集电路的两侧。第三音频采集电路、第四音频采集电路对称设置在视频采集电路的另两侧。例如,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上。第三音频采集电路、第四音频采集电路和视频采集电路位于第二直线上。第一直线和第二直线垂直。若第一直线为水平方向,第二直线为竖直方向,则声源跟踪控制装置利用第一偏移角控制视频采集电路在左右方向上的偏转角度,利用第二偏移角控制视频采集电路在上下方向上的偏转角度。因此能够在水平面上实现声源跟踪。
图7是根据本公开一个实施例的声源跟踪控制装置的结构示意图。如图7所示,声源跟踪控制装置包括存储器701和处理器702。
存储器701用于存储指令,处理器702耦合到存储器701,处理器702被配置为基于存储器存储的指令执行实现如图1、图2、图4和图5中任一实施例涉及的方法。
如图7所示,声源跟踪控制装置还包括通信接口703,用于与其它设备进行信息交互。同时,该声源跟踪控制装置还包括总线704,处理器702、通信接口703、以及存储器701通过总线704完成相互间的通信。
存储器701可以包含高速RAM存储器,也可还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。存储器701也可以是存储器阵列。存储器701还可能被分块,并且块可按一定的规则组合成虚拟卷。
此外,处理器702可以是一个中央处理器CPU,或者可以是专用集成电路(Application Specific Integrated Circuit,简称:ASIC),或是被配置成实施本公开实施例的一个或多个集成电路。
本公开同时还涉及一种计算机可读存储介质,其中计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如图1、图2、图4和图5中任一实施例涉及的方法。
图8是根据本公开一个实施例的声源跟踪系统的结构示意图。如图8所示,声源跟踪系统包括第一音频采集电路811、第二音频采集电路812、声源跟踪控制装置82和视频采集电路83。声源跟踪控制装置82为图6或图7中任一实施例涉及的声源跟踪控制装置。
第一音频采集电路811和第二音频采集电路812对称设置在视频采集电路73的两侧。视频采集电路到第一音频采集电路的距离和视频采集电路到第二音频采集电路的距离相同。例如,第一音频采集电路、第二音频采集电路和视频采集电路位于第一直线上。
在一些实施例中,第一音频采集电路811和第二音频采集电路812为拾音器。
在一些实施例中,第一直线为水平方向。声源跟踪控制装置82利用计算出的第一偏移角,控制视频采集电路83的左右偏转角度,从而能够在水平面上实现声源跟踪。
图9是根据本公开另一个实施例的声源跟踪系统的结构示意图。图9与图8的不同之处在于,在图9所示实施例中,视频采集电路83包括方向控制平台831和设置在方向控制平台831上的摄像头832。例如,方向控制平台831为云台。
声源跟踪控制装置82通过利用方向控制平台831支持的通信协议,将控制参数 发送给方向控制平台831,以便对方向控制平台831的方向进行调节,从而调整摄像头832的视频采集方向。例如,所使用的通信协议为UART协议
在一些实施例中,如图8所示,声源跟踪系统还包括模数转换器84。
模数转换器84对第一音频采集电路811采集的音频信号进行模数转换以生成第一音频信息。模数转换器84对第二音频采集电路812采集的音频信号进行模数转换以生成第二音频信息。
这里需要说明的是,模数转换器84中设有多个相互独立的转换模块。因此可利用模数转换器84中的第一转换模块对第一音频采集电路811采集的音频信号进行模数转换以生成第一音频信息,利用模数转换器84中的第二转换模块对第二音频采集电路812采集的音频信号进行模数转换以生成第二音频信息。
在一些实施例中,模数转换器84为流水线(pipelined)式模数转换器、逐次逼近(successive approximation register,简称:SAR)式模数转换器或Σ-△(Sigma-Delta)式模数转换器。
图10是根据本公开另一个实施例的声源跟踪系统的结构示意图。图10与图9的不同之处在于,在图10所示实施例中,声源跟踪系统还包括第三音频采集电路813和第四音频采集电路814。
第三音频采集电路813、第四音频采集电路814对称设置在视频采集电路83的另两侧。视频采集电路83到第三音频采集电路813的距离和视频采集电路83到第四音频采集电路814的距离相同。例如,第一音频采集电路811、第二音频采集电路812和视频采集电路83位于第一直线上。第三音频采集电路813、第四音频采集电路814和视频采集电路83位于第二直线上。第一直线和第二直线垂直。
在一些实施例中,第一直线为水平方向,第二直线为竖直方向。声源跟踪控制装置82利用第一偏移角控制视频采集电路83在左右方向上的偏转角度。声源跟踪控制装置82利用第二偏移角控制视频采集电路83在上下方向上的偏转角度。从而能够在三维空间中实现声源跟踪。
至此,已经详细描述了本公开的实施例。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领 域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改或者对部分技术特征进行等同替换。本公开的范围由所附权利要求来限定。

Claims (15)

  1. 一种声源跟踪控制方法,包括:
    从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段;
    根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量;
    根据所述第一时间偏移量,确定声源相距所述第一音频采集电路的第一距离和所述声源相距所述第二音频采集电路的第二距离的第一距离差;
    根据所述第一距离差,确定所述声源的第一偏移角;
    根据所述第一偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
  2. 根据权利要求1所述的控制方法,其中,所述根据所述第一距离差,确定所述声源的第一偏移角包括:
    利用所述第一距离差,以及所述第一音频采集电路和所述第二音频采集电路之间的距离确定第一距离参数;
    根据所述第一距离参数和所述第一距离差的比值确定所述声源的第一偏移角。
  3. 根据权利要求1所述的控制方法,其中,所述根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量包括:
    根据所述第一音频段中的最大正峰值采样序号和所述第二音频段中的最大正峰值采样序号的第一差值,在所述第一音频段和所述第二音频段中选择出对应的有效正峰值,其中所述第一音频段和所述第二音频段中分别包括多个采样值;
    根据所述第一音频段中的最小负峰值采样序号和所述第二音频段中的最小负峰值采样序号的第二差值,在所述第一音频段和所述第二音频段中选择出对应的有效负峰值;
    根据所述第一音频段和所述第二音频段中对应的有效正峰值的采样序号偏差,以及所述第一音频段和所述第二音频段中对应的有效负峰值的采样序号偏差,确定所述 第一音频段和所述第二音频段的第一采样时钟偏差;
    根据所述第一采样时钟偏差和采样转换频率确定所述第一时间偏移量。
  4. 根据权利要求3所述的控制方法,其中:
    所述第一音频段中的有效正峰值采样序号和所述第二音频段中对应的有效正峰值采样序号之差与所述第一差值的差在第一预设范围内;
    所述第一音频段中的有效负峰值采样序号和所述第二音频段中对应的有效负峰值采样序号之差与所述第二差值的差在第二预设范围内。
  5. 根据权利要求3所述的控制方法,还包括:
    判断所述第一音频段或所述第二音频段中的有效正峰值和有效负峰值的第一和值是否小于第一预设门限;
    若所述第一和值小于第一预设门限,则控制所述视频采集电路进行全景拍摄。
  6. 根据权利要求5所述的控制方法,还包括:
    若所述第一和值不小于第一预设门限,则判断所述第一音频段或所述第二音频段中的所述有效正峰值的数量和所述有效负峰值的数量是否相同;
    在所述第一音频段或所述第二音频段中的所述有效正峰值的数量和所述有效负峰值的数量相同的情况下,进一步计算第一音频段或第二音频段中的正峰值总数和负峰值总数的第二和值;
    响应于所述第一和值与所述第二和值之比大于第二预设门限,控制所述视频采集电路进行全景拍摄。
  7. 根据权利要求3所述的控制方法,还包括:
    计算所述第一音频段中的所述最大正峰值采样序号和所述最小负正峰值采样序号的第三差值;
    计算所述第二音频段中的所述最大正峰值采样序号和所述最小负正峰值采样序号的第四差值;
    响应于所述第三差值和所述第四差值的正负性一致,且所述第三差值和所述第四差值的差在第三预设范围内,则在所述第一音频段和所述第二音频段中选择出对应的 有效正峰值。
  8. 根据权利要求3所述的控制方法,还包括:
    计算所述第一音频段中的正峰值总数和所述第二音频段中的正峰值总数的第五差值,以及所述第一音频段中的正峰值总数和所述第二音频段中的正峰值总数的第三和值;
    计算所述第一音频段中的负峰值总数和所述第二音频段中的负峰值总数的第六差值,以及所述第一音频段中的负峰值总数和所述第二音频段中的负峰值总数的第四和值;
    响应于所述第五差值与所述第三和值的比值在第四预定范围内,且所述第六差值与所述第四和值的比值在所述第五预定范围内,则在所述第一音频段和所述第二音频段中选择出对应的有效正峰值。
  9. 根据权利要求1-8中任一项所述的控制方法,还包括:
    同步地从第三音频采集电路采集的第三音频信息中提取第三音频段,从第四音频采集电路采集的第四音频信息中提取第四音频段;
    根据所述第三音频段和所述第四音频段中的预设峰值之间的偏差,确定所述第三音频段和所述第四音频段的第二时间偏移量;
    根据所述第二时间偏移量,确定所述声源相距所述第三音频采集电路的第三距离和所述声源相距所述第四音频采集电路的第四距离的第二距离差;
    根据所述第二距离差,确定所述声源的第二偏移角;
    根据所述第一偏移角和所述第二偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
  10. 一种声源跟踪控制装置,包括:
    提取模块,被配置为从第一音频采集电路采集的第一音频信息中提取第一音频段,并同步地从第二音频采集电路采集的第二音频信息中提取第二音频段;
    时间偏移量确定模块,被配置为根据所述第一音频段和所述第二音频段中的预设峰值之间的偏差,确定所述第一音频段和所述第二音频段的第一时间偏移量;
    距离差确定模块,被配置为根据所述第一时间偏移量,确定声源相距所述第一音 频采集电路的第一距离和所述声源相距所述第二音频采集电路的第二距离的第一距离差;
    偏移角确定模块,被配置为根据所述第一距离差,确定所述声源的第一偏移角;
    方向调整模块,被配置为根据所述第一偏移角调整视频采集电路的视频采集方向,以便所述视频采集电路对准所述声源。
  11. 一种声源跟踪控制装置,包括:
    存储器,被配置为存储指令;
    处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如权利要求1-9中任一项所述的方法。
  12. 一种声源跟踪系统,包括如权利要求10或11所述的声源跟踪控制装置,以及
    视频采集电路,被配置为根据所述声源跟踪控制装置的控制调整视频采集方向;
    第一音频采集电路和第二音频采集电路,其中所述第一音频采集电路和所述第二音频采集电路对称设置在所述视频采集电路的两侧。
  13. 根据权利要求12所述的跟踪系统,其中:
    所述声源到所述视频采集电路的距离与所述第一音频采集电路到第二音频采集电路的距离之比大于预设距离门限。
  14. 根据权利要求13所述的跟踪系统,还包括:
    模数转换器,用于对第一音频采集电路采集的音频信号进行模数转换以生成第一音频信息,对第二音频采集电路采集的音频信号进行模数转换以生成第二音频信息;
    所述视频采集电路包括:方向控制平台和设置在所述方向控制平台上的摄像头,所述方向控制平台被配置为根据所述声源跟踪控制装置的控制调整方向。
  15. 一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现如权利要求1-9中任一项所述的方法。
PCT/CN2020/076462 2020-02-24 2020-02-24 声源跟踪控制方法和控制装置、声源跟踪系统 WO2021168620A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080000167.1A CN113631942B (zh) 2020-02-24 2020-02-24 声源跟踪控制方法和控制装置、声源跟踪系统
PCT/CN2020/076462 WO2021168620A1 (zh) 2020-02-24 2020-02-24 声源跟踪控制方法和控制装置、声源跟踪系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/076462 WO2021168620A1 (zh) 2020-02-24 2020-02-24 声源跟踪控制方法和控制装置、声源跟踪系统

Publications (1)

Publication Number Publication Date
WO2021168620A1 true WO2021168620A1 (zh) 2021-09-02

Family

ID=77491742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076462 WO2021168620A1 (zh) 2020-02-24 2020-02-24 声源跟踪控制方法和控制装置、声源跟踪系统

Country Status (2)

Country Link
CN (1) CN113631942B (zh)
WO (1) WO2021168620A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926378A (zh) * 2022-04-01 2022-08-19 浙江西图盟数字科技有限公司 一种声源跟踪的方法、系统、装置和计算机存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6826284B1 (en) * 2000-02-04 2004-11-30 Agere Systems Inc. Method and apparatus for passive acoustic source localization for video camera steering applications
CN103235287A (zh) * 2013-04-17 2013-08-07 华北电力大学(保定) 一种声源定位摄像追踪装置
CN103797821A (zh) * 2011-06-24 2014-05-14 若威尔士有限公司 使用直接声的到达时间差确定
CN103841357A (zh) * 2012-11-21 2014-06-04 中兴通讯股份有限公司 基于视频跟踪的麦克风阵列声源定位方法、装置及系统
CN204517964U (zh) * 2015-02-13 2015-07-29 上海赢谊电子设备有限公司 一种应用在智能家居中的声音定位装置
CN106842131A (zh) * 2017-03-17 2017-06-13 浙江宇视科技有限公司 麦克风阵列声源定位方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI230023B (en) * 2003-11-20 2005-03-21 Acer Inc Sound-receiving method of microphone array associating positioning technology and system thereof
CN102695043A (zh) * 2012-06-06 2012-09-26 郑州大学 基于声源定位的动态视频监控系统
CN108231085A (zh) * 2016-12-14 2018-06-29 杭州海康威视数字技术股份有限公司 一种声源定位方法及装置
JP6375475B1 (ja) * 2017-06-07 2018-08-15 井上 時子 音源方向追従システム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6826284B1 (en) * 2000-02-04 2004-11-30 Agere Systems Inc. Method and apparatus for passive acoustic source localization for video camera steering applications
CN103797821A (zh) * 2011-06-24 2014-05-14 若威尔士有限公司 使用直接声的到达时间差确定
CN103841357A (zh) * 2012-11-21 2014-06-04 中兴通讯股份有限公司 基于视频跟踪的麦克风阵列声源定位方法、装置及系统
CN103235287A (zh) * 2013-04-17 2013-08-07 华北电力大学(保定) 一种声源定位摄像追踪装置
CN204517964U (zh) * 2015-02-13 2015-07-29 上海赢谊电子设备有限公司 一种应用在智能家居中的声音定位装置
CN106842131A (zh) * 2017-03-17 2017-06-13 浙江宇视科技有限公司 麦克风阵列声源定位方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926378A (zh) * 2022-04-01 2022-08-19 浙江西图盟数字科技有限公司 一种声源跟踪的方法、系统、装置和计算机存储介质

Also Published As

Publication number Publication date
CN113631942B (zh) 2024-04-16
CN113631942A (zh) 2021-11-09

Similar Documents

Publication Publication Date Title
US11398235B2 (en) Methods, apparatuses, systems, devices, and computer-readable storage media for processing speech signals based on horizontal and pitch angles and distance of a sound source relative to a microphone array
CN107408386B (zh) 基于语音方向控制电子装置
EP3614377B1 (en) Object recognition method, computer device and computer readable storage medium
CN106782584B (zh) 音频信号处理设备、方法和电子设备
CN107464564B (zh) 语音交互方法、装置及设备
CN107346661B (zh) 一种基于麦克风阵列的远距离虹膜跟踪与采集方法
WO2022127180A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
JP5456832B2 (ja) 入力された発話の関連性を判定するための装置および方法
JP2019186929A (ja) カメラ撮影制御方法、装置、インテリジェント装置および記憶媒体
WO2016131361A1 (zh) 一种监控系统和方法
WO2012036424A2 (en) Method and apparatus for performing microphone beamforming
CN107799126A (zh) 基于有监督机器学习的语音端点检测方法及装置
EP2836964A1 (en) Object recognition using multi-modal matching scheme
WO2021008000A1 (zh) 语音唤醒方法、装置及电子设备、存储介质
CN103685906A (zh) 一种控制方法、控制装置及控制设备
CN111432115A (zh) 基于声音辅助定位的人脸追踪方法、终端及存储装置
WO2021168620A1 (zh) 声源跟踪控制方法和控制装置、声源跟踪系统
WO2016137042A1 (ko) 사용자 인식을 위한 특징 벡터를 변환하는 방법 및 디바이스
WO2019153382A1 (zh) 智能音箱及播放控制方法
CN112925235A (zh) 交互时的声源定位方法、设备和计算机可读存储介质
CN105516692A (zh) 一种物联网智能设备
WO2019227552A1 (zh) 基于行为识别的语音定位方法以及装置
CN107533415B (zh) 声纹检测的方法和装置
WO2023193803A1 (zh) 音量控制方法、装置、存储介质和电子设备
Park et al. Robust multi-channel speech recognition using frequency aligned network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921161

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921161

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20921161

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/04/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20921161

Country of ref document: EP

Kind code of ref document: A1