WO2023010599A1 - Target trajectory calibration method based on video and audio, and computer device - Google Patents

Target trajectory calibration method based on video and audio, and computer device Download PDF

Info

Publication number
WO2023010599A1
WO2023010599A1 PCT/CN2021/111895 CN2021111895W WO2023010599A1 WO 2023010599 A1 WO2023010599 A1 WO 2023010599A1 CN 2021111895 W CN2021111895 W CN 2021111895W WO 2023010599 A1 WO2023010599 A1 WO 2023010599A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound sources
sound
distance
trajectory
microphone
Prior art date
Application number
PCT/CN2021/111895
Other languages
French (fr)
Chinese (zh)
Inventor
郑勇
张缤
戴志涛
Original Assignee
深圳市沃特沃德信息有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德信息有限公司 filed Critical 深圳市沃特沃德信息有限公司
Publication of WO2023010599A1 publication Critical patent/WO2023010599A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present application relates to the technical field of audio and video processing, in particular to a video and audio-based target trajectory marking method and computer equipment.
  • the position calibration of the sound source included in the audio and video it is necessary to ensure that the sound source appears within the visual range of the audio and video, that is, the camera equipment needs to capture the sound source, and the user can manually find the sound source from the captured audio and video Relative to the position of the camera device, the movement trajectory of the sound source is obtained through the shooting field of view of the camera device. If the sound source appears outside the shooting field of view of the camera device, even if the sound from the sound source is received, the user cannot determine the relative position between the sound source and the camera device, let alone further determine the trajectory of the sound source.
  • the main purpose of this application is to provide a target trajectory calibration method and computer equipment based on video and audio, aiming at solving the disadvantage that the existing sound source cannot determine the movement trajectory of the sound source when it is not within the shooting field of view of the camera equipment.
  • the present application provides a target trajectory calibration method based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones, The microphone array is deployed on the camera equipment, and the target trajectory calibration method includes:
  • the distance between each of the sound sources and the reference sub-microphone is calculated.
  • the first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
  • the second distance between each of the sound sources and the imaging equipment is obtained through conversion.
  • a second movement trajectory corresponding to each of the sound sources is constructed.
  • the present application also provides a computer device, including a memory and a processor, and a computer program is stored in the memory, wherein, when the processor executes the computer program, a target trajectory based on video and audio is realized
  • the video is collected by a camera device
  • the audio is collected by a microphone array
  • the microphone array is composed of a plurality of sub-microphones
  • the microphone array is deployed on the camera device;
  • the target track marking method based on video and audio includes:
  • the distance between each of the sound sources and the reference sub-microphone is calculated.
  • the first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
  • the second distance between each of the sound sources and the imaging equipment is obtained through conversion.
  • a second movement trajectory corresponding to each of the sound sources is constructed.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, a method for marking a target trajectory based on video and audio is implemented, and the video Collected by camera equipment, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, the microphone array is deployed on the camera equipment, and the target track calibration method based on video and audio includes the following steps:
  • the distance between each of the sound sources and the reference sub-microphone is calculated.
  • the first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
  • the second distance between each of the sound sources and the imaging equipment is obtained through conversion.
  • a second movement trajectory corresponding to each of the sound sources is constructed.
  • a video and audio-based target trajectory calibration method and computer equipment wherein the video is collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of multiple sub-microphones, and the microphone array is deployed on the camera device .
  • the processing system collects video data through a camera device, and collects a plurality of audio data through a microphone array. Then perform VAD algorithm recognition on the sounds contained in each audio data respectively to obtain several sound sources contained in each audio data.
  • the processing system calculates the first distance between each sound source and the reference sub-microphone based on the difference in receiving time of the sound corresponding to the same sound source by the two first sub-microphones and the deployment position between the two first sub-microphones.
  • the reference sub-microphone is any one of the two first sub-microphones.
  • the processing system converts and obtains a second relative positional relationship between each sound source and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each sound source.
  • the processing system constructs the second movement trajectory corresponding to each sound source according to the acquisition time of the video data and each audio data, the first movement trajectory of the camera device, and each second relative positional relationship.
  • the camera device is equipped with a microphone array, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone.
  • the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the first trajectory of the camera device is used as a position parameter to calibrate the second trajectory of each sound source.
  • Fig. 1 is a schematic diagram of the steps of the target trajectory marking method based on video and audio in an embodiment of the present application;
  • Fig. 2 is a schematic diagram of the distribution of the reference sub-microphone, the field of view center of the imaging device, and the sound source in an embodiment of the present application;
  • Fig. 3 is the overall structural block diagram of the target trajectory marking device based on video and audio in one embodiment of the present application;
  • Fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • an embodiment of the present application provides a method for marking a target trajectory based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones, The microphone array is deployed on the camera equipment, and the target trajectory calibration method includes:
  • S1 collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
  • a microphone array is deployed on the camera device, and the microphone array is composed of multiple sub-microphones.
  • the processing system collects video data through the camera device, and collects multiple audio data through the microphone array.
  • the processing system can be the local system of the camera equipment, which can directly analyze and process the collected video data and audio data; (such as wifi signals, 4g/5g network signals) are uploaded to the cloud server for analysis and processing by the processing system.
  • the processing system performs VAD (Voice Activity Detection, Voice Endpoint Detection) algorithm recognition on the sounds included in each audio data, performs speech recognition on each sound, and obtains all sound sources included in each audio data.
  • VAD Voice Activity Detection, Voice Endpoint Detection
  • the processing system uses a TDOA (Time Difference of Arrival) positioning algorithm based on the difference in the receiving time of the sound corresponding to the same sound source between the two first sub-microphones and the deployment position between the two first sub-microphones
  • the first relative positional relationship between each sound source and the reference sub-microphone is obtained through calculation.
  • the first relative positional relationship includes a first distance and a first angle
  • the first distance characterizes the linear distance between the sound source and the reference sub-microphone
  • the first angle represents the distance between the sound source and the reference sub-microphone relative to the horizontal plane.
  • the reference sub-microphone can be any one of the first sub-microphones, different first sub-microphones are selected as the reference sub-microphone, and the corresponding first distances are correspondingly different, and the first angle is the same, the calculation of the first distance
  • the processing system calculates the complementary angle of the first angle, and obtains the linear distance between the reference sub-microphone and the center of the field of view of the imaging device according to the deployment position of the reference sub-microphone on the imaging device. Then, corresponding calculation is performed according to the complementary angle of the first angle, the straight-line distance and the first distance to obtain the second distance between the imaging device and the sound source.
  • the processing system calculates the vertical distance between the reference sub-microphone and the sound source through the formula of the law of cosines according to the first angle and the first distance. Finally, according to the vertical distance and the second distance, the second angle between the camera device and the sound source is obtained by calculating again through the formula of the law of cosines. According to the above calculation logic, the processing system calculates the second distance and the second angle corresponding to each sound source and the center of field of view of the device respectively, so as to generate the second relative positional relationship corresponding to each sound source.
  • the processing system obtains the first motion track of the camera device according to the GPS positioning module deployed on the camera device, and calibrates and constructs each sound source according to the second relative positional relationship between each sound source and the camera device, taking the first motion track as a position reference. Sources respectively correspond to the second motion trajectory.
  • a microphone array is deployed on the imaging device, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone. Then, by means of the deployment positional relationship between the reference sub-microphone and the imaging device, a second positional relationship between each sound source relative to the imaging device is obtained through position conversion. Therefore, even if the sound source does not appear in the shooting field of view of the camera device, as long as the sound from the sound source can be received by the microphone array, the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the acquisition time of the video data and each audio data is taken as the time reference, and the first movement trajectory of the camera device is used as the position parameter, so as to calibrate the second movement trajectory of each sound source.
  • the first relative positional relationship includes a first distance and a first angle, according to the deployment position of the reference sub-microphone on the imaging device, and the first position corresponding to each of the sound sources.
  • the relative positional relationship, the step of converting to obtain the second relative positional relationship between each of the sound sources and the imaging device includes:
  • S402 Call the straight-line distance between the reference sub-microphone and the imaging device, and substitute the complementary angle of the first angle, the straight-line distance, and the first distance into a calculation formula to obtain a second distance , wherein the calculation formula is: , b is the first distance, c is the straight-line distance, ⁇ is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
  • S404 Calculate and obtain a second angle between the imaging device and the sound source according to the second distance and the vertical distance through the formula of the law of cosines, where the distance between the imaging device and the sound source The value of the vertical distance is the same as the vertical distance between the reference sub-microphone and the sound source;
  • S405 Calculate and obtain the second distance and the second angle respectively corresponding between each of the sound sources and the imaging device according to the above rules, and generate each of the second relative positional relationships.
  • the center of the field of view of the imaging device is point A
  • the reference sub-microphone is point B
  • the sound source is point C
  • a vertical line is drawn through the reference sub-microphone and the sound source respectively, intersecting at point D
  • the triangle BCD is a right triangle
  • ⁇ BDC is a right angle
  • ⁇ CBD is the first angle between the sound source and the reference sub-microphone
  • side BC is the first distance between the sound source and the reference sub-microphone.
  • ⁇ ABC is the complementary angle of ⁇ CBD (i.e.
  • side AB is the linear distance between the reference sub-microphone and the center of field of view of the camera device
  • side AC is the field of view between the sound source and the camera device Second distance between centers. Since the values of ⁇ ABC, side AB and side BC are known, they are substituted into the calculation formula: The value of side AC can be calculated. Among them, b is the first distance (that is, side BC), c is the straight-line distance (that is, side AB), ⁇ is the complementary angle of the first angle (that is, ⁇ ABC), and a is the second distance (that is, side AC), which represents the camera The distance between the device and the sound source in question.
  • the value of the side BD (that is, the vertical distance between the reference sub-microphone and the sound source) can be calculated through the formula of the law of cosines.
  • the triangle ACE is a right triangle
  • the length of the side AE is the same as that of the side BD
  • ⁇ CAE is the sound source
  • the second included angle between the camera devices, ⁇ CEA is a right angle.
  • the value of ⁇ CAE can be calculated by the formula of the law of cosines, and thus the first distance between the center of field of view of the camera equipment and the sound source can be obtained.
  • the processing system calculates the second distance and the second angle corresponding to each sound source and the field of view center of the camera device according to the above rules, and generates the second relative positional relationship corresponding to each sound source according to the second distance and the second angle .
  • the step of performing VAD algorithm recognition on the sounds contained in each of the audio data to obtain several sound sources includes:
  • S201 Perform VAD algorithm recognition on the sounds contained in each of the audio data, respectively, to obtain a number of personal sound sources and other types of sound sources;
  • S202 Mark and number each of the human voice sound sources, and perform decibel value detection on each of the other types of sound sources, hide the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously detect the decibel value in the decibel value.
  • the second other type of sound source above the above-mentioned decibel threshold shall be marked and numbered.
  • the processing system uses the VAD algorithm to perform speech recognition on the sounds included in each audio data, thereby obtaining several sound sources, and distinguishing each sound source into human voice sources and other types of sound sources (such as animal sound sources, automobile sound sources, etc.) sound source, etc.).
  • the processing system marks and numbers each vocal sound source to distinguish each vocal sound source.
  • the processing system screens other types of sound sources to eliminate some other types of sound sources that are less practical. Specifically, the processing system recalls a preset decibel threshold, and then respectively detects the decibel values of sounds emitted by various other types of sound sources.
  • the processing system compares the decibel value of the sound emitted by each other type of sound source with the decibel threshold, and hides or eliminates some other types of sound sources (that is, the first other type of sound source) whose sound decibel value is below the decibel threshold.
  • the sound corresponding to the first other type of sound source is correspondingly processed (such as the subsequent marker number and the construction of the corresponding motion trajectory).
  • the processing system marks and numbers some other types of sound sources (that is, second other types of sound sources) whose decibel value is above the decibel threshold, so as to distinguish each second other type of sound source.
  • the step of marking and numbering the second other type of sound source whose decibel value is above the decibel threshold includes:
  • S2021 Input the sound corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtain the sound types corresponding to each of the second other types of sound sources;
  • a pre-trained sound type recognition model is built in the processing system.
  • the sound type recognition model uses various types of sounds (such as cat meowing, dog barking, car driving sound, etc.) After learning and training (the method of deep learning training model is the same as that of the prior art, and will not be described in detail here), the corresponding types of various sounds can be identified.
  • the processing system inputs the sounds corresponding to the second other types of sound sources into the pre-trained sound type recognition model for corresponding processing, and outputs the sound types corresponding to the sounds of each second other types of sound sources (such as the first
  • the sound type corresponding to the sound of the second other type of sound source A is cat meowing
  • the sound type corresponding to the sound of the second other type of sound source B is the sound of driving cars.
  • the processing system correspondingly marks the sound types corresponding to each of the second other types of sound sources as marking information, so as to facilitate the user to know specific information.
  • the first motion track of the camera device constructs the first sound source corresponding to each of the sound sources.
  • the steps of the second motion trajectory include:
  • S501 Perform time synchronization based on the acquisition time of the video data and each of the audio data respectively, and locate the appearance time of each of the sound sources in the video data;
  • S502 Collect the first motion track of the imaging device by GPS positioning method, and use the first motion track as a position reference, according to the appearance time corresponding to each of the sound sources and each of the second The relative positional relationship is constructed to obtain the second motion trajectory of each of the sound sources relative to the first motion trajectory.
  • the processing system takes the acquisition time of video data as a benchmark, and synchronizes the acquisition time of each audio data with the acquisition time of video data, thereby locating and obtaining the appearance time of each sound source in the video data (the appearance time includes the appearance time , duration, and end time).
  • the camera device is equipped with a GPS positioning module, and the processing system uses the GPS positioning module to realize the positions corresponding to each acquisition time of the camera device in the process of shooting video data, and then form the first trajectory according to these positions.
  • the second relative positional relationship is constructed to obtain the second movement trajectory of each sound source relative to the first movement trajectory, so as to realize the calibration of the movement trajectory of the sound source not within the shooting field of view of the camera device.
  • the first motion track of the camera device constructs the first sound source corresponding to each of the sound sources.
  • S7 Generate a trajectory distribution diagram according to the first movement trajectory, the corresponding information, and each of the second movement trajectories, and output the trajectory distribution diagram to a display interface.
  • the processing system after the processing system generates the second movement trajectories corresponding to each sound source according to the first movement trajectory, in order to reflect the difference between the second movement trajectories of each sound source, the processing system is constructed with lines of different colors The second motion track of each sound source, and record the corresponding relationship between each color and each sound source to form corresponding information.
  • the color of the track line of the second motion track of sound source A is red
  • the second track of sound source B The track line color of the motion track is yellow.
  • the processing system generates a trajectory distribution diagram according to the first movement trajectory, the second movement trajectory and the corresponding information (the corresponding information is recorded on the trajectory distribution diagram as label information, which is convenient for users to view the sound source of the corresponding color), and outputs the trajectory distribution diagram to The display interface enables the user to intuitively understand the changes of each second motion track.
  • the step of generating a trajectory distribution diagram according to the first movement trajectory, the corresponding information and each of the second movement trajectories, and outputting the trajectory distribution diagram to a display interface includes:
  • S701 Call a three-dimensional map, and mark the first movement track on the three-dimensional map;
  • the processing system retrieves the three-dimensional map of the shooting area of the camera device (the three-dimensional map can be pre-stored in the database of the processing system, or can be downloaded from the network by the processing system), and then marks the first movement track on the on a three-dimensional map. Then, with the first motion trajectory as the position parameter, the second motion trajectory corresponding to each sound source is marked on the three-dimensional map according to the appearance time, and the corresponding information of the color and each second motion trajectory are added on the three-dimensional map The appearance time and end time of , form a trajectory distribution diagram as a whole. Finally, the processing system outputs the trajectory distribution map to the display interface, and the user can understand the change of each second motion trajectory more clearly from the three-dimensional level.
  • the processing system outputs the trajectory distribution map to the display interface, and the user can understand the change of each second motion trajectory more clearly from the three-dimensional level.
  • an embodiment of the present application also provides a target trajectory marking device based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones , the microphone array is deployed on the camera equipment, and the target trajectory marking device includes:
  • a collection module 1 configured to collect video data through the camera device, and collect a plurality of audio data through the microphone array;
  • the recognition module 2 is used to perform VAD algorithm recognition on the sounds contained in each of the audio data respectively to obtain several sound sources;
  • Calculation module 3 for calculating and obtaining each of the sound sources based on the difference between the receiving time of the sound corresponding to the same sound source by the two first sub-microphones, and the deployment position between the two first sub-microphones. a first relative positional relationship between a source and a reference sub-microphone, where the reference sub-microphone is any one of the two first sub-microphones;
  • the conversion module 4 is configured to convert each of the sound sources and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each of the sound sources. a second relative positional relationship between the devices;
  • a construction module 5 configured to construct a sound source corresponding to each of the sound sources according to the acquisition time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second relative positional relationships. Second motion track.
  • the first relative positional relationship includes a first distance and a first angle
  • the conversion module 4 includes:
  • a first calculation unit configured to calculate a complementary angle of the first angle
  • a second calculation unit configured to call the straight-line distance between the reference sub-microphone and the imaging device, and substitute the complement of the first angle, the straight-line distance and the first distance into a calculation formula , to obtain the second distance, wherein the calculation formula is: , b is the first distance, c is the straight-line distance, ⁇ is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
  • a third calculation unit configured to calculate the vertical distance between the reference sub-microphone and the sound source through the cosine law formula according to the first angle and the first distance;
  • a fourth calculation unit configured to calculate a second angle between the imaging device and the sound source according to the second distance and the vertical distance through the formula of the law of cosines, wherein the imaging device and the sound source The vertical distance between the sound sources is the same as the vertical distance between the reference sub-microphone and the sound source;
  • the generation unit is configured to calculate the second distance and the second angle respectively corresponding between each of the sound sources and the imaging device according to the above rules, and generate each of the second relative positional relationships.
  • the identification module 2 includes:
  • the recognition unit is used to perform VAD algorithm recognition on the sounds contained in each of the audio data respectively to obtain a number of personal sound sources and other types of sound sources;
  • the screening unit is used to mark and number each of the human voice sound sources, and detect the decibel value of each of the other types of sound sources, hide the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously decibel A second other type of sound source whose value is above said decibel threshold is marked with a number.
  • the screening unit includes:
  • An identification subunit configured to input the sounds corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtain the sound types corresponding to each of the second other types of sound sources;
  • the marking subunit is configured to use the sound type as marking information to mark and number each of the second sound sources of other types.
  • building block 5 includes:
  • a positioning unit configured to perform time synchronization based on the acquisition time of the video data and each of the audio data, respectively, and locate the appearance time of each of the sound sources in the video data;
  • a construction unit configured to collect the first motion trajectory of the imaging device through a GPS positioning method, and use the first motion trajectory as a position reference, according to the appearance time corresponding to each of the sound sources and each sound source The second relative positional relationship is used to construct and obtain the second motion trajectory of each of the sound sources relative to the first motion trajectory.
  • the target trajectory calibration device also includes:
  • a recording module 6 configured to construct each of the second motion trajectories with lines of different colors, and record the corresponding relationship between each color and each of the sound sources to form corresponding information;
  • the generating module 7 is configured to generate a trajectory distribution diagram according to the first movement trajectory, the corresponding information and each of the second movement trajectories, and output the trajectory distribution diagram to a display interface.
  • the generating module 7 includes:
  • a marking unit configured to call a three-dimensional map, and mark the first movement track on the three-dimensional map
  • a forming unit configured to mark each of the second movement trajectories on the three-dimensional map by using the first movement trajectory as a position reference, and add the corresponding information and each of the three-dimensional maps on the three-dimensional map.
  • the appearance time and the end time of the second motion track form the track distribution diagram;
  • an output unit configured to output the trajectory distribution map to a display interface.
  • each module, unit, and subunit in the target trajectory marking device is used to perform corresponding steps in the above-mentioned video and audio-based target trajectory marking method, and its specific implementation process will not be described in detail here.
  • This embodiment provides a target trajectory marking device based on video and audio, wherein the video is collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of multiple sub-microphones, and the microphone array is deployed on the camera device.
  • the processing system collects video data through a camera device, and collects a plurality of audio data through a microphone array. Then perform VAD algorithm recognition on the sounds contained in each audio data respectively to obtain several sound sources contained in each audio data.
  • the processing system calculates the first distance between each sound source and the reference sub-microphone based on the difference in receiving time of the sound corresponding to the same sound source by the two first sub-microphones and the deployment position between the two first sub-microphones.
  • the reference sub-microphone is any one of the two first sub-microphones.
  • the processing system converts and obtains a second relative positional relationship between each sound source and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each sound source.
  • the processing system constructs the second movement trajectory corresponding to each sound source according to the acquisition time of the video data and each audio data, the first movement trajectory of the camera device, and each second relative positional relationship.
  • the camera device is equipped with a microphone array, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone.
  • the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the first trajectory of the camera device is used as a position parameter to calibrate the second trajectory of each sound source.
  • an embodiment of the present application also provides a computer device, which may be a server, and its internal structure may be as shown in FIG. 4 .
  • the computer device includes a processor, memory, network interface and database connected by a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs and databases.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as decibel thresholds.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • the function of a target trajectory marking method based on video and audio in any of the above-mentioned embodiments is realized, the video is collected by an imaging device, the audio is collected by a microphone array, and the microphone array Composed of multiple sub-microphones, the microphone array is deployed on the imaging device.
  • Above-mentioned processor carries out the step of above-mentioned method based on target track marking of video and audio frequency:
  • S1 collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the storage medium may be a non-volatile storage medium or a volatile storage medium on which a computer program is stored.
  • the computer program is executed by a processor
  • the video is collected by camera equipment
  • the audio is collected by a microphone array
  • the microphone array is composed of a plurality of sub-microphones
  • the microphone array is deployed in the On the imaging device, the method is specifically:
  • S1 collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Studio Devices (AREA)

Abstract

Provided by the present application are a target trajectory calibration method based on video and audio, and a computer device. A microphone array is deployed on a camera device, and the positional relationship between a sound source and the camera device is determined by means of the deployment position relationship between a reference sub-microphone and the camera device. Then, a second motion trajectory of each sound source is calibrated by using video data and the collection time of each item of audio data as a time reference and by using a first motion trajectory of the camera device as a position parameter.

Description

基于视频和音频的目标轨迹标定方法和计算机设备Target trajectory calibration method and computer equipment based on video and audio 技术领域technical field
本申请涉及音视频处理技术领域,特别涉及一种基于视频和音频的目标轨迹标定方法和计算机设备。The present application relates to the technical field of audio and video processing, in particular to a video and audio-based target trajectory marking method and computer equipment.
背景技术Background technique
现有针对音视频所包含的声源位置标定,需要保证声源出现在音视频的视界范围内,即摄像设备需要拍摄到声源,用户才能从拍摄到的音视频中通过人工查找得到声源相对于摄像设备的位置,进而通过摄像设备的拍摄视界得到声源的运动轨迹。如果声源出现在摄像设备的拍摄视界之外,即使接收到声源所发出的声音,用户也无法确定声源与摄像设备之间的相对位置,更不用说进一步确定声源的运动轨迹。Currently, for the position calibration of the sound source included in the audio and video, it is necessary to ensure that the sound source appears within the visual range of the audio and video, that is, the camera equipment needs to capture the sound source, and the user can manually find the sound source from the captured audio and video Relative to the position of the camera device, the movement trajectory of the sound source is obtained through the shooting field of view of the camera device. If the sound source appears outside the shooting field of view of the camera device, even if the sound from the sound source is received, the user cannot determine the relative position between the sound source and the camera device, let alone further determine the trajectory of the sound source.
技术问题technical problem
本申请的主要目的为提供一种基于视频和音频的目标轨迹标定方法和计算机设备,旨在解决现有声源不在摄像设备的拍摄视界内时无法确定声源的运动轨迹的弊端。The main purpose of this application is to provide a target trajectory calibration method and computer equipment based on video and audio, aiming at solving the disadvantage that the existing sound source cannot determine the movement trajectory of the sound source when it is not within the shooting field of view of the camera equipment.
技术解决方案technical solution
为实现上述目的,第一方面,本申请提供一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述目标轨迹标定方法包括:In order to achieve the above object, in the first aspect, the present application provides a target trajectory calibration method based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones, The microphone array is deployed on the camera equipment, and the target trajectory calibration method includes:
通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
第二方面,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上;In a second aspect, the present application also provides a computer device, including a memory and a processor, and a computer program is stored in the memory, wherein, when the processor executes the computer program, a target trajectory based on video and audio is realized In the calibration method, the video is collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, and the microphone array is deployed on the camera device;
其中,所述基于视频和音频的目标轨迹标定方法包括:Wherein, the target track marking method based on video and audio includes:
通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
第三方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述基于视频和音频的目标轨迹标定方法包括以下步骤:In a third aspect, the present application also provides a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, a method for marking a target trajectory based on video and audio is implemented, and the video Collected by camera equipment, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, the microphone array is deployed on the camera equipment, and the target track calibration method based on video and audio includes the following steps:
通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
有益效果Beneficial effect
本申请中提供的一种基于视频和音频的目标轨迹标定方法和计算机设备,其中,视频由摄像设备采集,音频由麦克风阵列采集,麦克风阵列由多个子麦克风组成,并且麦克风阵列部署在摄像设备上。应用时,处理系统通过摄像设备采集视频数据,并通过麦克风阵列采集多个音频数据。然后分别对各音频数据所包含的声音做VAD算法识别,得到各音频数据所包含的若干个声源。处理系统基于两个第一子麦克风对相同的声源对应的声音的接收时间之差,以及两个第一子麦克风之间的部署位置,计算得到各个声源与基准子麦克风之间的第一相对位置关系,其中,基准子麦克风为两个第一子麦克风中的任意一个。处理系统根据基准子麦克风在摄像设备上的部署位置,以及各声源分别对应的第一相对位置关系,转换得到各声源与摄像设备之间的第二相对位置关系。最后,处理系统根据视频数据与各音频数据的采集时间,摄像设备的第一运动轨迹,以及各第二相对位置关系,构建各声源分别对应的第二运动轨迹。本申请中,摄像设备上部署有麦克风阵列,可以根据各个声源对应的声音到各个子麦克风之间的时间差计算得到各个声源与基准子麦克风之间的第一相对位置关系。再借由基准子麦克风与摄像设备之间的部署位置关系,通过位置转换得到各个声源相对于摄像设备之间的第二位置关系。因此,即使声源没有出现在摄像设备的拍摄视界内,只要声源发出的声音可以被麦克风阵列接收到,即可通过基准子麦克风与摄像设备之间的部署位置关系,确定声源相对于摄像设备之间的位置关系。再以摄像设备的第一运动轨迹为位置参数,从而标定各个声源的第二运动轨迹。A video and audio-based target trajectory calibration method and computer equipment provided in this application, wherein the video is collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of multiple sub-microphones, and the microphone array is deployed on the camera device . In application, the processing system collects video data through a camera device, and collects a plurality of audio data through a microphone array. Then perform VAD algorithm recognition on the sounds contained in each audio data respectively to obtain several sound sources contained in each audio data. The processing system calculates the first distance between each sound source and the reference sub-microphone based on the difference in receiving time of the sound corresponding to the same sound source by the two first sub-microphones and the deployment position between the two first sub-microphones. Relative positional relationship, wherein, the reference sub-microphone is any one of the two first sub-microphones. The processing system converts and obtains a second relative positional relationship between each sound source and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each sound source. Finally, the processing system constructs the second movement trajectory corresponding to each sound source according to the acquisition time of the video data and each audio data, the first movement trajectory of the camera device, and each second relative positional relationship. In this application, the camera device is equipped with a microphone array, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone. Then, by means of the deployment positional relationship between the reference sub-microphone and the imaging device, a second positional relationship between each sound source relative to the imaging device is obtained through position conversion. Therefore, even if the sound source does not appear in the shooting field of view of the camera device, as long as the sound from the sound source can be received by the microphone array, the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the first trajectory of the camera device is used as a position parameter to calibrate the second trajectory of each sound source.
附图说明Description of drawings
图1是本申请一实施例中基于视频和音频的目标轨迹标定方法的步骤示意图;Fig. 1 is a schematic diagram of the steps of the target trajectory marking method based on video and audio in an embodiment of the present application;
图2是本申请一实施例中基准子麦克风、摄像设备的视野中心以及声源的分布示意图;Fig. 2 is a schematic diagram of the distribution of the reference sub-microphone, the field of view center of the imaging device, and the sound source in an embodiment of the present application;
图3是本申请一实施例中基于视频和音频的目标轨迹标定装置的整体结构框图;Fig. 3 is the overall structural block diagram of the target trajectory marking device based on video and audio in one embodiment of the present application;
图4是本申请一实施例的计算机设备的结构示意框图。Fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
参照图1,本申请一实施例中提供了一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述目标轨迹标定方法包括:Referring to Fig. 1, an embodiment of the present application provides a method for marking a target trajectory based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones, The microphone array is deployed on the camera equipment, and the target trajectory calibration method includes:
S1:通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;S1: collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
S2:分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;S2: do VAD algorithm recognition to the sound contained in each described audio data respectively, obtain several sound sources;
S3:基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;S3: Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, calculate the relationship between each of the sound sources and the reference sub-microphone A first relative positional relationship between microphones, the reference sub-microphone being any one of the two first sub-microphones;
S4:根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;S4: According to the deployment position of the reference sub-microphone on the imaging device, and the first relative positional relationship corresponding to each of the sound sources, convert and obtain the distance between each of the sound sources and the imaging device The second relative positional relationship;
S5:根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。S5: According to the acquisition time of the video data and each of the audio data, the first motion track of the camera device, and each of the second relative positional relationships, construct a second motion track corresponding to each of the sound sources respectively .
本实施例中,摄像设备上部署有麦克风阵列,麦克风阵列由多个子麦克风组成,在应用时,处理系统通过摄像设备采集视频数据,并通过麦克风阵列采集多个音频数据。其中,处理系统可以为摄像设备的本地系统,直接可以对采集的视频数据和音频数据进行相应的解析处理;也可以为云端服务器,摄像设备采集的视频数据、麦克风阵列采集的音频数据通过无线信号(比如wifi信号、4g/5g网络信号)上传至云端服务器,以供处理系统进行解析处理。处理系统分别对各个音频数据所包含的声音做VAD(Voice Activity Detection,语音端点检测)算法识别,对各个声音进行语音识别,得到各所述音频数据所包含的所有声源。处理系统基于两个第一子麦克风对相同的声源所对应的声音的接收时间之差,以及两个第一子麦克风之间的部署位置,通过TDOA(Time Difference of Arrival,到达时间差)定位算法计算得到各个声源与基准子麦克风之间的第一相对位置关系。其中,第一相对位置关系包括第一距离和第一角度,第一距离表征声源与基准子麦克风之间的直线距离,第一角度表征声源与基准子麦克风之间相对于水平面之间的夹角(由于基准子麦克风可以是第一子麦克风中的任意一个,选择不同的第一子麦克风作为基准子麦克风,其对应的第一距离相应不同,而第一角度相同,第一距离的计算逻辑相同,在此不做详述)。处理系统计算第一角度的余角,并根据基准子麦克风在摄像设备上的部署位置,获取基准子麦克风与摄像设备的视野中心之间的直线距离。然后,根据第一角度的余角、直线距离以及第一距离进行相应的计算,得到摄像设备与声源之间的第二距离。处理系统根据第一角度和第一距离,通过余弦定理公式计算得到基准子麦克风与声源之间的垂直距离。最后,根据该垂直距离和第二距离,再次通过余弦定理公式计算得到摄像设备与声源之间的第二角度。处理系统按照上述计算逻辑,分别计算得到各个声源与设备的视野中心之间分别对应的第二距离和第二角度,从而生成各声源分别对应的第二相对位置关系。处理系统根据摄像设备上部署的GPS定位模块得到摄像设备的第一运动轨迹,并根据各声源与摄像设备之间的第二相对位置关系,以第一运动轨迹为位置参照,标定构建各个声源分别对应的第二运动轨迹。In this embodiment, a microphone array is deployed on the camera device, and the microphone array is composed of multiple sub-microphones. In application, the processing system collects video data through the camera device, and collects multiple audio data through the microphone array. Among them, the processing system can be the local system of the camera equipment, which can directly analyze and process the collected video data and audio data; (such as wifi signals, 4g/5g network signals) are uploaded to the cloud server for analysis and processing by the processing system. The processing system performs VAD (Voice Activity Detection, Voice Endpoint Detection) algorithm recognition on the sounds included in each audio data, performs speech recognition on each sound, and obtains all sound sources included in each audio data. The processing system uses a TDOA (Time Difference of Arrival) positioning algorithm based on the difference in the receiving time of the sound corresponding to the same sound source between the two first sub-microphones and the deployment position between the two first sub-microphones The first relative positional relationship between each sound source and the reference sub-microphone is obtained through calculation. Wherein, the first relative positional relationship includes a first distance and a first angle, the first distance characterizes the linear distance between the sound source and the reference sub-microphone, and the first angle represents the distance between the sound source and the reference sub-microphone relative to the horizontal plane. Angle (Since the reference sub-microphone can be any one of the first sub-microphones, different first sub-microphones are selected as the reference sub-microphone, and the corresponding first distances are correspondingly different, and the first angle is the same, the calculation of the first distance The logic is the same and will not be described in detail here). The processing system calculates the complementary angle of the first angle, and obtains the linear distance between the reference sub-microphone and the center of the field of view of the imaging device according to the deployment position of the reference sub-microphone on the imaging device. Then, corresponding calculation is performed according to the complementary angle of the first angle, the straight-line distance and the first distance to obtain the second distance between the imaging device and the sound source. The processing system calculates the vertical distance between the reference sub-microphone and the sound source through the formula of the law of cosines according to the first angle and the first distance. Finally, according to the vertical distance and the second distance, the second angle between the camera device and the sound source is obtained by calculating again through the formula of the law of cosines. According to the above calculation logic, the processing system calculates the second distance and the second angle corresponding to each sound source and the center of field of view of the device respectively, so as to generate the second relative positional relationship corresponding to each sound source. The processing system obtains the first motion track of the camera device according to the GPS positioning module deployed on the camera device, and calibrates and constructs each sound source according to the second relative positional relationship between each sound source and the camera device, taking the first motion track as a position reference. Sources respectively correspond to the second motion trajectory.
本实施例中,摄像设备上部署有麦克风阵列,可以根据各个声源对应的声音到各个子麦克风之间的时间差计算得到各个声源与基准子麦克风之间的第一相对位置关系。再借由基准子麦克风与摄像设备之间的部署位置关系,通过位置转换得到各个声源相对于摄像设备之间的第二位置关系。因此,即使声源没有出现在摄像设备的拍摄视界内,只要声源发出的声音可以被麦克风阵列接收到,即可通过基准子麦克风与摄像设备之间的部署位置关系,确定声源相对于摄像设备之间的位置关系。再以视频数据和各个音频数据的采集时间为时间基准,以摄像设备的第一运动轨迹为位置参数,从而标定各个声源的第二运动轨迹。In this embodiment, a microphone array is deployed on the imaging device, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone. Then, by means of the deployment positional relationship between the reference sub-microphone and the imaging device, a second positional relationship between each sound source relative to the imaging device is obtained through position conversion. Therefore, even if the sound source does not appear in the shooting field of view of the camera device, as long as the sound from the sound source can be received by the microphone array, the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the acquisition time of the video data and each audio data is taken as the time reference, and the first movement trajectory of the camera device is used as the position parameter, so as to calibrate the second movement trajectory of each sound source.
进一步的,所述第一相对位置关系包括第一距离和第一角度,所述根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系的步骤,包括:Further, the first relative positional relationship includes a first distance and a first angle, according to the deployment position of the reference sub-microphone on the imaging device, and the first position corresponding to each of the sound sources. The relative positional relationship, the step of converting to obtain the second relative positional relationship between each of the sound sources and the imaging device includes:
S401:计算所述第一角度的余角;S401: Calculate the complementary angle of the first angle;
S402:调取所述基准子麦克风与所述摄像设备之间的直线距离,并将所述第一角度的余角、所述直线距离和所述第一距离代入计算公式中,得到第二距离,其中,所述计算公式为:
Figure dest_path_image001
,b为所述第一距离,c为所述直线距离,β为所述第一角度的余角,a为所述第二距离,表征所述摄像设备与所述声源之间的距离;
S402: Call the straight-line distance between the reference sub-microphone and the imaging device, and substitute the complementary angle of the first angle, the straight-line distance, and the first distance into a calculation formula to obtain a second distance , wherein the calculation formula is:
Figure dest_path_image001
, b is the first distance, c is the straight-line distance, β is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
S403:根据所述第一角度和所述第一距离,通过余弦定理公式计算得到所述基准子麦克风与所述声源之间的垂直距离;S403: According to the first angle and the first distance, calculate the vertical distance between the reference sub-microphone and the sound source through the cosine law formula;
S404:根据所述第二距离和所述垂直距离,通过余弦定理公式,计算得到所述摄像设备与所述声源之间的第二角度,其中,所述摄像设备与所述声源之间的垂直距离与所述基准子麦克风与所述声源之间的垂直距离的值相同;S404: Calculate and obtain a second angle between the imaging device and the sound source according to the second distance and the vertical distance through the formula of the law of cosines, where the distance between the imaging device and the sound source The value of the vertical distance is the same as the vertical distance between the reference sub-microphone and the sound source;
S405:按照上述规则计算得到各所述声源与所述摄像设备之间分别对应的第二距离和第二角度,生成各所述第二相对位置关系。S405: Calculate and obtain the second distance and the second angle respectively corresponding between each of the sound sources and the imaging device according to the above rules, and generate each of the second relative positional relationships.
本实施例中,如图2所示,假定摄像设备的视野中心为点A,基准子麦克风为点B,声源为点C,分别过基准子麦克风和声源做垂直线,相交于点D,则三角形BCD为直角三角形,∠BDC为直角,∠CBD为声源与基准子麦克风之间的第一夹角,边BC则为声源与基准子麦克风之间的第一距离。在三角形ABC中,∠ABC即为∠CBD(即第一角度)的余角;边AB为基准子麦克风与摄像设备的视野中心之间的直线距离;边AC则为声源与摄像设备的视野中心之间的第二距离。由于∠ABC、边AB、边BC的值为已知,将其代入计算公式:
Figure dest_path_image002
可以计算得到边AC的值。其中,b为第一距离(即边BC),c为直线距离(即边AB),β为第一角度的余角(即∠ABC),a为第二距离(即边AC),表征摄像设备与所述声源之间的距离。在直角三角形BCD中,由于边BC、∠CBD为已知值,通过余弦定理公式,即可计算得到边BD(即基准子麦克风与声源之间的垂直距离)的值。过摄像设备的视野中心(即点A)做垂直于边CD的线段,垂直点假定为E,则三角形ACE为直角三角形,边AE的长度与边BD的长度相同;∠CAE则为声源与摄像设备之间的第二夹角,∠CEA为直角。在直角三角形ACE中,由于斜边AC和∠CAE的邻边AE为已知值,通过余弦定理公式即可计算得到∠CAE的值,由此得到摄像设备的视野中心与声源之间的第二角度。处理系统按照上述规则计算得到各个声源与摄像设备的视野中心之间分别对应的第二距离和第二角度,并根据第二距离和第二角度生成各个声源分别对应的第二相对位置关系。
In this embodiment, as shown in Figure 2, it is assumed that the center of the field of view of the imaging device is point A, the reference sub-microphone is point B, and the sound source is point C, and a vertical line is drawn through the reference sub-microphone and the sound source respectively, intersecting at point D , then the triangle BCD is a right triangle, ∠BDC is a right angle, ∠CBD is the first angle between the sound source and the reference sub-microphone, and side BC is the first distance between the sound source and the reference sub-microphone. In the triangle ABC, ∠ABC is the complementary angle of ∠CBD (i.e. the first angle); side AB is the linear distance between the reference sub-microphone and the center of field of view of the camera device; side AC is the field of view between the sound source and the camera device Second distance between centers. Since the values of ∠ABC, side AB and side BC are known, they are substituted into the calculation formula:
Figure dest_path_image002
The value of side AC can be calculated. Among them, b is the first distance (that is, side BC), c is the straight-line distance (that is, side AB), β is the complementary angle of the first angle (that is, ∠ABC), and a is the second distance (that is, side AC), which represents the camera The distance between the device and the sound source in question. In the right triangle BCD, since the sides BC and ∠CBD are known values, the value of the side BD (that is, the vertical distance between the reference sub-microphone and the sound source) can be calculated through the formula of the law of cosines. Make a line segment perpendicular to the side CD through the center of the field of view of the camera equipment (namely point A), and assume that the vertical point is E, then the triangle ACE is a right triangle, and the length of the side AE is the same as that of the side BD; ∠CAE is the sound source and The second included angle between the camera devices, ∠CEA is a right angle. In the right triangle ACE, since the hypotenuse AC and the adjacent side AE of ∠CAE are known values, the value of ∠CAE can be calculated by the formula of the law of cosines, and thus the first distance between the center of field of view of the camera equipment and the sound source can be obtained. Two angles. The processing system calculates the second distance and the second angle corresponding to each sound source and the field of view center of the camera device according to the above rules, and generates the second relative positional relationship corresponding to each sound source according to the second distance and the second angle .
进一步的,所述分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源的步骤,包括:Further, the step of performing VAD algorithm recognition on the sounds contained in each of the audio data to obtain several sound sources includes:
S201:分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个人声声源以及其他类型声源;S201: Perform VAD algorithm recognition on the sounds contained in each of the audio data, respectively, to obtain a number of personal sound sources and other types of sound sources;
S202:对各所述人声声源进行标记编号,并对各所述其他类型声源进行分贝值检测,将分贝值在分贝阈值以下的第一其他类型声源隐藏,同时对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号。S202: Mark and number each of the human voice sound sources, and perform decibel value detection on each of the other types of sound sources, hide the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously detect the decibel value in the decibel value. The second other type of sound source above the above-mentioned decibel threshold shall be marked and numbered.
本实施例中,处理系统使用VAD算法对各个音频数据包括的声音做语音识别,从而得到若干个声源,并将各个声源区分为人声声源以及其他类型声源(比如动物声源、汽车声源等)。处理系统对各个人声声源进行标记编号,以区分各个人声声源。同时,为了降低后续对声源的数据的处理量,以及运动轨迹的构建复杂度,处理系统对其他类型声源进行筛选,以剔除实用性较小的部分其他类型声源。具体地,处理系统调取预设的分贝阈值,然后分别检测各个其他类型声源所发出的声音的分贝值。处理系统将各个其他类型声源所发出声音的分贝值与分贝阈值进行比对,将声音的分贝值在分贝阈值以下的部分其他类型声源(即第一其他类型声源)隐藏或剔除,不对第一其他类型声源所对应的声音进行相应处理(比如后续的标记编号以及对应的运动轨迹构建)。同时,处理系统对分贝值在分贝阈值以上的部分其他类型声源(即第二其他类型声源)进行标记编号,以区分各个第二其他类型声源。In this embodiment, the processing system uses the VAD algorithm to perform speech recognition on the sounds included in each audio data, thereby obtaining several sound sources, and distinguishing each sound source into human voice sources and other types of sound sources (such as animal sound sources, automobile sound sources, etc.) sound source, etc.). The processing system marks and numbers each vocal sound source to distinguish each vocal sound source. At the same time, in order to reduce the amount of subsequent processing of sound source data and the complexity of building motion trajectories, the processing system screens other types of sound sources to eliminate some other types of sound sources that are less practical. Specifically, the processing system recalls a preset decibel threshold, and then respectively detects the decibel values of sounds emitted by various other types of sound sources. The processing system compares the decibel value of the sound emitted by each other type of sound source with the decibel threshold, and hides or eliminates some other types of sound sources (that is, the first other type of sound source) whose sound decibel value is below the decibel threshold. The sound corresponding to the first other type of sound source is correspondingly processed (such as the subsequent marker number and the construction of the corresponding motion trajectory). At the same time, the processing system marks and numbers some other types of sound sources (that is, second other types of sound sources) whose decibel value is above the decibel threshold, so as to distinguish each second other type of sound source.
进一步的,所述对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号的步骤,包括:Further, the step of marking and numbering the second other type of sound source whose decibel value is above the decibel threshold includes:
S2021:将各所述第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型进行识别,得到各所述第二其他类型声源分别对应的声音类型;S2021: Input the sound corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtain the sound types corresponding to each of the second other types of sound sources;
S2022:将所述声音类型作为标记信息,分别对各所述第二其他类型声源进行标记编号。S2022: Using the sound type as marking information, mark and number each of the second sound sources of other types.
本实施例中,处理系统内构建有预先训练的声音类型识别模型,该声音类型识别模型使用各种类型的声音(比如猫叫声、狗叫声、汽车行驶声音等)作为训练样本,通过深度学习训练得到(深度学习训练模型的方法与现有技术相同,在此不做详述),可以识别出各种声音对应的类型。应用时,处理系统将各个第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型中进行相应的处理,输出得到各个第二其他类型声源的声音所对应的声音类型(比如第二其他类型声源A的声音对应的声音类型为猫叫声,第二其他类型声源B的声音对应的声音类型为汽车行驶声音)。处理系统在对各个第二其他类型声源进行标记编号时,将各个第二其他类型声源对应的声音类型作为标记信息相应标记在其上,方便用户了解具体信息。In this embodiment, a pre-trained sound type recognition model is built in the processing system. The sound type recognition model uses various types of sounds (such as cat meowing, dog barking, car driving sound, etc.) After learning and training (the method of deep learning training model is the same as that of the prior art, and will not be described in detail here), the corresponding types of various sounds can be identified. During application, the processing system inputs the sounds corresponding to the second other types of sound sources into the pre-trained sound type recognition model for corresponding processing, and outputs the sound types corresponding to the sounds of each second other types of sound sources (such as the first The sound type corresponding to the sound of the second other type of sound source A is cat meowing, and the sound type corresponding to the sound of the second other type of sound source B is the sound of driving cars). When marking and numbering each of the second other types of sound sources, the processing system correspondingly marks the sound types corresponding to each of the second other types of sound sources as marking information, so as to facilitate the user to know specific information.
进一步的,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤,包括:Further, according to the acquisition time of the video data and each of the audio data, the first motion track of the camera device, and each of the second relative positional relationships, construct the first sound source corresponding to each of the sound sources. The steps of the second motion trajectory include:
S501:分别以所述视频数据和各所述音频数据的采集时刻为基准进行时间同步,定位各所述声源在所述视频数据中的出现时间;S501: Perform time synchronization based on the acquisition time of the video data and each of the audio data respectively, and locate the appearance time of each of the sound sources in the video data;
S502:通过GPS定位方法采集所述摄像设备的所述第一运动轨迹,并以所述第一运动轨迹为位置参照,根据各所述声源分别对应的所述出现时间以及各所述第二相对位置关系,构建得到各所述声源分别相对于所述第一运动轨迹的所述第二运动轨迹。S502: Collect the first motion track of the imaging device by GPS positioning method, and use the first motion track as a position reference, according to the appearance time corresponding to each of the sound sources and each of the second The relative positional relationship is constructed to obtain the second motion trajectory of each of the sound sources relative to the first motion trajectory.
本实施例中,由于视频数据在实际应用过程中是从头拍摄到尾,而各个声源所对应的声音有可能是在拍摄过程中出现,持续一段时间后消失。因此,处理系统以视频数据的采集时刻为基准,将各个音频数据的采集时刻与视频数据的采集时刻进行时间同步,从而定位得到各个声源在视频数据中的出现时间(该出现时间包括出现时刻、持续时长以及结束时刻)。摄像设备上安装有GPS定位模块,处理系统通过GPS定位模块实现对摄像设备在拍摄视频数据过程中每个采集时刻分别对应的位置,进而根据这些位置形成第一运动轨迹。在此基础上,以摄像设备的第一运动轨迹为位置参照(具体地,以采集时刻所对应的位置为位置参照),根据各个声源相对于视频数据的出现时间以及与摄像设备之间的第二相对位置关系,构建得到各个声源相对于第一运动轨迹的第二运动轨迹,实现对不在摄像设备的拍摄视界内的声源的运动轨迹标定。In this embodiment, since the video data is shot from the beginning to the end in the actual application process, the sound corresponding to each sound source may appear during the shooting process and disappear after a period of time. Therefore, the processing system takes the acquisition time of video data as a benchmark, and synchronizes the acquisition time of each audio data with the acquisition time of video data, thereby locating and obtaining the appearance time of each sound source in the video data (the appearance time includes the appearance time , duration, and end time). The camera device is equipped with a GPS positioning module, and the processing system uses the GPS positioning module to realize the positions corresponding to each acquisition time of the camera device in the process of shooting video data, and then form the first trajectory according to these positions. On this basis, taking the first motion trajectory of the imaging device as the position reference (specifically, taking the position corresponding to the acquisition time as the position reference), according to the appearance time of each sound source relative to the video data and the distance between the sound source and the imaging device The second relative positional relationship is constructed to obtain the second movement trajectory of each sound source relative to the first movement trajectory, so as to realize the calibration of the movement trajectory of the sound source not within the shooting field of view of the camera device.
进一步的,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤之后,包括:Further, according to the acquisition time of the video data and each of the audio data, the first motion track of the camera device, and each of the second relative positional relationships, construct the first sound source corresponding to each of the sound sources. After the steps of the second motion trajectory, including:
S6: 分别以不同颜色的线条构建各所述第二运动轨迹,并将各个颜色与各所述声源之间的对应关系进行记录形成对应信息;S6: Constructing each of the second motion trajectories with lines of different colors, and recording the corresponding relationship between each color and each of the sound sources to form corresponding information;
S7:根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面。S7: Generate a trajectory distribution diagram according to the first movement trajectory, the corresponding information, and each of the second movement trajectories, and output the trajectory distribution diagram to a display interface.
本实施例中,处理系统在依据第一运动轨迹生成各个声源分别对应的第二运动轨迹之后,为了体现各个声源的第二运动轨迹之间的区别,处理系统分别以不同颜色的线条构建各个声源的第二运动轨迹,并将各个颜色与各个声源之间的对应关系进行记录形成对应信息,比如声源A的第二运动轨迹的轨迹线条颜色为红色,声源B的第二运动轨迹的轨迹线条颜色为黄色。处理系统根据第一运动轨迹、第二运动轨迹以及对应信息生成轨迹分布图(对应信息作为标注信息记录在轨迹分布图上,方便用户查看对应颜色的声源),并将该轨迹分布图输出到显示界面,使得用户可以直观了解到各个第二运动轨迹的变化情况。In this embodiment, after the processing system generates the second movement trajectories corresponding to each sound source according to the first movement trajectory, in order to reflect the difference between the second movement trajectories of each sound source, the processing system is constructed with lines of different colors The second motion track of each sound source, and record the corresponding relationship between each color and each sound source to form corresponding information. For example, the color of the track line of the second motion track of sound source A is red, and the second track of sound source B The track line color of the motion track is yellow. The processing system generates a trajectory distribution diagram according to the first movement trajectory, the second movement trajectory and the corresponding information (the corresponding information is recorded on the trajectory distribution diagram as label information, which is convenient for users to view the sound source of the corresponding color), and outputs the trajectory distribution diagram to The display interface enables the user to intuitively understand the changes of each second motion track.
进一步的,所述根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面的步骤,包括:Further, the step of generating a trajectory distribution diagram according to the first movement trajectory, the corresponding information and each of the second movement trajectories, and outputting the trajectory distribution diagram to a display interface includes:
S701:调取三维地图,将所第一运动轨迹标示在所述三维地图上;S701: Call a three-dimensional map, and mark the first movement track on the three-dimensional map;
S702:以所述第一运动轨迹为位置参照,分别将各所述第二运动轨迹标示在所述三维地图上,并在所述三维地图上加注所述对应信息以及各所述第二运动轨迹的出现时刻和结束时刻,形成所述轨迹分布图;S702: Using the first movement trajectory as a position reference, respectively mark each of the second movement trajectories on the three-dimensional map, and add the corresponding information and each of the second movements on the three-dimensional map The moment of appearance and the end moment of the trajectory form the distribution diagram of the trajectory;
S703:将所述轨迹分布图输出到显示界面。S703: Output the trajectory distribution map to a display interface.
本实施例中,处理系统调取摄像设备拍摄区域的三维地图(该三维地图可以预先存储在处理系统的数据库中,也可以由处理系统从网络上下载得到),然后将第一运动轨迹标示在三维地图上。然后,以第一运动轨迹为位置参数,按照出现时间将各个声源分别对应改的第二运动轨迹标示在该三维地图上,并在三维地图上加注颜色的对应信息以及各个第二运动轨迹的出现时刻和结束时刻,整体形成轨迹分布图。最后,处理系统将轨迹分布图输出到显示界面,用户从三维层面能够更加清晰了解各个第二运动轨迹的变化情况。In this embodiment, the processing system retrieves the three-dimensional map of the shooting area of the camera device (the three-dimensional map can be pre-stored in the database of the processing system, or can be downloaded from the network by the processing system), and then marks the first movement track on the on a three-dimensional map. Then, with the first motion trajectory as the position parameter, the second motion trajectory corresponding to each sound source is marked on the three-dimensional map according to the appearance time, and the corresponding information of the color and each second motion trajectory are added on the three-dimensional map The appearance time and end time of , form a trajectory distribution diagram as a whole. Finally, the processing system outputs the trajectory distribution map to the display interface, and the user can understand the change of each second motion trajectory more clearly from the three-dimensional level.
参照图3,本申请一实施例中还提供了一种基于视频和音频的目标轨迹标定装置,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述目标轨迹标定装置包括:Referring to Fig. 3, an embodiment of the present application also provides a target trajectory marking device based on video and audio, the video is collected by a camera device, the audio is collected by a microphone array, and the microphone array is composed of a plurality of sub-microphones , the microphone array is deployed on the camera equipment, and the target trajectory marking device includes:
采集模块1,用于通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;A collection module 1, configured to collect video data through the camera device, and collect a plurality of audio data through the microphone array;
识别模块2,用于分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;The recognition module 2 is used to perform VAD algorithm recognition on the sounds contained in each of the audio data respectively to obtain several sound sources;
计算模块3,用于基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Calculation module 3, for calculating and obtaining each of the sound sources based on the difference between the receiving time of the sound corresponding to the same sound source by the two first sub-microphones, and the deployment position between the two first sub-microphones. a first relative positional relationship between a source and a reference sub-microphone, where the reference sub-microphone is any one of the two first sub-microphones;
转换模块4,用于根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;The conversion module 4 is configured to convert each of the sound sources and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each of the sound sources. a second relative positional relationship between the devices;
构建模块5,用于根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。A construction module 5, configured to construct a sound source corresponding to each of the sound sources according to the acquisition time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second relative positional relationships. Second motion track.
进一步的,所述第一相对位置关系包括第一距离和第一角度,所述转换模块4,包括:Further, the first relative positional relationship includes a first distance and a first angle, and the conversion module 4 includes:
第一计算单元,用于计算所述第一角度的余角;a first calculation unit, configured to calculate a complementary angle of the first angle;
第二计算单元,用于调取所述基准子麦克风与所述摄像设备之间的直线距离,并将所述第一角度的余角、所述直线距离和所述第一距离代入计算公式中,得到第二距离,其中,所述计算公式为:
Figure dest_path_image003
,b为所述第一距离,c为所述直线距离,β为所述第一角度的余角,a为所述第二距离,表征所述摄像设备与所述声源之间的距离;
A second calculation unit, configured to call the straight-line distance between the reference sub-microphone and the imaging device, and substitute the complement of the first angle, the straight-line distance and the first distance into a calculation formula , to obtain the second distance, wherein the calculation formula is:
Figure dest_path_image003
, b is the first distance, c is the straight-line distance, β is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
第三计算单元,用于根据所述第一角度和所述第一距离,通过余弦定理公式计算得到所述基准子麦克风与所述声源之间的垂直距离;A third calculation unit, configured to calculate the vertical distance between the reference sub-microphone and the sound source through the cosine law formula according to the first angle and the first distance;
第四计算单元,用于根据所述第二距离和所述垂直距离,通过余弦定理公式,计算得到所述摄像设备与所述声源之间的第二角度,其中,所述摄像设备与所述声源之间的垂直距离与所述基准子麦克风与所述声源之间的垂直距离的值相同;A fourth calculation unit, configured to calculate a second angle between the imaging device and the sound source according to the second distance and the vertical distance through the formula of the law of cosines, wherein the imaging device and the sound source The vertical distance between the sound sources is the same as the vertical distance between the reference sub-microphone and the sound source;
生成单元,用于按照上述规则计算得到各所述声源与所述摄像设备之间分别对应的第二距离和第二角度,生成各所述第二相对位置关系。The generation unit is configured to calculate the second distance and the second angle respectively corresponding between each of the sound sources and the imaging device according to the above rules, and generate each of the second relative positional relationships.
进一步的,所述识别模块2,包括:Further, the identification module 2 includes:
识别单元,用于分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个人声声源以及其他类型声源;The recognition unit is used to perform VAD algorithm recognition on the sounds contained in each of the audio data respectively to obtain a number of personal sound sources and other types of sound sources;
筛选单元,用于对各所述人声声源进行标记编号,并对各所述其他类型声源进行分贝值检测,将分贝值在分贝阈值以下的第一其他类型声源隐藏,同时对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号。The screening unit is used to mark and number each of the human voice sound sources, and detect the decibel value of each of the other types of sound sources, hide the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously decibel A second other type of sound source whose value is above said decibel threshold is marked with a number.
进一步的,所述筛选单元,包括:Further, the screening unit includes:
识别子单元,用于将各所述第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型进行识别,得到各所述第二其他类型声源分别对应的声音类型;An identification subunit, configured to input the sounds corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtain the sound types corresponding to each of the second other types of sound sources;
标记子单元,用于将所述声音类型作为标记信息,分别对各所述第二其他类型声源进行标记编号。The marking subunit is configured to use the sound type as marking information to mark and number each of the second sound sources of other types.
进一步的,所述构建模块5,包括:Further, the building block 5 includes:
定位单元,用于分别以所述视频数据和各所述音频数据的采集时刻为基准进行时间同步,定位各所述声源在所述视频数据中的出现时间;A positioning unit, configured to perform time synchronization based on the acquisition time of the video data and each of the audio data, respectively, and locate the appearance time of each of the sound sources in the video data;
构建单元,用于通过GPS定位方法采集所述摄像设备的所述第一运动轨迹,并以所述第一运动轨迹为位置参照,根据各所述声源分别对应的所述出现时间以及各所述第二相对位置关系,构建得到各所述声源分别相对于所述第一运动轨迹的所述第二运动轨迹。A construction unit, configured to collect the first motion trajectory of the imaging device through a GPS positioning method, and use the first motion trajectory as a position reference, according to the appearance time corresponding to each of the sound sources and each sound source The second relative positional relationship is used to construct and obtain the second motion trajectory of each of the sound sources relative to the first motion trajectory.
进一步的,所述目标轨迹标定装置,还包括:Further, the target trajectory calibration device also includes:
记录模块6,用于分别以不同颜色的线条构建各所述第二运动轨迹,并将各个颜色与各所述声源之间的对应关系进行记录形成对应信息;A recording module 6, configured to construct each of the second motion trajectories with lines of different colors, and record the corresponding relationship between each color and each of the sound sources to form corresponding information;
生成模块7,用于根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面。The generating module 7 is configured to generate a trajectory distribution diagram according to the first movement trajectory, the corresponding information and each of the second movement trajectories, and output the trajectory distribution diagram to a display interface.
进一步的,所述生成模块7,包括:Further, the generating module 7 includes:
标示单元,用于调取三维地图,将所第一运动轨迹标示在所述三维地图上;a marking unit, configured to call a three-dimensional map, and mark the first movement track on the three-dimensional map;
形成单元,用于以所述第一运动轨迹为位置参照,分别将各所述第二运动轨迹标示在所述三维地图上,并在所述三维地图上加注所述对应信息以及各所述第二运动轨迹的出现时刻和结束时刻,形成所述轨迹分布图;A forming unit, configured to mark each of the second movement trajectories on the three-dimensional map by using the first movement trajectory as a position reference, and add the corresponding information and each of the three-dimensional maps on the three-dimensional map. The appearance time and the end time of the second motion track form the track distribution diagram;
输出单元,用于将所述轨迹分布图输出到显示界面。an output unit, configured to output the trajectory distribution map to a display interface.
本实施例中,目标轨迹标定装置中各模块、单元、子单元用于对应执行与上述基于视频和音频的目标轨迹标定方法中的各个步骤,其具体实施过程在此不做详述。In this embodiment, each module, unit, and subunit in the target trajectory marking device is used to perform corresponding steps in the above-mentioned video and audio-based target trajectory marking method, and its specific implementation process will not be described in detail here.
本实施例提供的一种基于视频和音频的目标轨迹标定装置,其中,视频由摄像设备采集,音频由麦克风阵列采集,麦克风阵列由多个子麦克风组成,并且麦克风阵列部署在摄像设备上。应用时,处理系统通过摄像设备采集视频数据,并通过麦克风阵列采集多个音频数据。然后分别对各音频数据所包含的声音做VAD算法识别,得到各音频数据所包含的若干个声源。处理系统基于两个第一子麦克风对相同的声源对应的声音的接收时间之差,以及两个第一子麦克风之间的部署位置,计算得到各个声源与基准子麦克风之间的第一相对位置关系,其中,基准子麦克风为两个第一子麦克风中的任意一个。处理系统根据基准子麦克风在摄像设备上的部署位置,以及各声源分别对应的第一相对位置关系,转换得到各声源与摄像设备之间的第二相对位置关系。最后,处理系统根据视频数据与各音频数据的采集时间,摄像设备的第一运动轨迹,以及各第二相对位置关系,构建各声源分别对应的第二运动轨迹。本申请中,摄像设备上部署有麦克风阵列,可以根据各个声源对应的声音到各个子麦克风之间的时间差计算得到各个声源与基准子麦克风之间的第一相对位置关系。再借由基准子麦克风与摄像设备之间的部署位置关系,通过位置转换得到各个声源相对于摄像设备之间的第二位置关系。因此,即使声源没有出现在摄像设备的拍摄视界内,只要声源发出的声音可以被麦克风阵列接收到,即可通过基准子麦克风与摄像设备之间的部署位置关系,确定声源相对于摄像设备之间的位置关系。再以摄像设备的第一运动轨迹为位置参数,从而标定各个声源的第二运动轨迹。This embodiment provides a target trajectory marking device based on video and audio, wherein the video is collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of multiple sub-microphones, and the microphone array is deployed on the camera device. In application, the processing system collects video data through a camera device, and collects a plurality of audio data through a microphone array. Then perform VAD algorithm recognition on the sounds contained in each audio data respectively to obtain several sound sources contained in each audio data. The processing system calculates the first distance between each sound source and the reference sub-microphone based on the difference in receiving time of the sound corresponding to the same sound source by the two first sub-microphones and the deployment position between the two first sub-microphones. Relative positional relationship, wherein, the reference sub-microphone is any one of the two first sub-microphones. The processing system converts and obtains a second relative positional relationship between each sound source and the imaging device according to the deployment position of the reference sub-microphone on the imaging device and the first relative positional relationship corresponding to each sound source. Finally, the processing system constructs the second movement trajectory corresponding to each sound source according to the acquisition time of the video data and each audio data, the first movement trajectory of the camera device, and each second relative positional relationship. In this application, the camera device is equipped with a microphone array, and the first relative positional relationship between each sound source and the reference sub-microphone can be calculated according to the time difference between the sound corresponding to each sound source and each sub-microphone. Then, by means of the deployment positional relationship between the reference sub-microphone and the imaging device, a second positional relationship between each sound source relative to the imaging device is obtained through position conversion. Therefore, even if the sound source does not appear in the shooting field of view of the camera device, as long as the sound from the sound source can be received by the microphone array, the positional relationship between the reference sub-microphone and the camera device can be used to determine the relative position of the sound source relative to the camera. The location relationship between devices. Then, the first trajectory of the camera device is used as a position parameter to calibrate the second trajectory of each sound source.
参照图4,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储分贝阈值等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述的任一实施例一种基于视频和音频的目标轨迹标定方法的功能,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上。Referring to FIG. 4 , an embodiment of the present application also provides a computer device, which may be a server, and its internal structure may be as shown in FIG. 4 . The computer device includes a processor, memory, network interface and database connected by a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as decibel thresholds. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by the processor, the function of a target trajectory marking method based on video and audio in any of the above-mentioned embodiments is realized, the video is collected by an imaging device, the audio is collected by a microphone array, and the microphone array Composed of multiple sub-microphones, the microphone array is deployed on the imaging device.
上述处理器执行上述基于视频和音频的目标轨迹标定方法的步骤:Above-mentioned processor carries out the step of above-mentioned method based on target track marking of video and audio frequency:
S1:通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;S1: collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
S2:分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;S2: do VAD algorithm recognition to the sound contained in each described audio data respectively, obtain several sound sources;
S3:基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;S3: Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, calculate the relationship between each of the sound sources and the reference sub-microphone A first relative positional relationship between microphones, the reference sub-microphone being any one of the two first sub-microphones;
S4:根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;S4: According to the deployment position of the reference sub-microphone on the imaging device, and the first relative positional relationship corresponding to each of the sound sources, convert and obtain the distance between each of the sound sources and the imaging device The second relative positional relationship;
S5:根据视频数据与各音频数据的采集时间,摄像设备的第一运动轨迹,以及各第二相对位置关系,构建各声源分别对应的第二运动轨迹。S5: According to the acquisition time of the video data and each audio data, the first motion trajectory of the camera device, and the second relative positional relationship, construct the second motion trajectory corresponding to each sound source.
本申请一实施例还提供一种计算机可读存储介质,所述存储介质可以是非易失性存储介质,也可以是易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述的任一实施例基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述方法具体为:An embodiment of the present application also provides a computer-readable storage medium. The storage medium may be a non-volatile storage medium or a volatile storage medium on which a computer program is stored. When the computer program is executed by a processor To achieve any of the above embodiments based on video and audio target trajectory calibration method, the video is collected by camera equipment, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, and the microphone array is deployed in the On the imaging device, the method is specifically:
S1:通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;S1: collecting video data by the camera device, and collecting a plurality of audio data by the microphone array;
S2:分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;S2: do VAD algorithm recognition to the sound contained in each described audio data respectively, obtain several sound sources;
S3:基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;S3: Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, calculate the relationship between each of the sound sources and the reference sub-microphone A first relative positional relationship between microphones, the reference sub-microphone being any one of the two first sub-microphones;
S4:根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;S4: According to the deployment position of the reference sub-microphone on the imaging device, and the first relative positional relationship corresponding to each of the sound sources, convert and obtain the distance between each of the sound sources and the imaging device The second relative positional relationship;
S5:根据视频数据与各音频数据的采集时间,摄像设备的第一运动轨迹,以及各第二相对位置关系,构建各声源分别对应的第二运动轨迹。S5: According to the acquisition time of the video data and each audio data, the first motion trajectory of the camera device, and the second relative positional relationship, construct the second motion trajectory corresponding to each sound source.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM通过多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media provided in the present application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其它变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其它要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, apparatus, article or method comprising a set of elements includes not only those elements, It also includes other elements that are not expressly listed, or that are inherent in the process, apparatus, article, or method. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional same elements in the process, apparatus, article or method comprising the element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本申请的专利保护范围内。The above descriptions are only preferred embodiments of the application, and are not intended to limit the patent scope of the application. Any equivalent structure or equivalent process conversion made by using the specification and drawings of the application, or directly or indirectly used in other relevant All technical fields are equally included in the patent protection scope of the present application.

Claims (20)

  1. 一种基于视频和音频的目标轨迹标定方法,其特征在于,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述目标轨迹标定方法包括:A target trajectory calibration method based on video and audio, characterized in that the video is collected by a camera, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, and the microphone array is deployed on the On the camera equipment, the target track calibration method includes:
    通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
    基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
    根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
    根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
  2. 根据权利要求1所述的基于视频和音频的目标轨迹标定方法,其特征在于,所述第一相对位置关系包括第一距离和第一角度,所述根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系的步骤,包括:The method for marking target tracks based on video and audio according to claim 1, wherein the first relative positional relationship includes a first distance and a first angle, and the reference sub-microphone in the imaging device The deployment position above, and the first relative positional relationship corresponding to each of the sound sources, the step of converting the second relative positional relationship between each of the sound sources and the imaging device includes:
    计算所述第一角度的余角;calculating a complement of said first angle;
    调取所述基准子麦克风与所述摄像设备之间的直线距离,并将所述第一角度的余角、所述直线距离和所述第一距离代入计算公式中,得到第二距离,其中,所述计算公式为:
    Figure dest_path_image001
    ,b为所述第一距离,c为所述直线距离,β为所述第一角度的余角,a为所述第二距离,表征所述摄像设备与所述声源之间的距离;
    Retrieving the straight-line distance between the reference sub-microphone and the imaging device, and substituting the complementary angle of the first angle, the straight-line distance and the first distance into a calculation formula to obtain a second distance, wherein , the calculation formula is:
    Figure dest_path_image001
    , b is the first distance, c is the straight-line distance, β is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
    根据所述第一角度和所述第一距离,通过余弦定理公式计算得到所述基准子麦克风与所述声源之间的垂直距离;According to the first angle and the first distance, the vertical distance between the reference sub-microphone and the sound source is obtained by calculating the law of cosines;
    根据所述第二距离和所述垂直距离,通过余弦定理公式,计算得到所述摄像设备与所述声源之间的第二角度,其中,所述摄像设备与所述声源之间的垂直距离与所述基准子麦克风与所述声源之间的垂直距离的值相同;According to the second distance and the vertical distance, the second angle between the imaging device and the sound source is calculated by the formula of the law of cosines, wherein the vertical angle between the imaging device and the sound source The distance has the same value as the vertical distance between the reference sub-microphone and the sound source;
    按照上述规则计算得到各所述声源与所述摄像设备之间分别对应的第二距离和第二角度,生成各所述第二相对位置关系。The second distance and the second angle respectively corresponding between each of the sound sources and the imaging device are calculated according to the above rules, and each of the second relative positional relationships is generated.
  3. 根据权利要求1所述的基于视频和音频的目标轨迹标定方法,其特征在于,所述分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源的步骤,包括:The target trajectory marking method based on video and audio according to claim 1, wherein the step of performing VAD algorithm recognition on the sounds contained in each of the audio data to obtain several sound sources includes:
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个人声声源以及其他类型声源;Respectively perform VAD algorithm identification on the sounds contained in each of the audio data to obtain a number of personal voice sound sources and other types of sound sources;
    对各所述人声声源进行标记编号,并对各所述其他类型声源进行分贝值检测,将分贝值在分贝阈值以下的第一其他类型声源隐藏,同时对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号。Marking and numbering each of the human voice sound sources, and detecting the decibel value of each of the other types of sound sources, hiding the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously detecting the decibel value below the decibel threshold. The second other type of sound source above the threshold is marked with a number.
  4. 根据权利要求3所述的基于视频和音频的目标轨迹标定方法,其特征在于,所述对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号的步骤,包括:The method for marking target tracks based on video and audio according to claim 3, wherein the step of marking and numbering second other types of sound sources whose decibel value is above the decibel threshold includes:
    将各所述第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型进行识别,得到各所述第二其他类型声源分别对应的声音类型;Inputting the sound corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtaining the sound types corresponding to each of the second other types of sound sources;
    将所述声音类型作为标记信息,分别对各所述第二其他类型声源进行标记编号。Using the sound type as marking information, mark and number each of the second sound sources of other types.
  5. 根据权利要求1所述的基于视频和音频的目标轨迹标定方法,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤,包括:The target trajectory marking method based on video and audio according to claim 1, characterized in that, according to the acquisition time of the video data and each of the audio data, the first motion trajectory of the imaging device, and each The second relative positional relationship, the step of constructing the second motion trajectory corresponding to each of the sound sources, includes:
    分别以所述视频数据和各所述音频数据的采集时刻为基准进行时间同步,定位各所述声源在所述视频数据中的出现时间;Carrying out time synchronization based on the acquisition time of the video data and each of the audio data respectively, and locating the appearance time of each of the sound sources in the video data;
    通过GPS定位方法采集所述摄像设备的所述第一运动轨迹,并以所述第一运动轨迹为位置参照,根据各所述声源分别对应的所述出现时间以及各所述第二相对位置关系,构建得到各所述声源分别相对于所述第一运动轨迹的所述第二运动轨迹。Collect the first motion trajectory of the imaging device by a GPS positioning method, and use the first motion trajectory as a position reference, according to the appearance time corresponding to each of the sound sources and each of the second relative positions Relationships are constructed to obtain the second motion trajectories of each of the sound sources relative to the first motion trajectories.
  6. 根据权利要求1所述的基于视频和音频的目标轨迹标记方法,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤之后,包括:The method for marking target tracks based on video and audio according to claim 1, characterized in that, according to the acquisition time of the video data and each of the audio data, the first motion track of the imaging device, and each The second relative positional relationship, after the step of constructing the second motion trajectory corresponding to each of the sound sources, includes:
    分别以不同颜色的线条构建各所述第二运动轨迹,并将各个颜色与各所述声源之间的对应关系进行记录形成对应信息;Constructing each of the second motion trajectories with lines of different colors, and recording the corresponding relationship between each color and each of the sound sources to form corresponding information;
    根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面。A trajectory distribution graph is generated according to the first motion trajectory, the corresponding information, and each of the second motion trajectories, and the trajectory distribution graph is output to a display interface.
  7. 根据权利要求6所述的基于视频和音频的目标轨迹标定方法,其特征在于,所述根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面的步骤,包括:The target trajectory marking method based on video and audio according to claim 6, wherein the trajectory profile is generated according to the first trajectory, the corresponding information and each of the second trajectory, and The step of outputting the trajectory distribution map to the display interface includes:
    调取三维地图,将所第一运动轨迹标示在所述三维地图上;calling a three-dimensional map, and marking the first movement track on the three-dimensional map;
    以所述第一运动轨迹为位置参照,分别将各所述第二运动轨迹标示在所述三维地图上,并在所述三维地图上加注所述对应信息以及各所述第二运动轨迹的出现时刻和结束时刻,形成所述轨迹分布图;Using the first motion track as a position reference, mark each of the second motion tracks on the three-dimensional map, and add the corresponding information and the information of each of the second motion tracks on the three-dimensional map The moment of appearance and the moment of end, forming the distribution map of the trajectory;
    将所述轨迹分布图输出到显示界面。Outputting the trajectory distribution diagram to a display interface.
  8. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上;A computer device comprising a memory and a processor, wherein a computer program is stored in the memory, wherein a method for marking a target trajectory based on video and audio is implemented when the processor executes the computer program, and the video Collected by a camera device, the audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, and the microphone array is deployed on the camera device;
    其中,所述基于视频和音频的目标轨迹标定方法包括:Wherein, the target track marking method based on video and audio includes:
    通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
    基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
    根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
    根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
  9. 根据权利要求8所述的计算机设备,其特征在于,所述第一相对位置关系包括第一距离和第一角度,所述根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系的步骤,包括:The computer device according to claim 8, wherein the first relative positional relationship includes a first distance and a first angle, and according to the deployment position of the reference sub-microphone on the imaging device, and each The step of converting the first relative positional relationship corresponding to each of the sound sources to obtain a second relative positional relationship between each of the sound sources and the imaging device includes:
    计算所述第一角度的余角;calculating a complement of said first angle;
    调取所述基准子麦克风与所述摄像设备之间的直线距离,并将所述第一角度的余角、所述直线距离和所述第一距离代入计算公式中,得到第二距离,其中,所述计算公式为:
    Figure dest_path_image002
    ,b为所述第一距离,c为所述直线距离,β为所述第一角度的余角,a为所述第二距离,表征所述摄像设备与所述声源之间的距离;
    Retrieving the straight-line distance between the reference sub-microphone and the imaging device, and substituting the complementary angle of the first angle, the straight-line distance and the first distance into a calculation formula to obtain a second distance, wherein , the calculation formula is:
    Figure dest_path_image002
    , b is the first distance, c is the straight-line distance, β is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
    根据所述第一角度和所述第一距离,通过余弦定理公式计算得到所述基准子麦克风与所述声源之间的垂直距离;According to the first angle and the first distance, the vertical distance between the reference sub-microphone and the sound source is obtained by calculating the law of cosines;
    根据所述第二距离和所述垂直距离,通过余弦定理公式,计算得到所述摄像设备与所述声源之间的第二角度,其中,所述摄像设备与所述声源之间的垂直距离与所述基准子麦克风与所述声源之间的垂直距离的值相同;According to the second distance and the vertical distance, the second angle between the imaging device and the sound source is calculated by the formula of the law of cosines, wherein the vertical angle between the imaging device and the sound source The distance has the same value as the vertical distance between the reference sub-microphone and the sound source;
    按照上述规则计算得到各所述声源与所述摄像设备之间分别对应的第二距离和第二角度,生成各所述第二相对位置关系。The second distance and the second angle respectively corresponding between each of the sound sources and the imaging device are calculated according to the above rules, and each of the second relative positional relationships is generated.
  10. 根据权利要求8所述的计算机设备,其特征在于,所述分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源的步骤,包括:The computer device according to claim 8, wherein the step of performing VAD algorithm recognition on the sounds contained in each of the audio data to obtain several sound sources includes:
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个人声声源以及其他类型声源;Respectively perform VAD algorithm identification on the sounds contained in each of the audio data to obtain a number of personal voice sound sources and other types of sound sources;
    对各所述人声声源进行标记编号,并对各所述其他类型声源进行分贝值检测,将分贝值在分贝阈值以下的第一其他类型声源隐藏,同时对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号。Marking and numbering each of the human voice sound sources, and detecting the decibel value of each of the other types of sound sources, hiding the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously detecting the decibel value below the decibel threshold. The second other type of sound source above the threshold is marked with a number.
  11. 根据权利要求10所述的计算机设备,其特征在于,所述对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号的步骤,包括:The computer device according to claim 10, wherein the step of marking and numbering second other types of sound sources whose decibel value is above the decibel threshold comprises:
    将各所述第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型进行识别,得到各所述第二其他类型声源分别对应的声音类型;Inputting the sound corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtaining the sound types corresponding to each of the second other types of sound sources;
    将所述声音类型作为标记信息,分别对各所述第二其他类型声源进行标记编号。Using the sound type as marking information, mark and number each of the second sound sources of other types.
  12. 根据权利要求8所述的计算机设备,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤,包括:The computer device according to claim 8, characterized in that, according to the collection time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second relative positional relationships , the step of constructing the second motion trajectory corresponding to each of the sound sources, including:
    分别以所述视频数据和各所述音频数据的采集时刻为基准进行时间同步,定位各所述声源在所述视频数据中的出现时间;Carrying out time synchronization based on the acquisition time of the video data and each of the audio data respectively, and locating the appearance time of each of the sound sources in the video data;
    通过GPS定位方法采集所述摄像设备的所述第一运动轨迹,并以所述第一运动轨迹为位置参照,根据各所述声源分别对应的所述出现时间以及各所述第二相对位置关系,构建得到各所述声源分别相对于所述第一运动轨迹的所述第二运动轨迹。Collect the first motion trajectory of the imaging device by a GPS positioning method, and use the first motion trajectory as a position reference, according to the appearance time corresponding to each of the sound sources and each of the second relative positions Relationships are constructed to obtain the second motion trajectories of each of the sound sources relative to the first motion trajectories.
  13. 根据权利要求8所述的计算机设备,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤之后,包括:The computer device according to claim 8, characterized in that, according to the collection time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second relative positional relationships After the step of constructing the second motion trajectory corresponding to each of the sound sources, it includes:
    分别以不同颜色的线条构建各所述第二运动轨迹,并将各个颜色与各所述声源之间的对应关系进行记录形成对应信息;Constructing each of the second motion trajectories with lines of different colors, and recording the corresponding relationship between each color and each of the sound sources to form corresponding information;
    根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面。A trajectory distribution graph is generated according to the first motion trajectory, the corresponding information, and each of the second motion trajectories, and the trajectory distribution graph is output to a display interface.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面的步骤,包括:The computer device according to claim 13, characterized in that, the trajectory distribution diagram is generated according to the first movement trajectory, the corresponding information and each of the second movement trajectories, and the trajectory distribution diagram is output to The steps to display the interface include:
    调取三维地图,将所第一运动轨迹标示在所述三维地图上;calling a three-dimensional map, and marking the first movement track on the three-dimensional map;
    以所述第一运动轨迹为位置参照,分别将各所述第二运动轨迹标示在所述三维地图上,并在所述三维地图上加注所述对应信息以及各所述第二运动轨迹的出现时刻和结束时刻,形成所述轨迹分布图;Using the first motion track as a position reference, mark each of the second motion tracks on the three-dimensional map, and add the corresponding information and the information of each of the second motion tracks on the three-dimensional map The moment of appearance and the moment of end, forming the distribution map of the trajectory;
    将所述轨迹分布图输出到显示界面。Outputting the trajectory distribution diagram to a display interface.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时一种基于视频和音频的目标轨迹标定方法,所述视频由摄像设备采集,所述音频由麦克风阵列采集,所述麦克风阵列由多个子麦克风组成,所述麦克风阵列部署在所述摄像设备上,所述基于视频和音频的目标轨迹标定方法包括以下步骤:A computer-readable storage medium, on which a computer program is stored, is characterized in that, when the computer program is executed by a processor, a target trajectory marking method based on video and audio, the video is collected by an imaging device, and the The audio is collected by a microphone array, the microphone array is composed of a plurality of sub-microphones, the microphone array is deployed on the camera, and the video and audio-based target track calibration method includes the following steps:
    通过所述摄像设备采集视频数据,并通过所述麦克风阵列采集多个音频数据;collecting video data through the camera device, and collecting a plurality of audio data through the microphone array;
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源;Carrying out VAD algorithm identification to the sound contained in each said audio data respectively, obtain several sound sources;
    基于两个第一子麦克风对相同的所述声源对应的声音的接收时间之差,以及两个所述第一子麦克风之间的部署位置,计算得到各所述声源与基准子麦克风之间的第一相对位置关系,所述基准子麦克风为两个所述第一子麦克风中的任意一个;Based on the difference between the receiving time of the sound corresponding to the same sound source of the two first sub-microphones, and the deployment position between the two first sub-microphones, the distance between each of the sound sources and the reference sub-microphone is calculated. The first relative positional relationship between the reference sub-microphones is any one of the two first sub-microphones;
    根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系;According to the deployment position of the reference sub-microphone on the imaging equipment and the first relative positional relationship corresponding to each of the sound sources, the second distance between each of the sound sources and the imaging equipment is obtained through conversion. Relative positional relationship;
    根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹。According to the acquisition time of the video data and each of the audio data, the first movement trajectory of the camera device, and each of the second relative positional relationships, a second movement trajectory corresponding to each of the sound sources is constructed.
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述第一相对位置关系包括第一距离和第一角度,所述根据所述基准子麦克风在所述摄像设备上的部署位置,以及各所述声源分别对应的所述第一相对位置关系,转换得到各所述声源与所述摄像设备之间的第二相对位置关系的步骤,包括:The computer-readable storage medium according to claim 15, wherein the first relative positional relationship includes a first distance and a first angle, and the deployment position of the reference sub-microphone on the imaging device , and the first relative positional relationship corresponding to each of the sound sources, the step of converting the second relative positional relationship between each of the sound sources and the imaging device includes:
    计算所述第一角度的余角;calculating a complement of said first angle;
    调取所述基准子麦克风与所述摄像设备之间的直线距离,并将所述第一角度的余角、所述直线距离和所述第一距离代入计算公式中,得到第二距离,其中,所述计算公式为:
    Figure dest_path_image003
    ,b为所述第一距离,c为所述直线距离,β为所述第一角度的余角,a为所述第二距离,表征所述摄像设备与所述声源之间的距离;
    Retrieving the linear distance between the reference sub-microphone and the imaging device, and substituting the complementary angle of the first angle, the linear distance and the first distance into a calculation formula to obtain a second distance, wherein , the calculation formula is:
    Figure dest_path_image003
    , b is the first distance, c is the straight-line distance, β is the complementary angle of the first angle, and a is the second distance, representing the distance between the imaging device and the sound source;
    根据所述第一角度和所述第一距离,通过余弦定理公式计算得到所述基准子麦克风与所述声源之间的垂直距离;According to the first angle and the first distance, the vertical distance between the reference sub-microphone and the sound source is obtained by calculating the law of cosines;
    根据所述第二距离和所述垂直距离,通过余弦定理公式,计算得到所述摄像设备与所述声源之间的第二角度,其中,所述摄像设备与所述声源之间的垂直距离与所述基准子麦克风与所述声源之间的垂直距离的值相同;According to the second distance and the vertical distance, the second angle between the imaging device and the sound source is calculated by the formula of the law of cosines, wherein the vertical angle between the imaging device and the sound source The distance has the same value as the vertical distance between the reference sub-microphone and the sound source;
    按照上述规则计算得到各所述声源与所述摄像设备之间分别对应的第二距离和第二角度,生成各所述第二相对位置关系。The second distance and the second angle respectively corresponding between each of the sound sources and the imaging device are calculated according to the above rules, and each of the second relative positional relationships is generated.
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个声源的步骤,包括:The computer-readable storage medium according to claim 15, wherein the step of performing VAD algorithm recognition on the sounds contained in each of the audio data to obtain several sound sources includes:
    分别对各所述音频数据所包含的声音做VAD算法识别,得到若干个人声声源以及其他类型声源;Respectively perform VAD algorithm identification on the sounds contained in each of the audio data to obtain a number of personal voice sound sources and other types of sound sources;
    对各所述人声声源进行标记编号,并对各所述其他类型声源进行分贝值检测,将分贝值在分贝阈值以下的第一其他类型声源隐藏,同时对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号。Marking and numbering each of the human voice sound sources, and performing decibel value detection on each of the other types of sound sources, hiding the first other type of sound source whose decibel value is below the decibel threshold, and simultaneously detecting the decibel value below the decibel threshold. The second other type of sound source above the threshold is marked with a number.
  18. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述对分贝值在所述分贝阈值以上的第二其他类型声源进行标记编号的步骤,包括:The computer-readable storage medium according to claim 17, wherein the step of marking and numbering the second other type of sound source whose decibel value is above the decibel threshold comprises:
    将各所述第二其他类型声源分别对应的声音输入预先训练的声音类型识别模型进行识别,得到各所述第二其他类型声源分别对应的声音类型;Inputting the sound corresponding to each of the second other types of sound sources into a pre-trained sound type recognition model for identification, and obtaining the sound types corresponding to each of the second other types of sound sources;
    将所述声音类型作为标记信息,分别对各所述第二其他类型声源进行标记编号。Using the sound type as marking information, mark and number each of the second sound sources of other types.
  19. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤,包括:The computer-readable storage medium according to claim 15, wherein, according to the collection time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second Relative to the positional relationship, the step of constructing the second motion trajectory corresponding to each of the sound sources includes:
    分别以所述视频数据和各所述音频数据的采集时刻为基准进行时间同步,定位各所述声源在所述视频数据中的出现时间;Carrying out time synchronization based on the acquisition time of the video data and each of the audio data respectively, and locating the appearance time of each of the sound sources in the video data;
    通过GPS定位方法采集所述摄像设备的所述第一运动轨迹,并以所述第一运动轨迹为位置参照,根据各所述声源分别对应的所述出现时间以及各所述第二相对位置关系,构建得到各所述声源分别相对于所述第一运动轨迹的所述第二运动轨迹。Collect the first motion trajectory of the imaging device by a GPS positioning method, and use the first motion trajectory as a position reference, according to the appearance time corresponding to each of the sound sources and each of the second relative positions Relationships are constructed to obtain the second motion trajectories of each of the sound sources relative to the first motion trajectories.
  20. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述根据所述视频数据与各所述音频数据的采集时间,所述摄像设备的第一运动轨迹,以及各所述第二相对位置关系,构建各所述声源分别对应的第二运动轨迹的步骤之后,包括:The computer-readable storage medium according to claim 15, wherein, according to the collection time of the video data and each of the audio data, the first motion trajectory of the camera device, and each of the second The relative positional relationship, after the step of constructing the second motion trajectory corresponding to each of the sound sources, includes:
    分别以不同颜色的线条构建各所述第二运动轨迹,并将各个颜色与各所述声源之间的对应关系进行记录形成对应信息;Constructing each of the second motion trajectories with lines of different colors, and recording the corresponding relationship between each color and each of the sound sources to form corresponding information;
    根据所述第一运动轨迹、所述对应信息以及各所述第二运动轨迹生成轨迹分布图,并将所述轨迹分布图输出到显示界面。A trajectory distribution graph is generated according to the first motion trajectory, the corresponding information, and each of the second motion trajectories, and the trajectory distribution graph is output to a display interface.
PCT/CN2021/111895 2021-08-04 2021-08-10 Target trajectory calibration method based on video and audio, and computer device WO2023010599A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110891951.2A CN113794830A (en) 2021-08-04 2021-08-04 Target track calibration method and device based on video and audio and computer equipment
CN202110891951.2 2021-08-04

Publications (1)

Publication Number Publication Date
WO2023010599A1 true WO2023010599A1 (en) 2023-02-09

Family

ID=79181397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/111895 WO2023010599A1 (en) 2021-08-04 2021-08-10 Target trajectory calibration method based on video and audio, and computer device

Country Status (2)

Country Link
CN (1) CN113794830A (en)
WO (1) WO2023010599A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116518989A (en) * 2023-07-05 2023-08-01 新唐信通(浙江)科技有限公司 Method for vehicle navigation based on sound and thermal imaging

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295015A (en) * 2007-04-23 2008-10-29 财团法人工业技术研究院 Sound source locating system and method
CN101567969A (en) * 2009-05-21 2009-10-28 上海交通大学 Intelligent video director method based on microphone array sound guidance
JP2010232888A (en) * 2009-03-26 2010-10-14 Ikegami Tsushinki Co Ltd Monitor device
CN109118610A (en) * 2018-08-17 2019-01-01 北京云鸟科技有限公司 A kind of track inspection method and device
CN111145736A (en) * 2019-12-09 2020-05-12 华为技术有限公司 Speech recognition method and related equipment
CN112261361A (en) * 2020-09-25 2021-01-22 江苏聆世科技有限公司 Microphone array and dome camera linked abnormal sound source monitoring method and system
CN112492207A (en) * 2020-11-30 2021-03-12 深圳卡多希科技有限公司 Method and device for controlling rotation of camera based on sound source positioning
CN112995566A (en) * 2019-12-17 2021-06-18 佛山市云米电器科技有限公司 Sound source positioning method based on display equipment, display equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO318096B1 (en) * 2003-05-08 2005-01-31 Tandberg Telecom As Audio source location and method
CN101009775A (en) * 2006-01-23 2007-08-01 株式会社理光 Imaging apparatus, position information recording method and computer programme product
CN100556151C (en) * 2006-12-30 2009-10-28 华为技术有限公司 A kind of video terminal and a kind of audio code stream processing method
JP2016109971A (en) * 2014-12-09 2016-06-20 キヤノン株式会社 Signal processing system and control method of signal processing system
CN107677992B (en) * 2017-09-30 2021-06-22 深圳市沃特沃德股份有限公司 Movement detection method and device and monitoring equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295015A (en) * 2007-04-23 2008-10-29 财团法人工业技术研究院 Sound source locating system and method
JP2010232888A (en) * 2009-03-26 2010-10-14 Ikegami Tsushinki Co Ltd Monitor device
CN101567969A (en) * 2009-05-21 2009-10-28 上海交通大学 Intelligent video director method based on microphone array sound guidance
CN109118610A (en) * 2018-08-17 2019-01-01 北京云鸟科技有限公司 A kind of track inspection method and device
CN111145736A (en) * 2019-12-09 2020-05-12 华为技术有限公司 Speech recognition method and related equipment
CN112995566A (en) * 2019-12-17 2021-06-18 佛山市云米电器科技有限公司 Sound source positioning method based on display equipment, display equipment and storage medium
CN112261361A (en) * 2020-09-25 2021-01-22 江苏聆世科技有限公司 Microphone array and dome camera linked abnormal sound source monitoring method and system
CN112492207A (en) * 2020-11-30 2021-03-12 深圳卡多希科技有限公司 Method and device for controlling rotation of camera based on sound source positioning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116518989A (en) * 2023-07-05 2023-08-01 新唐信通(浙江)科技有限公司 Method for vehicle navigation based on sound and thermal imaging
CN116518989B (en) * 2023-07-05 2023-09-12 新唐信通(浙江)科技有限公司 Method for vehicle navigation based on sound and thermal imaging

Also Published As

Publication number Publication date
CN113794830A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
Pérez‐Granados et al. Estimating bird density using passive acoustic monitoring: a review of methods and suggestions for further research
Liu et al. End-to-end trajectory transportation mode classification using Bi-LSTM recurrent neural network
US8676728B1 (en) Sound localization with artificial neural network
US9171548B2 (en) Methods and systems for speaker identity verification
CN108089152B (en) Equipment control method, device and system
CN105979442B (en) Noise suppressing method, device and movable equipment
US20180018970A1 (en) Neural network for recognition of signals in multiple sensory domains
KR102230667B1 (en) Method and apparatus for speaker diarisation based on audio-visual data
WO2020000697A1 (en) Behavior recognition method and apparatus, computer device, and storage medium
CN107949866A (en) Image processing apparatus, image processing system and image processing method
CN108229441A (en) A kind of classroom instruction automatic feedback system and feedback method based on image and speech analysis
EP0947161A2 (en) Measurement and validation of interaction and communication
WO2022179453A1 (en) Sound recording method and related device
WO2016119107A1 (en) Noise map drawing method and apparatus
JP2017207877A (en) Behavioral analysis device and program
WO2023010599A1 (en) Target trajectory calibration method based on video and audio, and computer device
KR101884446B1 (en) Speaker identification and speaker tracking method for Multilateral conference environment
CN113759938B (en) Unmanned vehicle path planning quality evaluation method and system
US20200333429A1 (en) Sonic pole position triangulation in a lighting system
Stattner et al. Acoustic scheme to count bird songs with wireless sensor networks
Kojima et al. HARK-Bird-Box: A portable real-time bird song scene analysis system
CN111273232B (en) Indoor abnormal condition judging method and system
Huetz et al. Bioacoustics approaches to locate and identify animals in terrestrial environments
Sturley et al. PANDI: a hybrid open source edge-based system for environmental and real-time passive acoustic monitoring-prototype design and development
CN109117765A (en) Video investigation device and method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE