WO2018210192A1 - Unmanned aerial vehicle monitoring method and audio/video linkage apparatus - Google Patents

Unmanned aerial vehicle monitoring method and audio/video linkage apparatus Download PDF

Info

Publication number
WO2018210192A1
WO2018210192A1 PCT/CN2018/086565 CN2018086565W WO2018210192A1 WO 2018210192 A1 WO2018210192 A1 WO 2018210192A1 CN 2018086565 W CN2018086565 W CN 2018086565W WO 2018210192 A1 WO2018210192 A1 WO 2018210192A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio collection
target object
sound component
sound
preset
Prior art date
Application number
PCT/CN2018/086565
Other languages
French (fr)
Chinese (zh)
Inventor
何赛娟
陈扬坤
陈展
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2018210192A1 publication Critical patent/WO2018210192A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/183Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present application relates to the field of security monitoring technologies, and in particular, to a drone monitoring method and an audio and video linkage device.
  • the drone is a short name for a class of powered, unmanned, reusable aircraft. It is small in size and low in cost, and can be equipped with telemetry equipment such as sensors and cameras. Since drones can take pictures of the ground from the air, drones are often used to monitor the ground.
  • the purpose of the embodiment of the present application is to provide a UAV monitoring method and an audio and video linkage device to implement monitoring of the UAV, thereby ensuring social and personal security.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a UAV monitoring method, where the method includes:
  • the signal feature includes: a spectrum feature
  • Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals meets a preset condition including:
  • a sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  • the acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component comprises:
  • the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained.
  • the frequency domain signal of the sound component collected by each audio collection device in the audio collection array is fused to obtain a frequency characteristic, and the method includes:
  • the frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  • the method before the extracting the frame signal corresponding to the sound component collected by the audio collection device that is greater than the preset threshold, the method further includes:
  • the method further includes:
  • the determining, by the sound component that the signal feature meets the preset condition, the location of the target object includes:
  • Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
  • the method further includes:
  • Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices including:
  • the location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  • determining the location of the target object according to a location of each audio collection device relative to the camera and a location of the target object relative to each pair of audio collection device including:
  • the method further includes:
  • the method further includes: after determining, according to the image captured by the camera, whether the target object is a drone, the method further includes:
  • An alarm enhancement command is generated when the target object is a drone.
  • the method further includes:
  • the camera is controlled to perform tracking shooting on the drone.
  • an audio and video linkage device where the device includes:
  • An audio collection array for collecting sound signals
  • a processor configured to perform signal processing on the sound signal collected by the audio collection array, to obtain a sound component in which at least one of the sound signals meets a preset condition; and determine a sound component corresponding to the preset condition Positioning the target object; controlling the position of the camera to be aligned with the target object; determining whether the target object is a drone according to an image captured by the camera;
  • the housing assembly includes: a rotating member and a base, wherein the rotating member is configured to provide rotational kinetic energy to the camera, and the base is configured to fix a bottom of the camera, the audio collection array, and the processor.
  • the feature signal includes: a spectrum feature
  • the processor is specifically configured to:
  • a sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  • the processor is specifically configured to:
  • the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained.
  • the frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  • the processor is specifically configured to:
  • the processor is specifically configured to:
  • the sound component is determined to be noise.
  • the processor is specifically configured to:
  • Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
  • the processor is specifically configured to:
  • the location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  • the processor is specifically configured to:
  • the device further includes:
  • the alarm component is used to perform an alarm operation when an alarm command is received.
  • the processor is specifically configured to:
  • the processor is specifically configured to:
  • the camera is controlled to perform tracking shooting on the drone.
  • an embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at a runtime: the method steps provided by the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides an application program for performing, at runtime, the method steps provided by the first aspect of the embodiment of the present application.
  • the UAV monitoring method and the audio and video linkage device provided by the embodiment of the present invention process the sound signal collected by the audio collection array to determine the location of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and According to the position of the target object, the camera is controlled to be aimed at the target object, and the target object is judged to be a drone according to the captured image, thereby realizing the monitoring of the drone and ensuring the accuracy of the monitoring.
  • FIG. 1 is a schematic flow chart of a method for monitoring a drone according to an embodiment of the present application
  • FIG. 2 is a schematic flow chart of a method for monitoring a drone according to another embodiment of the present application.
  • FIG. 3 is a schematic flow chart of a method for monitoring a drone according to still another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an audio-video linkage device according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an audio-video linkage device according to another embodiment of the present application.
  • the embodiment of the present application provides a drone monitoring method and an audio and video linkage device.
  • the following describes an unmanned aerial vehicle monitoring method provided by the embodiment of the present application.
  • the execution body of the UAV monitoring method provided by the embodiment of the present application may be a controller equipped with a core processing chip, wherein the core processing chip may be a DSP (Digital Signal Processor), an ARM. Any of core processing chips such as (Advanced Reduced Instruction Set Computer Machines) or FPGA (Field-Programmable Gate Array).
  • a manner of implementing a UAV monitoring method provided by an embodiment of the present application may be at least one of software, hardware circuits, and logic circuits disposed in an execution body.
  • a method for monitoring a drone provided by an embodiment of the present application may include the following steps:
  • the audio collection array may be an array of multiple audio collection devices having audio collection functions, such as an array of multiple microphones or multiple sound sensors.
  • the array can be installed on the periphery of the area to be monitored; of course, the array can also be installed in the area to be monitored. An important location within the area so that only that important location is monitored.
  • the number of audio collection devices in the audio collection array can be selected according to the size of the area to be monitored.
  • the larger the area to be monitored the audio collection is required to obtain the same detection result. The more devices there are, the wider the distribution of audio capture devices.
  • an electret microphone can be installed every 10 cm to 20 cm, and the distance between each two electret microphones may not be exactly the same, which is not limited herein.
  • S102 Perform signal processing on the sound signal to obtain a sound component in which at least one of the sound signals satisfies a preset condition.
  • the sound signals collected by the audio collection array may contain sounds from multiple objects.
  • the sound signal collected by the audio collection array may be a time domain signal or a frequency domain signal, and the signal characteristics may be any one or more of characteristics such as audio, pitch, volume, and frequency spectrum, and the signal characteristics may be different.
  • the sound components are separated.
  • a preset condition is defined in the embodiment of the present application, and the preset condition is a condition that the signal feature needs to reach.
  • the sound signals generated by the propellers have distinct features, and all of the properties of the sound signal characteristics of the propellers that can represent the drone can be set to the preset conditions. For example, there are a plurality of spectral lines having harmonic relations in the noise, and the spectral lines are arranged at equal intervals on the frequency axis. Based on the feature, signals having spectral lines arranged at equal intervals on the frequency axis can be carried. Any attribute information is set as a preset condition. By analyzing the signal characteristics of all sound components, and obtaining the sound components whose signal characteristics meet the preset conditions, the noise signal similar to noise can be filtered out, the accuracy of positioning the drone by sound is improved, and the identification drone is improved. s efficiency.
  • the step of performing signal processing on the sound signal to obtain a sound component that satisfies a predetermined condition in at least one of the sound signals may include:
  • the sound signal collected by the audio collection array is extracted to obtain at least one sound component in the sound signal.
  • the sound signal collected by the audio collection array may be a sound emitted by a plurality of objects
  • at least one sound component may be obtained by extracting the sound signal, for example, dividing the sound signal according to the spectrum of the sound, in addition to the manner, Other ways of extracting the sound signal to obtain at least one sound component are also within the protection scope of the embodiment, and are not described herein again.
  • the energy collected by each audio collection device in the audio collection array is acquired; for the sound component whose energy is greater than the preset threshold, each audio collection device in the audio collection array The frequency domain signals of the collected sound components are fused to obtain spectral features.
  • the greater the number the closer the object is to the area to be monitored or the area to be monitored. Therefore, in this embodiment, the energy of the sound component collected by each audio collection device is extracted, and an audio collection device whose energy is greater than a preset threshold is determined, wherein the preset threshold and the area to be monitored are not included.
  • the volume of the man-machine is related to other factors, and can be initialized according to the environment when the audio acquisition array is installed for the first time.
  • the number of audio collection devices whose energy is greater than a preset threshold may be limited, for example, the number of audio collection devices whose energy is greater than a preset threshold.
  • the frequency domain is converted to half of the total number of all audio collection devices in the audio collection array. Otherwise, the sound component is considered as noise and is not processed.
  • the spectral characteristic is a characteristic of the sound emitted by the propeller of the drone when it is rotating, that is, a plurality of spectral lines having harmonic relations in the noise, and the spectral lines are arranged at equal intervals on the frequency axis, and therefore, the sound component is Frequency domain transform is performed to obtain a frequency domain signal, and then the spectral characteristics of the sound component are obtained by fusion.
  • the step of acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component may include:
  • the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time to obtain a frame corresponding to the sound component collected by each audio collection device. signal;
  • the sound component collected by each audio collection device in the audio collection array is audio for a period of time
  • the sound components collected by each audio collection device may be divided in a preset period of the current time.
  • Frame processing after performing framing processing, it is necessary to perform energy calculation on the frame signal, and for each audio acquisition device, calculate the sum of energy calculated for all frame signals.
  • the step of merging the frequency domain signals of the sound component collected by each of the audio collection devices in the audio collection array for the sound component whose energy is greater than the preset threshold to obtain the spectral features may include:
  • the frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  • the frame signal obtained by the frame processing may be subjected to frequency domain transformation according to a preset time-frequency transform method, wherein the preset time-frequency transform method may be The leaf transform, of course, other methods for transforming the time domain signal into the frequency domain signal are also within the scope of protection of the embodiments of the present application, and are not described herein again.
  • the multi-order harmonic signal processing result of the obtained frequency domain signal can be fused to obtain a spectrum feature.
  • the fusion of the harmonic signals may be a fusion of the cyclostationary features of the multi-order harmonics with the multiplication relationship.
  • other ways of fusing the harmonic signals are also within the protection scope of the embodiment of the present application. Let me repeat.
  • the method may further include the following steps:
  • the number of the audio collecting devices whose energy is greater than the preset threshold may be limited, for example, a first preset number is set, and the energy is greater than the preset threshold.
  • the step of extracting the frame signal is performed only when the number of the audio collection devices is greater than the first preset number, otherwise the sound component is considered to be noise and is not processed.
  • the first preset number may be set to be half of the total number of audio collection devices, or may be set to one quarter of the total number of audio collection devices. The smaller the first preset number is set, the larger the monitoring range is, but the accuracy is May be reduced. Normally, the first preset number is set to half of the total number of audio collection devices.
  • the sound components of all the sound components whose spectral features satisfy the preset spectral features are obtained.
  • the target object corresponding to the sound component satisfying the preset condition can be determined by analyzing the signal feature, and the position of the target object compared to the audio collecting device can be determined by the attribute information such as volume and audio of the sound component of the target object.
  • the step of determining that the signal feature meets the location of the target object corresponding to the sound component of the preset condition may include:
  • the location of the target object is determined according to the position of each of the audio collection devices in the audio collection array relative to the camera and the position of the target object relative to each of the audio collection devices.
  • each audio collection device in the audio collection array relative to the camera When the position of each audio collection device in the audio collection array relative to the camera is arranged in the audio collection array, it can be determined according to the position information provided by the audio collection device, that is, when determining the arrangement of the audio collection array.
  • the position of each audio capture device in the audio capture array relative to the camera has been determined.
  • the position mentioned here can be either an angle or a distance.
  • the method further includes the following steps:
  • the first step is to select an audio collection device that meets a second preset number from the audio collection devices whose energy is greater than a preset threshold;
  • the second preset number of audio collection devices are combined to form a pair of audio collection device pairs
  • the position of the target object relative to each pair of audio collection devices is determined according to a preset time delay estimation method.
  • the audio collection device with the energy greater than the preset threshold may select a plurality of audio collection devices in descending order of energy values, in this embodiment.
  • the second preset number is determined by the energy value from large to small, and can be set as five audio collection devices, and the selected audio collection devices are combined to form an audio collection device pair.
  • the preset time delay estimation method can calculate a corresponding azimuth of each pair of audio collection devices, and can convert the azimuth angle into an angle at which the camera rotates.
  • the preset time delay estimation method may be a classical time delay estimation method, and the estimated sound signal reaches a time difference between different audio collection devices, and the position of the sound source may be determined through a geometric relationship.
  • the classical time delay estimation method includes The method of the cross-correlation function, the method using the speech feature, and the method based on the channel transfer function are all applicable to the embodiments of the present application.
  • the step of determining the location of the target object according to the position of each of the audio collection devices in the audio collection array relative to the camera and the position of the target object relative to each of the audio collection devices may include:
  • the location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  • the step of determining the location of the target object according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of the audio collection device may include:
  • the first step determining a position parameter of the target object relative to each pair of the audio collection device according to a position of the target object relative to each pair of the audio collection device, wherein the position parameter includes an angle value or a displacement value;
  • the third step is to average the position parameters corresponding to the difference value of the preset difference, and determine the position of the target object relative to any of the audio collection devices;
  • the location of the target object is determined according to the position of the audio collection device relative to the camera and the position of the target object relative to the audio collection device.
  • the pair of audio collection devices calculates an angle of the target object relative to each pair of audio collection devices, calculates an audio collection device pair whose angle difference value is within 10 degrees, and determines the target object by using an average value. Accurate position can further improve the accuracy of determining the target position.
  • S104 Control the position of the camera at the target object.
  • the control camera is aimed at the target object, and the camera can be driven to rotate by driving the rotating device.
  • the alignment operation of the camera is not specifically limited, and may be aligned by the camera, or may be aligned by the camera, or may be translated by the camera.
  • S105 Determine, according to the image captured by the camera, whether the target object is a drone.
  • the camera can be always in the shooting state, and only the driving camera is aimed at the target object; or after the camera is aimed at the target object, a shooting instruction is received to start shooting the target object.
  • the image recognition technology may be an automatic recognition technology such as a neural network or a pixel comparison in the field of image processing, or may be artificially Compare, there is no specific limit here.
  • the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object.
  • the embodiment of the present application provides another UAV monitoring method. After S104, the following steps may be further included:
  • the drone Since the drone enters the area to be monitored, it may bring safety hazards to the users in the area. Therefore, when it is judged by the audio that the target object approaches or enters the area to be monitored, an alarm command is generated to remind the user that there is a suspected target. Need to be alert.
  • S105 if the target object is a drone, S202 is performed, otherwise, S203 is performed.
  • an alarm enhancement command is generated to enhance the effect of the alarm, for example, increasing the sound of the alarm to remind the user that there is a drone in the area.
  • the alarm can be eliminated.
  • the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object.
  • Shooting, according to the captured image to determine whether the target object is a drone thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring.
  • alerting the presence of the target object the alarm is enhanced when there is a drone, and the user is promptly reminded to further improve the safety.
  • the embodiment of the present application provides another UAV monitoring method. After S105, the following steps may be further included:
  • the camera can be controlled to the drone. Tracking and shooting is performed. Specifically, according to the motion track of the drone in the captured video, the camera can be driven to move according to the motion track to achieve tracking shooting.
  • the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object.
  • Shooting according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring.
  • controlling the camera to track and shoot the drone real-time monitoring of the flight status of the drone, to ensure real-time monitoring and security.
  • the embodiment of the present application provides an audio and video linkage device.
  • the audio and video linkage device may include:
  • An audio collection array 410 configured to collect a sound signal
  • the processor 430 is configured to perform signal processing on the sound signal collected by the audio collection array 410, obtain a sound component in which at least one of the sound signals meets a preset condition, and determine a sound that the signal feature meets a preset condition. Positioning the target object corresponding to the component; controlling the camera 420 to align the location of the target object; determining, according to the image captured by the camera 420, whether the target object is a drone;
  • the housing assembly 440 includes a rotating member and a base, wherein the rotating member is configured to provide rotational kinetic energy to the camera, and the base is configured to fix a bottom of the camera, the audio collection array, and the processor.
  • the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object.
  • the feature signal may include: a spectrum feature
  • the processor 430 is specifically configured to:
  • a sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  • the processor 430 is specifically configured to:
  • the sound component collected by each audio collection device in the audio collection array 410 is subjected to frame processing in a preset period of the current time to obtain the sound component collected by each audio collection device.
  • the frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  • the processor 430 is specifically configured to:
  • the processor 430 is specifically configured to:
  • the sound component is noise.
  • the processor 430 is specifically configured to:
  • the location of the target object is determined according to the position of each of the audio collection devices in the audio collection array 410 relative to the camera 420 and the position of the target object relative to each of the audio collection devices.
  • the processor 430 is specifically configured to:
  • the location of the target object is determined according to the position of each audio collection device relative to the camera 420 and the position of the target object relative to each pair of audio collection devices.
  • the processor 430 is specifically configured to:
  • the location of the target object is determined according to the position of the audio capture device relative to the camera 420 and the position of the target object relative to the audio capture device.
  • the audio-video linkage device of the embodiment of the present application is a device applying the above-described UAV monitoring method, and all the embodiments of the UAV monitoring method are applicable to the device, and all of the same or similar beneficial effects can be achieved.
  • the audio and video linkage device may further include:
  • the alarm component 510 is configured to perform an alarm operation when receiving an alarm instruction.
  • the alarm component 510 can be a device for alarms such as a buzzer, a light emitting diode, a display, etc., and the alarm component 510 has a function of enhancing an alarm, for example, the buzzer of the buzzer is getting louder and the light of the light emitting diode is coming. The brighter, the display content of the display changes, and so on.
  • the alarm operation may be a beeping sound, lighting a light emitting diode, displaying a suspected target, and the like.
  • the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object.
  • Shooting, according to the captured image to determine whether the target object is a drone thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring.
  • alerting the presence of the target object the alarm is enhanced when there is a drone, and the user is promptly reminded to further improve the safety.
  • the processor 430 is specifically configured to:
  • the processor 430 is specifically configured to:
  • the camera 420 is controlled to perform tracking shooting on the drone.
  • the embodiment of the present application provides a storage medium for storing executable code, which is used to execute at runtime: the embodiment of the present application
  • the UAV monitoring method provided; specifically, the UAV monitoring method includes:
  • the signal feature includes: a spectrum feature
  • Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals meets a preset condition including:
  • a sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  • the acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component comprises:
  • the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained.
  • the frequency domain signal of the sound component collected by each audio collection device in the audio collection array is fused to obtain a spectrum feature, where the sound component is greater than a preset threshold, and the spectrum features are obtained, including:
  • the frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  • the method before the extracting the frame signal corresponding to the sound component collected by the audio collection device that is greater than the preset threshold, the method further includes:
  • the method further includes:
  • the determining, by the sound component that the signal feature meets the preset condition, the location of the target object includes:
  • Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
  • the method further includes:
  • Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices including:
  • the location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  • determining the location of the target object according to a location of each audio collection device relative to the camera and a location of the target object relative to each pair of audio collection device including:
  • the method further includes:
  • the method further includes: after determining, according to the image captured by the camera, whether the target object is a drone, the method further includes:
  • An alarm enhancement command is generated when the target object is a drone.
  • the method further includes:
  • the camera is controlled to perform tracking shooting on the drone.
  • the storage medium stores an application program for performing the UAV monitoring method provided by the embodiment of the present application during operation, so that the sound signal collected by the audio collection array can be processed to determine that the signal characteristic satisfies the pre-processing.
  • the embodiment of the present application provides an application program for performing the following steps of the UAV monitoring method provided by the embodiment of the present application.
  • the application performs the UAV monitoring method provided by the embodiment of the present application during operation, so that the sound component collected by the audio collection array can be processed to determine the sound component whose signal characteristic meets the preset condition.
  • Corresponding position of the target object according to the position of the target object, controlling the camera to aim at the target object, and determining whether the target object is a drone according to the captured image, thereby realizing monitoring of the drone and ensuring The accuracy of the monitoring.

Abstract

Embodiments of the present application provide an unmanned aerial vehicle monitoring method and an audio/video linkage apparatus. The unmanned aerial vehicle monitoring method comprises: acquiring a sound signal by means of an audio acquisition array; performing signal processing on the sound signal and obtaining, in the sound signal, at least one sound component with a signal characteristic satisfying a preset condition; determining the location of a target object corresponding to the sound component with the signal characteristic satisfying the preset condition; controlling a camera towards the location of target object; and determining whether the target object is an unmanned aerial vehicle according to an image photographed by the camera. According to the present solution, monitoring on an unmanned aerial vehicle can be achieved, and thus social and personal safety can be ensured.

Description

一种无人机监测方法及音视频联动装置UAV monitoring method and audio and video linkage device
本申请要求于2017年05月17日提交中国专利局、申请号为201710349350.2发明名称为“一种无人机监测方法及音视频联动装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application entitled "A UAV Monitoring Method and Audio and Video Linkage Device" submitted by the Chinese Patent Office on May 17, 2017, and the application number is 201710349350.2, the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请涉及安防监控技术领域,特别是涉及一种无人机监测方法及音视频联动装置。The present application relates to the field of security monitoring technologies, and in particular, to a drone monitoring method and an audio and video linkage device.
背景技术Background technique
无人机,是一类由动力驱动、无人驾驶、可重复使用的航空器的简称,其体积小、成本低,可装配传感器、摄像机等遥测设备。由于无人机可以从空中航拍地面的情况,因此,无人机常常被用来对地面进行监控。The drone is a short name for a class of powered, unmanned, reusable aircraft. It is small in size and low in cost, and can be equipped with telemetry equipment such as sensors and cameras. Since drones can take pictures of the ground from the air, drones are often used to monitor the ground.
随着无人机开发与制造成本的降低,无人机行业发展迅猛,门槛越来越低,因此,个人、团体、组织等使用无人机的情况越来越普遍,由于无人机体积较小且在低空飞行,雷达监测系统大多不具备监测无人机的能力,从而导致无人机的“黑飞”盛行,影响社会及个人安全。With the development of drones and manufacturing costs, the drone industry is developing rapidly and the threshold is getting lower and lower. Therefore, the use of drones by individuals, groups, organizations, etc. is becoming more and more common. Small and flying at low altitudes, most of the radar monitoring systems do not have the ability to monitor drones, which leads to the prevalence of “black flying” of drones, affecting social and personal safety.
发明内容Summary of the invention
本申请实施例的目的在于提供一种无人机监测方法及音视频联动装置,以实现对无人机的监控,从而保证社会及个人安全。具体技术方案如下:The purpose of the embodiment of the present application is to provide a UAV monitoring method and an audio and video linkage device to implement monitoring of the UAV, thereby ensuring social and personal security. The specific technical solutions are as follows:
第一方面,本申请实施例提供了一种无人机监测方法,所述方法包括:In a first aspect, an embodiment of the present application provides a UAV monitoring method, where the method includes:
通过音频采集阵列采集声音信号;Acquiring sound signals through an audio collection array;
对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals satisfies a preset condition;
确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;Determining a location of the target object corresponding to the sound component that satisfies the preset condition;
控制摄像头对准所述目标物体所在位置;Controlling the position of the camera at the target object;
根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机。Determining whether the target object is a drone according to an image captured by the camera.
可选的,所述信号特征包括:频谱特征;Optionally, the signal feature includes: a spectrum feature;
所述对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量,包括:Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals meets a preset condition, including:
对所述音频采集阵列采集的声音信号进行提取,获得所述声音信号中至少一个声音分量;Extracting a sound signal collected by the audio collection array to obtain at least one sound component of the sound signal;
对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
可选的,所述对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量,包括:Optionally, the acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component comprises:
针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained. Frame signal
计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
所述对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征,包括:The frequency domain signal of the sound component collected by each audio collection device in the audio collection array is fused to obtain a frequency characteristic, and the method includes:
提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
可选的,所述提取每个能量大于预设阈值的音频采集设备采集的该声音 分量所对应的帧信号之前,所述方法还包括:Optionally, before the extracting the frame signal corresponding to the sound component collected by the audio collection device that is greater than the preset threshold, the method further includes:
判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
可选的,所述判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目之后,所述方法还包括:Optionally, after the determining whether the number of the audio collection devices that is greater than the preset threshold is greater than the first preset number, the method further includes:
若否,则确定该声音分量为噪声。If not, it is determined that the sound component is noise.
可选的,所述确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置,包括:Optionally, the determining, by the sound component that the signal feature meets the preset condition, the location of the target object, includes:
针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
可选的,所述在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体之后,所述方法还包括:Optionally, after the determining that the object corresponding to the sound component is the target object, when the signal feature of the sound component meets the preset condition, the method further includes:
从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
所述根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置,包括:Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices, including:
根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
可选的,所述根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置,包括:Optionally, determining the location of the target object according to a location of each audio collection device relative to the camera and a location of the target object relative to each pair of audio collection device, including:
根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
根据该音频采集设备相对于所述摄像头的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。And determining a location of the target object according to a position of the audio collection device relative to the camera and a position of the target object relative to the audio collection device.
可选的,所述确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置之后,所述方法还包括:Optionally, after the determining that the signal feature meets the location of the target object corresponding to the sound component of the preset condition, the method further includes:
生成报警指令;Generate an alarm command;
所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:The method further includes: after determining, according to the image captured by the camera, whether the target object is a drone, the method further includes:
在所述目标物体不是无人机时,消除所述报警指令;Eliminating the alarm command when the target object is not a drone;
在所述目标物体为无人机时,生成报警增强指令。An alarm enhancement command is generated when the target object is a drone.
可选的,所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:Optionally, after the determining whether the target object is a drone according to the image captured by the camera, the method further includes:
在所述目标物体为无人机时,控制所述摄像头对所述无人机进行跟踪拍摄。When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
第二方面,本申请实施例提供了一种音视频联动装置,所述装置包括:In a second aspect, an embodiment of the present application provides an audio and video linkage device, where the device includes:
音频采集阵列,用于采集声音信号;An audio collection array for collecting sound signals;
摄像头,用于对目标物体进行拍摄;a camera for shooting a target object;
处理器,用于对所述音频采集阵列采集的声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;控制所述摄像头对准所述目标物体所在位置;根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机;a processor, configured to perform signal processing on the sound signal collected by the audio collection array, to obtain a sound component in which at least one of the sound signals meets a preset condition; and determine a sound component corresponding to the preset condition Positioning the target object; controlling the position of the camera to be aligned with the target object; determining whether the target object is a drone according to an image captured by the camera;
外壳组件,包括:转动件及底座,其中,所述转动件用于向所述摄像头提供转动动能,所述底座用于固定所述摄像头的底部、所述音频采集阵列及所述处理器。The housing assembly includes: a rotating member and a base, wherein the rotating member is configured to provide rotational kinetic energy to the camera, and the base is configured to fix a bottom of the camera, the audio collection array, and the processor.
可选的,所述特征信号包括:频谱特征;Optionally, the feature signal includes: a spectrum feature;
所述处理器,具体用于:The processor is specifically configured to:
对所述音频采集阵列采集的声音信号进行提取,获得所述声音信号中至少一个声音分量;Extracting a sound signal collected by the audio collection array to obtain at least one sound component of the sound signal;
对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained. Frame signal
计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预 设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
若能量大于预设阈值的音频采集设备的数目小于或等于所述第一预设数目时,则确定该声音分量为噪声。If the number of audio collection devices whose energy is greater than a preset threshold is less than or equal to the first preset number, then the sound component is determined to be noise.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
根据该音频采集设备相对于所述摄像头的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。And determining a location of the target object according to a position of the audio collection device relative to the camera and a position of the target object relative to the audio collection device.
可选的,所述装置还包括:Optionally, the device further includes:
报警组件,用于在接收到报警指令时,执行报警操作。The alarm component is used to perform an alarm operation when an alarm command is received.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
在确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置之后,生成报警指令,并发送所述报警指令至所述报警组件,以驱动所述报警组件执行报警操作;After determining that the signal feature meets the position of the target object corresponding to the sound component of the preset condition, generating an alarm instruction, and transmitting the alarm instruction to the alarm component to drive the alarm component to perform an alarm operation;
在根据所述摄像头拍摄到的图像,确定所述目标物体不是无人机时,消除所述报警指令;Determining the alarm command when determining that the target object is not a drone according to an image captured by the camera;
在根据所述摄像头拍摄到的图像,确定所述目标物体为无人机时,生成报警增强指令,并发送所述报警增强指令至所述报警组件,以驱动所述报警组件执行报警增强操作。And determining, according to the image captured by the camera, that the target object is a drone, generating an alarm enhancement command, and transmitting the alarm enhancement command to the alarm component to drive the alarm component to perform an alarm enhancement operation.
可选的,所述处理器,具体还用于:Optionally, the processor is specifically configured to:
在所述目标物体为无人机时,控制所述摄像头对所述无人机进行跟踪拍摄。When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
第三方面,本申请实施例提供了一种存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例第一方面所提供的方法步骤。In a third aspect, an embodiment of the present application provides a storage medium for storing executable code, where the executable code is used to execute at a runtime: the method steps provided by the first aspect of the embodiment of the present application.
第四方面,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例第一方面所提供的方法步骤。In a fourth aspect, an embodiment of the present application provides an application program for performing, at runtime, the method steps provided by the first aspect of the embodiment of the present application.
本申请实施例提供的一种无人机监测方法及音视频联动装置,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并保证监测的准确性。The UAV monitoring method and the audio and video linkage device provided by the embodiment of the present invention process the sound signal collected by the audio collection array to determine the location of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and According to the position of the target object, the camera is controlled to be aimed at the target object, and the target object is judged to be a drone according to the captured image, thereby realizing the monitoring of the drone and ensuring the accuracy of the monitoring.
附图说明DRAWINGS
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application and the technical solutions of the prior art, the following description of the embodiments and the drawings used in the prior art will be briefly introduced. Obviously, the drawings in the following description are only Some embodiments of the application may also be used to obtain other figures from those of ordinary skill in the art without departing from the scope of the invention.
图1为本申请一实施例的无人机监测方法的流程示意图;1 is a schematic flow chart of a method for monitoring a drone according to an embodiment of the present application;
图2为本申请另一实施例的无人机监测方法的流程示意图;2 is a schematic flow chart of a method for monitoring a drone according to another embodiment of the present application;
图3为本申请又一实施例的无人机监测方法的流程示意图;3 is a schematic flow chart of a method for monitoring a drone according to still another embodiment of the present application;
图4为本申请一实施例的音视频联动装置的结构示意图;4 is a schematic structural diagram of an audio-video linkage device according to an embodiment of the present application;
图5为本申请另一实施例的音视频联动装置的结构示意图。FIG. 5 is a schematic structural diagram of an audio-video linkage device according to another embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and not all of them. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
为了实现对无人机的监控,从而保证社会及个人安全,本申请实施例提供了一种无人机监测方法及音视频联动装置。In order to realize the monitoring of the drone, thereby ensuring social and personal safety, the embodiment of the present application provides a drone monitoring method and an audio and video linkage device.
下面首先对本申请实施例所提供的一种无人机监测方法进行介绍。The following describes an unmanned aerial vehicle monitoring method provided by the embodiment of the present application.
本申请实施例所提供的一种无人机监测方法的执行主体可以为一种搭载 有核心处理芯片的控制器,其中,核心处理芯片可以为DSP(Digital Signal Processor,数字信号处理器)、ARM(Advanced Reduced Instruction Set Computer Machines,精简指令集计算机微处理器)或者FPGA(Field-Programmable Gate Array,现场可编程门阵列)等核心处理芯片中的任一种。实现本申请实施例所提供的一种无人机监测方法的方式可以为设置于执行主体中的软件、硬件电路和逻辑电路中的至少一种。The execution body of the UAV monitoring method provided by the embodiment of the present application may be a controller equipped with a core processing chip, wherein the core processing chip may be a DSP (Digital Signal Processor), an ARM. Any of core processing chips such as (Advanced Reduced Instruction Set Computer Machines) or FPGA (Field-Programmable Gate Array). A manner of implementing a UAV monitoring method provided by an embodiment of the present application may be at least one of software, hardware circuits, and logic circuits disposed in an execution body.
如图1所示,本申请实施例所提供的一种无人机监测方法,可以包括如下步骤:As shown in FIG. 1 , a method for monitoring a drone provided by an embodiment of the present application may include the following steps:
S101,通过音频采集阵列采集声音信号。S101. Acquire a sound signal through an audio collection array.
其中,音频采集阵列可以为具有音频采集功能的多个音频采集设备组成的阵列,例如由多个麦克风或者多个声音传感器组成的阵列。为了保证无人机在距需要监控的区域边缘还有很长的一段距离时就可以被监测到,该阵列可以选择安装在需要监控的区域的外围;当然,该阵列还可以安装在需要监控的区域内的重要位置,这样,只会对该重要位置进行监测。通常情况下,可以根据需要监控的区域的面积大小,选择音频采集阵列中音频采集设备的个数,一般而言,需要监控的区域的越大,为了获得同等的检测结果,则需要的音频采集设备就会越多,且音频采集设备的分布也越广。以性能一般的驻极体麦克风为例,可以选择每10厘米到20厘米安装一枚驻极体麦克风,每两个驻极体麦克风之间的距离可以不完全一致,这里不做限定。The audio collection array may be an array of multiple audio collection devices having audio collection functions, such as an array of multiple microphones or multiple sound sensors. In order to ensure that the drone can be monitored at a long distance from the edge of the area to be monitored, the array can be installed on the periphery of the area to be monitored; of course, the array can also be installed in the area to be monitored. An important location within the area so that only that important location is monitored. Generally, the number of audio collection devices in the audio collection array can be selected according to the size of the area to be monitored. Generally speaking, the larger the area to be monitored, the audio collection is required to obtain the same detection result. The more devices there are, the wider the distribution of audio capture devices. For example, in an ordinary electret microphone, an electret microphone can be installed every 10 cm to 20 cm, and the distance between each two electret microphones may not be exactly the same, which is not limited herein.
S102,对声音信号进行信号处理,获得声音信号中至少一个信号特征满足预设条件的声音分量。S102: Perform signal processing on the sound signal to obtain a sound component in which at least one of the sound signals satisfies a preset condition.
在需要监控的区域内,可能会有多个声源,也就是说,可能会有多个物体在附近发声,那么,音频采集阵列采集到的声音信号中就可能包含了多个物体发出的声音以及其他的噪声(例如风噪),因此,在进行识别时,需要先将声音信号中不同物体产生的声音分量进行分离。音频采集阵列采集的声音信号可以为时域信号,也可以为频域信号,则信号特征可以为例如音频、音调、音量、频谱等特征中的任一种或多种,通过信号特征可以对不同的声音分量进行分离。并且,为了更为准确的定位无人机,本申请实施例中定义了一预设条件,该预设条件为信号特征需要达到的条件。由于大部分无人机由 螺旋桨驱动,螺旋桨所产生的声音信号具有明显的特征,所有能够代表无人机的螺旋桨的声音信号特征的属性均可以设定为该预设条件。例如,在噪声中存在有谐波关系的多条谱线,这些谱线在频率轴上呈等间隔排列,基于该特征,可以将具有谱线在频率轴上呈等间隔排列的信号所携带的任何属性信息设定为预设条件。通过分析所有声音分量的信号特征,从中获取信号特征满足预设条件的声音分量,可以滤掉类似于噪声的声音信号,提高了通过声音定位无人机的准确度,并且提高了识别无人机的效率。In the area that needs to be monitored, there may be multiple sound sources, that is, there may be multiple objects in the vicinity, then the sound signals collected by the audio collection array may contain sounds from multiple objects. As well as other noises (such as wind noise), it is necessary to separate the sound components generated by different objects in the sound signal when performing recognition. The sound signal collected by the audio collection array may be a time domain signal or a frequency domain signal, and the signal characteristics may be any one or more of characteristics such as audio, pitch, volume, and frequency spectrum, and the signal characteristics may be different. The sound components are separated. Moreover, in order to locate the drone more accurately, a preset condition is defined in the embodiment of the present application, and the preset condition is a condition that the signal feature needs to reach. Since most of the drones are driven by propellers, the sound signals generated by the propellers have distinct features, and all of the properties of the sound signal characteristics of the propellers that can represent the drone can be set to the preset conditions. For example, there are a plurality of spectral lines having harmonic relations in the noise, and the spectral lines are arranged at equal intervals on the frequency axis. Based on the feature, signals having spectral lines arranged at equal intervals on the frequency axis can be carried. Any attribute information is set as a preset condition. By analyzing the signal characteristics of all sound components, and obtaining the sound components whose signal characteristics meet the preset conditions, the noise signal similar to noise can be filtered out, the accuracy of positioning the drone by sound is improved, and the identification drone is improved. s efficiency.
可选的,上述对声音信号进行信号处理,获得声音信号中至少一个信号特征满足预设条件的声音分量的步骤,可以包括:Optionally, the step of performing signal processing on the sound signal to obtain a sound component that satisfies a predetermined condition in at least one of the sound signals may include:
第一步,对音频采集阵列采集的声音信号进行提取,获得声音信号中至少一个声音分量。In the first step, the sound signal collected by the audio collection array is extracted to obtain at least one sound component in the sound signal.
由于音频采集阵列采集的声音信号可能为多个物体发出的声音,因此,通过对声音信号进行提取,例如,按照声音的频谱对声音信号进行划分,可以得到至少一个声音分量,除该方式以外,其他对声音信号进行提取,得到至少一个声音分量的方式也属于本实施例的保护范围,这里不再赘述。Since the sound signal collected by the audio collection array may be a sound emitted by a plurality of objects, at least one sound component may be obtained by extracting the sound signal, for example, dividing the sound signal according to the spectrum of the sound, in addition to the manner, Other ways of extracting the sound signal to obtain at least one sound component are also within the protection scope of the embodiment, and are not described herein again.
第二步,对于至少一个声音分量中的每个声音分量,获取音频采集阵列中每个音频采集设备采集的能量;对于能量大于预设阈值的声音分量,对音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征。In the second step, for each of the at least one sound component, the energy collected by each audio collection device in the audio collection array is acquired; for the sound component whose energy is greater than the preset threshold, each audio collection device in the audio collection array The frequency domain signals of the collected sound components are fused to obtain spectral features.
对于监测到的某一个物体,离该物体越近的音频采集设备所采集到的该物体的声音分量的能量就越大,那么采集到该物体的声音分量的能量大于预设阈值的音频采集设备的数量越多,说明该物体离需要监控的区域越近,或者位于需要监控的区域内。因此,本实施例中,对每个音频采集设备采集到的该声音分量的能量进行提取,并确定了能量大于预设阈值的音频采集设备,其中,预设阈值与需要监控的区域范围、无人机的音量等因素有关,可以根据第一次安装音频采集阵列时的环境初始化,根据环境噪声提前设定。具体的,为了进一步提高对监控范围的限定,以提升监测的准确性,还可以对能量大于预设阈值的音频采集设备的数量作以限定,例如,能量大于预设阈值的音频采集设备的数量达到音频采集阵列中所有音频采集设备总数量的一半, 才进行频域变换,否则认为声音分量为噪声,不做处理。频谱特征作为无人机的螺旋桨在转动时所发出的声音的一个特点,即在噪声存在有谐波关系的多条谱线,这些谱线在频率轴上呈等间隔排列,因此,将声音分量进行频域变换得到频域信号,进而通过融合得到声音分量的频谱特征。For an object that is detected, the closer the energy collection device of the object is, the greater the energy of the sound component of the object, and the audio collecting device that collects the energy component of the object is greater than a preset threshold. The greater the number, the closer the object is to the area to be monitored or the area to be monitored. Therefore, in this embodiment, the energy of the sound component collected by each audio collection device is extracted, and an audio collection device whose energy is greater than a preset threshold is determined, wherein the preset threshold and the area to be monitored are not included. The volume of the man-machine is related to other factors, and can be initialized according to the environment when the audio acquisition array is installed for the first time. Specifically, in order to further improve the limitation of the monitoring range, to improve the accuracy of the monitoring, the number of audio collection devices whose energy is greater than a preset threshold may be limited, for example, the number of audio collection devices whose energy is greater than a preset threshold. The frequency domain is converted to half of the total number of all audio collection devices in the audio collection array. Otherwise, the sound component is considered as noise and is not processed. The spectral characteristic is a characteristic of the sound emitted by the propeller of the drone when it is rotating, that is, a plurality of spectral lines having harmonic relations in the noise, and the spectral lines are arranged at equal intervals on the frequency axis, and therefore, the sound component is Frequency domain transform is performed to obtain a frequency domain signal, and then the spectral characteristics of the sound component are obtained by fusion.
可选的,上述对于至少一个声音分量中的每个声音分量,获取音频采集阵列中每个音频采集设备采集的能量的步骤,可以包括:Optionally, the step of acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component may include:
针对任一声音分量,在当前时刻的预设周期内,对音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For any sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time to obtain a frame corresponding to the sound component collected by each audio collection device. signal;
计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定能量总和为该音频采集设备采集的该声音分量的能量。Calculating the sum of the energy of the frame signal corresponding to the sound component collected by any of the audio collection devices, and determining the sum of the energy of the sound component collected by the audio collection device.
由于音频采集阵列中每个音频采集设备采集的声音分量为一段时间内的音频,为了保证信号的实时处理,可以在当前时刻的预设周期内,对每个音频采集设备采集的声音分量进行分帧处理,在进行分帧处理后,需要对帧信号进行能量计算,针对每个音频采集设备,对所有帧信号计算出的能量计算总和。Since the sound component collected by each audio collection device in the audio collection array is audio for a period of time, in order to ensure real-time processing of the signal, the sound components collected by each audio collection device may be divided in a preset period of the current time. Frame processing, after performing framing processing, it is necessary to perform energy calculation on the frame signal, and for each audio acquisition device, calculate the sum of energy calculated for all frame signals.
可选的,上述对于能量大于预设阈值的声音分量,对音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征的步骤,可以包括:Optionally, the step of merging the frequency domain signals of the sound component collected by each of the audio collection devices in the audio collection array for the sound component whose energy is greater than the preset threshold to obtain the spectral features may include:
提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
根据预设时频变换方法,对每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to the preset time-frequency transform method, and obtaining the audio acquisition device that is collected by the audio collection device whose energy is greater than a preset threshold. a frequency domain signal of a frame signal corresponding to the sound component;
对所有能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
对每个音频采集设备采集的声音分量进行分帧处理后,可以根据预设时 频变换方法对分帧处理后得到的帧信号进行频域变换,其中,预设时频变换方法可以为傅里叶变换,当然,其他将时域信号变换为频域信号的方法也属于本申请实施例的保护范围,这里不再赘述。在对帧信号进行了频域变换之后,可以对得到的频域信号的多阶谐波信号处理结果进行融合,得到频谱特征。其中,对谐波信号进行融合可以是对有倍频关系的多阶谐波的循环平稳特征进行融合,当然,其他对谐波信号进行融合的方式也属于本申请实施例的保护范围,这里不再赘述。After performing frame processing on the sound component collected by each audio collection device, the frame signal obtained by the frame processing may be subjected to frequency domain transformation according to a preset time-frequency transform method, wherein the preset time-frequency transform method may be The leaf transform, of course, other methods for transforming the time domain signal into the frequency domain signal are also within the scope of protection of the embodiments of the present application, and are not described herein again. After the frequency domain transform of the frame signal, the multi-order harmonic signal processing result of the obtained frequency domain signal can be fused to obtain a spectrum feature. The fusion of the harmonic signals may be a fusion of the cyclostationary features of the multi-order harmonics with the multiplication relationship. Of course, other ways of fusing the harmonic signals are also within the protection scope of the embodiment of the present application. Let me repeat.
可选的,上述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号的步骤之前,还可以包括如下步骤:Optionally, before the step of extracting the frame signal corresponding to the sound component collected by the audio collection device with each energy greater than a preset threshold, the method may further include the following steps:
判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
如果大于,则执行提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号的步骤;If it is greater, performing a step of extracting a frame signal corresponding to the sound component collected by each audio collection device whose energy is greater than a preset threshold;
如果不大于,则确定该声音分量为噪声。If not greater, it is determined that the sound component is noise.
为了进一步提高对监控范围的限定,以提升监测的准确性,还可以对能量大于预设阈值的音频采集设备的数量作以限定,例如,设置一第一预设数目,在能量大于预设阈值的音频采集设备的数量大于该第一预设数目时,才进行提取帧信号的步骤,否则认为声音分量为噪声,不做处理。其中,第一预设数目可以设置为音频采集设备总数目的一半、也可以设置为音频采集设备总数目的四分之一,第一预设数目设置的越小,监测的范围越大,但是准确性可能会降低。通常情况下,将第一预设数目设置为音频采集设备总数目的一半。In order to further improve the limitation of the monitoring range, to improve the accuracy of the monitoring, the number of the audio collecting devices whose energy is greater than the preset threshold may be limited, for example, a first preset number is set, and the energy is greater than the preset threshold. The step of extracting the frame signal is performed only when the number of the audio collection devices is greater than the first preset number, otherwise the sound component is considered to be noise and is not processed. The first preset number may be set to be half of the total number of audio collection devices, or may be set to one quarter of the total number of audio collection devices. The smaller the first preset number is set, the larger the monitoring range is, but the accuracy is May be reduced. Normally, the first preset number is set to half of the total number of audio collection devices.
第三步,获取所有声音分量中频谱特征满足预设频谱特征的声音分量。In the third step, the sound components of all the sound components whose spectral features satisfy the preset spectral features are obtained.
基于无人机的声音信号在噪声中存在有谐波关系的多条谱线,这些谱线在频率轴上呈等间隔排列的特点,可以建立多频循环平稳检测,通过多频循环平稳检测分析该频谱是否为无人机的螺旋桨声音,其中,多频循环平稳检测是利用信号的各阶谐波为相干叠加、而噪声为非相干叠加的性质,具有提高频谱特征的提取能力,降低随机噪声的影响的优势。当然,其他分析频谱特征的方式也属于本申请实施例的保护范围,这里不再赘述。Based on the sound signal of the drone, there are multiple spectral lines with harmonic relationship in the noise. These spectral lines are arranged at equal intervals on the frequency axis. It is possible to establish multi-frequency cyclic stationary detection and analyze the multi-frequency cyclic stationary detection. Whether the spectrum is the propeller sound of the unmanned aerial vehicle, wherein the multi-frequency cyclic stationary detection utilizes the characteristics that the harmonics of the signals are coherently superimposed, and the noise is non-coherently superimposed, which has the ability to extract spectral features and reduce random noise. The advantage of the impact. Certainly, other ways of analyzing the spectrum features are also included in the protection scope of the embodiments of the present application, and are not described herein again.
S103,确定信号特征满足预设条件的声音分量对应的目标物体所在位置。S103. Determine a location of the target object corresponding to the sound component whose signal characteristic meets the preset condition.
通过对信号特征的分析可以确定满足预设条件的声音分量对应的目标物体,通过目标物体的声音分量的音量、音频等属性信息,可以确定该目标物体相较于音频采集设备的位置。The target object corresponding to the sound component satisfying the preset condition can be determined by analyzing the signal feature, and the position of the target object compared to the audio collecting device can be determined by the attribute information such as volume and audio of the sound component of the target object.
可选的,上述确定信号特征满足预设条件的声音分量对应的目标物体所在位置的步骤,可以包括:Optionally, the step of determining that the signal feature meets the location of the target object corresponding to the sound component of the preset condition may include:
针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
然后,根据音频采集阵列中每个音频采集设备相对于摄像头的位置及目标物体相对于每个音频采集设备的位置,确定目标物体所在位置。Then, the location of the target object is determined according to the position of each of the audio collection devices in the audio collection array relative to the camera and the position of the target object relative to each of the audio collection devices.
音频采集阵列中每个音频采集设备相对于摄像头的位置在进行音频采集阵列的排布时,就可以根据音频采集设备提供的位置信息确定下来,也就是说,在确定音频采集阵列的排布时,音频采集阵列中每个音频采集设备相对于摄像头的位置就已确定下来。这里所说的位置,可以是角度,也可以是距离。When the position of each audio collection device in the audio collection array relative to the camera is arranged in the audio collection array, it can be determined according to the position information provided by the audio collection device, that is, when determining the arrangement of the audio collection array. The position of each audio capture device in the audio capture array relative to the camera has been determined. The position mentioned here can be either an angle or a distance.
可选的,在上述在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体的步骤之后,还可以包括以下步骤:Optionally, after the step of determining that the object corresponding to the sound component is the target object when the signal feature of the sound component meets the preset condition, the method further includes the following steps:
第一步,从能量大于预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;The first step is to select an audio collection device that meets a second preset number from the audio collection devices whose energy is greater than a preset threshold;
第二步,将满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;In the second step, the second preset number of audio collection devices are combined to form a pair of audio collection device pairs;
第三步,根据预设时延估计方法,确定目标物体相对于每对音频采集设备对的位置。In the third step, the position of the target object relative to each pair of audio collection devices is determined according to a preset time delay estimation method.
在确定目标物体之后,为了提高计算目标物体所在位置的运算效率,可以从能量大于预设阈值的音频采集设备中,按能量值从大到小的顺序选择出若干个音频采集设备,本实施例中,第二预设数目即为按能量值从大到小确定的若干个,这里可以设置为五个音频采集设备,针对选择的音频采集设备, 进行两两组合,构成音频采集设备对,利用预设时延估计方法可计算出每对音频采集设备对对应的方位角,可以将该方位角转化成摄像头转动的角度。其中,预设时延估计方法可以为经典的时延估计方法,估计声音信号达到不同音频采集设备之间的时间差,通过几何关系可以确定声源的位置,经典的时延估计方法中包括了基于互相关函数的方法、利用语音特征的方法及基于通道传递函数的方法,这些方法均适用于本申请实施例。After determining the target object, in order to improve the computational efficiency of calculating the position of the target object, the audio collection device with the energy greater than the preset threshold may select a plurality of audio collection devices in descending order of energy values, in this embodiment. The second preset number is determined by the energy value from large to small, and can be set as five audio collection devices, and the selected audio collection devices are combined to form an audio collection device pair. The preset time delay estimation method can calculate a corresponding azimuth of each pair of audio collection devices, and can convert the azimuth angle into an angle at which the camera rotates. The preset time delay estimation method may be a classical time delay estimation method, and the estimated sound signal reaches a time difference between different audio collection devices, and the position of the sound source may be determined through a geometric relationship. The classical time delay estimation method includes The method of the cross-correlation function, the method using the speech feature, and the method based on the channel transfer function are all applicable to the embodiments of the present application.
可选的,上述根据音频采集阵列中每个音频采集设备相对于摄像头的位置及目标物体相对于每个音频采集设备的位置,确定目标物体所在位置的步骤,可以包括:Optionally, the step of determining the location of the target object according to the position of each of the audio collection devices in the audio collection array relative to the camera and the position of the target object relative to each of the audio collection devices may include:
根据每个音频采集设备相对于摄像头的位置及目标物体相对于每对音频采集设备对的位置,确定目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
由于噪声的影响,上述确定的目标物体所在位置往往会存在一定的偏差,如果要精准的计算出目标位置,需要对每对音频采集设备对估计出的方位再进行处理。可选的,上述根据每个音频采集设备相对于摄像头的位置及目标物体相对于每对音频采集设备对的位置,确定目标物体所在位置的步骤,可以包括:Due to the influence of noise, the location of the target object determined above often has a certain deviation. If the target position is to be accurately calculated, the estimated orientation of each pair of audio collection devices needs to be processed. Optionally, the step of determining the location of the target object according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of the audio collection device may include:
第一步,根据目标物体相对于每对音频采集设备对的位置,确定目标物体相对于每对音频采集设备对的位置参数,其中,位置参数包括角度值或位移值;In the first step, determining a position parameter of the target object relative to each pair of the audio collection device according to a position of the target object relative to each pair of the audio collection device, wherein the position parameter includes an angle value or a displacement value;
第二步,计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;In the second step, calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
第三步,对小于预设差值的差异值所对应的位置参数求平均值,确定目标物体相对于任一音频采集设备的位置;The third step is to average the position parameters corresponding to the difference value of the preset difference, and determine the position of the target object relative to any of the audio collection devices;
第四步,根据该音频采集设备相对于摄像头的位置及目标物体相对于该音频采集设备的位置,确定目标物体所在位置。In the fourth step, the location of the target object is determined according to the position of the audio collection device relative to the camera and the position of the target object relative to the audio collection device.
在确定目标物体相对于每对音频采集设备对的位置后,该目标物体相对于每对音频采集设备对的角度值或者位移值之间均存在一定的差异,选择差异在预设差值以内的音频采集设备对,例如,对目标物体相对于每对音频采 集设备对的角度,计算出角度的差异值在10度以内的音频采集设备对,再利用求取平均值的方式,确定目标物体的精确位置,可以进一步提高确定目标位置的精准性。After determining the position of the target object relative to each pair of audio collection devices, there is a certain difference between the angle value or the displacement value of the target object relative to each pair of audio collection device pairs, and the selection difference is within the preset difference value. The pair of audio collection devices, for example, calculates an angle of the target object relative to each pair of audio collection devices, calculates an audio collection device pair whose angle difference value is within 10 degrees, and determines the target object by using an average value. Accurate position can further improve the accuracy of determining the target position.
S104,控制摄像头对准目标物体所在位置。S104: Control the position of the camera at the target object.
在确定目标物体所在的位置之后,为了联合音视频双重确认对确定的目标物体进一步确定该目标物体是否为无人机,则控制摄像头对准目标物体,可以通过驱动转动装置带动摄像头转动,当然,这里对摄像头的对准操作不做具体限定,可以是通过摄像头转动对准,也可以是通过摄像头滑动对准,还可以是通过摄像头平移对准。After determining the location of the target object, in order to further determine whether the target object is a drone for the determined target object, the control camera is aimed at the target object, and the camera can be driven to rotate by driving the rotating device. Here, the alignment operation of the camera is not specifically limited, and may be aligned by the camera, or may be aligned by the camera, or may be translated by the camera.
S105,根据摄像头拍摄到的图像,判断目标物体是否为无人机。S105. Determine, according to the image captured by the camera, whether the target object is a drone.
摄像头可以是一直处于拍摄的状态,只用驱动摄像头对准目标物体;也可以是在摄像头在对准目标物体后,接收一拍摄指令,开始对目标物体进行拍摄。通过对拍摄到的目标物体的图像进行识别,判断目标物体是否为无人机,具体的,图像识别技术可以是图像处理领域中的神经网络、像素比对等自动识别技术,也可以是通过人工比对,这里不做具体限定。The camera can be always in the shooting state, and only the driving camera is aimed at the target object; or after the camera is aimed at the target object, a shooting instruction is received to start shooting the target object. By identifying the image of the captured target object, it is determined whether the target object is an unmanned aerial vehicle. Specifically, the image recognition technology may be an automatic recognition technology such as a neural network or a pixel comparison in the field of image processing, or may be artificially Compare, there is no specific limit here.
应用本实施例,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并利用音频和视频的二重确认,防止误报,保证了监测的准确性。Applying the embodiment, the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object. Shooting, according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring.
基于图1所示实施例,如图2所示,本申请实施例提供了另一种无人机监测方法,在S104之后,还可以包括如下步骤:Based on the embodiment shown in FIG. 1, as shown in FIG. 2, the embodiment of the present application provides another UAV monitoring method. After S104, the following steps may be further included:
S201,生成报警指令。S201, generating an alarm instruction.
由于无人机进入需要监控的区域,可能给该区域的用户带来安全隐患,因此,在通过音频判断有目标物体靠近或进入需要监控的区域,则生成报警指令,用于提醒用户存在疑似目标,需要注意警觉。Since the drone enters the area to be monitored, it may bring safety hazards to the users in the area. Therefore, when it is judged by the audio that the target object approaches or enters the area to be monitored, an alarm command is generated to remind the user that there is a suspected target. Need to be alert.
在S105的判断中,如果目标物体是无人机,则执行S202,否则执行S203。In the judgment of S105, if the target object is a drone, S202 is performed, otherwise, S203 is performed.
S202,生成报警增强指令。S202, generating an alarm enhancement instruction.
如果通过视频进一步确认有无人机靠近或进入需要监控的区域,则生成报警增强指令,增强报警的效果,例如增大报警的声音,用于提醒用户区域内存在无人机。If it is further confirmed by video that the drone is approaching or entering the area to be monitored, an alarm enhancement command is generated to enhance the effect of the alarm, for example, increasing the sound of the alarm to remind the user that there is a drone in the area.
S203,消除报警指令。S203, eliminating the alarm instruction.
如果通过视频确认需要监控的区域内没有无人机,则可将报警消除。If there is no drone in the area that needs to be monitored by video, the alarm can be eliminated.
应用本实施例,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并利用音频和视频的二重确认,防止误报,保证了监测的准确性。并通过对存在目标物体时进行报警,对存在无人机时进行报警的增强,及时提醒用户,进一步提高安全性。Applying the embodiment, the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object. Shooting, according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring. And by alerting the presence of the target object, the alarm is enhanced when there is a drone, and the user is promptly reminded to further improve the safety.
基于图1所示实施例,如图3所示,本申请实施例提供了另一种无人机监测方法,在S105之后,还可以包括如下步骤:Based on the embodiment shown in FIG. 1, as shown in FIG. 3, the embodiment of the present application provides another UAV monitoring method. After S105, the following steps may be further included:
S301,在目标物体为无人机时,控制摄像头对无人机进行跟踪拍摄。S301: When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
在监测到有无人机靠近或进入需要监控的区域时,由于无人机一直处于飞行状态,为了更好的跟踪无人机,实时监控无人机的飞行行为,可以控制摄像头对无人机进行跟踪拍摄,具体的,可以根据拍摄到的视频中无人机的运动轨迹,驱动摄像头按照该运动轨迹运动,以实现跟踪拍摄。When the drone is detected to be close to or enters the area to be monitored, since the drone has been in flight, in order to better track the drone and monitor the flight behavior of the drone in real time, the camera can be controlled to the drone. Tracking and shooting is performed. Specifically, according to the motion track of the drone in the captured video, the camera can be driven to move according to the motion track to achieve tracking shooting.
应用本实施例,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并利用音频和视频的二重确认,防止误报,保证了监测的准确性。并通过控制摄像头对无人机进行跟踪拍摄,实时监控无人机的飞行状态,保证监控的实时性和安全性。Applying the embodiment, the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object. Shooting, according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring. And by controlling the camera to track and shoot the drone, real-time monitoring of the flight status of the drone, to ensure real-time monitoring and security.
相应于上述实施例,本申请实施例提供了一种音视频联动装置,如图4所示,该音视频联动装置可以包括:Corresponding to the above embodiment, the embodiment of the present application provides an audio and video linkage device. As shown in FIG. 4, the audio and video linkage device may include:
音频采集阵列410,用于采集声音信号;An audio collection array 410, configured to collect a sound signal;
摄像头420,用于对目标物体进行拍摄;a camera 420 for photographing a target object;
处理器430,用于对所述音频采集阵列410采集的声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;控制所述摄像头420对准所述目标物体所在位置;根据所述摄像头420拍摄到的图像,判断所述目标物体是否为无人机;The processor 430 is configured to perform signal processing on the sound signal collected by the audio collection array 410, obtain a sound component in which at least one of the sound signals meets a preset condition, and determine a sound that the signal feature meets a preset condition. Positioning the target object corresponding to the component; controlling the camera 420 to align the location of the target object; determining, according to the image captured by the camera 420, whether the target object is a drone;
外壳组件440,包括:转动件及底座,其中,所述转动件用于向所述摄像头提供转动动能,所述底座用于固定所述摄像头的底部、所述音频采集阵列及所述处理器。The housing assembly 440 includes a rotating member and a base, wherein the rotating member is configured to provide rotational kinetic energy to the camera, and the base is configured to fix a bottom of the camera, the audio collection array, and the processor.
应用本实施例,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并利用音频和视频的二重确认,防止误报,保证了监测的准确性。Applying the embodiment, the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object. Shooting, according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring.
可选的,所述特征信号可以包括:频谱特征;Optionally, the feature signal may include: a spectrum feature;
所述处理器430,具体可以用于:The processor 430 is specifically configured to:
对所述音频采集阵列410采集的声音信号进行提取,获得所述声音信号中至少一个声音分量;Extracting a sound signal collected by the audio collection array 410 to obtain at least one sound component of the sound signal;
对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列410中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array 410 is subjected to frame processing in a preset period of the current time to obtain the sound component collected by each audio collection device. Corresponding frame signal;
计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
若能量大于预设阈值的音频采集设备的数目小于或等于所述第一预设数目,则确定该声音分量为噪声。If the number of audio collection devices whose energy is greater than a preset threshold is less than or equal to the first preset number, it is determined that the sound component is noise.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
根据所述音频采集阵列410中每个音频采集设备相对于所述摄像头420的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each of the audio collection devices in the audio collection array 410 relative to the camera 420 and the position of the target object relative to each of the audio collection devices.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
根据每个音频采集设备相对于所述摄像头420的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera 420 and the position of the target object relative to each pair of audio collection devices.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
根据该音频采集设备相对于所述摄像头420的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of the audio capture device relative to the camera 420 and the position of the target object relative to the audio capture device.
本申请实施例的音视频联动装置为应用上述无人机监测方法的装置,则上述无人机监测方法的所有实施例均适用于该装置,且均能达到相同或相似的有益效果。The audio-video linkage device of the embodiment of the present application is a device applying the above-described UAV monitoring method, and all the embodiments of the UAV monitoring method are applicable to the device, and all of the same or similar beneficial effects can be achieved.
更进一步的,在包含音频采集阵列410、摄像头420、处理器430、外壳组件440的基础上,如图5所示,本申请实施例所提供的一种音视频联动装置,还可以包括:Further, based on the audio capture array 410, the camera 420, the processor 430, and the outer casing assembly 440, as shown in FIG. 5, the audio and video linkage device provided by the embodiment of the present application may further include:
报警组件510,用于在接收到报警指令时,执行报警操作。The alarm component 510 is configured to perform an alarm operation when receiving an alarm instruction.
报警组件510可以为蜂鸣器、发光二极管、显示器等用于报警的设备,并且报警组件510具有增强报警的功能,例如,蜂鸣器的蜂鸣声越来越大、发光二极管的光亮越来越亮、显示器的显示内容变化等等。其中,报警操作可以为发出蜂鸣声、点亮发光二极管、显示有疑似目标等。The alarm component 510 can be a device for alarms such as a buzzer, a light emitting diode, a display, etc., and the alarm component 510 has a function of enhancing an alarm, for example, the buzzer of the buzzer is getting louder and the light of the light emitting diode is coming. The brighter, the display content of the display changes, and so on. The alarm operation may be a beeping sound, lighting a light emitting diode, displaying a suspected target, and the like.
应用本实施例,通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并利用音频和视频的二重确认,防止误报,保证了监测的准确性。并通过对存在目标物体时进行报警,对存在无人机时进行报警的增强,及时提醒用户,进一步提高安全性。Applying the embodiment, the sound signal collected by the audio collection array is processed to determine the position of the target object corresponding to the sound component whose signal characteristic meets the preset condition, and the camera is controlled to be aligned with the target object according to the position of the target object. Shooting, according to the captured image to determine whether the target object is a drone, thus achieving monitoring of the drone, and using the double confirmation of audio and video to prevent false positives, to ensure the accuracy of monitoring. And by alerting the presence of the target object, the alarm is enhanced when there is a drone, and the user is promptly reminded to further improve the safety.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
在确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置之后,生成报警指令,并发送所述报警指令至所述报警组件510,以驱动所述报警组件510执行报警操作;After determining that the signal feature meets the position of the target object corresponding to the preset sound component, generating an alarm instruction, and transmitting the alarm command to the alarm component 510 to drive the alarm component 510 to perform an alarm operation;
在根据所述摄像头420拍摄到的图像,确定所述目标物体不是无人机时,消除所述报警指令;Determining the alarm instruction when determining that the target object is not a drone according to an image captured by the camera 420;
在根据所述摄像头420拍摄到的图像,确定所述目标物体为无人机时,生成报警增强指令,并发送所述报警增强指令至所述报警组件510,以驱动所述报警组件510执行报警增强操作。And determining, according to the image captured by the camera 420, that the target object is a drone, generating an alarm enhancement command, and transmitting the alarm enhancement command to the alarm component 510 to drive the alarm component 510 to perform an alarm. Enhance operations.
可选的,所述处理器430,具体还可以用于:Optionally, the processor 430 is specifically configured to:
在所述目标物体为无人机时,控制所述摄像头420对所述无人机进行跟踪拍摄。When the target object is a drone, the camera 420 is controlled to perform tracking shooting on the drone.
另外,相应于上述实施例所提供的无人机监测方法,本申请实施例提供了一种存储介质,用于存储可执行代码,所述可执行代码用于在运行时执行:本申请实施例所提供的无人机监测方法;具体的,所述无人机监测方法,包 括:In addition, corresponding to the UAV monitoring method provided by the foregoing embodiment, the embodiment of the present application provides a storage medium for storing executable code, which is used to execute at runtime: the embodiment of the present application The UAV monitoring method provided; specifically, the UAV monitoring method includes:
通过音频采集阵列采集声音信号;Acquiring sound signals through an audio collection array;
对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals satisfies a preset condition;
确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;Determining a location of the target object corresponding to the sound component that satisfies the preset condition;
控制摄像头对准所述目标物体所在位置;Controlling the position of the camera at the target object;
根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机。Determining whether the target object is a drone according to an image captured by the camera.
可选的,所述信号特征包括:频谱特征;Optionally, the signal feature includes: a spectrum feature;
所述对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量,包括:Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals meets a preset condition, including:
对所述音频采集阵列采集的声音信号进行提取,获得所述声音信号中至少一个声音分量;Extracting a sound signal collected by the audio collection array to obtain at least one sound component of the sound signal;
对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
可选的,所述对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量,包括:Optionally, the acquiring the energy collected by each audio collection device in the audio collection array for each of the at least one sound component comprises:
针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained. Frame signal
计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
所述对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征,包 括:And the frequency domain signal of the sound component collected by each audio collection device in the audio collection array is fused to obtain a spectrum feature, where the sound component is greater than a preset threshold, and the spectrum features are obtained, including:
提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
可选的,所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号之前,所述方法还包括:Optionally, before the extracting the frame signal corresponding to the sound component collected by the audio collection device that is greater than the preset threshold, the method further includes:
判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
可选的,所述判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目之后,所述方法还包括:Optionally, after the determining whether the number of the audio collection devices that is greater than the preset threshold is greater than the first preset number, the method further includes:
若否,则确定该声音分量为噪声。If not, it is determined that the sound component is noise.
可选的,所述确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置,包括:Optionally, the determining, by the sound component that the signal feature meets the preset condition, the location of the target object, includes:
针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
可选的,所述在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体之后,所述方法还包括:Optionally, after the determining that the object corresponding to the sound component is the target object, when the signal feature of the sound component meets the preset condition, the method further includes:
从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
所述根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置,包括:Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices, including:
根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
可选的,所述根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置,包括:Optionally, determining the location of the target object according to a location of each audio collection device relative to the camera and a location of the target object relative to each pair of audio collection device, including:
根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
根据该音频采集设备相对于所述摄像头的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。And determining a location of the target object according to a position of the audio collection device relative to the camera and a position of the target object relative to the audio collection device.
可选的,所述确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置之后,所述方法还包括:Optionally, after the determining that the signal feature meets the location of the target object corresponding to the sound component of the preset condition, the method further includes:
生成报警指令;Generate an alarm command;
所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:The method further includes: after determining, according to the image captured by the camera, whether the target object is a drone, the method further includes:
在所述目标物体不是无人机时,消除所述报警指令;Eliminating the alarm command when the target object is not a drone;
在所述目标物体为无人机时,生成报警增强指令。An alarm enhancement command is generated when the target object is a drone.
可选的,所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:Optionally, after the determining whether the target object is a drone according to the image captured by the camera, the method further includes:
在所述目标物体为无人机时,控制所述摄像头对所述无人机进行跟踪拍摄。When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
本实施例中,存储介质存储有在运行时执行本申请实施例所提供的无人机监测方法的应用程序,因此能够实现:通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并保证监测的准确性。In this embodiment, the storage medium stores an application program for performing the UAV monitoring method provided by the embodiment of the present application during operation, so that the sound signal collected by the audio collection array can be processed to determine that the signal characteristic satisfies the pre-processing. Setting the position of the target object corresponding to the conditional sound component, and controlling the camera to aim at the target object according to the position of the target object, and determining whether the target object is a drone according to the captured image, thereby realizing the drone Monitoring and ensuring the accuracy of the monitoring.
另外,相应于上述实施例所提供的无人机监测方法,本申请实施例提供了一种应用程序,用于在运行时执行:本申请实施例所提供的上述无人机监测方法步骤。In addition, corresponding to the UAV monitoring method provided by the foregoing embodiment, the embodiment of the present application provides an application program for performing the following steps of the UAV monitoring method provided by the embodiment of the present application.
本实施例中,应用程序在运行时执行本申请实施例所提供的无人机监测方法,因此能够实现:通过对音频采集阵列采集的声音信号进行处理,确定信号特征满足预设条件的声音分量对应的目标物体所在的位置,并根据该目标物体所在的位置,控制摄像头对准目标物体拍摄,根据拍摄到的图像判断目标物体是否为无人机,从而实现对无人机的监测,并保证监测的准确性。In this embodiment, the application performs the UAV monitoring method provided by the embodiment of the present application during operation, so that the sound component collected by the audio collection array can be processed to determine the sound component whose signal characteristic meets the preset condition. Corresponding position of the target object, according to the position of the target object, controlling the camera to aim at the target object, and determining whether the target object is a drone according to the captured image, thereby realizing monitoring of the drone and ensuring The accuracy of the monitoring.
对于存储介质以及应用程序实施例而言,由于其所涉及的方法内容基本相似于前述的方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the storage medium and the application embodiment, since the method content involved is basically similar to the foregoing method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要 素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同或相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于音视频联动装置、存储介质以及应用程序实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in the present specification are described in a related manner, and the same or similar parts between the respective embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for audio and video linkages, storage media, and application embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant portions can be referred to the description of the method embodiments.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims (23)

  1. 一种无人机监测方法,其特征在于,所述方法包括:A method for monitoring a drone, characterized in that the method comprises:
    通过音频采集阵列采集声音信号;Acquiring sound signals through an audio collection array;
    对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals satisfies a preset condition;
    确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;Determining a location of the target object corresponding to the sound component that satisfies the preset condition;
    控制摄像头对准所述目标物体所在位置;Controlling the position of the camera at the target object;
    根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机。Determining whether the target object is a drone according to an image captured by the camera.
  2. 根据权利要求1所述的方法,其特征在于,所述信号特征包括:频谱特征;The method of claim 1 wherein said signal characteristics comprise: spectral features;
    所述对所述声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量,包括:Performing signal processing on the sound signal to obtain a sound component in which at least one of the sound signals meets a preset condition, including:
    对所述音频采集阵列采集的声音信号进行提取,获得所述声音信号中至少一个声音分量;Extracting a sound signal collected by the audio collection array to obtain at least one sound component of the sound signal;
    对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
    获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  3. 根据权利要求2所述的方法,其特征在于,所述对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量,包括:The method according to claim 2, wherein the acquiring the energy collected by each of the audio collection devices in the audio collection array for each of the at least one sound component comprises:
    针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained. Frame signal
    计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和, 并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
    所述对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征,包括:The frequency domain signal of the sound component collected by each audio collection device in the audio collection array is fused to obtain a frequency characteristic, and the method includes:
    提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
    根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
    对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  4. 根据权利要求3所述的方法,其特征在于,所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号之前,所述方法还包括:The method according to claim 3, wherein the method further comprises: before extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold, the method further comprises:
    判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
    若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
  5. 根据权利要求4所述的方法,其特征在于,所述判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目之后,所述方法还包括:The method according to claim 4, wherein after the determining whether the number of the audio collection devices is greater than the preset threshold, the method further comprises:
    若否,则确定该声音分量为噪声。If not, it is determined that the sound component is noise.
  6. 根据权利要求1所述的方法,其特征在于,所述确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置,包括:The method according to claim 1, wherein the determining the location of the target object corresponding to the sound component that the signal feature meets the preset condition comprises:
    针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
    根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
  7. 根据权利要求6所述的方法,其特征在于,所述在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体之后,所述方法还包括:The method according to claim 6, wherein after the signal characteristic of the sound component satisfies a preset condition, after determining that the object corresponding to the sound component is the target object, the method further includes:
    从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
    将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
    根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
    所述根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置,包括:Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices, including:
    根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  8. 根据权利要求7所述的方法,其特征在于,所述根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置,包括:The method according to claim 7, wherein said determining said target object according to a position of each audio collecting device relative to said camera and a position of said target object relative to each pair of audio collecting device pairs Location, including:
    根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
    计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
    对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
    根据该音频采集设备相对于所述摄像头的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。And determining a location of the target object according to a position of the audio collection device relative to the camera and a position of the target object relative to the audio collection device.
  9. 根据权利要求1至8中任一所述的方法,其特征在于,所述确定所述信 号特征满足预设条件的声音分量对应的目标物体所在位置之后,所述方法还包括:The method according to any one of claims 1 to 8, wherein after the determining that the signal feature satisfies the location of the target object corresponding to the sound component of the preset condition, the method further includes:
    生成报警指令;Generate an alarm command;
    所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:The method further includes: after determining, according to the image captured by the camera, whether the target object is a drone, the method further includes:
    在所述目标物体不是无人机时,消除所述报警指令;Eliminating the alarm command when the target object is not a drone;
    在所述目标物体为无人机时,生成报警增强指令。An alarm enhancement command is generated when the target object is a drone.
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机之后,所述方法还包括:The method according to claim 9, wherein the method further comprises: after determining whether the target object is a drone according to an image captured by the camera, the method further comprising:
    在所述目标物体为无人机时,控制所述摄像头对所述无人机进行跟踪拍摄。When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
  11. 一种音视频联动装置,其特征在于,所述装置包括:An audio and video linkage device, the device comprising:
    音频采集阵列,用于采集声音信号;An audio collection array for collecting sound signals;
    摄像头,用于对目标物体进行拍摄;a camera for shooting a target object;
    处理器,用于对所述音频采集阵列采集的声音信号进行信号处理,获得所述声音信号中至少一个信号特征满足预设条件的声音分量;确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置;控制所述摄像头对准所述目标物体所在位置;根据所述摄像头拍摄到的图像,判断所述目标物体是否为无人机;a processor, configured to perform signal processing on the sound signal collected by the audio collection array, to obtain a sound component in which at least one of the sound signals meets a preset condition; and determine a sound component corresponding to the preset condition Positioning the target object; controlling the position of the camera to be aligned with the target object; determining whether the target object is a drone according to an image captured by the camera;
    外壳组件,包括:转动件及底座,其中,所述转动件用于向所述摄像头提供转动动能,所述底座用于固定所述摄像头的底部、所述音频采集阵列及所述处理器。The housing assembly includes: a rotating member and a base, wherein the rotating member is configured to provide rotational kinetic energy to the camera, and the base is configured to fix a bottom of the camera, the audio collection array, and the processor.
  12. 根据权利要求11所述的装置,其特征在于,所述特征信号包括:频谱特征;The apparatus according to claim 11, wherein said characteristic signal comprises: a spectral characteristic;
    所述处理器,具体用于:The processor is specifically configured to:
    对所述音频采集阵列采集的声音信号进行提取,获得所述声音信号中至 少一个声音分量;Extracting a sound signal collected by the audio collection array to obtain at least one sound component of the sound signal;
    对于所述至少一个声音分量中的每个声音分量,获取所述音频采集阵列中每个音频采集设备采集的能量;对于所述能量大于预设阈值的声音分量,对所述音频采集阵列中每个音频采集设备采集的该声音分量的频域信号进行融合,得到频谱特征;Acquiring, for each of the at least one sound component, the energy collected by each of the audio collection devices in the audio collection array; for the sound component whose energy is greater than a preset threshold, for each of the audio collection arrays The frequency domain signals of the sound component collected by the audio collection device are fused to obtain a spectrum feature;
    获取所有声音分量中频谱特征满足预设频谱特征的声音分量。A sound component in which the spectral features of all sound components satisfy the preset spectral features is obtained.
  13. 根据权利要求12所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 12, wherein the processor is further configured to:
    针对任一声音分量,在当前时刻的预设周期内,对所述音频采集阵列中每个音频采集设备采集的该声音分量进行分帧处理,得到每个音频采集设备采集的该声音分量所对应的帧信号;For each sound component, the sound component collected by each audio collection device in the audio collection array is subjected to frame processing in a preset period of the current time, and the sound component corresponding to each audio collection device is correspondingly obtained. Frame signal
    计算任一音频采集设备采集的该声音分量所对应的帧信号的能量总和,并确定所述能量总和为该音频采集设备采集的该声音分量的能量;Calculating a sum of energy of a frame signal corresponding to the sound component collected by any audio collection device, and determining that the sum of the energy is the energy of the sound component collected by the audio collection device;
    提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号;Extracting a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold;
    根据预设时频变换方法,对每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号进行频域变换,得到每个能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号;Performing frequency domain transformation on a frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than the preset threshold according to a preset time-frequency transform method, to obtain an audio collection in which each energy is greater than the preset threshold. a frequency domain signal of a frame signal corresponding to the sound component collected by the device;
    对所有能量大于所述预设阈值的音频采集设备采集的该声音分量所对应的帧信号的频域信号进行融合,得到该声音分量的频谱特征。The frequency domain signals of the frame signals corresponding to the sound components collected by the audio collection devices whose energy is greater than the preset threshold are fused to obtain the spectral features of the sound components.
  14. 根据权利要求13所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 13, wherein the processor is further configured to:
    判断能量大于预设阈值的音频采集设备的数目是否大于第一预设数目;Determining whether the number of audio collection devices whose energy is greater than a preset threshold is greater than a first preset number;
    若是,则执行所述提取每个能量大于预设阈值的音频采集设备采集的该声音分量所对应的帧信号。If yes, performing the extracting the frame signal corresponding to the sound component collected by the audio collection device whose energy is greater than a preset threshold.
  15. 根据权利要求14所述的装置,其特征在于,所述处理器,具体还用 于:The device according to claim 14, wherein the processor is further configured to:
    若能量大于预设阈值的音频采集设备的数目小于或等于所述第一预设数目,则确定该声音分量为噪声。If the number of audio collection devices whose energy is greater than a preset threshold is less than or equal to the first preset number, it is determined that the sound component is noise.
  16. 根据权利要求11所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 11, wherein the processor is further configured to:
    针对任一声音分量,在该声音分量的信号特征满足预设条件时,确定该声音分量对应的物体为目标物体;For any sound component, when the signal characteristic of the sound component satisfies a preset condition, determining that the object corresponding to the sound component is the target object;
    根据所述音频采集阵列中每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每个音频采集设备的位置,确定所述目标物体所在位置。Determining a location of the target object according to a position of each of the audio collection devices in the audio collection array relative to the camera and a position of the target object relative to each of the audio collection devices.
  17. 根据权利要求16所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 16, wherein the processor is further configured to:
    从能量大于所述预设阈值的音频采集设备中选择满足第二预设数目的音频采集设备;Selecting, from an audio collection device whose energy is greater than the preset threshold, a second preset number of audio collection devices;
    将所述满足第二预设数目的音频采集设备进行两两组合,构建多对音频采集设备对;And combining the second preset number of audio collection devices to form a pair of audio collection device pairs;
    根据预设时延估计方法,确定所述目标物体相对于每对音频采集设备对的位置;Determining, according to a preset time delay estimation method, a position of the target object with respect to each pair of audio collection device pairs;
    根据每个音频采集设备相对于所述摄像头的位置及所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体所在位置。The location of the target object is determined according to the position of each audio collection device relative to the camera and the position of the target object relative to each pair of audio collection devices.
  18. 根据权利要求17所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 17, wherein the processor is further configured to:
    根据所述目标物体相对于每对音频采集设备对的位置,确定所述目标物体相对于每对音频采集设备对的位置参数,其中,所述位置参数包括角度值或位移值;Determining a position parameter of the target object relative to each pair of audio collection device pairs according to a position of the target object relative to each pair of audio collection device pairs, wherein the position parameter includes an angle value or a displacement value;
    计算每两个位置参数之间的差异值,选择小于预设差值的差异值所对应的位置参数;Calculating a difference value between each two position parameters, and selecting a position parameter corresponding to the difference value of the preset difference value;
    对所述小于预设差值的差异值所对应的位置参数求平均值,确定所述目标物体相对于任一音频采集设备的位置;Locating a position parameter corresponding to the difference value smaller than the preset difference, and determining a position of the target object relative to any audio collection device;
    根据该音频采集设备相对于所述摄像头的位置及所述目标物体相对于该音频采集设备的位置,确定所述目标物体所在位置。And determining a location of the target object according to a position of the audio collection device relative to the camera and a position of the target object relative to the audio collection device.
  19. 根据权利要求11所述的装置,其特征在于,所述装置,还包括:The device according to claim 11, wherein the device further comprises:
    报警组件,用于在接收到报警指令时,执行报警操作。The alarm component is used to perform an alarm operation when an alarm command is received.
  20. 根据权利要求19所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 19, wherein the processor is further configured to:
    在确定所述信号特征满足预设条件的声音分量对应的目标物体所在位置之后,生成报警指令,并发送所述报警指令至所述报警组件,以驱动所述报警组件执行报警操作;After determining that the signal feature meets the position of the target object corresponding to the sound component of the preset condition, generating an alarm instruction, and transmitting the alarm instruction to the alarm component to drive the alarm component to perform an alarm operation;
    在根据所述摄像头拍摄到的图像,确定所述目标物体不是无人机时,消除所述报警指令;Determining the alarm command when determining that the target object is not a drone according to an image captured by the camera;
    在根据所述摄像头拍摄到的图像,确定所述目标物体为无人机时,生成报警增强指令,并发送所述报警增强指令至所述报警组件,以驱动所述报警组件执行报警增强操作。And determining, according to the image captured by the camera, that the target object is a drone, generating an alarm enhancement command, and transmitting the alarm enhancement command to the alarm component to drive the alarm component to perform an alarm enhancement operation.
  21. 根据权利要求20所述的装置,其特征在于,所述处理器,具体还用于:The device according to claim 20, wherein the processor is further configured to:
    在所述目标物体为无人机时,控制所述摄像头对所述无人机进行跟踪拍摄。When the target object is a drone, the camera is controlled to perform tracking shooting on the drone.
  22. 一种存储介质,其特征在于,用于存储可执行代码,所述可执行代码用于在运行时执行:权利要求1-10任一所述的方法步骤。A storage medium for storing executable code for performing at runtime: the method steps of any of claims 1-10.
  23. 一种应用程序,其特征在于,用于在运行时执行:权利要求1-10任一所述的方法步骤。An application, characterized in that it is executed at runtime: the method steps of any of claims 1-10.
PCT/CN2018/086565 2017-05-17 2018-05-11 Unmanned aerial vehicle monitoring method and audio/video linkage apparatus WO2018210192A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710349350.2 2017-05-17
CN201710349350.2A CN108965789B (en) 2017-05-17 2017-05-17 Unmanned aerial vehicle monitoring method and audio-video linkage device

Publications (1)

Publication Number Publication Date
WO2018210192A1 true WO2018210192A1 (en) 2018-11-22

Family

ID=64273421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/086565 WO2018210192A1 (en) 2017-05-17 2018-05-11 Unmanned aerial vehicle monitoring method and audio/video linkage apparatus

Country Status (2)

Country Link
CN (1) CN108965789B (en)
WO (1) WO2018210192A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109714572A (en) * 2018-12-28 2019-05-03 深圳市微纳感知计算技术有限公司 A kind of intelligent safety and defence system of sound view linkage
CN111866454A (en) * 2020-07-02 2020-10-30 广州博冠智能科技有限公司 Sound and image linkage detection early warning method and device
CN112380933B (en) * 2020-11-02 2023-11-07 中国兵器工业计算机应用技术研究所 Unmanned aerial vehicle target recognition method and device and unmanned aerial vehicle
CN112698665A (en) * 2020-12-28 2021-04-23 同济大学 Unmanned aerial vehicle detection positioning method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105203999A (en) * 2015-10-20 2015-12-30 陈昊 Rotorcraft early-warning device and method
CN205139360U (en) * 2015-10-20 2016-04-06 陈昊 Rotor craft early warning device
CN106057195A (en) * 2016-05-25 2016-10-26 东华大学 Unmanned aerial vehicle detection system based on embedded audio recognition

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202077131U (en) * 2010-07-16 2011-12-14 弭强 Video monitor system based on sound positioning
US20130192451A1 (en) * 2011-06-20 2013-08-01 Steven Gregory Scott Anti-sniper targeting and detection system
KR20130039844A (en) * 2011-10-13 2013-04-23 주식회사 나임코리아 Intelligent camera appartus and intelligent surveillance metod using the same
CN104581021A (en) * 2013-10-23 2015-04-29 西安群丰电子信息科技有限公司 Video monitoring system based on sound positioning
CN105321516B (en) * 2014-06-30 2019-06-04 美的集团股份有限公司 Sound control method and system
CN106412488A (en) * 2015-07-29 2017-02-15 中兴通讯股份有限公司 Monitoring system and method
CN105357442A (en) * 2015-11-27 2016-02-24 小米科技有限责任公司 Shooting angle adjustment method and device for camera
CN105550636B (en) * 2015-12-04 2019-03-01 中国电子科技集团公司第三研究所 A kind of method and device of target type discrimination
CN106341665A (en) * 2016-09-30 2017-01-18 浙江宇视科技有限公司 Tracking monitoring method and device
CN106627646B (en) * 2016-11-25 2019-06-21 杭州捍鹰科技有限公司 Train protection apparatus and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105203999A (en) * 2015-10-20 2015-12-30 陈昊 Rotorcraft early-warning device and method
CN205139360U (en) * 2015-10-20 2016-04-06 陈昊 Rotor craft early warning device
CN106057195A (en) * 2016-05-25 2016-10-26 东华大学 Unmanned aerial vehicle detection system based on embedded audio recognition

Also Published As

Publication number Publication date
CN108965789B (en) 2021-03-12
CN108965789A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
WO2018210192A1 (en) Unmanned aerial vehicle monitoring method and audio/video linkage apparatus
CN107223332B (en) Audio visual scene analysis based on acoustic camera
Crocco et al. Audio surveillance: A systematic review
Planinc et al. Introducing the use of depth data for fall detection
US9899025B2 (en) Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
WO2019080639A1 (en) Object identifying method, computer device and computer readable storage medium
WO2017202169A1 (en) Access control data processing method, access control method, and electronic apparatus
US10438069B2 (en) Method and apparatus for detecting abnormal situation, and non-transitory computer-readable recording medium
CN111724558B (en) Monitoring method, monitoring device and intrusion alarm system
Andersson et al. Fusion of acoustic and optical sensor data for automatic fight detection in urban environments
JPWO2019044157A1 (en) Sound collecting device, sound collecting method, and program
US10290197B2 (en) Mass transit surveillance camera system
EP3469391A1 (en) Methods and systems for sound source locating
WO2016131361A1 (en) Monitoring system and method
JP6588413B2 (en) Monitoring device and monitoring method
WO2019239667A1 (en) Sound-collecting device, sound-collecting method, and program
CN113568435A (en) Unmanned aerial vehicle autonomous flight situation perception trend based analysis method and system
Siewert et al. Slew-to-cue electro-optical and infrared sensor network for small UAS detection, tracking and identification
Chang et al. Mmvg-inf-etrol@ trecvid 2019: Activities in extended video
US9992593B2 (en) Acoustic characterization based on sensor profiling
Kolamunna et al. Acousticprint: Acoustic signature based open set drone identification
KR101520446B1 (en) Monitoring system for prevention beating and cruel act
Basnyat et al. Environmental sound classification for flood event detection
Alaparthy et al. Machine Learning vs. Human Performance in the Realtime Acoustic Detection of Drones
Alaparthy et al. A comparison of machine learning and human performance in the real-time acoustic detection of drones

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18802098

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18802098

Country of ref document: EP

Kind code of ref document: A1