CN117424982B - Intelligent distribution control ball and distribution control method thereof - Google Patents

Intelligent distribution control ball and distribution control method thereof Download PDF

Info

Publication number
CN117424982B
CN117424982B CN202311341630.0A CN202311341630A CN117424982B CN 117424982 B CN117424982 B CN 117424982B CN 202311341630 A CN202311341630 A CN 202311341630A CN 117424982 B CN117424982 B CN 117424982B
Authority
CN
China
Prior art keywords
target
area
feature vector
image
garbage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311341630.0A
Other languages
Chinese (zh)
Other versions
CN117424982A (en
Inventor
刘轴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jianruan Technology Co ltd
Original Assignee
Guangzhou Jianruan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jianruan Technology Co ltd filed Critical Guangzhou Jianruan Technology Co ltd
Priority to CN202311341630.0A priority Critical patent/CN117424982B/en
Publication of CN117424982A publication Critical patent/CN117424982A/en
Application granted granted Critical
Publication of CN117424982B publication Critical patent/CN117424982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/20Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from infrared radiation only
    • H04N23/23Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from infrared radiation only from thermal infrared radiation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/188Capturing isolated or intermittent images triggered by the occurrence of a predetermined event, e.g. an object reaching a predetermined position

Abstract

The application relates to the technical field of control balls, and provides an intelligent control ball and a control method thereof, wherein the intelligent control ball comprises a thermal imaging device, a shooting device, a KA200 brain chip and a control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with a KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device; the distributed control monitoring device comprises an overhead working personnel behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit. According to the intelligent control ball provided by the application, when the abnormal condition is identified through the built-in abnormal identification module, the shooting device can be adjusted through horizontal rotation or pitching rotation, and then the focal length of the zoom lens of the shooting device is adjusted, so that the abnormal condition can be accurately shot.

Description

Intelligent distribution control ball and distribution control method thereof
Technical Field
The application relates to the technical field of distribution control balls, in particular to an intelligent distribution control ball and a distribution control method thereof.
Background
Video monitoring is indispensable under many situations of existing life, especially in relatively important places, continuous and effective monitoring is often needed, and functions of video monitoring equipment in the prior art are single, namely single functions can only be realized by single video monitoring equipment, and functional scenes outside the single video monitoring equipment cannot be identified, so that the video monitoring equipment in the prior art cannot accurately shoot abnormal situations.
Disclosure of Invention
The embodiment of the application provides an intelligent control ball and a control method thereof, aiming at realizing accurate shooting of abnormal conditions.
In a first aspect, an embodiment of the present application provides an intelligent control ball, including a thermal imaging device, a shooting device, a KA200 brain chip and a control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with the KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device;
The distribution monitoring device comprises an overhead operator behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit;
The shooting device is used for: collecting video stream data;
the thermal imaging device is used for: collecting point cloud data;
The aerial working personnel behavior recognition unit is used for: acquiring current time position information, current time lens attitude information and lens parameter information of the shooting device and a current time image of a designated high-altitude area in the video stream data; marking the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographic position information of the area where the monitoring target is located in the designated high-altitude area and digital elevation model data; performing high-altitude operation personnel behavior recognition on the area where the monitoring target is located, acquiring a human body frame, a human face frame and a safety helmet frame of the high-altitude operation personnel when the high-altitude operation personnel exist in the area where the monitoring target is located, and displaying the human body frame, the human face frame and the safety helmet frame on a monitoring interface;
The terminal browsing behavior recognition unit is used for: performing terminal browsing behavior identification and control on each frame of video image in the video stream data to obtain a head position area and a hand position area of each frame of video image, and identifying whether a user has browsing terminal equipment behaviors according to the head position area and the hand position area;
The vehicle abnormal parking identification unit is used for: carrying out abnormal vehicle parking identification and control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifying whether abnormal vehicle parking behaviors exist or not according to the vehicle parking detection frame and the parking area detection frame;
The garbage throwing behavior recognition unit is used for: extracting at least two frames of images from the video stream data in real time, and identifying a target main body in each frame of the images, wherein the target main body comprises at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing the garbage to be thrown; analyzing the garbage throwing objects and the garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects; identifying a target garbage event of garbage mixed throwing according to the reduced number of garbage to be thrown carried by the garbage throwing object or the combined number of garbage in each garbage container recorded in the target image;
the first environment authenticity identification unit is used for: determining a first discrimination feature vector and a second discrimination feature vector corresponding to the video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between each frame of face image in the video stream data; determining a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing the time domain feature information and the frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a fake environment or not;
The second environment authenticity identification unit is used for: preprocessing the video stream data to obtain a plurality of video clips; the video stream data includes audio, each of the video clips including the audio; for each video segment, respectively extracting a video feature vector of the video segment and an audio feature vector of the audio in the video segment; determining a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each of the video feature vectors, each of the audio feature vectors, the total video feature vector, and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a fake environment;
The target tracking unit is used for: acquiring a target image of video stream data and first point cloud data corresponding to the target image acquired by a thermal imaging device; determining second point cloud data of the target to be tracked based on the target image and the first point cloud data; determining pose information of the target to be tracked at the next moment based on the second point cloud data; and tracking the target to be tracked based on pose information of the target to be tracked at the next moment.
In a second aspect, an embodiment of the present application provides a method for controlling an intelligent control ball, which is applied to the intelligent control ball in the first aspect, and includes:
collecting video stream data and point cloud data;
And carrying out behavior recognition on the aerial working personnel based on the video stream data, wherein the behavior recognition comprises the following specific steps: acquiring current time position information, current time lens attitude information and lens parameter information of the shooting device and a current time image of a designated high-altitude area in the video stream data; marking the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographic position information of the area where the monitoring target is located in the designated high-altitude area and digital elevation model data; performing high-altitude operation personnel behavior recognition on the area where the monitoring target is located, acquiring a human body frame, a human face frame and a safety helmet frame of the high-altitude operation personnel when the high-altitude operation personnel exist in the area where the monitoring target is located, and displaying the human body frame, the human face frame and the safety helmet frame on a monitoring interface; or alternatively, the first and second heat exchangers may be,
Terminal browsing behavior identification and control are carried out based on the video stream data, and specifically the method comprises the following steps: performing terminal browsing behavior identification and control on each frame of video image in the video stream data to obtain a head position area and a hand position area of each frame of video image, and identifying whether a user has browsing terminal equipment behaviors according to the head position area and the hand position area; or alternatively, the first and second heat exchangers may be,
The abnormal parking identification of the vehicle based on the video stream data comprises the following specific steps: carrying out abnormal vehicle parking identification and control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifying whether abnormal vehicle parking behaviors exist or not according to the vehicle parking detection frame and the parking area detection frame; or alternatively, the first and second heat exchangers may be,
The garbage mixing and throwing behavior identification is carried out based on the video stream data, and specifically comprises the following steps: extracting at least two frames of images from the video stream data in real time, and identifying a target main body in each frame of the images, wherein the target main body comprises at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing the garbage to be thrown; analyzing the garbage throwing objects and the garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects; identifying a target garbage event of garbage mixed throwing according to the reduced number of garbage to be thrown carried by the garbage throwing object or the combined number of garbage in each garbage container recorded in the target image; or alternatively, the first and second heat exchangers may be,
And carrying out environment authenticity identification based on the video stream data, wherein the method specifically comprises the following steps: determining a first discrimination feature vector and a second discrimination feature vector corresponding to the video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between each frame of face image in the video stream data; determining a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing the time domain feature information and the frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a fake environment or not; or alternatively, the first and second heat exchangers may be,
And carrying out environment authenticity identification based on the video stream data, wherein the method specifically comprises the following steps: preprocessing the video stream data to obtain a plurality of video clips; the video stream data includes audio, each of the video clips including the audio; for each video segment, respectively extracting a video feature vector of the video segment and an audio feature vector of the audio in the video segment; determining a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each of the video feature vectors, each of the audio feature vectors, the total video feature vector, and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a fake environment; or alternatively, the first and second heat exchangers may be,
And carrying out target tracking based on the video stream data and the point cloud data, wherein the target tracking comprises the following specific steps: acquiring a target image of video stream data and first point cloud data corresponding to the target image acquired by a thermal imaging device; determining second point cloud data of the target to be tracked based on the target image and the first point cloud data; determining pose information of the target to be tracked at the next moment based on the second point cloud data; and tracking the target to be tracked based on pose information of the target to be tracked at the next moment.
In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the method for controlling an intelligent control ball according to the second aspect is implemented.
In a fourth aspect, an embodiment of the present application provides a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium includes a computer program, where the computer program when executed by a processor implements the method for controlling an intelligent control ball according to the second aspect.
The intelligent control ball and the control method thereof provided by the embodiment of the application comprise a thermal imaging device, a shooting device, a KA200 brain chip and a control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with a KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device; the distributed control monitoring device comprises an overhead working personnel behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit. In the process of carrying out the cloth control at intelligent cloth control ball, the intelligent cloth control ball realizes when discernment abnormal conditions through built-in abnormal recognition module, and accessible horizontal rotation or every single move rotation adjustment shooting device carries out zoom to shooting device and adjusts the focus, realizes the accurate shooting to the abnormal conditions.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an intelligent control ball according to an embodiment of the present application;
fig. 2 is a second schematic structural diagram of an intelligent control ball according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an intelligent control ball according to an embodiment of the present application. The embodiment of the application provides an intelligent cloth control ball which comprises a thermal imaging device, a shooting device, a KA200 brain chip and a cloth control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with a KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device; the distributed control monitoring device comprises an overhead working personnel behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit.
Optionally, the shooting device collects video stream data.
Optionally, the aerial work personnel behavior recognition unit obtains current time position information, current time lens attitude information and lens parameter information of the shooting device, and a current time image of a designated aerial region in the video stream data.
It will be appreciated that for a large area, only a portion of the large area may be monitored, for example: planar areas, high floor areas, overhead work areas, etc., and therefore, the area to be monitored in the embodiments of the present application is a designated overhead area, i.e., an overhead work area.
It can be understood that, because the shooting range of the camera is limited, the number of shooting devices in the embodiment of the application is multiple, each shooting device can acquire images of partial areas in the designated high-altitude area, and the images acquired by all the shooting devices at a certain moment are spliced to acquire complete images of the designated high-altitude area. Optionally, in the embodiment of the application, the intelligent control ball can be arranged at the bottom of an unmanned aerial vehicle, and the unmanned aerial vehicle drives the shooting device in the intelligent control ball to obtain the image of the appointed high-altitude area in the process of flying in the appointed high-altitude area.
It should be noted that, the photographing device in the embodiment of the present application may refer to any photographing camera; the current time image of the designated high-altitude area may refer to an image obtained by photographing the designated high-altitude area at the current time of the photographing device.
It should be noted that, in the embodiment of the present application, the intelligent control ball is built with a global navigation satellite system (Global Navigation SATELLITE SYSTEM, GNSS) and a receiver and inertial navigation system (Inertial Navigation System, INS). The lens of the photographing device can be rotated in the horizontal direction and the vertical direction. The intelligent control ball is also provided with communication equipment.
As an optional embodiment, before acquiring the current time image of the designated high-altitude area, the current time position information of the photographing device, and the lens current time pose information of the photographing device, the method further includes: and controlling the lens of the shooting device to rotate in the horizontal direction and/or the vertical direction according to a preset rule. Optionally, in the embodiment of the present application, the rotation angle of the lens of the photographing device in the horizontal direction and/or the vertical direction may be controlled to be changed once every a preset period, and under the condition that the rotation angle of the lens of the photographing device is changed, the photographing device is controlled to collect an image of a designated high-altitude area. Correspondingly, in the embodiment of the application, the time length of the interval between the previous time and the current time is the preset time length.
Optionally, in the embodiment of the present application, the lens of the photographing device may be controlled to rotate at a uniform speed in a horizontal direction and/or a vertical direction, and the photographing device may be controlled to collect video images of a designated high-altitude area. Accordingly, in the embodiment of the application, the time length of the interval between the current time and the next time can be determined according to the time length of the interval between any two frames of images in the video image data of the appointed high-altitude area acquired by the shooting device, namely, the image of the last time of the appointed high-altitude area is the image of the last frame in the video images shot by the shooting device, and the image of the current time of the appointed high-altitude area is the image of the current frame in the video images shot by the shooting device.
After the shooting device shoots the current time image of the appointed high-altitude area, the current time image of the appointed high-altitude area can be sent to the area monitoring device in a data communication mode, and then video image data of the appointed high-altitude area shot by the shooting device can be read based on a machine vision technology algorithm, and an image of a current frame is extracted and used as the current time image of the appointed high-altitude area. The OpenCV (Open Source Computer Vision Library) is a cross-platform computer vision and machine learning software library written based on the C++ language, can be operated on various systems, has good portability, provides Python, MATLAB, java, C # and other various programming language interfaces, and can realize various functions such as image processing, target detection, motion tracking, object recognition and the like. OpenCV has unique advantages in the field of computer vision because it is dedicated to real-time processing of real-world images, and its execution speed is greatly improved by code optimization. The application of the OpenCV visual library to process the multi-frame pictures per second such as video images can embody the speed advantages of the images to the greatest extent.
According to the embodiment of the application, based on an OpenCV image processing algorithm, video image data of the appointed high-altitude area shot by the shooting device can be read, and the image of the current frame is extracted and used as the current-moment image of the appointed high-altitude area. In the embodiment of the application, the current time position information of the shooting device and the current time posture information of the lens of the shooting device can be described by using the current time position and posture system (position and orientation system, POS) data of the shooting device. The POS data of the photographing device may include information such as Latitude (latitudes), longitude (Longitude), elevation (Elevation), heading angle (Phi), pitch angle (Omega), roll angle (Kappa), and the like. Based on the running data acquired by the GNSS receiver and the INS system arranged in the intelligent control ball at the current moment, POS data of the shooting device at the current moment can be acquired. The shooting device can send operation data acquired at the current moment of a GNSS receiver and an INS system which are arranged in the intelligent control ball to the area monitoring device in a data communication mode. It should be noted that, the current time position information of the camera may include latitude, longitude, and altitude of the current time of the camera. Based on the operation data acquired at the current moment of the GNSS receiver arranged in the intelligent control ball, the current moment position information of the shooting device can be acquired.
Note that, the current lens position information of the camera may include the euler angle (h, p, r) of the current lens position of the camera (HEADING PITCH Roll). Because the lens of the shooting device can only rotate in the horizontal direction and the vertical direction, in the implementation of the application, the rotation point of the lens of the shooting device is taken as an original point, the horizontal direction is taken as an X axis, the vertical direction is taken as a Z axis, a space rectangular coordinate system is established, and further the rotation of the lens of the target camera can be regarded as Euler angle hpr which rotates around the X axis for an h angle, then around the Y axis for a p angle and finally around the Z axis for an r angle. Wherein the angle p is 0 °.
Based on the operation data acquired at the current moment of the INS system built in the intelligent cloth control ball, the Euler angle (h, p, r) of the current moment of the lens of the shooting device can be calculated
Optionally, the aerial working personnel behavior recognition unit marks the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographical position information of the area where the monitoring target is located in the designated high-altitude area and the digital elevation model data, specifically:
The specifying the geographical position information of the area where the monitoring target is located in the high-altitude area may include specifying the geographical position information of each ground point on the boundary of the area where the monitoring target is located in the high-altitude area; the geographical position information of the area where the monitoring target is located in the high-altitude area may be specified, and the geographical position information of each ground point in the area where the monitoring target is located in the high-altitude area may be also included. The geographic location information in the embodiment of the application may include longitude and latitude. Optionally, in the embodiment of the present application, the area where the monitoring target in the specified high-altitude area is located may be marked in the remote sensing image of the specified high-altitude area by means of a map spot, so as to obtain a map spot remote sensing image of the specified high-altitude area. The pattern spots of the area where the monitoring target is located in the remote sensing image of the designated high-altitude area can be generated by means of manual sketching, land block segmentation and the like. After the image spot remote sensing image of the designated high-altitude area is obtained, the geographic position information of the area where the monitoring target in the designated high-altitude area is located can be obtained based on the image spot remote sensing image of the designated high-altitude area.
The main key (PRIMARY KEY) is set in the geographical position information of the area where the monitoring target is located in the designated high-altitude area, and is used for searching the geographical position information of the area where the monitoring target is located in the designated high-altitude area. In the embodiment of the application, the geographical position information of the area where the monitoring target is located in the designated high-altitude area can be obtained through data query, user input and other modes. It can be understood that, because the geographical position information of the area where the monitoring target is located in the designated high-altitude area includes plane information, but the current time image of the designated high-altitude area is acquired from the air by the shooting device, the characteristics of the designated high-altitude area in the vertical direction need to be expressed by the digital elevation model data of the area where the monitoring target is located in the designated high-altitude area. The digital elevation model (Digital Elevation Model, abbreviated as DEM) realizes the digital simulation of the ground terrain (namely the digital expression of the surface morphology of the terrain) through the limited terrain elevation data. In the embodiment of the application, the digital elevation model data of the area where the monitoring target is located in the designated high-altitude area is downloaded and acquired from the network open-source geographic database such as the geographic space data cloud, and the digital elevation model data of the area where the monitoring target is located in the designated high-altitude area can be also input or received by a user and transmitted by other electronic equipment.
In order to improve the accuracy of monitoring the area where the monitoring target is located in the designated high-altitude area, digital elevation model data of the area where the monitoring target is located in the designated high-altitude area with higher spatial resolution should be selected. The lens parameter information of the photographing device in the embodiment of the application may include a focal length, a resolution, and a CMOS (Complementary Metal Oxide Semiconductor ) imaging size. The factory information of the photographing device may include lens parameter information of the photographing device. According to the embodiment of the application, the lens parameter information of the shooting device can be obtained through information inquiry, user input and other modes.
After geographical position information and digital elevation model data of the area where the monitoring target is located in the designated high-altitude area, current time position information of the shooting device, current time posture information of a lens of the shooting device and lens parameter information of the shooting device are obtained, the area where the monitoring target is located can be marked in the current time image of the designated high-altitude area through numerical calculation, deep learning, model design and other modes. As an alternative embodiment, marking the area where the monitoring target is located in the current time image of the designated high altitude area based on the geographical position information and the digital elevation model data of the area where the monitoring target is located in the designated high altitude area, the current time position information of the photographing device, the current time pose information of the lens of the photographing device, and the lens parameter information of the photographing device, includes: and determining target geographic position information and target digital elevation model data corresponding to the current time image of the designated high-altitude area from the geographic position information and digital elevation model data of the area where the monitoring target is located in the designated high-altitude area based on the current time position information of the shooting device or based on the current time position information of the shooting device and the current time posture information of the lens of the shooting device. Specifically, after geographical position information and digital elevation model data of an area where a monitoring target in a designated high-altitude area is located, current time position information of a shooting device, current time gesture information of a lens of the shooting device and lens parameter information of the shooting device are obtained, target geographical position information and target data elevation model data corresponding to a current time image of the designated high-altitude area can be determined through numerical calculation and data screening.
As an optional embodiment, determining, based on the current time position information of the photographing device, target geographic position information and target digital elevation model data corresponding to a current time image of a designated high-altitude area from geographic position information and digital elevation model data of an area where a monitoring target is located in the designated high-altitude area, includes: based on the current time position information of the shooting device, the vertical projection point of the shooting device in the designated high-altitude area; a circular area taking the vertical projection point as a circle center and taking a preset distance as a radius is determined as an associated area of the shooting device; and screening the geographic position information and the digital elevation model data of the associated area of the shooting device from the geographic position information and the digital elevation model data of the area where the monitoring target is located in the designated high-altitude area, and taking the geographic position information and the digital elevation model data as target geographic position information and target digital elevation model data.
Specifically, after geographical position information and digital elevation model data of an area where a monitoring target is located in a specified high-altitude area, current time position information of a shooting device, current time gesture information of a lens of the shooting device and lens parameter information of the shooting device are obtained, a projection point of the shooting device in the specified high-altitude area can be determined based on the current time position information of the shooting device, and then a circular area taking the projection point as a center and a preset distance as a radius can be determined as a relevant area corresponding to the shooting device. It should be noted that the preset distance may be determined according to a priori knowledge and/or actual conditions. The specific value of the preset distance in the embodiment of the application is not limited. Alternatively, the preset distance may range from 400 meters to 600 meters, for example, the preset distance may be 400 meters, 500 meters, or 600 meters. Preferably, the preset distance may be 500 meters.
After determining the associated area corresponding to the shooting device, the area where the monitoring target is located in the associated area corresponding to the shooting device can be determined based on the geographic position information of the area where the monitoring target is located in the designated high-altitude area, and then the geographic position information of the area where the monitoring target is located in the associated area corresponding to the shooting device can be screened from the geographic position information of the area where the monitoring target is located in the designated high-altitude area, and the geographic position information is used as the target geographic position information corresponding to the current moment image of the designated high-altitude area. It may be understood that the target geographic location information includes geographic location information of each ground point in the associated area corresponding to the photographing device.
After determining the area where the monitoring target is located in the associated area corresponding to the shooting device, the digital elevation model data of the area where the monitoring target is located in the associated area corresponding to the shooting device can be screened from the digital elevation model data of the area where the monitoring target is located in the designated high-altitude area, and the digital elevation model data is used as the target digital elevation model data corresponding to the current moment image of the designated high-altitude area.
It can be understood that the target digital elevation model data includes digital elevation model data of each ground point in the associated area corresponding to the photographing device. And acquiring pixel coordinate values corresponding to the target geographic position information based on the target digital elevation model data, the target geographic position information, the current moment posture information of the lens of the shooting device and the lens parameter information of the shooting device.
Specifically, after the target geographic position information and the target digital elevation model data corresponding to the current time image of the designated high-altitude area, the current time pose information of the lens of the shooting device and the lens parameter information of the shooting device are obtained, the area where the monitoring target is located can be marked in the current time image of the designated high-altitude area in a numerical calculation mode.
As an optional embodiment, acquiring pixel coordinate values corresponding to the target geographic position information based on the target digital elevation model data, the target geographic position information, the current moment pose information of the lens of the photographing device, and the lens parameter information of the photographing device includes: and acquiring the geodetic coordinate value corresponding to the target geographic position information based on the target digital elevation model data and the target geographic position information.
After the target geographic position information and the target digital elevation model data corresponding to the current time image of the designated high-altitude area are obtained, the coordinate value of each ground point in the area where the monitoring target is located in the designated high-altitude area can be determined as the ground coordinate value corresponding to the target geographic position information based on the target geographic position information and the target digital elevation model data corresponding to the current time image of the designated high-altitude area. The geodetic coordinate values may be used to describe the longitude L, latitude B, and elevation H of any point on the earth. And converting the coordinate system of the geodetic coordinate value corresponding to the target geographic position information through Gaussian forward calculation, and obtaining the object coordinate value corresponding to the target geographic position information.
The coordinate system conversion between the geodetic coordinate system and the object coordinate system can be realized by gaussian forward calculation. Therefore, in the embodiment of the application, gaussian forward calculation can be adopted to perform object coordinate system conversion on the geodetic coordinate value corresponding to the target geographic position information, so as to obtain the object coordinate value corresponding to the target geographic position information. The object coordinate system is used for describing the position of the ground point in the object space. Specifically, the forward and backward Gaussian algorithm essentially describes the mapping relationship between the geodetic coordinate system and the Gaussian-Kelvin projection coordinate system. The latitude and longitude are projected by Gaussian-Kelvin, and a Gaussian-Kelvin projection coordinate system is generated. The Y-axis in the gaussian-g-projection coordinate system points to the positive east along the equatorial line and the X-axis in the gaussian-g-projection coordinate system points to the positive north along the central meridian line. It should be noted that, in the embodiment of the present application, the gaussian-g projection coordinate system may be determined as the object coordinate system.
For any one ground point i of the area where the monitoring target is located in the designated high-altitude area, based on the geodetic coordinate values (Li, bi, hi) of the ground point i, the specific calculation formula of the gaussian projection coordinate values of the ground point i can be obtained through gaussian forward calculation, wherein the specific calculation formula is as follows:
wherein, the angles in the Gaussian forward calculation are all radians;
in the above Gaussian forward calculation, the basic ellipsoid parameters include: an ellipsoidal long half shaft a, a flat rate f, a short half shaft b, a first eccentricity e and a second eccentricity e';
b=a(1-f)(3)
L "represents the difference between the longitude value of the ground point i and the longitude value of the central meridian (L0); when a 6 degree band is used, the calculation method of L0 is that the longitude of the ground point i is divided by 3, the result is rounded off, and then multiplied by 3 to obtain the local central meridian.
N represents a radius of curvature of the meridian corresponding to the ground point i, and the radius of curvature N of the meridian corresponding to the ground point i can be calculated by the following formula:
t, η 2, ρ "can be calculated by the following formula:
t=tanBi (7)
η2=e′2cos2Bi (8)
x represents the meridian arc length corresponding to the ground point i, and can be calculated by the following formula:
a 0,a2,a4,a6,a8 represents a basic constant, which can be calculated by the following formula:
m 0,m2,m4,m6,m8 represents a basic constant, which can be calculated by the following formula:
The conversion of the Gaussian projection coordinate values to the object coordinates includes: the Gaussian-Kelvin plane coordinate system is a left-hand system and a two-dimensional plane coordinate system, namely, the origin is located at the equator, the positive eastern direction is the positive Y-axis direction, and the positive north direction is the positive X-axis direction. The object coordinate system is a right-hand system and is a three-dimensional coordinate system, namely, the origin is located at the equator, the forward eastern direction is the positive X-axis direction, the positive north direction is the positive Y-axis direction, and the upward direction along the plumb direction is the positive Z-axis direction. The conversion formula of the gaussian projection coordinate value of the ground point i calculated by Gao Sizheng to the object coordinate (X A,YA,ZA) is as follows:
And acquiring image space coordinate values corresponding to the target geographic position information through a collinearly equation based on the current moment position information of the shooting device, the current moment posture information of the lens of the shooting device, the lens parameter information of the shooting device and the object space coordinate values corresponding to the target geographic position information.
It should be noted that, coordinate system conversion between the object-side coordinate system and the image-side coordinate system may be achieved based on the collineation equation. Therefore, in the embodiment of the application, based on the current moment posture information of the lens of the shooting device and the lens parameter information of the shooting device, the object coordinate value corresponding to the target geographic position information is subjected to image space coordinate system conversion by a collinear method, so as to obtain the image space coordinate value corresponding to the target geographic position information.
The collinearity equation is a mathematical basis of center projection conception and is also an important theoretical basis of a photogrammetry processing method. Specifically, in order to achieve coordinate system conversion between the object-side coordinate system and the image-side coordinate system, it is necessary to construct a collineation equation based on the inside and outside azimuth elements of the current-time image of the specified high-altitude region, thereby establishing a coordinate system conversion relationship between pixels in the current-time image of the specified high-altitude region and ground points.
Based on the current time position information of the photographing device and the current time posture information of the lens of the photographing device, an external orientation element value (X S,YS,ZS) of the current time image of the designated high-altitude area can be calculated. Based on the lens parameter information of the photographing device, the internal azimuth element value (x 0,y0, F) of the current moment image of the designated high-altitude area can be calculated. Where x 0,y0 denotes a deviation of an image principal point of the current time image of the specified high-altitude area from the image center point, and F denotes a principal distance (a sag distance from the lens center to the image plane) of the current time image of the specified high-altitude area. It should be noted that, when the lens of the camera is imaging, if the image surface just falls at the focus, the size of the main distance can be determined to be equal to the focal length of the camera.
After obtaining the outer azimuth element value (X S,YS,ZS) and the inner azimuth element value (X 0,y0, F) of the current time image of the specified high-altitude region, a collineation equation may be constructed based on the outer azimuth element value (X S,YS,ZS) and the inner azimuth element value (X 0,y0, F) of the current time image of the specified high-altitude region and the target digital elevation pattern data. Based on the collinear mode, when the ground point i is located in the shooting range of the current moment of the shooting device, the image coordinate value corresponding to the ground point i can be obtained based on the object coordinate value (X A,YA,ZA) of the ground point i
The above collinearity equation can be expressed by the following formula:
Inversion of the above collineation equation can yield:
In the three-dimensional space, the coordinate rotation is generally represented by a3×3 orthogonal matrix, and the rotation matrix of the euler angles (h, p, r) of the current moment of the lens of the photographing device can be calculated by the following formula.
Wherein R h represents a rotation matrix corresponding to the h angle; r p represents a rotation matrix corresponding to the angle p; r r represents the rotation matrix corresponding to the R angle.
When a rotation matrix is calculated based on Euler angles (h, p, r) of a lens of the photographing device at the current moment, the rotation matrix in each direction is multiplied right by the rotation matrix, and the rotation is multiplied left by the rotation matrix. The rotation matrix R corresponding to the euler angles (h, p, R) of the current moment of the lens of the photographing device can be calculated by the following formula.
Λ is a scaling factor between the image space coordinate system and the object space coordinate system, and λ can be expressed by the following formula:
It should be noted that, based on the above collineation equation, the object coordinate value corresponding to each ground point in the target geographic location information is traversed, and the image coordinate value corresponding to the target geographic location information may be obtained.
And converting a coordinate system of an image side coordinate value corresponding to the target geographic position information based on lens parameter information of the shooting device, and acquiring a pixel coordinate value corresponding to the target geographic position information.
It should be noted that, in the image plane, the origin of the image side coordinates is the center point of the image, the x-axis is horizontal to the right, and the y-axis is vertical to the top, but in the pixel coordinate system of the image, the origin of the pixel coordinate system is at the top left corner of the image, the x-axis of the pixel coordinate system is rightward along the upper boundary of the image, the y-axis of the pixel coordinate system is downward along the left boundary of the image, and the pixel coordinate values are only integers.
Based on the image coordinate value corresponding to the ground point i and the lens parameter information of the photographing device, the pixel coordinate value/>, corresponding to the ground point i, in the current moment image of the designated high-altitude area can be obtained through the following formula
Wherein pixelsize denotes the true size of any one pixel of the photographing device.
In the case of a CMOS camera, the deviation between the principal point and the center point of the image is checked before the camera comes out of the field, and therefore, it is considered that the deviation between the principal point and the center point of the image is approximately 0, that is, x 0 and y 0 are approximately 0.
Based on the lens parameter information of the photographing device, the actual size pixelsize of any one pixel of the photographing device can be obtained through calculation according to the following formula:
wherein W represents the width of the target image; h represents the height of the target image; w and h represent CMOS imaging dimensions of a lens of the photographing device; the width W and height H of the target image and the CMOS imaging size of the camera lens may be determined based on the camera lens parameter information.
It should be noted that, in the case of capturing a video or an image with a ratio of 16:9, the CMOS is not fully enabled, and only the middle portion is enabled, so that, in the case of capturing a video or an image with a ratio of 4:3, the number of pixels of the image may reach the maximum, and special attention is required in calculating the real size pixelsize of the pixel.
And marking the area where the monitoring target is located in the image at the current moment of the appointed high-altitude area based on the pixel coordinate value corresponding to the target geographic position information.
Specifically, after the pixel coordinate value corresponding to the target geographical position information is obtained, the area where the monitoring target is located may be marked directly in the current time image of the specified high-altitude area based on the pixel coordinate value, and the image spot may be generated in the blank image based on the pixel coordinate value, and the area where the monitoring target is located is marked in the current time image of the specified high-altitude area through image superposition, so as to obtain the current time image of the specified high-altitude area marked with the area where the monitoring target is located. Marking the area where the monitoring target is located in the current time image of the designated high-altitude area based on the pixel coordinate value corresponding to the target geographic position information comprises the following steps: generating a pattern spot in the blank image based on the pixel coordinate value corresponding to the target geographic position information, and obtaining a pattern spot image corresponding to the shooting device at the current moment;
Cutting out a target image spot corresponding to the current moment image of the appointed high-altitude area from the image spot image corresponding to the current moment shooting device;
And overlapping the target image spot with the current time image of the appointed high-altitude area frame by frame to obtain the current time image of the appointed high-altitude area marked with the area where the monitoring target is located.
Specifically, in the embodiment of the application, when the image of the image spot at the current moment and the image at the current moment of the appointed high-altitude area are overlapped frame by frame, the method can be realized based on a machine vision technology algorithm.
In the embodiment of the application, when the image spot image at the current moment and the image at the current moment of the appointed high-altitude area are overlapped frame by frame, the image spot image can be realized based on an OpenCV image processing algorithm.
Optionally, the aerial working personnel behavior recognition unit recognizes the aerial working personnel behavior in the area where the monitoring target is located, acquires a human body frame, a human face frame and a safety helmet frame of the aerial working personnel when the aerial working personnel exist in the area where the monitoring target is determined to be located, displays the human body frame, the human face frame and the safety helmet frame on the monitoring interface, and performs real-time tracking and monitoring on the aerial working personnel.
Optionally, the terminal browsing behavior recognition unit performs terminal browsing behavior recognition and control on each frame of video image in the video stream data to obtain a head position area and a hand position area of each frame of video image, and recognizes whether the user has browsing terminal equipment behaviors according to the head position area and the hand position area, specifically including:
The terminal browsing behavior recognition unit acquires the nose part position coordinates in the head part position region and the first center position coordinates of the head part position region, and calculates a position coordinate difference value, that is, position coordinate difference value= (abscissa of the nose part position coordinates-abscissa of the first center position coordinates, ordinate of the nose part position coordinates-ordinate of the first center position coordinates), based on the nose part position coordinates and the first center position coordinates. Optionally, the terminal browsing behavior recognition unit enlarges or reduces the head position area according to the position coordinate difference value to obtain an adjusted head position area, and determines the first position coordinate according to each vertex position coordinate of the adjusted head position area.
The terminal browsing behavior recognition unit acquires the light source position coordinate with the maximum light source brightness in the hand position region and the second center position coordinate of the hand position region, and calculates a position coordinate difference value, that is, a position coordinate difference value= (abscissa of the light source position coordinate-abscissa of the second center position coordinate, ordinate of the light source position coordinate-ordinate of the second center position coordinate) based on the light source position coordinate and the second center position coordinate. Optionally, the terminal browsing behavior recognition unit enlarges or reduces the hand position area according to the position coordinate difference value to obtain an adjusted hand position area, and determines the second position coordinate according to each vertex position coordinate of the adjusted hand position area.
The terminal browsing behavior recognition unit calculates a first area of the head position area according to the vertex position coordinates of the adjusted head position area, and calculates a second area of the hand position area according to the vertex position coordinates of the adjusted hand position area.
The terminal browsing behavior recognition unit calculates a difference value according to each vertex position coordinate of the adjusted head position area and each vertex position coordinate of the adjusted hand position area, that is, the difference value=each vertex position coordinate of the adjusted head position area-each vertex position coordinate of the adjusted hand position area, and in one embodiment, the abscissa of the left vertex corresponds to the difference, and the ordinate of the left vertex corresponds to the difference. Optionally, the terminal browsing behavior recognition unit determines the maximum position coordinate as the target position coordinate according to the difference value, and calculates the first target area based on the target position coordinate. In one embodiment, the left vertex of the adjusted head position area is the abscissa of the left vertex of the adjusted head position area-the abscissa of the left vertex of the adjusted hand position area >0, and the left vertex of the adjusted head position area is the abscissa of the left vertex of the adjusted hand position area < 0.
The terminal browsing behavior recognition unit calculates a second target area according to the first area, the second area and the first target area, namely, the second target area=the first area+the second area-the first target area.
The terminal browsing behavior recognition unit calculates the area ratio of the head position area to the hand position area according to the first target area and the second target area, namely, the area ratio=the first target area/the second target area.
If the area ratio of the area is larger than a first preset ratio, the terminal browsing behavior recognition unit determines that the user has browsing terminal equipment behaviors, wherein the first preset ratio is set according to the actual situation.
If the area ratio of the areas is smaller than or equal to the first preset ratio, the terminal browsing behavior recognition unit determines that the user does not have the behavior of browsing the terminal equipment.
Optionally, the abnormal vehicle parking identification unit performs abnormal vehicle parking identification and distribution control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifies whether abnormal vehicle parking behavior exists according to the vehicle parking detection frame and the parking area detection frame, which specifically includes:
The abnormal vehicle parking identification unit calculates a first detection frame area of the vehicle parking detection frame according to an upper left corner coordinate and a lower right corner coordinate of the vehicle parking detection frame, and in an embodiment, the upper left corner coordinate may be represented by (left, top) and the lower right corner coordinate may be represented by (right, bottom). rec is an array, which may include coordinates of two points of the upper left corner and the lower right corner of the target frame, rec1[ left1, top1, right1, bottom1] may be used to represent a head position area corresponding to the head, and an area of the head position area may be calculated.
Therefore, the vehicle abnormal parking recognition unit calculates the area of the head position area as rect1_area= (rec1 [2] -rec1[0 ]) (rec1 [3] -rec1[1 ]). Wherein rec1[2] represents the right lower corner abscissa of the head position region, rec1[0] represents the left upper corner abscissa of the head position region, rec1[3] represents the right lower corner ordinate of the head position region, and rec1[1] represents the left upper corner ordinate of the head position region.
The abnormal parking recognition unit calculates the second detection frame area of the parking area detection frame according to the upper left corner coordinates and the lower right corner coordinates of the parking area detection frame, rec2[ left2, top2, right2, bottom2] can be used for representing the hand position area corresponding to the mobile phone, and the area of the hand position area is calculated.
Therefore, the vehicle abnormal parking recognition unit calculates the area of the hand position area as rect2_area= (rec2 [2] -rec2[0 ]) (rec2 [3] -rec2[1 ]). Wherein rec2[2] represents the right lower corner abscissa of the hand position area, rec2[0] represents the left upper corner abscissa of the hand position area, rec2[3] represents the right lower corner ordinate of the hand position area, and rec2[1] represents the left upper corner ordinate of the hand position area.
The abnormal parking recognition unit calculates an intersection detection frame area of the vehicle parking detection frame and the parking area detection frame based on an upper left corner abscissa maximum value, an upper left corner ordinate minimum value, a lower right corner abscissa minimum value and a lower right corner ordinate maximum value in the vehicle parking detection frame and the parking area detection frame, and in one embodiment, left_max=max (rec1 [0], rec2[0 ]), which indicates that the upper left corner abscissas of the two target frames take the maximum value; top_min=min (rec 1[1], rec2[1 ]), which represents that the ordinate of the upper left corner of the two target frames takes the minimum value; right_min=min (rec1 [2], rec2[2 ]), representing that the right lower corner abscissa of two target frames takes the minimum value; bottom_max=max (rec1 [3], rec2[3 ]), indicating that the ordinate of the lower right corner of the two object boxes takes the maximum value.
Next, an intersection area area_cross= (top_min-bottom_max) of the head position area and the hand position area may be calculated.
The vehicle abnormal parking recognition unit calculates a union detection frame area of the vehicle parking detection frame and the parking area detection frame based on the first detection frame area, the second detection frame area, and the intersection detection frame area, so that the intersection area of the two target frames can be subtracted after adding the area of the head position area and the area of the hand position area to obtain a union area area_ union =rect1_area+rect2_area-area_cross of the head position area and the hand position area.
The abnormal vehicle parking recognition unit divides the intersection detection frame area by the union detection frame area to obtain an intersection detection frame area ratio of the vehicle parking detection frame and the parking area detection frame, and obtains an intersection detection frame area ratio iou=area_cross/area_ union of the head position area and the hand position area. If the area ratio of the intersection detection frames is larger than a preset threshold value, determining that abnormal parking behaviors of the vehicle exist. If the area ratio of the intersecting detection frames is smaller than or equal to a preset threshold value, determining that no abnormal parking behavior of the vehicle exists.
Optionally, the garbage mixing and throwing behavior identifying unit extracts at least two frames of images from the video stream data in real time, identifies a target main body in each frame of images, where the target main body includes at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing garbage to be thrown, and specifically includes:
In this embodiment, the trained machine learning algorithm may be used to identify the trash, the person (i.e., the object) and the trash container storing the trash to be thrown in the image, for example, yolov machine learning algorithm may be used. In one example, a plurality of image samples containing trash, people, and trash receptacles covering various different scenes and angles are collected, and using the collected image samples, yolov machine learning algorithms are trained so that they can identify different classes of objects in the image and give their positions and bounding boxes.
In practical application, a trained yolov machine learning algorithm is used for identifying each type of object and corresponding two-dimensional coordinates in the image, various objects are analyzed according to a preset target main body identification rule, and garbage to be thrown, garbage throwing objects and garbage containers for storing garbage to be thrown are identified.
The garbage throwing behavior recognition unit analyzes garbage throwing objects and garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects, wherein the target images are specifically as follows:
In one example, by comparing the same garbage throwing object or the position change of the same garbage to be thrown in at least two adjacent frames of images, whether the garbage to be thrown is thrown away can be judged, so as to screen out the target image recorded with the garbage throwing process.
In one example, at least two adjacent frames of images can be compared, an image with obvious position change of the garbage to be thrown in the garbage throwing process can be found, and a target image related to the garbage throwing event can be deduced according to the characteristics of the motion track of the garbage to be thrown and the like.
In another example, a time window may be set to filter out related target images, considering that a garbage disposal event may involve multiple consecutive frame images. Such as the length of the time window and the degree of overlap, to ensure that the garbage disposal process is covered and other extraneous images are excluded.
The garbage mixing and throwing behavior recognition unit recognizes a garbage mixing and throwing target garbage event according to the once reduced amount of garbage to be thrown carried by garbage throwing objects recorded in the target image or the combined amount of garbage in each garbage container, specifically:
in this embodiment, the garbage to be thrown carried by the garbage throwing object in each frame of the target image is counted, or the garbage merging quantity of garbage thrown into the same garbage container is counted.
In one example, traversing the target image associated with each garbage throwing object, if recognizing that the number of garbage to be thrown carried by one garbage throwing object in the garbage throwing process is less than or equal to 2, determining that the garbage throwing event is a garbage mixing target garbage event, and selecting the garbage throwing object, the garbage to be thrown and the garbage container frame in the target image by using the position frame so as to facilitate subsequent further manual analysis.
In another example, the target images associated with the garbage throwing objects are traversed, the target images of garbage throwing into the same garbage container are combined, and if the combined number of garbage throwing into the garbage container (namely, the total number of garbage throwing into the garbage container in the garbage throwing event) is greater than or equal to 2 through identification in the combined target images, the garbage throwing event is judged to be a garbage mixing target garbage event, and the garbage throwing object, the garbage to be thrown and the garbage container frame in the target images are selected by using a position frame so as to facilitate subsequent further manual analysis.
The garbage mixing and throwing behavior recognition unit determines garbage to be thrown in an image, wherein the garbage to be thrown does not exist in a cache image before the image or the overlapping degree between the garbage to be thrown and historical garbage appearing in the cache image is smaller than a preset overlapping degree; if the image comprises at least two objects to be identified, determining that a first distance between the image and garbage to be thrown is minimum, wherein the object to be identified with the first distance smaller than a first preset distance is the garbage throwing object; if the image includes at least two garbage containers, determining the garbage container with the smallest second distance between the garbage container and the garbage throwing object as the garbage container for storing garbage to be thrown, specifically: when the garbage in the garbage container is excessive or the garbage is not completely thrown into the garbage container, the garbage can also appear in the image, so that all the garbage in the image is identified firstly in the embodiment, and the garbage to be thrown is identified.
In one example, garbage in the current image is compared with garbage in the previous cached image, for example, garbage in the current image is compared with garbage appearing in the previous 3 cached images to determine whether the garbage appears in the past 3 cached images, if the garbage appears in the past 3 cached images, the IOU overlapping degree between the garbage and the historical garbage appearing in the cached images is calculated, if the IOU overlapping degree is >90, the position of the garbage and a certain historical garbage existing in the cached images is the same, and in this case, the garbage is determined to be on the ground or in a dustbin.
After recognizing the garbage to be thrown, if at least two objects to be recognized exist in the image, the garbage mixing and throwing behavior recognition unit calculates a first distance between each object to be recognized and the garbage to be thrown, and selects the object to be recognized, which has the smallest first distance with the garbage to be thrown and is smaller than a first preset distance, as the garbage throwing object.
In one example, the position of the object to be identified is determined by two-dimensional coordinates (X1, Y1), (X2, Y2) at two different areas, and then the first distances between the two-dimensional coordinate center points of all the objects to be identified and the garbage to be thrown in the image can be traversed to improve the accuracy of the determined first distances, where the coordinates (X, Y) of the two-dimensional coordinate center points of the objects to be identified are obtained by the formula PointX =x1+ (X2-X1)/2, pointY =y1+ (Y2-Y2)/2.
If the garbage throwing object does not exist in the cache image before the image, determining a first quantity of garbage to be thrown carried by the garbage throwing object in the image; if the first number is greater than the preset number, determining that the image is a target image associated with a garbage throwing event of the garbage throwing object, specifically:
If the garbage throwing object does not exist in the cache image before the image, the first occurrence of the garbage throwing object before the time point corresponding to the image is indicated, and when the garbage throwing object is the first occurrence, the first quantity of garbage to be thrown carried by the first occurrence of the garbage throwing object is identified.
In one example, the preset number may be set to be 1, that is, when the number of garbage to be thrown carried by the first occurrence is 1, it indicates that the garbage throwing object does not have garbage mixing still behavior, then the garbage throwing object is not analyzed any more, when the number of garbage to be thrown carried by the first occurrence is 2, it indicates that the garbage throwing object may have garbage mixing still behavior, then the image is identified as a target image associated with a garbage throwing event of the garbage throwing object, so as to perform garbage mixing and throwing identification subsequently.
In this embodiment, if there is a garbage throwing object in the cached image before the image, it is first determined whether a target image associated with the garbage throwing event of the garbage throwing object is stored before, if yes, the number of garbage to be thrown remaining after the garbage throwing object finishes throwing garbage last time is obtained from the first target image associated with the garbage throwing event of the garbage throwing object, a difference between the number of garbage to be thrown remaining and the number of garbage to be thrown existing in the current image is calculated, if the difference is greater than zero, it is indicated that the process of throwing garbage of the garbage throwing object for this time is recorded in the image, and the image is identified as the target image associated with the garbage throwing event of the garbage throwing object, so as to perform garbage mixing and throwing identification later.
In one example, the two-dimensional coordinates of the discarded object are (x 1, y 1), (x 2, y 2), when the two-dimensional coordinates of the discarded object meet the condition: x1<20 or y2<20 or x2>2668 or y2>2500, indicating that the garbage throwing object is in a non-edge area.
In one example, if a garbage throwing object is identified in a non-edge area in an image, a fast-reID model may be used to calculate a feature vector of the garbage throwing object, then the calculated similarity between the feature vector of the garbage throwing object and the feature vector of the history object in the last 3 history images in memory, if the similarity >55, indicates that the garbage throwing object appears in the last 3 history images, i.e., the ID identification of the garbage throwing object was calibrated before, the ID identification of the history object may be used as the first ID identification of the garbage throwing object, and the first ID identification, the similarity of the garbage throwing object, and the feature vector association are stored in a dictionary.
If no history object with the similarity of >55 is found, the garbage throwing object is not appeared in the last 3 history images, namely the ID identification of the garbage throwing object is not calibrated before, a new second ID identification is generated for the garbage throwing object, and the second ID identification, the similarity of the garbage throwing object and the feature vector are associated and stored in a dictionary.
It should be noted that, when the number of garbage objects with similarity greater than the preset threshold in one image is more than one, multiple different garbage objects may be marked by using the same ID, so in this embodiment, in order to improve accuracy of subsequent garbage mix and throw recognition, first, sorting garbage objects associated with the same first ID in the image according to the similarity, and assigning the first ID to the garbage object with the greatest similarity, and generating a new third ID for other garbage objects associated with the same first ID to distinguish multiple different garbage objects.
It should be noted that, in the embodiment of the present application, the first environment authenticity identifying unit and the second environment authenticity identifying unit are integrated, when the environment authenticity is identified, the first environment authenticity identifying unit is preferentially called, and when the first environment authenticity identifying unit identifies that a certain time delay exists, for example, 5 seconds, 10 seconds, the second environment authenticity identifying unit is called to identify the environment authenticity.
Optionally, the first environment authenticity identification unit determines a first discrimination feature vector and a second discrimination feature vector corresponding to the video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between every two frames of face images in the video stream data, and specifically comprises the following steps: according to the video stream data, a first distinguishing feature vector and a second distinguishing feature vector corresponding to the video stream data can be determined. The first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data, namely, every two frames of face images in the video stream data should have high consistency; the second discriminant feature vector represents frequency domain feature information between each frame of face image in the video stream data.
The first environment authenticity identification unit determines a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing time domain feature information and frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a fake environment, specifically: according to the first discrimination feature vector and the second discrimination feature vector, a target feature vector corresponding to video stream data can be determined; the target feature vector represents feature information that merges the time domain feature information and the frequency domain feature information. The detection result is used for indicating whether the environment in the video stream data is a fake environment, and the detection result is a real environment or a fake environment. Based on the target feature vector, the detection result of the video stream data can be determined.
The first environment authenticity identification unit determines a face area and a non-face area of the face image based on each frame of face image in the video stream data; the human face area comprises an area below the human eyes and comprising left and right cheeks; the non-face area comprises left and right side areas above the human eyes except for the forehead, and specifically comprises: according to each frame of face image in the video stream data, detecting a face area in each frame of face image by adopting a target detection algorithm; for example, the target detection algorithm is a face detection and localization algorithm or a key point-based face detection algorithm. The face area is represented by a detection frame, and each frame of face image is cut by the detection frame to obtain a cut face image. Dividing the face image into areas to obtain a face area and a non-face area, wherein the face area comprises an area below a human eye and comprising left and right cheeks; the non-face area includes left and right side areas other than the forehead above the human eye in consideration of the influence of hair accessories, ornaments, and the like.
The first environment authenticity identification unit extracts a first time domain feature corresponding to each face region and a second time domain feature corresponding to each non-face region based on each face region and each non-face region, specifically: the first temporal feature represents temporal feature information between face regions in each frame of face image in the video stream data, i.e., a face region rpg feature, and the second temporal feature represents temporal feature information between non-face regions in each frame of face image in the video stream data, i.e., a non-face region rpg feature. According to each face region and each non-face region, a first time domain feature corresponding to each face region and a second time domain feature corresponding to each non-face region can be extracted respectively.
The first environment authenticity identification unit determines a first distinguishing feature vector corresponding to video stream data based on the first time domain feature and the second time domain feature, specifically: and carrying out relevance and consistency analysis by adopting a consistency analysis algorithm according to the first time domain feature and the second time domain feature, and determining a first discrimination feature vector corresponding to the video stream data. For example, the consistency analysis algorithm is a typical correlation analysis (Canonical Correlation Analysis, CCA), independent component analysis (INDEPENDENT COMPONENT ANALYSIS, ICA), spearman rank correlation coefficient (spearman's rank correlation coefficient), or mutual information (Mutual Information).
The first environment authenticity identification unit determines a second distinguishing feature vector corresponding to video stream data based on the first time domain features, specifically: according to the extracted first time domain features, a frequency domain analysis algorithm is adopted to carry out frequency spectrum analysis on the first time domain features, so that more time domain features can be obtained, for example, the result obtained by the frequency spectrum analysis can be expressed as the relation between frequency and amplitude or the relation between frequency and energy; the result obtained by the spectrum analysis comprises heart rate spectrum, frequency band energy and the like, and a second distinguishing feature vector corresponding to the video stream data is obtained. For example, the frequency domain analysis algorithm is a fast fourier transform (Faster Fourier Transform, FFT), wavelet transform (Wavelet Transform), or other algorithm.
The first environment authenticity identification unit normalizes the first discrimination feature vector and the second discrimination feature vector to obtain a normalized first discrimination feature vector and a normalized second discrimination feature vector, specifically: dividing the first discrimination feature vector by the sum of the first discrimination feature vector and the second discrimination feature vector to obtain a normalized first discrimination feature vector; dividing the second discriminant feature vector by the sum of the first discriminant feature vector and the second discriminant feature vector to obtain a normalized second discriminant feature vector.
The first environment authenticity identification unit is used for splicing the normalized first discrimination feature vector and the normalized second discrimination feature vector to obtain a target feature vector corresponding to video stream data, wherein the target feature vector specifically comprises: and splicing the normalized first discrimination feature vector with the normalized second discrimination feature vector to obtain a target feature vector corresponding to the video stream data.
The first environment authenticity identification unit inputs the target feature vector into the classification and discrimination network model to obtain a detection result output by the classification and discrimination network model; the classifying and judging network model is trained based on sample feature vectors and label data corresponding to the sample detection video, and is used for classifying and judging whether the environment in the video stream data is a fake environment or not, and specifically comprises the following steps: the classification discrimination network model may be a visual geometry group (Visual Geometry Group, VGG) model, or may be other discrimination networks. The classification and discrimination network model is trained based on sample feature vectors corresponding to the sample detection video and tag data, and is used for classifying and discriminating whether the environment in the video stream data is a fake environment or not, wherein the tag data is real or fake, for example, tag data 1 represents real, and tag data 0 represents fake.
The target feature vector is input into the classification discrimination network model, and the detection result output by the classification discrimination network model can be directly obtained.
In the training stage of classifying the judging network model, the initial classifying judging network model is selected as a basic model, only the last full-connection layer is changed and reset, and the label data is targeted. In each training period, inputting the sample feature vector into an initial classification discrimination network model to obtain a discrimination result output by the initial classification discrimination network model; selecting a cross entropy loss function to calculate a loss value according to a judging result and the label data, performing fine tuning (finetune) on the model according to the loss value, performing back propagation to update parameters of all connected layers of the initial classification judging network model, keeping parameters of other layers of the classification judging network unchanged, adopting dropout and other technologies to prevent the model from being overfitted until the classification judging network model meets preset conditions, and stopping training of the classification judging network model to obtain a trained classification judging network model; the preset condition is that the accumulated value of the loss value tends to be stable or the training times reach the maximum preset times. By training the classification discrimination network model, the performance of the classification discrimination network model can be improved, a large amount of calculation time and resources can be saved, and the detection efficiency of deep fake content can be improved. In the testing stage of the classification discrimination network model, the target feature vector corresponding to the video stream data is input into the trained classification discrimination network model to obtain the detection result output by the classification discrimination network model.
The second environment authenticity identification unit preprocesses the video stream data to obtain a plurality of video clips; the video stream data comprises audio, and each video clip comprises audio, specifically: and acquiring video stream data, wherein the video stream data comprises audio, the video stream data is video comprising a human face, and the audio in the video stream data is the sound of a task corresponding to the human face. Cutting the video stream data into video clips with preset lengths to obtain a plurality of video clips, wherein each video clip comprises audio. For example, each video clip has a duration of 7 to 10 seconds and the number of video clips is 6.
In practice, due to different acquisition environments, the resolution and audio information of the input video sequence may be different, in the preprocessing stage, the image sequence in the video stream data is decoded, each frame is stored in an image manner, the resolution of each frame of image is scaled to the same size, the audio in the video stream data is decoded into a waveform sound file (wav), for example, the resolution of each frame of image is 1280×720, the encoding format is a bitmap, and 10 to 15 frames of images are reserved per second; the audio encoding format is a rate 8 bits. Smoothing each frame of image and audio sequence in the video stream data by using a filter so as to reduce the interference of noise on subsequent processing; wherein the parameters of the filters of the video stream data and the audio are different. For example, the filter is an average filter, or other type of filter.
The second environment authenticity identifying unit extracts a video feature vector of the video segment and an audio feature vector of audio in the video segment for each video segment respectively, specifically: for each video clip, a video feature vector of the video clip and an audio feature vector of audio in the video clip can be extracted respectively; the video feature vector is a time domain feature vector extracted by adopting remote photoplethysmography (remote Photoplethysmography, rpg), and comprises feature vectors composed of peak amplitude, waveform width, rising time, falling time and the like. The audio feature vector is a feature vector formed by normalizing, splicing and fusing the frequency spectrum energy features and the time domain features; the spectral energy characteristic may be at least one of Mel-frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC), constant Q cepstrum coefficient (Constant Q CEPSTRAL Coefficients, CQCC), spectral characteristic envelope characteristic, intonation characteristic, and time domain characteristic may be at least one of peak amplitude, zero-crossing rate, short-time energy, and short-time average amplitude.
The second environment authenticity identification unit determines a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each video feature vector, each audio feature vector, the total video feature vector and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a fake environment, specifically: from the video feature vector and the audio feature vector for each video clip, a total video feature vector and a total audio feature vector corresponding to the video stream data may be determined. According to the video feature vector and the audio feature vector of each video clip, the total video feature vector and the total audio feature vector of the video stream data, a target detection result of the video stream data can be determined; wherein the target detection result indicates whether the environment in the video stream data is a falsified environment.
The second environment authenticity identification unit determines a fusion feature vector corresponding to the video stream data based on the total video feature vector and the total audio feature vector, specifically: and determining a fusion feature vector corresponding to the video stream data according to the total video feature vector and the total audio feature vector corresponding to the video stream data.
The second environment authenticity identifying unit respectively determines a first detection result corresponding to each video feature vector, a second detection result corresponding to each audio feature vector and a third detection result corresponding to the fusion feature vector based on each video feature vector, each audio feature vector and the fusion feature vector, and specifically comprises the following steps: according to each video feature vector, each audio feature vector and the fusion feature vector, a first detection result corresponding to each video feature vector, a second detection result corresponding to each audio feature vector and a third detection result corresponding to the fusion feature vector can be respectively determined; the first detection result indicates whether the environment in the video stream data is a fake environment or not, the second detection result indicates whether the environment in the video stream data is a fake environment or not, and the third detection result indicates whether the environment in the video stream data is a fake environment or not.
The second environment authenticity identifying unit determines a target detection result of the video stream data based on the first detection result, the second detection result and the third detection result, specifically: according to the first detection result, the second detection result and the third detection result, a target detection result of the detected video can be determined.
The second environment authenticity identifying unit normalizes the video feature vector and the audio feature vector corresponding to each video segment to obtain a normalized video feature vector and a normalized audio feature vector, respectively, specifically: dividing the video feature vector corresponding to each video segment by the sum of the video feature vectors corresponding to all the video segments respectively to obtain a normalized video feature vector; dividing the audio feature vector corresponding to the video clip by each audio feature vector respectively corresponding to all the video clips to obtain the normalized audio feature vector.
The second environment authenticity identifying unit respectively splices the normalized video feature vectors and the normalized audio feature vectors to obtain a total video feature vector and a total audio feature vector corresponding to the video stream data, wherein the method specifically comprises the following steps: splicing the normalized video feature vectors to obtain a total video feature vector corresponding to the video stream data; and splicing the normalized audio feature vectors to obtain a total audio feature vector corresponding to the video stream data.
The target tracking unit acquires a target image of video stream data and first point cloud data corresponding to the target image acquired by the thermal imaging device, specifically: the imaging device acquires a target image of a target to be tracked in real time, the thermal imaging device irradiates the environment with a laser beam or structured light and records reflection data of the irradiation in the environment, and the reflection data are used for generating point cloud data. And aligning the target image with the reflection data through calibration parameters of the shooting device and the thermal imaging device, so as to acquire first point cloud data corresponding to the target image.
The target tracking unit determines second point cloud data of a target to be tracked based on the target image and the first point cloud data, specifically: the first point cloud data represents the point cloud data corresponding to the target image which is acquired by the thermal imaging device and comprises the target to be tracked, and the second point cloud data represents the point cloud data corresponding to the target image which is only included in the target to be tracked, namely the target to be tracked is segmented from the target image, and the point cloud data of the target to be tracked is obtained. From the target image and the first point cloud data, second point cloud data of the target to be tracked may be determined.
The target tracking unit determines pose information of the target to be tracked at the next moment based on the second point cloud data, specifically: the pose information comprises position information, direction information and angle information, and according to second point cloud data of the target to be tracked at the current moment, the pose information of the target to be tracked at the next moment can be predicted, and the pose information of the target to be tracked at the next moment is determined.
The target tracking unit tracks the target to be tracked based on pose information of the target to be tracked at the next moment, and specifically comprises the following steps: according to pose information of the target to be tracked at the next moment, aiming the shooting device at the surface of the target to be tracked, gradually changing a shooting pose strategy, continuously trying to detect and identify the target to be tracked until a detection result of the target to be tracked is accurately obtained, judging the detection result of the target to be tracked can be achieved through overlapping degree (Intersection over Union, IOU), and the IOU quantifies overlapping degree between a detection frame and a real labeling frame.
The target tracking unit inputs the target image into the target detection network model to obtain the position information of the target to be tracked, which is output by the target detection network model, at the current moment; the target detection network model is obtained by training based on sample target images and label number data and is used for detecting targets to be tracked in the target images, and specifically comprises the following steps: inputting the target image into a target detection network model, matching the target detection network model by utilizing the characteristics of the target to be tracked and the characteristics of the target to be tracked at the initial moment, obtaining the most similar area of the target image at the current moment and the target image at the initial moment in the target image to be tracked, updating a detection frame of the target to be tracked according to the matching result, ensuring that the detection frame can accurately reflect the position information of the target to be tracked, further updating the position information of the target to be tracked, realizing the positioning and classification of the target to be tracked, and obtaining the position information of the target to be tracked at the current moment, which is output by the target detection network model.
The target tracking unit aligns the first point cloud data with the target image to be tracked corresponding to the position information, and determines the position and the range of the target to be tracked corresponding to the first point cloud data, specifically: the pixel point of each position in the target image to be tracked corresponding to the position information is aligned with the first point cloud data of the corresponding position, so that the position and the range of the target to be tracked corresponding to the aligned first point cloud data can be determined.
In practice, detecting a feature point corresponding to each pixel in the target image to be tracked, matching the feature point with a corresponding point in the first point cloud data, and establishing a corresponding relation between the pixel point and the point cloud data by matching the corresponding point in the target image to be tracked and the corresponding point in the first point cloud data to realize the alignment of the first point cloud data and the target image to be tracked. After aligning the target image to be tracked and the first point cloud data, it is generally necessary to perform coordinate system conversion on the first point cloud data, that is, convert the first point cloud data into three-dimensional coordinates, so that the target image to be tracked and the first point cloud data are represented in the same coordinate system, and correspond the first point cloud data to pixels in the target image to be tracked.
The target tracking unit determines second point cloud data of a target to be tracked based on the position and the range, specifically: and dividing the target to be tracked from the first point cloud data according to the position and the range of the target to be tracked corresponding to the first point cloud data, and obtaining second point cloud data of the target to be tracked.
The intelligent control ball provided by the embodiment of the application comprises a thermal imaging device, a shooting device, a KA200 brain chip and a control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with a KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device; the distributed control monitoring device comprises an overhead working personnel behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit. In the process of carrying out the cloth control at intelligent cloth control ball, the intelligent cloth control ball realizes when discernment abnormal conditions through built-in abnormal recognition module, and accessible horizontal rotation or every single move rotation adjustment shooting device carries out zoom to shooting device and adjusts the focus, realizes the accurate shooting to the abnormal conditions.
Optionally, a method for controlling an intelligent control ball is applied to the intelligent control ball, and includes:
video stream data and point cloud data are collected.
The method is characterized by carrying out behavior recognition on the aerial working personnel based on video stream data, and specifically comprising the following steps: acquiring current time position information, current time lens attitude information and lens parameter information of a shooting device and a current time image of a designated high-altitude area in video stream data; marking the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographical position information of the area where the monitoring target is located in the designated high-altitude area and the digital elevation model data; and carrying out behavior recognition on the area where the monitoring target is located, acquiring a human body frame, a human face frame and a safety helmet frame of the high-altitude operation personnel when the high-altitude operation personnel exist in the area where the monitoring target is located, and displaying the human body frame, the human face frame and the safety helmet frame on a monitoring interface.
Terminal browsing behavior identification and control are carried out based on video stream data, and the terminal browsing behavior identification and control method specifically comprises the following steps: terminal browsing behavior identification and control are conducted on each frame of video image in the video stream data, a head position area and a hand position area of each frame of video image are obtained, and whether a user has browsing terminal equipment behaviors or not is identified according to the head position area and the hand position area; or alternatively, the first and second heat exchangers may be,
The abnormal parking identification of the vehicle based on video stream data comprises the following specific steps: and carrying out vehicle abnormal parking identification and control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifying whether abnormal vehicle parking behaviors exist or not according to the vehicle parking detection frame and the parking area detection frame.
The garbage mixing and throwing behavior identification is carried out based on video stream data, and specifically comprises the following steps: extracting at least two frames of images from video stream data in real time, and identifying a target main body in each frame of images, wherein the target main body comprises at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing the garbage to be thrown; analyzing the garbage throwing objects and garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects; and identifying the target garbage event of garbage mixing according to the reduced number of garbage to be thrown carried by the garbage throwing object or the combined number of garbage in each garbage container recorded in the target image.
The method is used for carrying out environment authenticity identification based on video stream data, and specifically comprises the following steps: determining a first discrimination feature vector and a second discrimination feature vector corresponding to video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between every two frames of face images in the video stream data; determining a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing time domain feature information and frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a falsified environment.
The method is used for carrying out environment authenticity identification based on video stream data, and specifically comprises the following steps: preprocessing video stream data to obtain a plurality of video clips; the video stream data includes audio, and each video clip includes audio; for each video clip, respectively extracting a video feature vector of the video clip and an audio feature vector of audio in the video clip; determining a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each video feature vector, each audio feature vector, the total video feature vector and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a falsified environment.
Target tracking is carried out based on video stream data and point cloud data, and specifically comprises the following steps: acquiring a target image of video stream data and first point cloud data corresponding to the target image acquired by a thermal imaging device; determining second point cloud data of a target to be tracked based on the target image and the first point cloud data; determining pose information of the target to be tracked at the next moment based on the second point cloud data; and tracking the target to be tracked based on pose information of the target to be tracked at the next moment.
The method for distributing the intelligent distributed control ball corresponds to the above embodiment, and will not be described in detail.
Optionally, referring to fig. 2, fig. 2 is a second schematic structural diagram of an intelligent control ball according to an embodiment of the present application.
The intelligent control ball comprises a base 1, an outer cover 2, a built-in device 3, a shooting device 4 and a thermal imaging device 5. The built-in device 3 is integrated with KA200 brain chip and a control monitoring device, and the control monitoring device is integrated with an overhead working personnel behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment true and false recognition unit, a second environment true and false recognition unit and a target tracking unit.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. An intelligent control ball is characterized by comprising a thermal imaging device, a shooting device, a KA200 brain chip and a control monitoring device; the shooting device and the cloth control monitoring device are respectively connected with the KA200 brain chip, and the KA200 brain chip is used for controlling the shooting device and the cloth control monitoring device;
The distribution monitoring device comprises an overhead operator behavior recognition unit, a terminal browsing behavior recognition unit, a vehicle abnormal parking recognition unit, a garbage mixed throwing behavior recognition unit, a first environment authenticity recognition unit, a second environment authenticity recognition unit and a target tracking unit;
The shooting device is used for: collecting video stream data;
the thermal imaging device is used for: collecting point cloud data;
The aerial working personnel behavior recognition unit is used for: acquiring current time position information, current time lens attitude information and lens parameter information of the shooting device and a current time image of a designated high-altitude area in the video stream data; marking the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographic position information of the area where the monitoring target is located in the designated high-altitude area and digital elevation model data; performing high-altitude operation personnel behavior recognition on the area where the monitoring target is located, acquiring a human body frame, a human face frame and a safety helmet frame of the high-altitude operation personnel when the high-altitude operation personnel exist in the area where the monitoring target is located, and displaying the human body frame, the human face frame and the safety helmet frame on a monitoring interface;
The terminal browsing behavior recognition unit is used for: performing terminal browsing behavior identification and control on each frame of video image in the video stream data to obtain a head position area and a hand position area of each frame of video image, and identifying whether a user has browsing terminal equipment behaviors according to the head position area and the hand position area;
The vehicle abnormal parking identification unit is used for: carrying out abnormal vehicle parking identification and control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifying whether abnormal vehicle parking behaviors exist or not according to the vehicle parking detection frame and the parking area detection frame;
The garbage throwing behavior recognition unit is used for: extracting at least two frames of images from the video stream data in real time, and identifying a target main body in each frame of the images, wherein the target main body comprises at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing the garbage to be thrown; analyzing the garbage throwing objects and the garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects; identifying a target garbage event of garbage mixed throwing according to the reduced number of garbage to be thrown carried by the garbage throwing object or the combined number of garbage in each garbage container recorded in the target image;
the first environment authenticity identification unit is used for: determining a first discrimination feature vector and a second discrimination feature vector corresponding to the video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between each frame of face image in the video stream data; determining a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing the time domain feature information and the frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a fake environment or not;
The second environment authenticity identification unit is used for: preprocessing the video stream data to obtain a plurality of video clips; the video stream data includes audio, each of the video clips including the audio; for each video segment, respectively extracting a video feature vector of the video segment and an audio feature vector of the audio in the video segment; determining a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each of the video feature vectors, each of the audio feature vectors, the total video feature vector, and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a fake environment;
The target tracking unit is used for: acquiring a target image of video stream data and first point cloud data corresponding to the target image acquired by a thermal imaging device; determining second point cloud data of the target to be tracked based on the target image and the first point cloud data; determining pose information of the target to be tracked at the next moment based on the second point cloud data; tracking the target to be tracked based on pose information of the target to be tracked at the next moment;
based on each abnormal recognition module, the intelligent control ball is enabled to realize that when abnormal conditions are recognized through the built-in abnormal recognition module, the shooting device is adjusted through horizontal rotation or pitching rotation, and then the shooting device is subjected to zoom lens focal length adjustment, so that accurate shooting of abnormal conditions is realized, and real-time tracking and monitoring are carried out.
2. The intelligent control ball of claim 1, wherein the aerial work personnel behavior recognition unit is further configured to:
Based on the current time position information or based on the current time position information and the current time lens attitude information, monitoring geographic position information and digital elevation model data of an area where a target is located in the designated high-altitude area, and determining target geographic position information and target digital elevation model data corresponding to a current time image of the designated high-altitude area;
Acquiring pixel coordinate values corresponding to the target geographic position information based on the target digital elevation model data, the target geographic position information, the current moment lens attitude information and the lens parameter information;
Marking the area where the monitoring target is located in the current moment image of the appointed high-altitude area based on the pixel coordinate value corresponding to the target geographic position information;
The aerial working personnel behavior recognition unit is also used for:
Acquiring a geodetic coordinate value corresponding to the target geographic position information based on the target digital elevation model data and the target geographic position information;
Converting a coordinate system of a geodetic coordinate value corresponding to the target geographic position information through Gaussian forward calculation to obtain an object coordinate value corresponding to the target geographic position information;
Acquiring an image coordinate value corresponding to the target geographic position information through a collinearly equation based on the current moment position information, the current moment lens attitude information, the lens parameter information and the object coordinate value corresponding to the target geographic position information;
And converting a coordinate system of an image side coordinate value corresponding to the target geographic position information based on the lens parameter information, and acquiring a pixel coordinate value corresponding to the target geographic position information.
3. The intelligent control ball of claim 2, wherein the aerial work personnel behavior recognition unit is further configured to:
acquiring a vertical projection point of the shooting device in the designated high-altitude area based on the current time position information;
determining a circular area taking the vertical projection point as a circle center and a preset distance as a radius as an associated area of the shooting device;
Screening geographic position information and digital elevation model data of an associated area of the shooting device from geographic position information and digital elevation model data of an area where a monitoring target is located in the designated high-altitude area, and taking the geographic position information and the digital elevation model data as the target geographic position information and the target digital elevation model data;
The aerial working personnel behavior recognition unit is also used for:
Generating a pattern spot in the blank image based on the pixel coordinate value corresponding to the target geographic position information, and obtaining a pattern spot image corresponding to the shooting device at the current moment;
Cutting out a target image spot corresponding to the current moment image of the appointed high-altitude area from the image spot image corresponding to the current moment shooting device;
And overlapping the target image spot with the current time image of the appointed high-altitude area frame by frame to obtain the current time image of the appointed high-altitude area marked with the area where the monitoring target is located.
4. The intelligent control ball according to claim 1, wherein the terminal browsing behavior recognition unit is further configured to:
Acquiring a nose position coordinate in the head position region and a first center position coordinate of the head position region, and determining a first position coordinate based on the nose position coordinate and the first center position coordinate;
Acquiring a light source position coordinate with the maximum light source brightness in the hand position area and a second center position coordinate of the hand position area, and determining a second position coordinate according to the light source position coordinate and the second center position coordinate;
Calculating a first area of the head position area based on the first position coordinates, and calculating a second area of the hand position area based on the second position coordinates;
Determining a target position coordinate based on a difference value between the first position coordinate and the second position coordinate, and calculating a first target area based on the target position coordinate;
Calculating a second target area based on the first area, the second area, and the first target area;
Calculating a region area ratio of the head position region and the hand position region based on the first target region area and the second target region area;
If the area ratio of the areas is larger than a first preset ratio, determining that the user has browsing terminal equipment behaviors; or alternatively, the first and second heat exchangers may be,
And if the area ratio of the areas is smaller than or equal to the first preset ratio, determining that the user does not have the behavior of browsing the terminal equipment.
5. The intelligent control ball of claim 1, wherein the vehicle abnormal parking identification unit is further configured to:
calculating the first detection frame area of the vehicle parking detection frame according to the coordinates of the upper left corner and the coordinates of the lower right corner of the vehicle parking detection frame;
calculating the area of a second detection frame of the parking area detection frame according to the coordinates of the upper left corner and the coordinates of the lower right corner of the parking area detection frame;
Calculating the intersection detection frame area of the vehicle parking detection frame and the parking area detection frame based on the maximum value of the left upper corner abscissa, the minimum value of the left upper corner ordinate, the minimum value of the right lower corner abscissa and the maximum value of the right lower corner ordinate in the vehicle parking detection frame and the parking area detection frame;
calculating the union detection frame area of the vehicle parking detection frame and the parking area detection frame according to the first detection frame area, the second detection frame area and the intersection detection frame area;
Dividing the intersection detection frame area by the union detection frame area to obtain an intersection detection frame area ratio of the vehicle parking detection frame to the parking area detection frame;
If the area ratio of the intersection detection frames is larger than a preset threshold value, determining that abnormal parking behaviors of the vehicle exist; or alternatively, the first and second heat exchangers may be,
And if the area ratio of the intersection detection frames is smaller than or equal to the preset threshold value, determining that no abnormal parking behavior of the vehicle exists.
6. The intelligent ball of claim 1, wherein the garbage shuffling and throwing behavior recognition unit is further configured to:
determining garbage to be thrown in the image, wherein the garbage to be thrown does not exist in a cache image before the image or the overlapping degree between the garbage to be thrown and historical garbage appearing in the cache image is smaller than a preset overlapping degree;
if the image comprises at least two objects to be identified, determining that a first distance between the image and the garbage to be thrown is minimum, wherein the objects to be identified with the first distance smaller than a first preset distance are garbage throwing objects;
if the image comprises at least two garbage containers, determining that the garbage container with the smallest second distance with the garbage throwing object is the garbage container for storing garbage to be thrown;
if the garbage throwing object does not exist in the cache image before the image, determining a first quantity of garbage to be thrown carried by the garbage throwing object in the image;
and if the first number is larger than the preset number, determining that the image is a target image associated with the garbage throwing event of the garbage throwing object.
7. The intelligent control ball of claim 1, wherein the first environment authentication unit is further configured to:
Determining a face area and a non-face area of the face image based on each frame of the face image in the video stream data; the face area comprises an area below the eyes of a person and comprising left and right cheeks; the non-human face area comprises left and right side areas above the human eyes except for the forehead;
Based on the face regions and the non-face regions, respectively extracting a first time domain feature corresponding to the face regions and a second time domain feature corresponding to the non-face regions;
Determining a first distinguishing feature vector corresponding to the video stream data based on the first time domain feature and the second time domain feature;
Determining a second distinguishing feature vector corresponding to the video stream data based on the first time domain feature;
The first environment authenticity identification unit is further used for:
Normalizing the first discrimination feature vector and the second discrimination feature vector to obtain a normalized first discrimination feature vector and a normalized second discrimination feature vector;
Splicing the normalized first discrimination feature vector with the normalized second discrimination feature vector to obtain a target feature vector corresponding to the video stream data;
Inputting the target feature vector into a classification discrimination network model to obtain a detection result output by the classification discrimination network model; the classification and discrimination network model is trained based on sample feature vectors and label data corresponding to the sample detection video and is used for classifying and discriminating whether the environment in the video stream data is a fake environment or not.
8. The intelligent control ball of claim 1, wherein the second environment authentication unit is further configured to:
Determining a fusion feature vector corresponding to the video stream data based on the total video feature vector and the total audio feature vector;
Based on each video feature vector, each audio feature vector and the fusion feature vector, a first detection result corresponding to each video feature vector, a second detection result corresponding to each audio feature vector and a third detection result corresponding to the fusion feature vector are respectively determined;
determining a target detection result of the video stream data based on the first detection result, the second detection result and the third detection result;
The second environment authenticity identification unit is further used for:
normalizing the video feature vector and the audio feature vector corresponding to each video segment to obtain a normalized video feature vector and a normalized audio feature vector respectively;
and respectively splicing the normalized video feature vectors and the normalized audio feature vectors to obtain a total video feature vector and a total audio feature vector corresponding to the video stream data.
9. The intelligent control ball of claim 1, wherein the target tracking unit is further configured to:
Inputting the target image into a target detection network model to obtain the position information of the target to be tracked, which is output by the target detection network model, at the current moment; the target detection network model is obtained by training based on sample target images and label number data and is used for detecting the target to be tracked in the target images;
aligning the first point cloud data with the target image to be tracked corresponding to the position information, and determining the position and the range of the target to be tracked corresponding to the first point cloud data;
and determining second point cloud data of the target to be tracked based on the position and the range.
10. A method for controlling an intelligent control ball, which is applied to the intelligent control ball according to any one of claims 1 to 9, and comprises the following steps:
collecting video stream data and point cloud data;
And carrying out behavior recognition on the aerial working personnel based on the video stream data, wherein the behavior recognition comprises the following specific steps: acquiring current time position information, current time lens attitude information and lens parameter information of the shooting device and a current time image of a designated high-altitude area in the video stream data; marking the area where the monitoring target is located in the current time image based on the current time position information, the current time lens attitude information, the lens parameter information, the geographic position information of the area where the monitoring target is located in the designated high-altitude area and digital elevation model data; performing high-altitude operation personnel behavior recognition on the area where the monitoring target is located, acquiring a human body frame, a human face frame and a safety helmet frame of the high-altitude operation personnel when the high-altitude operation personnel exist in the area where the monitoring target is located, and displaying the human body frame, the human face frame and the safety helmet frame on a monitoring interface; or alternatively, the first and second heat exchangers may be,
Terminal browsing behavior identification and control are carried out based on the video stream data, and specifically the method comprises the following steps: performing terminal browsing behavior identification and control on each frame of video image in the video stream data to obtain a head position area and a hand position area of each frame of video image, and identifying whether a user has browsing terminal equipment behaviors according to the head position area and the hand position area; or alternatively, the first and second heat exchangers may be,
The abnormal parking identification of the vehicle based on the video stream data comprises the following specific steps: carrying out abnormal vehicle parking identification and control on each frame of video image in the video stream data to obtain a vehicle parking detection frame and a parking area detection frame of each frame of video image, and identifying whether abnormal vehicle parking behaviors exist or not according to the vehicle parking detection frame and the parking area detection frame; or alternatively, the first and second heat exchangers may be,
The garbage mixing and throwing behavior identification is carried out based on the video stream data, and specifically comprises the following steps: extracting at least two frames of images from the video stream data in real time, and identifying a target main body in each frame of the images, wherein the target main body comprises at least one of garbage to be thrown, garbage throwing objects and garbage containers for storing the garbage to be thrown; analyzing the garbage throwing objects and the garbage to be thrown in at least two frames of images to screen out target images related to garbage throwing events of the garbage throwing objects; identifying a target garbage event of garbage mixed throwing according to the reduced number of garbage to be thrown carried by the garbage throwing object or the combined number of garbage in each garbage container recorded in the target image; or alternatively, the first and second heat exchangers may be,
And carrying out environment authenticity identification based on the video stream data, wherein the method specifically comprises the following steps: determining a first discrimination feature vector and a second discrimination feature vector corresponding to the video stream data; the first distinguishing feature vector represents time domain feature information between every two frames of face images in the video stream data; the second distinguishing feature vector represents frequency domain feature information between each frame of face image in the video stream data; determining a target feature vector corresponding to the video stream data based on the first discrimination feature vector and the second discrimination feature vector; the target feature vector represents feature information fusing the time domain feature information and the frequency domain feature information; determining a detection result of the video stream data based on the target feature vector; the detection result is used for indicating whether the environment in the video stream data is a fake environment or not; or alternatively, the first and second heat exchangers may be,
And carrying out environment authenticity identification based on the video stream data, wherein the method specifically comprises the following steps: preprocessing the video stream data to obtain a plurality of video clips; the video stream data includes audio, each of the video clips including the audio; for each video segment, respectively extracting a video feature vector of the video segment and an audio feature vector of the audio in the video segment; determining a total video feature vector and a total audio feature vector corresponding to the video stream data based on each video feature vector and each audio feature vector; determining a target detection result of the video stream data based on each of the video feature vectors, each of the audio feature vectors, the total video feature vector, and the total audio feature vector; the target detection result indicates whether the environment in the video stream data is a fake environment; or alternatively, the first and second heat exchangers may be,
And carrying out target tracking based on the video stream data and the point cloud data, wherein the target tracking comprises the following specific steps: acquiring a target image of video stream data and first point cloud data corresponding to the target image acquired by a thermal imaging device; determining second point cloud data of the target to be tracked based on the target image and the first point cloud data; determining pose information of the target to be tracked at the next moment based on the second point cloud data; tracking the target to be tracked based on pose information of the target to be tracked at the next moment;
based on each abnormal recognition module, the intelligent control ball is enabled to realize that when abnormal conditions are recognized through the built-in abnormal recognition module, the shooting device is adjusted through horizontal rotation or pitching rotation, and then the shooting device is subjected to zoom lens focal length adjustment, so that accurate shooting of abnormal conditions is realized, and real-time tracking and monitoring are carried out.
CN202311341630.0A 2023-10-16 2023-10-16 Intelligent distribution control ball and distribution control method thereof Active CN117424982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311341630.0A CN117424982B (en) 2023-10-16 2023-10-16 Intelligent distribution control ball and distribution control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311341630.0A CN117424982B (en) 2023-10-16 2023-10-16 Intelligent distribution control ball and distribution control method thereof

Publications (2)

Publication Number Publication Date
CN117424982A CN117424982A (en) 2024-01-19
CN117424982B true CN117424982B (en) 2024-04-16

Family

ID=89522126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311341630.0A Active CN117424982B (en) 2023-10-16 2023-10-16 Intelligent distribution control ball and distribution control method thereof

Country Status (1)

Country Link
CN (1) CN117424982B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5838566A (en) * 1996-12-10 1998-11-17 Advanced Micro Devices System and method for managing empty carriers in an automated material handling system
WO2017096761A1 (en) * 2015-12-10 2017-06-15 杭州海康威视数字技术股份有限公司 Method, device and system for looking for target object on basis of surveillance cameras
WO2020042489A1 (en) * 2018-08-30 2020-03-05 平安科技(深圳)有限公司 Authentication method and apparatus for illegal parking case, and computer device
CN115018804A (en) * 2022-06-20 2022-09-06 北京中车赛德铁道电气科技有限公司 Device for identifying pantograph defects by brain-like technology
WO2023092798A1 (en) * 2021-11-25 2023-06-01 成都时识科技有限公司 Noise filtering for dynamic vision sensor

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11340345B2 (en) * 2015-07-17 2022-05-24 Origin Wireless, Inc. Method, apparatus, and system for wireless object tracking
US10121080B2 (en) * 2015-01-15 2018-11-06 vClick3d, Inc. Systems and methods for controlling the recording, storing and transmitting of video surveillance content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5838566A (en) * 1996-12-10 1998-11-17 Advanced Micro Devices System and method for managing empty carriers in an automated material handling system
WO2017096761A1 (en) * 2015-12-10 2017-06-15 杭州海康威视数字技术股份有限公司 Method, device and system for looking for target object on basis of surveillance cameras
WO2020042489A1 (en) * 2018-08-30 2020-03-05 平安科技(深圳)有限公司 Authentication method and apparatus for illegal parking case, and computer device
WO2023092798A1 (en) * 2021-11-25 2023-06-01 成都时识科技有限公司 Noise filtering for dynamic vision sensor
CN115018804A (en) * 2022-06-20 2022-09-06 北京中车赛德铁道电气科技有限公司 Device for identifying pantograph defects by brain-like technology

Also Published As

Publication number Publication date
CN117424982A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
US9646212B2 (en) Methods, devices and systems for detecting objects in a video
US20200401617A1 (en) Visual positioning system
CN110672111B (en) Vehicle driving path planning method, device, system, medium and equipment
CN107851318A (en) System and method for Object tracking
CN104378582A (en) Intelligent video analysis system and method based on PTZ video camera cruising
CN103514432A (en) Method, device and computer program product for extracting facial features
CN102959946A (en) Augmenting image data based on related 3d point cloud data
CN113568435B (en) Unmanned aerial vehicle autonomous flight situation perception trend based analysis method and system
AU2021255130B2 (en) Artificial intelligence and computer vision powered driving-performance assessment
CN112364843A (en) Plug-in aerial image target positioning detection method, system and equipment
CN112818925A (en) Urban building and crown identification method
CN111666860A (en) Vehicle track tracking method integrating license plate information and vehicle characteristics
CN114140745A (en) Method, system, device and medium for detecting personnel attributes of construction site
CN112800918A (en) Identity recognition method and device for illegal moving target
CN117424982B (en) Intelligent distribution control ball and distribution control method thereof
JP6916975B2 (en) Sign positioning system and program
CN115767424A (en) Video positioning method based on RSS and CSI fusion
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
CN112818780A (en) Defense area setting method and device for aircraft monitoring and identifying system
Ying et al. Fully Convolutional Networks tor Street Furniture Identification in Panorama Images.
KR20210001438A (en) Method and device for indexing faces included in video
CN117152719B (en) Weeding obstacle detection method, weeding obstacle detection equipment, weeding obstacle detection storage medium and weeding obstacle detection device
CN112667832B (en) Vision-based mutual positioning method in unknown indoor environment
JPH1032751A (en) Image pickup device and image processor
CN115311323A (en) Trajectory detection method and device based on binocular vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant