CN115205781A - Transformer-based trans-scale target detection method and system - Google Patents

Transformer-based trans-scale target detection method and system Download PDF

Info

Publication number
CN115205781A
CN115205781A CN202210719122.0A CN202210719122A CN115205781A CN 115205781 A CN115205781 A CN 115205781A CN 202210719122 A CN202210719122 A CN 202210719122A CN 115205781 A CN115205781 A CN 115205781A
Authority
CN
China
Prior art keywords
target
data
video
detection
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210719122.0A
Other languages
Chinese (zh)
Inventor
李靓
朱志强
刘志海
杨振祠
孙瑞
吴嘉宇
白涛
葛小武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Civil Aviation Air Traffic Control Science & Technology Co ltd
Second Research Institute of CAAC
Original Assignee
Chengdu Civil Aviation Air Traffic Control Science & Technology Co ltd
Second Research Institute of CAAC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Civil Aviation Air Traffic Control Science & Technology Co ltd, Second Research Institute of CAAC filed Critical Chengdu Civil Aviation Air Traffic Control Science & Technology Co ltd
Priority to CN202210719122.0A priority Critical patent/CN115205781A/en
Publication of CN115205781A publication Critical patent/CN115205781A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a trans-former-based cross-scale target detection method and system, wherein a training data set is generated, and video image data in the training data set is used as a training sample; designing a target function, and training a pre-established target detection model; and detecting the image to be detected by using a trained target detection model based on a trans-former trans-scale target detection algorithm, and outputting a target detection result to determine the target position and category information. The proposal solves the problem of cross-scale target detection in real runway intrusion prevention, realizes the overall target of improving the target detection precision, further reduces the false detection rate and the missing detection rate of the model, and improves the availability and the stability of the video-based runway intrusion prevention system.

Description

Transformer-based cross-scale target detection method and system
Technical Field
The invention relates to the technical field of computers, in particular to a trans-former-based cross-scale target detection method and system.
Background
Runway invasion is a main factor causing safety accidents of the runway, and effective monitoring of runway invasion events is an important means for guaranteeing civil aviation safety and improving flight operation efficiency. At present, the most widely used scene target monitoring devices such as a field surveillance radar, broadcast automatic dependent surveillance (ADS-B) and multi-point positioning (MLAT) are applied to runway safety monitoring, and the problems of insufficient stability, strong dependence, low update rate, monitoring blind areas and the like exist. The above problems can be effectively solved by vision-based object detection technology as an important supplement to existing monitoring devices.
The existing image-based target detection methods can be roughly divided into two categories, one is a two-stage target detection method represented by R-CNN 1, and the other is a single-stage target detection method represented by YOLO 2. The two-stage target detection method pre-extracts a series of candidate areas which are more likely to be targets, and then utilizes a convolutional neural network to extract features to replace the traditional manual design features; the single-stage target detection method uses an end-to-end architecture, saves the region selection process, and can return to the image target category and position information only by performing convolutional neural network calculation.
The visual-based anti-runway intrusion system firstly uses a monitoring camera to capture real-time video stream information of a runway and a contact runway area, and then automatically detects objects such as aircrafts, vehicles, personnel and the like in the monitoring area by using an image-based object detection method. However, due to the large physical size differences between aircraft, vehicles and personnel, the difference in scale between different objects on the same image is very large; in addition, due to the fact that coordination construction difficulty of the airport flight area is high, point position selection of the monitoring cameras is very limited, and therefore different target scales also have obvious differences among images shot by different monitoring cameras.
In the existing method, a Convolutional Neural Network (CNN) is generally used for solving the problem of target detection, and methods such as Mosaic and image inversion are used for enhancing training data, so that the performance of a model is improved. However, the CNN receptive field is usually small and depending on the homing bias, it is difficult to capture image global features. Therefore, the existing method is difficult to deal with the problem of true runway intrusion prevention mid-span target detection.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to design a trans-former-based cross-scale target detection method and system, which are used for solving the problem of cross-scale target detection in real runway intrusion prevention, generally improving the target detection precision, further reducing the false detection rate and the missing detection rate of a model, and improving the availability and the stability of a video-based runway intrusion prevention system.
The application provides a trans-former-based cross-scale target detection method, which comprises the following steps:
generating a training data set;
taking the video image data in the training data set as a training sample, designing a target function, and training a pre-established target detection model;
and detecting the image to be detected by using a trained target detection model based on a trans-former trans-scale target detection algorithm, and outputting a target detection result to determine the target position and category information.
Preferably, the generating the training data set includes: constructing an application scene monitoring video set based on pre-collected video image data, and respectively establishing a public target detection data set and a web crawler data set by means of sorting existing public data and downloading crawler data; and carrying out data annotation on the video image data in each data set to form a training data set.
Further, the constructing an application scene monitoring video set based on the pre-collected video image data, and the respectively establishing a public target detection data set and a web crawler data set by means of sorting the existing public data and downloading crawler data comprises:
receiving real-time video data of a key road port shot by a camera on the scene for preventing the runway from invading; using an RTSP protocol to pull a video stream, using a VideoCapture packet of Opencv to analyze the pulled video stream information, and using a VideoWriter to write an analyzed video frame into an mp4 video stream file to obtain an application scene monitoring video set;
screening common target types in the runway area from the existing public data to form a public target detection data set facing runway intrusion application;
wherein the common goals include: aircraft, vehicles, personnel;
searching keywords from an image retrieval website according to actual needs, downloading related images by using a Python crawler technology, and performing secondary filtering on the related images collected by the web crawler through a manual screening mode to obtain a web crawler data set.
Preferably, the data labeling of the video image data in each data set includes: and marking the position of a target in the application scene monitoring video set and the web crawler data set by using the dark label software, and expressing by using a four-dimensional vector consisting of a mark frame central point x coordinate, a mark frame central point y coordinate, a mark frame width and a mark frame height.
Preferably, the pre-established cross-scale target detection model comprises a backbone network, a neck network and a head network which are connected in sequence;
the backbone network is used for extracting high-level image features;
the neck network is distributed in each layer of the backbone network and is used for fusing and reprocessing the features extracted from the backbone network from different scales and different processing stages and respectively outputting the processed features for each scale;
the head network corresponds to the features of different scales output by the neck network one by one, and is used for determining a target detection result according to the features extracted by the neck network.
Preferably, the designing objective function specifically includes: respectively designing a target function by taking the loss of a detection frame, the loss of confidence coefficient and the minimum loss of a target class as targets; wherein the content of the first and second substances,
the detection frame loss is obtained by calculating the distance between the detection frame and the truth value;
the confidence loss is obtained by calculating the reliability of the detection target;
the target class loss is obtained based on a cross entropy loss function evaluating a distance between an estimated class and a true value.
Preferably, after determining the target detection result, the method further includes: screening target detection results output by the head network to realize non-maximum value suppression of the detection frame;
the target detection result output by the screening head network specifically includes: traversing all the detection frames, and if any two detection frames are overlapped, calculating the intersection-parallel ratio between the two detection frames; when the intersection ratio is larger than a threshold value T, the two detection frames are considered to belong to the same target, and the detection frame with the higher confidence coefficient is reserved; and inhibiting the non-maximum values of all the detection frames through circulation, and outputting a final detection result.
A trans-former-based cross-scale target detection system comprises a key crossing monitoring camera, a visual processing server, a runway intrusion prevention display device, a key crossing video stream data processing module, a trans-former inference system and a runway intrusion prevention fusion display system;
the monitoring camera of the key way opening is used for acquiring, coding and pushing real-time video data of the key way opening;
the visual processing server is used for operating a core visual processing algorithm and realizing positioning and identification of targets in the video stream data of the key crossing; wherein the core vision processing algorithm is a transformer inference method;
and the runway intrusion prevention display equipment is used for displaying the real-time video data of the key road junction and simultaneously displaying the target position and the category information of the runway intrusion event.
Preferably, the monitoring camera for the key way opening comprises a protective cover and a camera heating device which are arranged on the outer layer of the camera;
the vision processing server includes: the system comprises a networking module and a high-performance computing module;
the networking module is used for data transmission between the front-end camera and the control terminal;
the high-performance computing module is used for carrying out reasoning computation on the input video data;
the runway intrusion prevention display apparatus includes: the large screen display module and the interactive display module;
the large-screen display module is used for displaying real-time videos and superposed signals of all monitored road junctions in the airport;
the interactive display module is used for displaying a control interactive interface for a controller;
the video stream data processing module for the key crossing comprises: a stream pulling unit and a decoding unit;
the stream pulling unit is used for pulling the video stream data acquired by the camera by using an RTSP (real time streaming protocol);
the decoding unit is used for decoding the pulled video stream into a video frame by adopting an FFMPEG built-in decoding algorithm;
the transformer reasoning system comprises: the device comprises a video frame preprocessing unit, a transform detection inference unit and a result prediction unit;
the video frame preprocessing unit is used for preprocessing the single-frame video data extracted by the decoding unit and normalizing the original video frame to the specified resolution by using a bilinear interpolation method;
the transform detection inference unit is used for inputting the preprocessing result into a transform cross-scale target detection model and outputting a target detection result, and comprises: detecting frames, confidence and target category information;
the result prediction unit is used for screening target detection results; if the two screening results are overlapped, calculating the intersection ratio between the two screening results, and when the intersection ratio is greater than a preset threshold value T, considering that the two detection frames belong to the same target, so that the detection frame with the lower confidence coefficient is deleted; finally realizing the non-maximum value inhibition of the overlapped frame through continuous circulation, and outputting a final detection result;
the anti-runway intrusion fusion display system comprises: the system comprises a runway invasion prevention early warning module and a multi-source data fusion display module;
the anti-runway invasion early warning module is used for estimating the position of the target in the monitored crossing area by combining the target position and the category information output by the cross-scale target detection method;
wherein the category information includes aircraft, vehicles, and personnel;
the target position information comprises a contact road, a runway, the sky and an apron;
the multi-source data fusion display module is used for comprehensively displaying the activity condition of the target in the runway area by combining the video positioning data and the multi-source monitoring data of the airport scene;
it includes: the system comprises a video positioning data display unit and a scene multi-source monitoring data display unit;
the video positioning data display unit is used for displaying the activity condition of the crossing target and whether the crossing target enters or leaves the runway;
the scene multi-source monitoring data display unit is used for comprehensively displaying the overall situation of the runway area target and accessing various monitoring source data according to the installation and use conditions of the actual scene monitoring source of the airport; when the video signal judges that a plurality of targets are included on the current runway, the target longitude and latitude information provided by the monitoring data is used for assisting the video signal to realize intrusion warning, and warning information is output through a plurality of information sources of screen flicker/warning sound;
wherein the monitoring source data comprises a field monitoring radar, ADS-B, MLAT.
The invention has the beneficial effects that:
the invention provides a trans-former-based cross-scale target detection method and system, firstly, a training data set is generated, and video image data in the training data set is used as a training sample; secondly, designing a target function, and training a pre-established target detection model; and finally, detecting the image to be detected by using a trained target detection model based on a trans-former trans-scale target detection algorithm, and outputting a target detection result to determine the target position and category information. Therefore, the problem of cross-scale target detection in real runway intrusion prevention is effectively solved, the overall target for improving the target detection precision is realized, the false detection rate and the missing detection rate of the model are further reduced, and the usability and the stability of the video-based runway intrusion prevention system are improved.
The trans-former-based cross-scale target detection method and system provided by the invention can realize automatic detection of the runway area target, and carry out early warning/alarm on the invaded target, thereby greatly improving the safety protection capability of the airport runway and further improving the intelligent air traffic control construction level.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a flowchart of a trans-former-based cross-scale target detection method provided by the invention;
fig. 2 is a functional structure block diagram of a trans-former-based cross-scale target detection method according to embodiment 1 of the present invention;
fig. 3 is a block diagram of a data collection function provided in embodiment 1 of the present invention;
fig. 4 is a functional structure block diagram of a transform-based cross-scale target detection algorithm provided in embodiment 1 of the present invention;
FIG. 5 is a functional structure block diagram of a trans-former-based cross-scale target detection system provided by the present invention;
fig. 6 is a functional structure block diagram of a monitoring camera for a level crossing according to embodiment 2 of the present invention;
fig. 7 is a functional block diagram of a visual processing server according to embodiment 2 of the present invention;
fig. 8 is a functional structure block diagram of a display device for preventing a runway from intruding according to embodiment 2 of the present invention;
fig. 9 is a block diagram of a functional structure of video stream data of a key road junction according to embodiment 2 of the present invention;
fig. 10 is a functional structure block diagram of a trans-former-based cross-scale target detection method according to embodiment 2 of the present invention;
fig. 11 is a functional structure block diagram of the lane intrusion prevention fusion display system according to embodiment 2 of the present invention.
The method comprises the following steps of data preparation 100, a transformer-based cross-scale target detection algorithm 200, an application scene monitoring video set 101, a public target detection data set 102, a web crawler data set 103, a data annotation 104, a data augmentation 105, a backbone network 201, a neck network 202, a head network 203, non-maximum value suppression 204, a loss function design 205, a key crossing monitoring camera S100, a visual processing server S200, a runway intrusion prevention display device S300, a key crossing video stream data processing module S400, a transformer inference system S500, a runway intrusion prevention fusion display system S600, a camera protection design S101, a networking function S201, a high-performance computing function S202, a camera installation design S102, a large screen display S301, an interactive display S302, a streaming unit S401, a decoding unit S402, a video frame preprocessing S501, a transformer detection inference S502, a result prediction S503, a runway intrusion prevention early warning/alarm module S601, and a multi-source data fusion display module S602.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
1. In a specific embodiment of the present invention, a method for cross-scale target detection based on a transform is provided, as shown in fig. 1, the method includes:
s1, generating a training data set;
s2, taking the video image data in the training data set as a training sample, designing a target function, and training a pre-established target detection model;
and S3, detecting the image to be detected by using the trained target detection model based on a trans-former trans-scale target detection algorithm, and outputting a target detection result to determine the target position and category information.
In step S1, the generating of the training data set includes: constructing an application scene monitoring video set based on pre-collected video image data, and respectively establishing a public target detection data set and a web crawler data set by means of sorting existing public data and downloading crawler data; and carrying out data annotation on the video image data in each data set to form a training data set.
The method comprises the following steps of constructing an application scene monitoring video set based on pre-collected video image data, and respectively establishing a public target detection data set and a web crawler data set in a mode of sorting existing public data and downloading crawler data, wherein the method comprises the following steps:
receiving real-time video data of a key road junction shot by a camera on the scene for preventing the runway from invading; using an RTSP protocol to pull a video stream, using a VideoCapture packet of Opencv to analyze the pulled video stream information, and using a VideoWriter to write the analyzed video frame into an mp4 video stream file to obtain an application scene monitoring video set;
screening common target types in the runway area from the existing public data to form a public target detection data set facing runway intrusion application; wherein the common goals include: aircraft, vehicles, personnel;
searching keywords from an image retrieval website according to actual requirements, downloading relevant images by using a Python crawler technology, and carrying out secondary filtering on the relevant images collected by the web crawler in a manual screening mode to obtain a web crawler data set.
The data labeling of the video image data in each data set comprises: and marking the position of a target in the application scene monitoring video set and the web crawler data set by using the dark label software, and expressing by using a four-dimensional vector consisting of a mark frame central point x coordinate, a mark frame central point y coordinate, a mark frame width and a mark frame height.
In the step S2, the pre-established cross-scale target detection model comprises a backbone network, a neck network and a head network which are connected in sequence;
the backbone network is used for extracting high-level image features;
the neck network is distributed in each layer of the backbone network and is used for fusing and reprocessing the features extracted from the backbone network from different scales and different processing stages and respectively outputting the processed features for each scale;
the head network corresponds to the features of different scales output by the neck network one by one, and is used for determining a target detection result according to the features extracted by the neck network. The target detection result comprises an estimated detection frame, confidence and a target category.
In step S2, designing the objective function specifically includes: respectively designing an objective function by taking the minimum loss of the detection frame, the minimum loss of the confidence coefficient and the minimum loss of the objective category as an objective; wherein the detection box loss is obtained by calculating the distance between the detection box and the true value; the confidence loss is obtained by calculating the reliability of the detection target; the target class loss is obtained based on a cross entropy loss function evaluating a distance between an estimated class and a true value.
Further comprising, after performing the determination of the target detection result in step S3: screening target detection results output by the head network to realize non-maximum value suppression of the detection frame;
the target detection result output by the screening head network specifically includes: traversing all the detection frames, and if any two detection frames are overlapped, calculating the intersection-parallel ratio between the two detection frames; when the intersection ratio is larger than a threshold value T, the two detection frames are considered to belong to the same target, and the detection frame with the higher confidence coefficient is reserved; and inhibiting the non-maximum values of all the detection frames through circulation, and outputting a final detection result.
In step S3, after the output final detection result is output, the target position and the category information can be determined based on the output final detection result.
Example 1:
embodiment 1 of the present invention provides a trans-former-based cross-scale target detection method, as shown in fig. 2. The method specifically comprises the following steps:
data preparation 100, transformer-based cross-scale object detection algorithm 200.
The data preparation 100 is used to collect, label, and augment image data used to train a target detection model, and to obtain a training data set with high diversity and multi-scale variation.
The transformer-based cross-scale target detection algorithm 200 is used for designing a transformer-based network structure for effectively solving the cross-scale target detection problem, and outputting information such as a target detection frame and a category.
A functional block diagram of the data preparation 100 is shown in fig. 3. The method specifically comprises the following steps: the system comprises an application scene monitoring video set 101, a public target detection data set 102, a web crawler data set 103 and a data annotation 104.
The application scene monitoring video set 101 collects videos shot by cameras on the scene for preventing the runway from invading. Firstly, a video stream is pulled by using an RTSP protocol, then the pulled video stream information is analyzed by using a VideoCapture packet of Opencv, and the analyzed video frame is written into an mp4 video stream file by using a VideoWriter.
The open target detection dataset 102 collects open target detection datasets including COCO and VOC, and screens common target types (aircraft, vehicles, personnel, etc.) in the runway area from the datasets to form an open target detection dataset for runway intrusion applications.
The web crawler data set 103 uses Python crawler technology to search relevant images including airplanes, aircrafts, automobiles, pickup cards, vans, trucks, cars, pedestrians and the like from image retrieval websites such as Baidu images and dog images. The images collected by the web crawler will again be filtered by manual screening.
And the data labeling 104 is to mark the positions of the targets in the application scene monitoring video set 101 and the web crawler data set 103 by using a dark label software, and to express the positions by using a four-dimensional vector consisting of a mark frame central point x coordinate, a mark frame central point y coordinate, a mark frame width and a mark frame height. The public target detection data set 102 already contains annotation information and can be used directly.
The data augmentation 105, using a data augmentation method to increase data diversity, includes: (1) The image is turned/rotated left and right, so that the visual angle characteristic of the target on the image is enhanced; (2) Zooming images, and increasing the diversity of target scales in a data set; (3) And Mosaic, randomly combining and transforming the samples in the data set to generate a new jigsaw sample, and is used for increasing the diversity of the small target samples.
A functional structure block diagram of the trans-former-based cross-scale target detection algorithm 200 is shown in fig. 4. The method specifically comprises the following steps: a Backbone network Backbone 201, a Neck network tack 202 and a Head network Head 203.
The Backbone network backhaul 201 is used for extracting high-level image features. Since the conventional backbone network exhibits strong feature extraction capability in tasks such as image classification and target detection, such as VGG, resNet, densnet, mobileNet, efficientNet, CSPDarknet, etc., one of the above networks, including but not limited to the above networks, is usually used in the implementation process. The present invention uses the ResNet50 as the backbone network, but the use of other backbone networks is still protected by the present invention. For each layer in the backbone network,
the Neck network Neck 202 is used for fusing and reprocessing the features extracted from the main network from different scales and different processing stages. Common neck polymerization networks are FPN, PANET, NAS-FPN, biFPN, ASFF, SAM, etc. The invention uses the PANet as the neck network, respectively fuses the features of different scales output by the main network ResNet50 into the modules of the corresponding scales of the PANet, and respectively outputs the processed features for each scale. Note that the use of other neck aggregation networks is still protected by the invention.
The Head network Head 203 determines the target location and category according to the features extracted by the network. Because the scale difference of the targets in the runway invasion prevention scene is very large, the output different scale features respectively correspond to a head network and are used for extracting the targets with different scales. When a bottom-layer, high-resolution feature map is used, the network is more sensitive to tiny targets; when using high-level, low-resolution feature maps, networks are more sensitive to large objects. The head network of the invention uses a transform architecture, and each transform coding module is composed of a multi-attention-head (multi-attention-head) and a multi-layer perceptron (MLP).
The non-maximum suppression 204 is used for screening the target detection result output by the head network. By traversing all candidate frames, if there is an overlap between any two candidate frames, the intersection ratio between the two candidate frames is calculated (IoU). The candidate box with high confidence is assumed to have higher accuracy. Therefore, when the intersection ratio is greater than the threshold T, the two candidate frames are considered to belong to the same target, and the detection frame with higher confidence is retained. And finally realizing the non-maximum value inhibition of all candidate frames through continuous circulation, and outputting a final detection result.
The loss function design 205, i.e. the objective function in step S2 of the specific implementation, is used for parameter updating in model training: (1) A detection box loss for evaluating a distance between the estimated detection box and the true value; (2) The confidence loss is used for evaluating the reliability of the detection target; (3) The objective class penalty, using a cross entropy penalty function, is used to evaluate the distance between the estimated class and the truth.
In addition, the backbone network can use ResNet50 in the invention, and other types of backbone networks can be used instead, such as VGG, resNet, denseNet, mobileNet, efficientNet, CSPDarknet, swin transform, etc.;
the neck network can use the PANET in the invention, and other neck network substitutes can be used, such as FPN, NAS-FPN, biFPN, ASFF, SAM, etc.
2. Based on the same technical concept, the embodiment of the invention also provides a trans-former-based cross-scale target detection system, which comprises a key crossing monitoring camera, a visual processing server, a runway intrusion prevention display device, a key crossing video stream data processing module, a trans-former inference system and a runway intrusion prevention fusion display system, wherein the key crossing monitoring camera is connected with the visual processing server through a network;
the gateway monitoring camera is used for acquiring, coding and pushing real-time video data of a gateway;
the visual processing server is used for operating a core visual processing algorithm and realizing positioning and identification of targets in the video stream data of the key crossing; wherein the core vision processing algorithm is a transformer inference method;
and the runway intrusion prevention display equipment is used for displaying the real-time video data of the key road junction and simultaneously displaying the target position and the category information of the runway intrusion event.
The monitoring camera for the key way junction comprises a protective cover and a camera heating device, wherein the protective cover and the camera heating device are arranged on the outer layer of the camera;
the vision processing server includes: the system comprises a networking module and a high-performance computing module;
the networking module is used for data transmission between the front-end camera and the control terminal;
the high-performance computing module is used for carrying out reasoning computation on the input video data;
the runway intrusion prevention display apparatus includes: the large screen display module and the interactive display module;
the large-screen display module is used for displaying real-time videos and superposed signals of all monitored road junctions in the airport;
the interactive display module is used for displaying a control interactive interface for a controller;
the video stream data processing module for the key crossing comprises: a stream pulling unit and a decoding unit;
the stream pulling unit is used for pulling the video stream data acquired by the camera by using an RTSP (real time streaming protocol);
the decoding unit is used for decoding the pulled video stream into a video frame by adopting an FFMPEG built-in decoding algorithm;
the transformer reasoning system comprises: the system comprises a video frame preprocessing unit, a transformer detection and inference unit and a result prediction unit;
the video frame preprocessing unit is used for preprocessing the single-frame video data extracted by the decoding unit and normalizing the original video frame to the specified resolution by using a bilinear interpolation method;
the transform detection inference unit is used for inputting the preprocessing result into a transform cross-scale target detection model and outputting a target detection result, and comprises: detecting frames, confidence and target category information;
the result prediction unit is used for screening target detection results; if the two screening results are overlapped, calculating the intersection ratio between the two screening results, and when the intersection ratio is greater than a preset threshold value T, considering that the two detection frames belong to the same target, and deleting the detection frame with the lower confidence coefficient; finally realizing the non-maximum value inhibition of the overlapped frame through continuous circulation, and outputting a final detection result;
the anti-runway intrusion fusion display system includes: the system comprises a runway invasion prevention early warning module and a multi-source data fusion display module;
the anti-runway invasion early warning module is used for estimating the position of the target in the monitored crossing area by combining the target position and the category information output by the cross-scale target detection method;
wherein the category information includes aircraft, vehicles, and personnel;
the target position information comprises a contact road, a runway, the sky and an apron;
the multi-source data fusion display module is used for comprehensively displaying the activity condition of the target in the runway area by combining the video positioning data and the multi-source monitoring data of the airport scene;
the multi-source data fusion display module comprises: the system comprises a video positioning data display unit and a scene multi-source monitoring data display unit;
the video positioning data display unit is used for displaying the activity condition of the crossing target and whether the crossing target enters or leaves the runway;
the scene multi-source monitoring data display unit is used for comprehensively displaying the overall situation of the runway area target and accessing various monitoring source data according to the installation and use conditions of the actual scene monitoring source of the airport; when the video signal judges that a plurality of targets are included on the current runway, the target longitude and latitude information provided by the monitoring data is used for assisting the video signal to realize intrusion warning, and warning information is output through a plurality of information sources of screen flicker/warning sound;
wherein the monitoring source data comprises a field monitoring radar, ADS-B, MLAT. Meteorological data, light status data, and other airport surface data may also be included.
Example 2: embodiment 2 of the present invention provides a trans-former-based cross-scale target detection system, as shown in fig. 5. The method specifically comprises the following steps:
the system comprises a key crossing monitoring camera S100, a visual processing server S200, an anti-runway-invasion display device S300, a key crossing video stream data processing module S400, a transformer reasoning system S500 and an anti-runway-invasion fusion display system S600.
Wherein, the S100, S200, S300 describe a runway intrusion prevention hardware device, and the S400, S500, S600 describe a runway intrusion prevention software system.
The key way port monitoring camera S100 is used for acquiring, coding and pushing real-time video data of a key way port;
the vision processing server S200 is used for operating a core vision processing algorithm to realize tasks such as positioning and identifying a target in a video;
the runway intrusion prevention display device S300 is used for displaying a real-time video to enhance the situational awareness of a controller, and simultaneously, information such as a target position and a category is used for assisting the controller in comprehensive judgment;
the video stream data processing module S400 of the gateway interface, that is, the video stream data collected by the gateway interface monitoring camera S100, is used as input data of the runway intrusion prevention system;
the transform inference system S500, i.e., a core algorithm running on the visual processing server S200, is configured to implement cross-scale, high-precision, and efficient positioning and identification of a target in the video stream data processing module S400 of the key crossing;
the runway intrusion prevention fusion display system S600, i.e., a human-computer interaction system displayed on the runway intrusion prevention display device S300, is used to implement functions such as early warning and alarm for runway intrusion events.
In the trans-former based cross-scale object detection system provided in embodiment 2 of the present invention, a functional structure block diagram of a key way crossing monitoring camera S100 is shown in fig. 6. The method specifically comprises the following steps: a camera protection design S101 and a camera installation design S102.
The protection design S101 of the camera mainly considers the adaptability of the outdoor monitoring camera to the environment, and protective devices such as a protective cover are added on the outer layer of the camera. In addition, aiming at dust, rainy days and the like, a remote control wiper is designed, and aiming at snowy days, frozen days and the like, an external heating device of the camera is designed.
The camera mounting design S102 is that for the point locations such as a tower, a building roof of an airport terminal and the like, only one small platform is generally needed to be added for reinforcing the camera, and for the point locations in a flight area, an area near a lower sliding platform is selected, but due to the low position, a support with a certain height is generally needed to be added. The purpose of selecting the point positions is to avoid the construction work of large-scale power supply and network supply as much as possible.
The invention provides a trans-former-based cross-scale target detection system, wherein a functional structure block diagram of a visual processing server S200 is shown in FIG. 7. The method specifically comprises the following steps: networking function S201, high performance computing function S202.
And the networking function S201 is used for data transmission between the front-end camera and the control terminal. When the control terminal is located in an airport tower area, data transmission can be realized by using an airport 5G network, a 4G LTE network or an airport internal dedicated network and the like; when the control terminal is located in a remote control center, a provider private network is also needed to be added to realize remote transmission of video stream data.
In the high-performance computing function S202, the vision processing server needs to have high-performance computing public function, and is used for performing efficient inference computation on input video data, and generally, according to the scale of a monitored crossing, multiple high-performance Graphics Processing Units (GPUs) are considered to be provided. When one server cannot support the computation, a multi-server mode or a server cluster mode may be considered.
Among them, the functional structure block diagram of the anti-lane intrusion display apparatus S300 is shown in fig. 8. The method specifically comprises the following steps: large screen display S301 and interactive display S302.
And the large-screen display S301 is used for displaying real-time videos and superposed signals of all monitored road junctions in the airport by using a curved surface monitoring large screen, and is displayed in a mode of considering using a television wall.
And the interactive display S302 is realized by using a computer display screen and is used for displaying a control interactive interface for a controller.
The invention provides a trans-former-based cross-scale target detection system, wherein a functional structure block diagram of a video stream data S400 of a key crossing is shown in FIG. 9. The method specifically comprises the following steps: a stream pulling unit S401 and a decoding unit S402.
The stream pulling unit S401 uses the RTSP to pull the video stream data collected by the camera, and the adopted stream pulling connection form is generally RTSP:// username: password @ camera IP, wherein the username and password respectively represent the user name and password set for accessing the corresponding monitoring camera, and the camera IP represents the IP address of the camera.
The decoding unit S402 decodes the extracted video stream into video frames, and typically decodes the video stream by using an FFMPEG built-in decoding algorithm.
The functional structure block diagram of the transform inference method S500 is shown in fig. 10. The method specifically comprises the following steps: video frame preprocessing S501, transform detection reasoning S502 and result prediction S503.
The video frame preprocessing S501 is configured to preprocess the single-frame video data extracted by the decoding unit S402, and normalize the original video frame to a specified resolution, such as 480 × 640, by using a bilinear interpolation method.
The transform detection inference S502 is configured to input the preprocessing result into a transform inference network, where the network structure is detailed in a "transform-based cross-scale target detection method," and outputs a possible target detection box, a confidence level, and target category information.
The result prediction S503 is used to further screen the target detection result. If the two screening results are overlapped, calculating the intersection ratio between the two screening results, and when the intersection ratio is greater than a preset threshold value T, considering that the two detection frames belong to the same target, and deleting the detection frame with the lower confidence coefficient. And finally realizing non-maximum value suppression of the overlapped frame through continuous circulation, and outputting a final detection result.
The invention provides a trans-former-based cross-scale target detection system, wherein a functional structure block diagram of a runway intrusion prevention fusion display system S600 is shown in FIG. 11. The method specifically comprises the following steps:
the system comprises a runway invasion prevention early warning/alarming module S601 and a multi-source data fusion display module S602.
The runway invasion prevention early warning/alarming module S601 estimates the position of the target in the monitored crossing area by combining the target position and the category information output by the cross-scale target detection method S500. The category information mainly includes aircrafts, vehicles, people and others (such as animals), and the target position information includes contact roads, runways, sky, parking lots and the like. The center of the detection frame where the target is located is usually used to represent the position of the target, but due to the difference of the viewing angles, the center point of the detection frame may have a deviation of the viewing angle. In order to solve the problems, the invention utilizes the coordinates of the center point of the bottom edge of the detection frame to distinguish the position of the target, because the center point of the bottom edge of the ground target can be generally similar to the connection point of the target and the ground, thereby reducing the visual angle deviation caused by the height and realizing the high-precision target position estimation.
The multi-source data fusion display module S602 uses the video positioning data and the multi-source monitoring data of the airport scene at the same time, and is used for comprehensively displaying the activity condition of the target in the runway area. The video positioning data is mainly used for displaying the movement condition of the crossing target, whether the crossing target enters or leaves a runway or not and the like; the scene multi-source monitoring data is used for comprehensively displaying the overall situation of the runway area target, and various monitoring source data including a scene monitoring radar, ADS-B, MLAT and the like can be accessed according to the installation and use conditions of the actual scene monitoring source of the airport. When the video signal judges that a plurality of targets are included on the current runway, the target longitude and latitude information provided by the monitoring data can be used for assisting the video signal to realize functions of intrusion warning and the like, and warning information can be output through various information sources such as screen flicker/warning tone and the like.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (9)

1. A trans-former-based cross-scale target detection method is characterized by comprising the following steps:
generating a training data set;
taking the video image data in the training data set as a training sample, designing a target function, and training a pre-established target detection model;
and detecting the image to be detected by using the trained target detection model based on the trans-former trans-scale target detection algorithm, and outputting a target detection result to determine the target position and the category information.
2. The method of claim 1, wherein the generating a training data set comprises: constructing an application scene monitoring video set based on pre-collected video image data, and respectively establishing a public target detection data set and a web crawler data set by means of sorting existing public data and downloading crawler data; and carrying out data annotation on the video image data in each data set to form a training data set.
3. The method of claim 2, wherein the constructing of the application scene monitoring video set based on the pre-collected video image data, and the respectively establishing of the public target detection data set and the web crawler data set by sorting the existing public data and downloading the crawler data comprises:
receiving real-time video data of a key road port shot by a camera on the scene for preventing the runway from invading; using an RTSP protocol to pull a video stream, using a VideoCapture packet of Opencv to analyze the pulled video stream information, and using a VideoWriter to write an analyzed video frame into an mp4 video stream file to obtain an application scene monitoring video set;
screening common target types in the runway area from the existing public data to form a public target detection data set facing runway intrusion application;
wherein the common goals include: aircraft, vehicles, personnel;
searching keywords from an image retrieval website according to actual requirements, downloading relevant images by using a Python crawler technology, and carrying out secondary filtering on the relevant images collected by the web crawler in a manual screening mode to obtain a web crawler data set.
4. The method of claim 1, wherein the data labeling of the video image data in each data set comprises: and marking the position of a target in the application scene monitoring video set and the web crawler data set by using the dark label software, and expressing by using a four-dimensional vector consisting of a mark frame central point x coordinate, a mark frame central point y coordinate, a mark frame width and a mark frame height.
5. The method of claim 1, wherein the pre-established cross-scale target detection model comprises a backbone network, a neck network and a head network connected in sequence;
the backbone network is used for extracting high-level image features;
the neck network is distributed in each layer of the backbone network and is used for fusing and reprocessing the features extracted from the backbone network from different scales and different processing stages and respectively outputting the processed features to each scale;
the head network corresponds to the features of different scales output by the neck network one by one, and is used for determining a target detection result according to the features extracted by the neck network.
6. The method according to claim 1, characterized in that said designing an objective function comprises in particular: respectively designing a target function by taking the loss of a detection frame, the loss of confidence coefficient and the minimum loss of a target class as targets; wherein the content of the first and second substances,
the detection box loss is obtained by calculating the distance between the detection box and the true value;
the confidence loss is obtained by calculating the reliability of the detection target;
the target class loss is obtained based on a cross entropy loss function evaluating a distance between an estimated class and a true value.
7. The method of claim 1, wherein determining the target detection result further comprises: screening target detection results output by the head network to realize non-maximum value suppression of the detection frame;
the target detection result output by the screening head network specifically includes: traversing all the detection frames, and if any two detection frames are overlapped, calculating the intersection-parallel ratio between the two detection frames; when the intersection ratio is greater than a threshold value T, the two detection frames are considered to belong to the same target, and the detection frame with the higher confidence coefficient is reserved; and inhibiting the non-maximum values of all the detection frames through circulation, and outputting a final detection result.
8. A trans-former-based cross-scale target detection system is characterized by comprising a key crossing monitoring camera, a visual processing server, a runway intrusion prevention display device, a key crossing video stream data processing module, a trans-former inference system and a runway intrusion prevention fusion display system;
the monitoring camera of the key way opening is used for acquiring, coding and pushing real-time video data of the key way opening;
the visual processing server is used for operating a core visual processing algorithm and realizing positioning and identification of targets in the video stream data of the key crossing; wherein, the core vision processing algorithm is a transformer reasoning method;
and the runway intrusion prevention display equipment is used for displaying the real-time video data of the key road junction and simultaneously displaying the target position and the category information of the runway intrusion event.
9. The system of claim 8, wherein the crossing surveillance camera comprises a protective cover and a camera heating device disposed on an outer layer of the camera;
the vision processing server includes: the system comprises a networking module and a high-performance computing module;
the networking module is used for data transmission between the front-end camera and the control terminal;
the high-performance computing module is used for carrying out reasoning computation on input video data;
the runway incursion prevention display device includes: the large screen display module and the interactive display module;
the large-screen display module is used for displaying real-time videos and superposed signals of all monitored crossings in the airport;
the interactive display module is used for displaying a control interactive interface for a controller;
the video stream data processing module for the key crossing comprises: a stream pulling unit and a decoding unit;
the stream pulling unit is used for pulling the video stream data acquired by the camera by using an RTSP (real time streaming protocol);
the decoding unit is used for decoding the pulled video stream into a video frame by adopting an FFMPEG built-in decoding algorithm;
the transformer reasoning system comprises: the system comprises a video frame preprocessing unit, a transformer detection and inference unit and a result prediction unit;
the video frame preprocessing unit is used for preprocessing the single-frame video data extracted by the decoding unit and normalizing the original video frame to the specified resolution by using a bilinear interpolation method;
the transform detection inference unit is used for inputting the preprocessing result into a transform cross-scale target detection model and outputting a target detection result, and comprises: detecting frames, confidence and target category information;
the result prediction unit is used for screening target detection results; if the two screening results are overlapped, calculating the intersection ratio between the two screening results, and when the intersection ratio is greater than a preset threshold value T, considering that the two detection frames belong to the same target, so that the detection frame with the lower confidence coefficient is deleted; finally realizing the non-maximum value inhibition of the overlapped frame through continuous circulation, and outputting a final detection result;
the anti-runway intrusion fusion display system includes: the system comprises a runway invasion prevention early warning module and a multi-source data fusion display module;
the anti-runway invasion early warning module is used for estimating the position of the target in the monitored crossing area by combining the target position and the category information output by the cross-scale target detection method;
wherein the category information includes aircraft, vehicles, and personnel;
the target position information comprises a contact road, a runway, the sky and an apron;
the multi-source data fusion display module is used for comprehensively displaying the activity condition of the target in the runway area by combining the video positioning data and the multi-source monitoring data of the airport scene;
it includes: the system comprises a video positioning data display unit and a scene multi-source monitoring data display unit;
the video positioning data display unit is used for displaying the activity condition of the crossing target and whether the crossing target enters or leaves the runway;
the scene multi-source monitoring data display unit is used for comprehensively displaying the overall situation of the runway area target and accessing various monitoring source data according to the installation and use conditions of the actual scene monitoring source of the airport; when the video signal judges that a plurality of targets are included on the current runway, the target longitude and latitude information provided by the monitoring data is used for assisting the video signal to realize intrusion warning, and warning information is output through a plurality of information sources of screen flicker/warning sound;
wherein the monitoring source data comprises a field monitor radar, ADS-B, MLAT.
CN202210719122.0A 2022-06-23 2022-06-23 Transformer-based trans-scale target detection method and system Pending CN115205781A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210719122.0A CN115205781A (en) 2022-06-23 2022-06-23 Transformer-based trans-scale target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210719122.0A CN115205781A (en) 2022-06-23 2022-06-23 Transformer-based trans-scale target detection method and system

Publications (1)

Publication Number Publication Date
CN115205781A true CN115205781A (en) 2022-10-18

Family

ID=83579078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210719122.0A Pending CN115205781A (en) 2022-06-23 2022-06-23 Transformer-based trans-scale target detection method and system

Country Status (1)

Country Link
CN (1) CN115205781A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765482A (en) * 2024-02-22 2024-03-26 交通运输部天津水运工程科学研究所 garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765482A (en) * 2024-02-22 2024-03-26 交通运输部天津水运工程科学研究所 garbage identification method and system for garbage enrichment area of coastal zone based on deep learning
CN117765482B (en) * 2024-02-22 2024-05-14 交通运输部天津水运工程科学研究所 Garbage identification method and system for garbage enrichment area of coastal zone based on deep learning

Similar Documents

Publication Publication Date Title
US11182598B2 (en) Smart area monitoring with artificial intelligence
US9442176B2 (en) Bus lane infraction detection method and system
CN102724482B (en) Based on the intelligent vision sensing network moving target relay tracking system of GPS and GIS
KR101375583B1 (en) Object Density Estimation in Video
CN111833598B (en) Automatic traffic incident monitoring method and system for unmanned aerial vehicle on highway
CN116824859B (en) Intelligent traffic big data analysis system based on Internet of things
CN104159067A (en) Intelligent monitoring system and method based on combination of 3DGIS with real scene video
CN113593250A (en) Illegal parking detection system based on visual identification
CN115205781A (en) Transformer-based trans-scale target detection method and system
US11727580B2 (en) Method and system for gathering information of an object moving in an area of interest
JP2019215889A (en) Computer-implemented method, imaging system, and image processing system
Li et al. Intelligent transportation video tracking technology based on computer and image processing technology
Gao et al. A new curb lane monitoring and illegal parking impact estimation approach based on queueing theory and computer vision for cameras with low resolution and low frame rate
US11288519B2 (en) Object counting and classification for image processing
KR102516890B1 (en) Identification system and method of illegal parking and stopping vehicle numbers using drone images and artificial intelligence technology
Pan et al. Identifying Vehicles Dynamically on Freeway CCTV Images through the YOLO Deep Learning Model.
Glasl et al. Video based traffic congestion prediction on an embedded system
Ishak et al. Traffic counting using existing video detection cameras
KR102644659B1 (en) Road managing system and method
Gregor et al. Design and implementation of a counting and differentiation system for vehicles through video processing
Ke Real-time video analytics empowered by machine learning and edge computing for smart transportation applications
Wang et al. Real-time intersection vehicle turning movement counts from live UAV video stream using multiple object tracking
Bonderup et al. Preventing drowning accidents using thermal cameras
Kaluza et al. Traffic Collision Detection Using DenseNet
KR20240064507A (en) Apparatus for detecting objects of image based on deep learning, and method of the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination