CN115331074A - Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image - Google Patents

Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image Download PDF

Info

Publication number
CN115331074A
CN115331074A CN202210907265.4A CN202210907265A CN115331074A CN 115331074 A CN115331074 A CN 115331074A CN 202210907265 A CN202210907265 A CN 202210907265A CN 115331074 A CN115331074 A CN 115331074A
Authority
CN
China
Prior art keywords
global
perception
target detection
panoramic
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210907265.4A
Other languages
Chinese (zh)
Inventor
高坤
邵航
林浩哲
刘威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze Delta Region Institute of Tsinghua University Zhejiang
Original Assignee
Zhejiang Future Technology Institute (jiaxing)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Future Technology Institute (jiaxing) filed Critical Zhejiang Future Technology Institute (jiaxing)
Priority to CN202210907265.4A priority Critical patent/CN115331074A/en
Publication of CN115331074A publication Critical patent/CN115331074A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Studio Devices (AREA)

Abstract

The invention discloses a cross-scale panoramic sensing system and a cross-scale target detection method of a panoramic image. The trans-scale panoramic sensing system comprises a plurality of groups of fixing devices which are arranged into a regular polygon, wherein each group of fixing devices is fixed with a camera array fixing support, and each camera array fixing support is fixed with a group of camera arrays; the multiple groups of camera arrays respectively and correspondingly cover multiple peripheral areas in the circumferential direction of the regular polygon; each group of camera arrays comprises two local scene sensing devices and a global scene sensing device; the global scene perception areas of two adjacent sets of camera arrays have a perception overlap area. The sensing system provided by the invention has a simple structure, is convenient to deploy, solves the problem of cross-scale high-resolution panoramic imaging, meets the data acquisition and training conditions of large-scene panoramic high-precision target detection, and finally can realize hundred million-pixel-level high-precision panoramic target detection through the cross-scale target detection method actually designed by the invention.

Description

Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image
Technical Field
The application relates to the technical field of deep learning, in particular to a cross-scale panoramic sensing system and a cross-scale target detection method of a panoramic image.
Background
According to the prior art investigation condition, the existing panoramic target detection of large scenes mainly has three realization modes on the visual sensing layer: firstly, 360-degree field angle imaging is obtained through the double-fish glasses; secondly, acquiring panoramic imaging by panoramic deployment and control of multiple common cameras; and thirdly, panoramic imaging of the multi-view camera.
The double-fisheye lens and the multi-view camera have the characteristics of simple structure, convenience in deployment and the like, but have the defects of high price, poor edge imaging quality, low resolution and the like, so that the target detection precision is poor; the panoramic deployment and control of a plurality of common cameras require that the cameras are deployed and controlled at different places, and the cost is too high, dead angles exist and deployment structures are relatively discrete due to too many deployment and control points in a large scene.
Therefore, the inventor recognizes that, for the requirement of large-scene panoramic target detection, a large-scene panoramic hardware scheme and algorithm with simple structure, convenient deployment and high-precision target detection performance are urgently needed.
Disclosure of Invention
Based on the above, in order to solve the technical problem, a cross-scale panoramic sensing system and a cross-scale target detection method of a panoramic image are provided.
In a first aspect, a cross-scale panoramic sensing system comprises a plurality of groups of fixing devices which are arranged into a regular polygon, wherein each group of fixing devices is fixed with a camera array fixing support, and each camera array fixing support is fixed with a group of camera arrays; the multiple groups of camera arrays respectively and correspondingly cover multiple peripheral areas in the circumferential direction of the regular polygon;
each group of camera arrays comprises three pieces of perception imaging equipment, the three pieces of perception imaging equipment comprise two pieces of local scene perception equipment and one piece of global scene perception equipment, and the global scene perception equipment is fixed between the two pieces of local scene perception equipment;
the global scene perception areas of the global scene perception devices of the two adjacent groups of camera arrays are provided with perception overlapping areas; for each group of camera arrays, the field angle of the global scene sensing device covers the field angles of two adjacent local sensing devices, and the vertical field angle of the global sensing device is greater than or equal to 2 times of that of the local scene sensing devices.
Optionally, the regular polygon is a regular octagon, and the fixing devices are eight groups.
Further optionally, the optical axes of the fields of view of all the perception imaging devices are coplanar with the regular octagon, and the reverse extension lines of the optical axes of the fields of view of all the global scene perception devices pass through the center of the regular octagon.
Further optionally, the horizontal viewing angle of each global scene awareness device is greater than or equal to 60 °, and the perceived image resolution of each global scene awareness device and each local scene awareness device is greater than or equal to 1400 ten thousand pixels.
In a second aspect, a method for detecting a cross-scale target facing a gigapixel-level panoramic image includes:
step one, a target detection training data acquisition device is set up, wherein the training data acquisition device comprises a group of camera arrays in the trans-scale panoramic sensing system provided by the first aspect and a camera array fixing support for fixing the camera arrays; calibrating the position of sensing equipment for the built target detection training data acquisition device, and training by using the built target detection training data acquisition device to obtain a cross-scale target detection model;
acquiring images by using the cross-scale panoramic sensing system provided by the first aspect to obtain 8 global sensing images and 16 local sensing images, and calculating a transformation matrix of adjacent images of the global sensing images; splicing 8 global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I G
Step three, respectively utilizing the 16 local perception images acquired in the step two to obtain a cross-scale target detection model through training in the step one for target detection, and obtaining the coordinate position of a target in each local perception image; obtaining the coordinate position of the target in the corresponding global perception image according to the coordinate position of the target in each local perception image, and transforming the target positions in 8 global perception images by using the transformation matrix of the adjacent images to obtain a 360-degree panoramic stitching image I G The specific location of all targets.
Optionally, the step one of calibrating the device position of the set-up target detection training data acquisition device, and training by using the set-up target detection training data acquisition device to obtain the cross-scale target detection model includes:
position calibration is carried out on the global scene sensing equipment of the two local scene sensing equipment in the target detection training data acquisition device by using a characteristic point matching method, and a mapping matrix M of the two local scene sensing equipment in the camera array relative to the global scene sensing equipment is obtained 1 And M 2
Acquiring image data at a specific place by using a built target detection training data acquisition device to obtain a local perception image data set and a global perception image data set, and marking a positioning frame on a target which is acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a test set and a verification set according to a preset proportion; the global perception image data set with the labels does not participate in training, and is used as a GrounTruth to restrict the training of a cross-scale target detection model;
using existing target detection networks and their loss functions L det Training the training set, wherein a loss function during training is defined as:
L=L det (local)+λL aet (global)
wherein L is det (local) represents the loss function of the existing target detection network training local perceptual image dataset, L det (global) denotes the global perceptual dataset with labels as a loss function of the GrounTruth constraint, and λ denotes L det (global) weight coefficient; l is det The (global) calculation formula is:
L det (global)=L det (PredM i )
wherein Pred is the predicted target position based on the local perceptual image; predM i Representing Pred passing through the mapping matrix M j Transforming to a global perceptual image;
and (4) iterating the training model according to a preset training strategy until the loss function L is converged to obtain a cross-scale target detection model.
Further optionally, the second step specifically includes:
constructing the cross-scale panoramic sensing system provided by the first aspect, synchronously acquiring image data of 8 groups of camera arrays by using a software-based timestamp synchronous triggering method, and respectively recording 8 global images and 16 local images sensed by the 8 groups of camera arrays at t moment
Figure BDA0003772904530000041
And
Figure BDA0003772904530000042
for a sequence of globally perceived images at time t (G) 1 ,G 2 ,...,G 8 ) Respectively carrying out feature detection and matching on adjacent images in the sequence by using a feature extraction and matching algorithm to obtain a transformation matrix (TR) of the adjacent images of the global perception image 21 ,TR 32 ,...,TR 87 );
Using a transformation matrix (TR) of said adjacent images 21 ,TR 32 ,...,TR 87 ) Splicing 8 global perception images to obtain a 360-degree panoramic spliced image I G
Further optionally, in the third step, the coordinate position of the target in the corresponding global perception image is obtained according to the coordinate position of the target in each local perception image, specifically according to the mapping momentMatrix M 1 And M 2 Mapping the coordinate position of the target in each local perception image back to the corresponding global perception image to obtain the coordinate position of the target in the corresponding global perception image; the target position in the 8 global perception images is transformed by using the transformation matrix of the adjacent images, in particular by using the transformation matrix (TR) of the adjacent images 21 ,TR 32 ,...,TR 87 ) And transforming the target positions in the 8 global perception images.
Further optionally, the feature point matching method is an SIFT algorithm; the specific sites include pedestrian streets and intersections.
Further optionally, the preset ratio is 8: 1; the target detection network is a Yolov5 neural network.
The invention has at least the following beneficial effects:
the embodiment of the invention provides an unstructured hundred million-pixel-level trans-scale panoramic sensing system and a target detection method for large-scene panoramic target detection; the unstructured hundred million pixel level sensing system is simple in structure and convenient to deploy, the problem of cross-scale high-resolution panoramic imaging is solved, data acquisition and training conditions of large-scene panoramic high-precision target detection are met, and finally the cross-scale target detection method designed by the embodiment of the invention can realize hundred million pixel level panoramic target detection; the target detection method provided by the embodiment of the invention performs large-scale detection on the target through the local scene perception image, and then maps the target to the global scene perception image through the perception equipment calibration relation, thereby finally realizing cross-scale high-precision target detection; compared with the existing panoramic target detection solution, the panoramic target detection method has higher pixel and precision.
Drawings
Fig. 1 is a schematic structural diagram of an unstructured cross-scale panoramic sensing system according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a cross-scale target detection training data acquisition device according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a cross-scale target detection method for a megapixel-level panoramic image according to an embodiment of the present invention;
fig. 4 is a block diagram of a block architecture of a cross-scale object detection apparatus for a megapixel-level panoramic image according to an embodiment of the present invention.
Description of the reference numerals:
1. a fixing device;
2. an array of cameras; 21. a first local scene aware device; 22. a second local scene awareness device; 23. a global scene aware device;
3. camera array fixed bolster.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Example one
In this embodiment, an unstructured cross-scale panoramic sensing system is provided, which implements panoramic sensing imaging by adopting a polygonal array design, and includes a plurality of sets of fixing devices arranged in a regular polygon, each set of fixing device is fixed with a camera array fixing bracket, and each camera array fixing bracket is fixed with a set of camera arrays; the multiple groups of camera arrays respectively and correspondingly cover multiple peripheral areas in the circumferential direction of the regular polygon;
each group of camera arrays comprises three pieces of perception imaging equipment, the three pieces of perception imaging equipment comprise two pieces of local scene perception equipment and one piece of global scene perception equipment, and the global scene perception equipment is fixed between the two pieces of local scene perception equipment;
the global scene sensing areas of the global scene sensing equipment of the two adjacent groups of camera arrays are provided with sensing overlapping areas; for each group of camera arrays, the field angle of the global scene sensing device covers the field angles of two adjacent local sensing devices, and the vertical field angle of the global sensing device is greater than or equal to 2 times of that of the local scene sensing devices.
Wherein the regular polygon may be, but is not limited to, an octagon, a decagon, etc.
The unstructured hundred million-pixel-level sensing system provided by the embodiment of the invention has the advantages that the hardware structure is simple, the deployment is convenient, the problem of cross-scale high-resolution panoramic imaging is solved, the data acquisition and training conditions of large-scene panoramic high-precision target detection are met, and a foundation is provided for the large-scene panoramic high-precision target detection.
Example two
In the present embodiment, an unstructured cross-scale panoramic sensing system is provided, as shown in fig. 1 (a), comprising eight sets of fixtures 1 arranged in a regular octagon; that is, around the fixture 1, 8 sets of camera arrays 2 are arranged around the polygonal array, and the peripheral regions in the circumferential direction of the regular octagon are respectively covered by the 8 sets of camera arrays 2. In fig. 1 (a), 2 is a set of camera arrays, each set of camera arrays 2 comprising three perceptual imaging devices. Specifically, as shown in fig. 1 (a) and (b), each group of fixing devices 1 is fixed with one camera array fixing support 3, each camera array fixing support 3 is fixed with one group of camera arrays 2, and each group of camera arrays 2 includes three sensing imaging devices; the three sensing imaging devices specifically include two local scene sensing devices (a first local scene sensing device 21 and a second local scene sensing device 22) and one global scene sensing device 23, and the global scene sensing device 23 is fixed between the first local scene sensing device 21 and the second local scene sensing device 22.
For each group of camera arrays 2, the field angle of the global scene sensing device 23 needs to cover the field angles of the two local scene sensing devices, the system includes 8 global sensing devices and 16 local sensing devices, and the resolution of the sensed image of each global scene sensing device 23 and each local scene sensing device is not lower than 1400 ten thousand pixels.
The system is characterized in that: the global scene sensing area of the global scene sensing device 23 of two adjacent groups of camera arrays 2 needs to have a certain sensing overlapping area, taking 8 groups of camera arrays 2 in fig. 1 as an example, the horizontal field angle of a single global sensing device theoretically needs to reach more than 45 degrees, in order to avoid dead angles and facilitate subsequent algorithm development, a certain sensing overlapping area size needs to be ensured, and therefore the horizontal field angle of each actual global sensing device should not be less than 60 degrees; for a group of camera arrays 2, the vertical field angle of the global sensing device should be not less than 2 times of that of the local sensing device, and the local sensing device of the system can realize any angle adjustment to realize unstructured deployment.
In addition, as shown in fig. 2, a large-scene cross-scale target detection training data acquisition device facing the training requirement of the panoramic perception target detection model is provided. Similar to the unstructured cross-scale panoramic sensing system provided above, the system includes a set of camera array fixing supports 3 and a set of camera arrays 2, and specifically includes two local scene sensing devices (a first local scene sensing device 21 and a second local scene sensing device 22) and a global scene sensing device 23, and the configuration requirements of the training data acquisition apparatus are the same as those of the local scene sensing devices (the first local scene sensing device 21 and the second local scene sensing device 22) and the global scene sensing device 23 in the unstructured cross-scale panoramic sensing system in fig. 1. The optical axes of the fields of view of the first and second local scene sensing devices 21, 22 can be flexibly adjusted as long as the global scene sensing device 23 is ensured to cover the field angles of the first and second local scene sensing devices 21, 22.
In fig. 2, a is an assumed perceptual scene object, B is a result of schematic imaging of a global scene perception object by the global scene perception device 23, C, D respectively represents a result of schematic imaging of a local scene perception object by the first local scene perception device 21 and the second local scene perception device 22, and E is a region schematic in B corresponding to C, D. As can be seen from fig. 2, the target scales captured by the global scene sensing device 23 and the local scene sensing device have obvious difference, and the size of the target under global scene sensing is small, so that the target detection precision of the global scene sensing image is poor, large-scale detection of the target can be performed through the local scene sensing image, and the target is mapped to the global scene sensing image through the sensing device calibration relationship, so that cross-scale high-precision target detection is realized.
The unstructured hundred million-pixel-level sensing system provided by the embodiment of the invention has the advantages that the hardware structure is simple, the deployment is convenient, the problem of cross-scale high-resolution panoramic imaging is solved, the data acquisition and training conditions of large-scene panoramic high-precision target detection are met, and a foundation is provided for the large-scene panoramic high-precision target detection.
EXAMPLE III
In the present embodiment, as shown in fig. 3, a method for detecting a trans-scale object facing a megapixel-level panoramic image is provided, which includes the following steps:
step S301, with reference to the hardware descriptions of fig. 1 and fig. 2 in the second embodiment, building a target detection training data acquisition device shown in fig. 2, that is, the training data acquisition device includes a set of camera arrays in the cross-scale panoramic sensing system of the cross-scale panoramic sensing system provided in the second embodiment and a camera array fixing bracket for fixing the camera arrays; and calibrating the position of the sensing equipment for the built target detection training data acquisition device, and training by using the built target detection training data acquisition device to obtain a cross-scale target detection model.
Step S301, namely training a large-scene cross-scale target detection model, wherein the equipment position calibration is carried out on the set target detection training data acquisition device, and the cross-scale target detection model is obtained by training the set target detection training data acquisition device, and the training specifically comprises the following steps:
(1) Position calibration is carried out on two local scene perception devices and a global scene perception device in a target detection training data acquisition device by using a characteristic point matching method, and a mapping matrix M of the two local scene perception devices (21 and 22 in figure 2) in a camera array relative to the global scene perception device (23 in figure 2) is obtained 1 And M 2 (ii) a The feature point matching method may be, but is not limited to, a SIFT algorithm;
(2) Carrying out diversity large scene data acquisition by utilizing the set-up target detection training data acquisition hardware, namely acquiring image data at specific places including places such as pedestrian streets, crossroads and the like so as to obtain a local perception image data set and a global perception image data set, and carrying out positioning frame labeling on targets acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a testing set and a verification set according to a preset ratio (8: 1); the global perception image data set with the labels does not participate in training, and is used as a GrounTruth to restrict the training of a cross-scale target detection model;
(3) Using existing target detection networks, e.g. Yolov5 neural networks, and their loss functions L det Training the training set, wherein a loss function during training is defined as follows:
L=L det (local)+λL det (global)
wherein L is de (local) represents a loss function of training local perception image data set of existing target detection network such as Yolov5 neural network, L det (global) denotes the global perceptual data set with labels as a loss function of the GrounTruth constraint, i.e. a loss function based on the global perceptual label data constraint, and λ denotes L det (global) weight coefficient; l is det (global) calculation formula is as follows:
L det (global)=L det (PredM i )
where Pred is the target location predicted based on the locally perceived image, which passes through the mapping matrix M j Transforming to a global perception image, and then solving for loss with a global perception labeling result; predM i Representing Pred passing through the mapping matrix M i Transforming to a global perceptual image; mapping matrix M i I.e. the mapping matrix M 1 And M 2
(4) And (3) iteratively training the model according to a preset training strategy (including learning rate, batch training, epoch training, an optimizer and the like) until the loss function L is converged to obtain the large-scene cross-scale target detection model.
Step S302, collecting images by using the cross-scale panoramic sensing system provided in the second embodiment to obtain 8 global sensing images and 16 local perception images, and calculating to obtain a transformation matrix of adjacent images of the global perception image; splicing 8 global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I G
Step S302 is to perform megapixel-level panoramic image fusion calculation, and step S302 specifically includes:
(1) The cross-scale panoramic sensing system provided in fig. 1 of the second embodiment is built, and a software-based timestamp synchronization triggering method is used to perform synchronous acquisition of image data of 8 sets of camera arrays, assuming that at a certain time t,8 global images and 16 local images sensed by the 8 sets of camera arrays are respectively recorded as
Figure BDA0003772904530000091
And
Figure BDA0003772904530000092
(2) For a sequence of globally perceived images at time t (G) 1 ,G 2 ,...,G 8 ) Respectively carrying out feature detection and matching on adjacent images in the sequence by using a feature extraction and matching algorithm to obtain a transformation matrix (TR) of the adjacent images of the global perception image 21 ,TR 32 ,...,TR 87 ) Adjacent images can be spliced by using the matrix;
(3) Using a transformation matrix (TR) of said adjacent images 21 ,TR 32 ,...,TR 87 ) Splicing 8 global perception images to obtain a 360-degree panoramic spliced image I G
Step S303, respectively carrying out target detection on the 16 local perception images acquired in the step S302 by using the cross-scale target detection model obtained by training in the step S301 to obtain the coordinate position of a target in each local perception image; obtaining the coordinate position of the target in the corresponding global perception image according to the coordinate position of the target in each local perception image, and transforming the target positions in 8 global perception images by using the transformation matrix of the adjacent images to obtain a 360-degree panorama mosaic imageLike I G The specific positions of all targets in the system realize cross-scale target detection of hundred million pixel level panorama.
Step S303, obtaining the coordinate position of the target in the corresponding global perception image according to the coordinate position of the target in each local perception image, specifically according to the mapping matrix M 1 And M 2 Mapping the coordinate position of the target in each local perception image back to the corresponding global perception image to obtain the coordinate position of the target in the corresponding global perception image; the target position in the 8 global perception images is transformed by using the transformation matrix of the adjacent images, in particular by using the transformation matrix (TR) of the adjacent images 21 ,TR 32 ,...,TR 87 ) And transforming the target positions in the 8 global perception images.
It should be understood that, although the steps in the flowchart of fig. 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The conventional target detection usually carries out target positioning and identification on a local scene perceived by a monocular camera, and the overall perception of the large scene and the high-precision detection of a multi-scale target are realized based on the panoramic high-precision target detection requirement of the large scene, which puts higher requirements on panoramic imaging hardware and a target detection algorithm.
Therefore, the embodiment of the invention provides unstructured hundred million-pixel-level trans-scale panoramic sensing system hardware and a target detection method for large-scene panoramic target detection. The unstructured hundred million pixel level sensing system hardware is simple in structure and convenient to deploy, the problem of cross-scale high-resolution panoramic imaging is solved, data acquisition and training conditions of large-scene panoramic high-precision target detection are met, and finally the cross-scale target detection method designed by the embodiment of the invention can realize hundred million pixel level panoramic target detection; the target detection method provided by the embodiment of the invention performs large-scale detection on the target through the local scene perception image, maps the target to the global scene perception image through the perception equipment calibration relation, and finally realizes cross-scale high-precision target detection.
In one embodiment, as shown in fig. 4, there is provided a cross-scale object detection apparatus facing a megapixel-level panoramic image, comprising the following modules:
a cross-scale target detection model training module 401, configured to set up a target detection training data acquisition device, where the training data acquisition device includes a set of camera arrays in the cross-scale panoramic sensing system provided in the second embodiment and a camera array fixing support for fixing the camera arrays; calibrating the position of sensing equipment for the built target detection training data acquisition device, and training by using the built target detection training data acquisition device to obtain a cross-scale target detection model;
a panorama stitched image generating module 402, configured to acquire images by using the trans-scale panorama sensing system provided in the second embodiment, to obtain 8 global sensing images and 16 local sensing images, and calculate a transformation matrix of adjacent images of the global sensing images; splicing 8 global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I G
The target position detection module 403 is used for training the 16 local perception images collected in the panoramic mosaic image generation module by using a cross-scale target detection model training module to obtain a cross-scale target detection model for target detection, so as to obtain the coordinate position of a target in each local perception image; obtaining a corresponding global perception map of the target according to the coordinate position of the target in each local perception imageCoordinate positions in the images, and converting the target positions in 8 global perception images by using the transformation matrix of the adjacent images to obtain a 360-degree panoramic stitching image I G The specific location of all targets.
For specific definition of a cross-scale target detection apparatus for a megapixel-level panoramic image, reference may be made to the above definition of a cross-scale target detection method for a megapixel-level panoramic image, and details are not repeated here. The modules in the cross-scale target detection device facing the hundred million pixel level panoramic image can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, and relates to all or part of the flow of the method of the above embodiment.
In one embodiment, a computer-readable storage medium having a computer program stored thereon is provided, which relates to all or part of the processes of the above-described embodiment methods.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A cross-scale panoramic sensing system is characterized by comprising a plurality of groups of fixing devices which are arranged into a regular polygon, wherein each group of fixing devices is fixed with a camera array fixing support, and each camera array fixing support is fixed with a group of camera arrays; the multiple groups of camera arrays respectively and correspondingly cover multiple peripheral areas in the circumferential direction of the regular polygon;
each group of camera arrays comprises three pieces of perception imaging equipment, the three pieces of perception imaging equipment comprise two pieces of local scene perception equipment and one piece of global scene perception equipment, and the global scene perception equipment is fixed between the two pieces of local scene perception equipment;
the global scene sensing areas of the global scene sensing equipment of the two adjacent groups of camera arrays are provided with sensing overlapping areas; for each group of camera arrays, the field angle of the global scene sensing device covers the field angles of two adjacent local sensing devices, and the vertical field angle of the global sensing device is greater than or equal to 2 times of that of the local scene sensing devices.
2. The system of claim 1, wherein the regular polygon is a regular octagon, and the fixing devices are in eight groups.
3. The trans-scale panoramic sensing system according to claim 2, wherein the optical axes of the fields of view of all the sensing imaging devices are coplanar with the regular octagon, and the reverse extensions of the optical axes of the fields of view of all the global scene sensing devices pass through the center of the regular octagon.
4. The system of claim 3, wherein the horizontal viewing angle of each global scene awareness device is greater than or equal to 60 °, and the perceived image resolution of each global scene awareness device and each local scene awareness device is greater than or equal to 1400 million pixels.
5. A cross-scale target detection method facing hundred million pixel level panoramic images is characterized by comprising the following steps:
step one, a target detection training data acquisition device is set up, wherein the training data acquisition device comprises a group of camera arrays in the trans-scale panoramic sensing system of claim 4 and a camera array fixing support for fixing the camera arrays; calibrating the position of sensing equipment for the built target detection training data acquisition device, and training by using the built target detection training data acquisition device to obtain a cross-scale target detection model;
acquiring images by using the trans-scale panoramic sensing system of claim 4 to obtain 8 global sensing images and 16 local sensing images, and calculating a transformation matrix of adjacent images of the global sensing images; splicing 8 global perception images by using the transformation matrix of the adjacent images of the global perception images to obtain a 360-degree panoramic spliced image I G
Step three, respectively utilizing the 16 local perception images acquired in the step two to obtain a cross-scale target detection model through training in the step one to carry out target detection to obtainTo the coordinate position of the target in each local perception image; obtaining the coordinate position of the target in the corresponding global perception image according to the coordinate position of the target in each local perception image, and transforming the target positions in 8 global perception images by using the transformation matrix of the adjacent images to obtain a 360-degree panoramic stitching image I G The specific location of all targets.
6. The cross-scale target detection method facing hundred million pixel level panoramic images according to claim 5, wherein the step one of calibrating the device position of the built target detection training data acquisition device and training by using the built target detection training data acquisition device to obtain the cross-scale target detection model comprises the following steps:
position calibration is carried out on the global scene sensing equipment of the two local scene sensing equipment in the target detection training data acquisition device by using a characteristic point matching method, and a mapping matrix M of the two local scene sensing equipment in the camera array relative to the global scene sensing equipment is obtained 1 And M 2
Acquiring image data at a specific place by using a built target detection training data acquisition device to obtain a local perception image data set and a global perception image data set, and marking a positioning frame on a target which is acquired in the local perception image data set and the global perception image data set; dividing a local perception image data set with labels into a training set, a testing set and a verification set according to a preset proportion; the global perception image data set with the labels does not participate in training, and is used as a GrounTruth to restrict the training of a cross-scale target detection model;
using existing target detection networks and their loss functions L det Training the training set, wherein a loss function during training is defined as:
L=L det (local)+λL det (global)
wherein L is det (local) represents the loss function of the existing target detection network training local perceptual image dataset, L det (Global) denotes Global with annotationsPerceptual data set as a loss function of the GrounTruth constraint, with λ representing L det (global) weight coefficient; l is det The (global) calculation formula is:
L det (global)=L det (PredM i )
wherein Pred is the predicted target position based on the local perceptual image; predM i Representing Pred passing through the mapping matrix M i Transforming to a global perceptual image;
and (4) iterating the training model according to a preset training strategy until the loss function L is converged to obtain a cross-scale target detection model.
7. The method for detecting the cross-scale target of the megapixel-level-oriented panoramic image according to claim 6, wherein the second step specifically comprises:
constructing the trans-scale panoramic perception system of any one of claims 1 to 3, synchronously acquiring image data of 8 groups of camera arrays by using a software-based timestamp synchronous triggering method, and respectively recording 8 global images and 16 local images perceived by the 8 groups of camera arrays at t moment
Figure FDA0003772904520000031
And
Figure FDA0003772904520000032
for a sequence of globally perceived images at time t (G) 1 ,G 2 ,...,G 8 ) Respectively carrying out feature detection and matching on adjacent images in the sequence by using a feature extraction and matching algorithm to obtain a transformation matrix (TR) of the adjacent images of the global perception image 21 ,TR 32 ,...,TR 87 );
Using a transformation matrix (TR) of said adjacent images 21 ,TR 32 ,...,TR 87 ) Splicing 8 global perception images to obtain a 360-degree panoramic spliced image I G
8. According toThe method for detecting objects across scales facing to hundred million pixel level panoramic images as claimed in claim 7, wherein the step three is to obtain the coordinate position of the object in the corresponding global perception image according to the coordinate position of the object in each local perception image, specifically according to the mapping matrix M 1 And M 2 Mapping the coordinate position of the target in each local perception image back to the corresponding global perception image to obtain the coordinate position of the target in the corresponding global perception image; the target positions in the 8 global perception images are transformed by using the transformation matrix of the adjacent images, in particular by using the transformation matrix (TR) of the adjacent images 21 ,TR 32 ,...,TR 87 ) And transforming the target positions in the 8 global perception images.
9. The inter-scale target detection method for the hundred million pixels level panoramic image according to claim 6, wherein the feature point matching method is SIFT algorithm; the specific sites include pedestrian streets and intersections.
10. The method for detecting the cross-scale target of the hundred million pixel level-oriented panoramic image according to claim 6, wherein the preset ratio is 8: 1; the target detection network is a Yolov5 neural network.
CN202210907265.4A 2022-07-29 2022-07-29 Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image Pending CN115331074A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210907265.4A CN115331074A (en) 2022-07-29 2022-07-29 Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210907265.4A CN115331074A (en) 2022-07-29 2022-07-29 Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image

Publications (1)

Publication Number Publication Date
CN115331074A true CN115331074A (en) 2022-11-11

Family

ID=83918730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210907265.4A Pending CN115331074A (en) 2022-07-29 2022-07-29 Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image

Country Status (1)

Country Link
CN (1) CN115331074A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726511A (en) * 2024-02-18 2024-03-19 科睿特软件集团股份有限公司 Panoramic imaging device and method for tourism landscape display

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726511A (en) * 2024-02-18 2024-03-19 科睿特软件集团股份有限公司 Panoramic imaging device and method for tourism landscape display
CN117726511B (en) * 2024-02-18 2024-05-03 科睿特软件集团股份有限公司 Panoramic imaging device and method for tourism landscape display

Similar Documents

Publication Publication Date Title
CN110310248B (en) A kind of real-time joining method of unmanned aerial vehicle remote sensing images and system
JP5054971B2 (en) Digital 3D / 360 degree camera system
US7831089B2 (en) Modeling and texturing digital surface models in a mapping application
CN102984453B (en) Single camera is utilized to generate the method and system of hemisphere full-view video image in real time
CN103971375B (en) A kind of panorama based on image mosaic stares camera space scaling method
US10757327B2 (en) Panoramic sea view monitoring method and device, server and system
CN113869231B (en) Method and equipment for acquiring real-time image information of target object
CN110782498A (en) Rapid universal calibration method for visual sensing network
US11703820B2 (en) Monitoring management and control system based on panoramic big data
JP5780561B2 (en) Visibility video information generator
CN112348775A (en) Vehicle-mounted all-round-looking-based pavement pool detection system and method
CN115331074A (en) Cross-scale panoramic sensing system and cross-scale target detection method of panoramic image
CN113436130B (en) Intelligent sensing system and device for unstructured light field
CN115439528A (en) Method and equipment for acquiring image position information of target object
Zhou et al. Automatic orthorectification and mosaicking of oblique images from a zoom lens aerial camera
CN109544455B (en) Seamless fusion method for ultralong high-definition live-action long rolls
CN110796690B (en) Image matching method and image matching device
KR102076635B1 (en) Apparatus and method for generating panorama image for scattered fixed cameras
CN110940318A (en) Aerial remote sensing real-time imaging method, electronic equipment and storage medium
Sankaranarayanan et al. A fast linear registration framework for multi-camera GIS coordination
EP4075789A1 (en) Imaging device, imaging method, and program
CN115330838A (en) Cross-scale target tracking method and device
CN112822442A (en) Heat map generation method and device and electronic equipment
CN114463164A (en) Stereo video fusion method for vehicle fleet
CN112017138B (en) Image splicing method based on scene three-dimensional structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240204

Address after: 314001 9F, No.705, Asia Pacific Road, Nanhu District, Jiaxing City, Zhejiang Province

Applicant after: ZHEJIANG YANGTZE DELTA REGION INSTITUTE OF TSINGHUA University

Country or region after: China

Address before: No.152 Huixin Road, Nanhu District, Jiaxing City, Zhejiang Province 314000

Applicant before: ZHEJIANG FUTURE TECHNOLOGY INSTITUTE (JIAXING)

Country or region before: China