CN114677658A - Billion-pixel dynamic large-scene image acquisition and multi-target detection method and device - Google Patents

Billion-pixel dynamic large-scene image acquisition and multi-target detection method and device Download PDF

Info

Publication number
CN114677658A
CN114677658A CN202210234371.0A CN202210234371A CN114677658A CN 114677658 A CN114677658 A CN 114677658A CN 202210234371 A CN202210234371 A CN 202210234371A CN 114677658 A CN114677658 A CN 114677658A
Authority
CN
China
Prior art keywords
road
image
large scene
targets
candidate frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210234371.0A
Other languages
Chinese (zh)
Other versions
CN114677658B (en
Inventor
方璐
李晓飞
戴琼海
董众
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210234371.0A priority Critical patent/CN114677658B/en
Priority claimed from CN202210234371.0A external-priority patent/CN114677658B/en
Publication of CN114677658A publication Critical patent/CN114677658A/en
Application granted granted Critical
Publication of CN114677658B publication Critical patent/CN114677658B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides a dynamic large scene image acquisition and multi-target detection method and device based on billions of pixels, wherein the method comprises the steps of acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections; generating a large scene depth image of the whole road section according to the images of the plurality of road sections and the depth information of the plurality of road sections; the method comprises the steps of adopting a segmentation algorithm to segment a large scene depth image to extract at least one candidate frame, inputting an image in the at least one candidate frame into a convolutional neural network to extract features, and obtaining an image feature map in each candidate frame; inputting the image characteristic diagrams in the candidate frames into a full-connection neural network for primary classification of targets, and obtaining final classification results of the targets according to the primary classification results of the targets, the physical sizes of the targets in the candidate frames and class label sets marking the physical size ranges of the targets; and generating a target detection result according to the final classification result of the target and the information of the road upper and lower road maps.

Description

Billion-pixel dynamic large-scene image acquisition and multi-target detection method and device
Technical Field
The invention belongs to the field of road side perception and vehicle-road interaction vehicle-road cooperation and the field of collection and detection of billion pixel dynamic large scene images.
Background
Automatic driving is one of the development trends of the future automobile industry, and the safety of automatic driving is related to the life and property safety of an automobile owner. The automatic driving vehicle faces the safety problem that the perception is shielded and disabled and the road information is difficult to perceive quickly and accurately due to the fact that the range of a vehicle-mounted end sight line is limited. The roadside perception technology provides the capability of perceiving the road condition ahead in advance for the limited vehicle-mounted end sight line range. Road side perception mainly realizes detection, tracking, positioning and the like of a road target by means of various sensors such as a camera, a laser radar and the like and by means of models such as computer vision target detection and tracking and a multi-sensor fusion theory and the like.
Compare in on-vehicle perception, roadside perception has following advantage: (1) in the aspect of perception range and target tracking, the roadside sensor is installed on the road rod piece, the scene range of the bird's-eye view perception is larger, the types and the number of targets which can be detected and tracked for a long time are more, and therefore road environment information provided by roadside perception is far more than and richer in vehicle-mounted perception. (2) In terms of training data samples, roadside perception may provide higher computational power and more network models may be selected, such as more complex network models, than vehicle-mounted perception. (3) In the aspect of a perception algorithm, the perception region background of roadside perception and vehicle-mounted perception is different, the perception region background of a vehicle-mounted sensor is dynamic, and the perception region of the roadside sensor is fixed, so that the roadside perception can be added with some static background algorithms (such as background modeling, region segmentation and the like) and a more accurate ground filtering model is established to improve the perception precision.
Roadside awareness the problems currently awaiting resolution are as follows: (1) although the roadside sensor is higher than the vehicle-mounted sensor in installation height, the perception direction of the roadside sensor at present is mainly the same as that of the vehicle-mounted sensor in the same direction, so that more target shelters still exist. (2) Targets captured by the road side sensor have characteristics of richer appearance, more abundant scale change and the like, and sample data of vehicle-mounted sensing cannot meet diversified requirements of data required by road side sensing, so that road side data need to be collected and marked again. (3) The deployment point positions of the roadside sensors are different, so that multi-sensor data fusion in different directions can be realized, the complexity of data to be fused is higher, and the algorithm requirement is higher. (4) The roadside perception scene is large, the number and the types of the targets are more, and the appearance and the scale change of the targets are richer, so that the design and the training of a target detection neural network are more challenged.
Disclosure of Invention
The present invention is directed to solving, at least in part, one of the technical problems in the related art.
Therefore, the first purpose of the invention is to provide a billion-pixel dynamic large-scene image acquisition and multi-target detection method, which is used for acquiring richer road information in real time and rapidly.
The second purpose of the invention is to provide a billion pixel dynamic large scene image acquisition and multi-target detection device.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a computer-readable storage medium.
In order to achieve the above object, an embodiment of the first aspect of the present invention provides a method for acquiring and multi-target detecting an image of a gigapixel dynamic large scene, including: acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections; generating a large scene depth image of the whole road section according to the images of the plurality of road sections and the depth information of the plurality of road sections; the method comprises the steps of adopting a segmentation algorithm to segment a large scene depth image to extract at least one candidate frame, inputting an image in the at least one candidate frame into a convolutional neural network to extract features, and obtaining an image feature map in each candidate frame; inputting the image characteristic diagrams in the candidate frames into a full-connection neural network for primary classification of targets, and obtaining final classification results of the targets according to the primary classification results of the targets, the physical sizes of the targets in the candidate frames and class label sets marking the physical size ranges of the targets; and generating a target detection result according to the final classification result of the target and the information of the road upper and lower road maps.
The billion-pixel dynamic large-scene image acquisition and multi-target detection method provided by the embodiment of the invention has the advantages that the construction scheme of the roadside billion-pixel dynamic large-scene perception sensor and the multi-path road information acquisition and splicing method well solve the problem of shielding among targets on the road and expand the roadside perception range; the provided ground filtering algorithm based on the off-line road map information and the large scene depth map improves the precision and speed of the segmentation of the target detection candidate frame; the provided multi-target detection method for the road side perception billion pixel dynamic large scene realizes the rapid and accurate detection of the road side perception large scene multi-target; the invention can provide abundant road information beyond the sight range for the unmanned vehicle.
In addition, the method for acquiring images of billion-pixel dynamic large scenes and detecting multiple targets according to the embodiment of the invention can also have the following additional technical characteristics:
further, in one embodiment of the present invention, generating a large scene depth image of an entire road segment from images of a plurality of road segments and depth information of the plurality of road segments includes:
performing pixel-level registration on the image and the depth information acquired from the same road section;
Performing pixel-level splicing on the images acquired from all road sections according to a preset splicing rule to generate a large scene image of the road;
performing pixel-level splicing on the depth information acquired from each road section according to a preset splicing rule of the large scene image to generate a large scene depth map of the road;
and generating a large scene depth image of the road according to the large scene image and the large scene depth image.
Further, in an embodiment of the present invention, before the segmenting the large scene depth image by using the segmentation algorithm to extract at least one candidate box, the method further includes:
and removing the ground in the large scene image in the large scene depth image according to a ground filtering algorithm based on the off-line road map information and the large scene depth image to obtain the large scene image to be segmented.
Further, in an embodiment of the present invention, a segmentation algorithm is used to segment the large scene depth image to extract at least one candidate box, including:
and (4) segmenting the large scene image to be segmented by adopting a segmentation algorithm, and extracting a candidate frame.
Further, in an embodiment of the present invention, the segmenting the large scene image to be segmented by using a segmentation algorithm to extract candidate frames includes:
In the direction parallel to the road, two boundaries in the large scene image to be segmented are regions with the ground-removed boundaries, and if the length of the regions is smaller than or equal to the length of an automobile, the regions are directly used as a candidate frame; if the length of the region is larger than that of the automobile, dividing the candidate frames according to the length of the automobile;
in the direction vertical to the road, two boundaries in the large scene image to be segmented are regions with the boundaries of the ground removed, and if the height of the regions is smaller than or equal to the height of the automobile, the regions are directly used as a candidate frame; and if the height of the area is greater than that of the automobile, dividing the candidate frame according to the automobile height.
Further, in an embodiment of the present invention, before obtaining a final classification result of the target according to the preliminary classification result of the target, the physical sizes of the targets in the candidate box, and the class label set labeling the physical size range of each target, the method further includes:
and obtaining the average depth value of the center position of each candidate frame according to the large scene depth map, and obtaining the physical size of the target in the candidate frame by combining the area size of each candidate frame under the pixel coordinate system.
Further, in an embodiment of the present invention, generating an object detection result according to the final object classification result and the on-road and off-road map information includes:
Obtaining the position of a target on a road and the detection results of the upper and lower lanes according to the coordinates of the candidate frame in the large scene image and the information of the upper and lower lanes of the off-line road;
and outputting the detected category and position of the target and the up-down lane result.
To achieve the above object, a second aspect of the present invention provides an apparatus for acquiring images of gigapixel dynamic large scenes and detecting multiple targets, comprising: the acquisition module is used for acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections; the generating module is used for generating a large scene depth image of the whole road section according to the images of the road sections and the depth information of the road sections; the segmentation module is used for segmenting the large scene depth image by adopting a segmentation algorithm to extract at least one candidate frame, and inputting the image in the at least one candidate frame into a convolutional neural network for feature extraction to obtain an image feature map in each candidate frame; the classification module is used for inputting the image characteristic diagrams in the candidate frames into the fully-connected neural network for primary classification of targets, and obtaining a final classification result of the targets according to the primary classification result of the targets, the physical sizes of the targets in the candidate frames and the class label set labeling the physical size range of each target; and the output module is used for generating a target detection result according to the final target classification result and the road upper and lower road map information.
In order to achieve the above object, a third aspect of the present invention provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for capturing images of gigapixel dynamic large scenes and detecting multiple targets as described above when executing the computer program.
To achieve the above object, a fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the gigapixel dynamic large scene image acquisition and multi-target detection method as described above.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a gigapixel dynamic large scene image acquisition and multi-target detection method according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of an apparatus for acquiring and detecting multiple objects in a gigapixel dynamic large scene according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a roadside billion pixel dynamic large scene image acquisition and multi-target detection system provided by an embodiment of the invention.
Fig. 4 is a schematic diagram of a road detection result fed back to each unmanned vehicle by the general control center according to the embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating a principle of a roadside billion pixel dynamic large scene multi-target detection method according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a gigapixel dynamic large scene image acquisition and multi-target detection method and apparatus of an embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a gigapixel-based dynamic large scene image acquisition and multi-target detection method according to an embodiment of the present invention.
As shown in fig. 1, the gigapixel dynamic large scene image acquisition and multi-target detection method comprises the following steps:
S1: acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections;
s2: generating a large scene depth image of the whole road section according to the images of the plurality of road sections and the depth information of the plurality of road sections;
s3: the method comprises the steps of adopting a segmentation algorithm to segment a large scene depth image to extract at least one candidate frame, inputting an image in the at least one candidate frame into a convolutional neural network to extract features, and obtaining an image feature map in each candidate frame;
s4: inputting the image characteristic diagrams in the candidate frames into a full-connection neural network for primary classification of targets, and obtaining final classification results of the targets according to the primary classification results of the targets, the physical sizes of the targets in the candidate frames and class label sets marking the physical size ranges of the targets;
s5: and generating a target detection result according to the final classification result of the target and the information of the road upper and lower road maps.
As shown in fig. 3, in order to solve the problem of shielding between targets such as road vehicles and pedestrians, and to obtain more abundant road information in real time and quickly, the present invention proposes that a camera and a sensor such as a laser radar are erected at a certain height (i.e. a bird's eye view type) in multiple sections on one side of each whole road for road image shooting and laser ranging, and the shooting and ranging directions of the camera and the laser radar are perpendicular to the road direction; then sending the images shot by each road section and the measured distance to a master control center (shown by an orange line); then, the master control center carries out pixel level registration on the RGB images shot by the single road sections and the measured depth maps (the depth maps are generated by the distances measured by the laser radar and are similar to gray level images, and each pixel value is the actual distance measured by the laser radar to a target), then the images and the depth maps of the road sections are respectively spliced according to the same rule, and the depth images are generated (the depth images are generated by common RGB three-channel color images and the depth maps, and the RGB images and the depth maps are registered and correspond to one another in pixels); then inputting the depth image into a billion-pixel dynamic large-scene multi-target detection algorithm for target detection, wherein the target detection based on the depth image can avoid the defects of pure computer vision target detection, such as lack of textures, insufficient illumination, overexposure, high software calculation complexity, rapid movement and the like; and finally, feeding back the detected road information (including the category and the position of the road object, the information of the upper and lower roads, the traffic light state, the traffic sign and the like) to the unmanned vehicle (namely the unmanned vehicle in the figure).
Further, in one embodiment of the present invention, generating a large scene depth image of an entire road segment from images of a plurality of road segments and depth information of the plurality of road segments includes:
performing pixel-level registration on the image and the depth information acquired from the same road section;
performing pixel-level splicing on the images acquired from all road sections according to a preset splicing rule to generate a large scene image of the road;
performing pixel level splicing on the depth information acquired by each road section according to a preset splicing rule of the large scene image to generate a large scene depth map of the road;
and generating a large scene depth image of the road according to the large scene image and the large scene depth image.
Further, in an embodiment of the present invention, before the segmenting the large scene depth image by using a segmentation algorithm to extract at least one candidate frame, the method further includes:
and removing the ground in the large scene image in the large scene depth image according to a ground filtering algorithm based on the off-line road map information and the large scene depth image to obtain the large scene image to be segmented.
Further, in an embodiment of the present invention, a segmentation algorithm is used to segment the large scene depth image to extract at least one candidate box, including:
And (3) segmenting the large scene image to be segmented by adopting a segmentation algorithm, and extracting a candidate frame.
Further, in an embodiment of the present invention, the segmenting the large scene image to be segmented by using a segmentation algorithm to extract candidate frames includes:
in the direction parallel to the road, two boundaries in the large scene image to be segmented are regions with the ground-removed boundaries, and if the length of the regions is smaller than or equal to the length of an automobile, the regions are directly used as a candidate frame; if the length of the region is larger than that of the automobile, dividing the candidate frames according to the length of the automobile;
in the direction vertical to the road, two boundaries in the large scene image to be segmented are regions with the boundaries of the ground removed, and if the height of the regions is smaller than or equal to the height of the automobile, the regions are directly used as a candidate frame; and if the height of the area is greater than that of the automobile, dividing the candidate frame according to the automobile height.
Further, in an embodiment of the present invention, before obtaining a final classification result of the target according to the preliminary classification result of the target, the physical sizes of the targets in the candidate box, and the class label set labeling the physical size range of each target, the method further includes:
And obtaining the average depth value of the center position of each candidate frame according to the large scene depth map, and obtaining the physical size of the target in the candidate frame by combining the area size of each candidate frame under the pixel coordinate system.
Further, in an embodiment of the present invention, generating an object detection result according to the final object classification result and the on-road and off-road map information includes:
obtaining the position of the target on the road and the detection result of the up-down road according to the coordinates of the candidate frame in the large scene image and the information of the up-down road map on the off-line road;
and outputting the detected category and position of the target and the up-down lane result.
Fig. 4 is a schematic diagram of road detection results fed back to each unmanned vehicle by the master control center. As shown, the different kinds of objects are represented by dots of different colors, and the position of the dots is the center of the road position where each object is located. For motor vehicles, plus "+" and minus "-" indicate whether the target is located on an upper road or a lower road, wherein the driving direction target intersection is indicated by plus "+" and is located on the upper road; the departure target intersection is indicated by a minus sign "-" and is on the down road. For pedestrians and bicycles, plus "+" and minus "-" indicate whether the object is crossing the road or outside the road, respectively. Stationary objects, such as large trees, are not labeled.
FIG. 5 is a schematic diagram of a roadside billion pixel dynamic large scene multi-target detection method provided by the invention. As shown in the figure, the roadside billion pixel dynamic large scene multi-target detection is mainly divided into two stages, wherein the first stage is a roadside large scene depth image generation stage: firstly, carrying out pixel-level registration (namely one-to-one correspondence between pixels) on images shot at the same time and the same road section and the measured distance; then, performing pixel-level splicing on images shot by all road sections at the same time to generate a large scene image of the whole road section; then, performing pixel-level splicing on the distances measured at each road section at the same time according to the splicing rule of the large scene image to generate a large scene depth map of the whole road section; and then generating a large scene depth image by using the large scene image and the large scene depth map of the whole road section. The second stage is a large scene depth image target detection stage: firstly, removing the ground in a large scene image in the generated large scene depth image according to a ground filtering algorithm based on off-line road map information and the large scene depth image; then, a segmentation algorithm is adopted to segment the large scene image without the ground, and candidate frames are extracted, wherein the generation method of the candidate frames is as follows: in the length direction of the candidate frame (namely the direction in the same direction as the road), firstly segmenting out areas of which two boundaries in the large scene image without the ground are the boundaries without the ground, if the lengths of some areas are smaller than or equal to the length of an automobile, directly taking the areas as the candidate frame, and if the lengths of some areas are larger than the length of the automobile, dividing the candidate frame according to the length of the automobile (the dividing standard that one more pedestrian is added at an intersection); in the height direction of the candidate frame (i.e. the direction perpendicular to the road), firstly dividing regions of which both boundaries are the boundaries with the ground removed in the large scene image with the ground removed, if the heights of some regions are smaller than or equal to the height of an automobile (the automobile height refers to the height of the automobile shot under the road side sensor erection scheme provided by the invention, but not the height of the automobile under the side view angle of the automobile), directly taking the regions as a candidate frame, and if the heights of some regions are larger than the height of the automobile, dividing the candidate frame according to the automobile height (the dividing standard that the pedestrian height is added at the intersection). Then inputting the generated images in the candidate frames into a convolutional neural network for feature extraction to obtain feature maps of the images in the candidate frames; inputting the feature maps of the images in the candidate frames into a fully-connected neural network for primary target classification; then, obtaining an average depth value of the center position of each candidate frame according to the large scene depth map, and obtaining the physical size of the target in the candidate frame by combining the area size of each candidate frame under the pixel coordinate system; then obtaining a final target classification result according to the target preliminary classification result, the physical size of the target in the candidate frame and the class label set marking the range of the physical size of each target, and performing frame regression; then, according to the coordinates of the candidate frame in the large scene image and the off-line road on-off-line road map information, the position of the target on the road and the detection result of the on-off road are obtained; and finally outputting the detected type and position of the target and the up-down lane result.
The gigapixel dynamic large scene image acquisition and multi-target detection method provided by the embodiment of the invention has the advantages that the construction scheme of the roadside gigapixel dynamic large scene perception sensor and the multi-path road information acquisition and splicing method can well solve the problem of shielding among targets on the road and expand the roadside perception range; the provided ground filtering algorithm based on the off-line road map information and the large scene depth map improves the precision and speed of target detection candidate frame segmentation; the provided multi-target detection method for the road side perception billion pixel dynamic large scene realizes the rapid and accurate detection of the road side perception large scene multi-target; the invention can provide abundant road information beyond the sight range for the unmanned vehicle.
In order to realize the embodiment, the invention also provides a device for acquiring the billion pixel dynamic large scene image and detecting multiple targets.
Fig. 2 is a schematic structural diagram of a gigapixel dynamic large scene image acquisition and multi-target detection device according to an embodiment of the present invention.
As shown in fig. 2, the gigapixel dynamic large scene image acquisition and multi-object detection apparatus includes: the system comprises an acquisition module 10, a generation module 20, a segmentation module 30, a classification module 40 and an output module 50, wherein the acquisition module is used for acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections; the generating module is used for generating a large scene depth image of the whole road section according to the images of the road sections and the depth information of the road sections; the segmentation module is used for segmenting the large scene depth image by adopting a segmentation algorithm to extract at least one candidate frame, and inputting the image in the at least one candidate frame into a convolutional neural network for feature extraction to obtain an image feature map in each candidate frame; the classification module is used for inputting the image characteristic diagrams in the candidate frames into the fully-connected neural network for primary classification of targets, and obtaining a final classification result of the targets according to the primary classification result of the targets, the physical sizes of the targets in the candidate frames and the class label set labeling the physical size range of each target; and the output module is used for generating a target detection result according to the final target classification result and the information of the road upper and lower roads.
To achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for capturing images of gigapixel dynamic large scenes and detecting multiple targets when executing the computer program.
To achieve the above object, a fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the gigapixel dynamic large scene image acquisition and multi-target detection method as described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A billion-pixel dynamic large scene image acquisition and multi-target detection method is characterized by comprising the following steps:
acquiring images of a plurality of road sections set in a road to be detected and depth information of the plurality of road sections;
generating a large scene depth image of the whole road section according to the images of the road sections and the depth information of the road sections;
adopting a segmentation algorithm to segment the large scene depth image so as to extract at least one candidate frame, and inputting the image in the at least one candidate frame into a convolutional neural network for feature extraction to obtain an image feature map in each candidate frame;
Inputting the image characteristic diagrams in the candidate frames into a full-connection neural network for primary classification of targets, and obtaining final classification results of the targets according to the primary classification results of the targets, the physical sizes of the targets in the candidate frames and class label sets marking the physical size ranges of the targets;
and generating a target detection result according to the final target classification result and the road upper and lower road map information.
2. The method of claim 1, wherein generating the large scene depth image of the entire road segment from the image of the plurality of road segments and the depth information of the plurality of road segments comprises:
performing pixel-level registration on the image and the depth information acquired from the same road section;
performing pixel level splicing on the images acquired from all road sections according to a preset splicing rule to generate a large scene image of the road;
performing pixel-level splicing on the depth information acquired from each road section according to the preset splicing rule of the large scene image to generate a large scene depth map of the road;
and generating a large scene depth image of the road according to the large scene image and the large scene depth image.
3. The method of claim 1, further comprising, before said employing a segmentation algorithm to segment the large scene depth image to extract at least one candidate box:
And removing the ground image in the large scene image from the large scene image in the large scene depth image according to a ground filtering algorithm based on the off-line road map information and the large scene depth image to obtain the large scene image to be segmented.
4. The method of claim 3, wherein the segmenting the large scene depth image using a segmentation algorithm to extract at least one candidate box comprises:
and segmenting the large scene image to be segmented by adopting a segmentation algorithm, and extracting a candidate frame.
5. The method according to claim 4, wherein the segmenting the image of the large scene to be segmented by adopting the segmentation algorithm to extract the candidate box comprises:
in the direction parallel to the road, for an area in which two boundaries in the large scene image to be segmented are boundaries without the ground, if the length of the area is smaller than or equal to the length of an automobile, directly taking the area as a candidate frame; if the length of the area is larger than that of the automobile, dividing the candidate frame according to the length of the automobile;
in the direction vertical to the road, for an area in which two boundaries in the large scene image to be segmented are boundaries with the ground removed, if the height of the area is smaller than or equal to the height of an automobile, directly taking the area as a candidate frame; and if the height of the area is greater than the height of the automobile, dividing the candidate frame according to the height of the automobile.
6. The method according to any one of claims 1 to 5, wherein before obtaining the final classification result according to the preliminary classification result of the targets, the physical sizes of the targets in the candidate frame, and the class label set labeling the physical size range of each target, the method further comprises:
and obtaining the average depth value of the center position of each candidate frame according to the large scene depth map, and obtaining the physical size of the target in each candidate frame according to the area of each candidate frame under the pixel coordinate system.
7. The method of claim 6, wherein generating the target detection result according to the target final classification result and the on-road and off-road map information comprises:
obtaining the position of a target on a road and the detection results of the upper and lower lanes according to the coordinates of the candidate frame in the large scene image and the information of the upper and lower lanes of the off-line road;
and outputting the detected category and position of the target and the up-down lane result.
8. A gigapixel dynamic large scene image acquisition and multi-target detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring images of a plurality of road sections set in a road to be detected and depth information of the road sections;
The generating module is used for generating a large scene depth image of the whole road section according to the images of the road sections and the depth information of the road sections;
the segmentation module is used for segmenting the large scene depth image by adopting a segmentation algorithm to extract at least one candidate frame, and inputting the image in the at least one candidate frame into a convolutional neural network for feature extraction to obtain an image feature map in each candidate frame;
the classification module is used for inputting the image characteristic diagrams in the candidate frames into a full-connection neural network for primary classification of targets, and obtaining a final classification result of the targets according to a primary classification result of the targets, the physical sizes of the targets in the candidate frames and a category label set labeling the physical size range of each target;
and the output module is used for generating a target detection result according to the final target classification result and the road upper and lower road map information.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202210234371.0A 2022-03-10 Billion-pixel dynamic large scene image acquisition and multi-target detection method and device Active CN114677658B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210234371.0A CN114677658B (en) 2022-03-10 Billion-pixel dynamic large scene image acquisition and multi-target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210234371.0A CN114677658B (en) 2022-03-10 Billion-pixel dynamic large scene image acquisition and multi-target detection method and device

Publications (2)

Publication Number Publication Date
CN114677658A true CN114677658A (en) 2022-06-28
CN114677658B CN114677658B (en) 2024-07-26

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661556A (en) * 2022-10-20 2023-01-31 南京领行科技股份有限公司 Image processing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN111539983A (en) * 2020-04-15 2020-08-14 上海交通大学 Moving object segmentation method and system based on depth image
CN112017138A (en) * 2020-09-02 2020-12-01 衢州光明电力投资集团有限公司赋腾科技分公司 Image splicing method based on scene three-dimensional structure
WO2021249351A1 (en) * 2020-06-10 2021-12-16 苏宁易购集团股份有限公司 Target detection method, apparatus and computer device based on rgbd image
WO2021254205A1 (en) * 2020-06-17 2021-12-23 苏宁易购集团股份有限公司 Target detection method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709568A (en) * 2016-12-16 2017-05-24 北京工业大学 RGB-D image object detection and semantic segmentation method based on deep convolution network
CN111539983A (en) * 2020-04-15 2020-08-14 上海交通大学 Moving object segmentation method and system based on depth image
WO2021249351A1 (en) * 2020-06-10 2021-12-16 苏宁易购集团股份有限公司 Target detection method, apparatus and computer device based on rgbd image
WO2021254205A1 (en) * 2020-06-17 2021-12-23 苏宁易购集团股份有限公司 Target detection method and apparatus
CN112017138A (en) * 2020-09-02 2020-12-01 衢州光明电力投资集团有限公司赋腾科技分公司 Image splicing method based on scene three-dimensional structure

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661556A (en) * 2022-10-20 2023-01-31 南京领行科技股份有限公司 Image processing method and device, electronic equipment and storage medium
CN115661556B (en) * 2022-10-20 2024-04-12 南京领行科技股份有限公司 Image processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108960183B (en) Curve target identification system and method based on multi-sensor fusion
CN108647638B (en) Vehicle position detection method and device
CN110531376B (en) Obstacle detection and tracking method for port unmanned vehicle
EP2549457B1 (en) Vehicle-mounting vehicle-surroundings recognition apparatus and vehicle-mounting vehicle-surroundings recognition system
Hirabayashi et al. Traffic light recognition using high-definition map features
JP4624594B2 (en) Object recognition method and object recognition apparatus
CN106170828B (en) External recognition device
Nedevschi et al. A sensor for urban driving assistance systems based on dense stereovision
CN109583415A (en) A kind of traffic lights detection and recognition methods merged based on laser radar with video camera
US11620478B2 (en) Semantic occupancy grid management in ADAS/autonomous driving
US11676403B2 (en) Combining visible light camera and thermal camera information
CN111323027A (en) Method and device for manufacturing high-precision map based on fusion of laser radar and panoramic camera
CN110780287A (en) Distance measurement method and distance measurement system based on monocular camera
CN114120283A (en) Method for distinguishing unknown obstacles in road scene three-dimensional semantic segmentation
CN114639085A (en) Traffic signal lamp identification method and device, computer equipment and storage medium
Joy et al. Real time road lane detection using computer vision techniques in python
CN114155720B (en) Vehicle detection and track prediction method for roadside laser radar
CN112507887B (en) Intersection sign extracting and associating method and device
CN113988197A (en) Multi-camera and multi-laser radar based combined calibration and target fusion detection method
DE102017221839A1 (en) Method for determining the position of a vehicle, control unit and vehicle
CN110727269A (en) Vehicle control method and related product
CN113611008B (en) Vehicle driving scene acquisition method, device, equipment and medium
KR102368262B1 (en) Method for estimating traffic light arrangement information using multiple observation information
CN114677658B (en) Billion-pixel dynamic large scene image acquisition and multi-target detection method and device
CN114677658A (en) Billion-pixel dynamic large-scene image acquisition and multi-target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant