CN112446231A

CN112446231A - Pedestrian crossing detection method and device, computer equipment and storage medium

Info

Publication number: CN112446231A
Application number: CN201910797712.3A
Authority: CN
Inventors: 秦超; 章恒
Original assignee: Fengtu Technology Shenzhen Co Ltd
Current assignee: Fengtu Technology Shenzhen Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2021-03-05

Abstract

The embodiment of the invention discloses a pedestrian crossing detection method, a pedestrian crossing detection device, computer equipment and a storage medium, and relates to the technical field of data processing. Acquiring target satellite image data; segmenting the target satellite image data to obtain a target image; performing multi-scale processing on the target image to extract a plurality of scale feature maps of the target image; and performing pedestrian crossing detection on the target image according to the scale feature maps, and marking the detected pedestrian crossing. According to the embodiment of the invention, the pedestrian crossing in the satellite image data is detected by extracting the multi-scale feature maps of the target image and utilizing the multi-scale features of the target image, so that the accuracy of pedestrian crossing detection can be improved.

Description

Pedestrian crossing detection method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of data processing, in particular to a pedestrian crossing detection method and device, computer equipment and a storage medium.

Background

The pedestrian crossing data in the satellite images is important road data, and can provide important information for navigation, drawing and the like, so that how to accurately extract the pedestrian crossing data in the satellite images becomes more and more important. In the past, the position of the pedestrian crossing in the satellite image is extracted manually, the method consumes a lot of manpower, and has low efficiency and low updating frequency, and the extraction quality also depends on the level of manual operation. With the advance of the technology, the pedestrian crossing data in the satellite map can be extracted by an algorithm in the prior art. However, for the pedestrian crossing in the satellite image, because the variation range of the size and the aspect ratio is large, the accuracy rate of identifying the pedestrian crossing by using the existing algorithm is very low, and the requirement of practical application cannot be met.

Disclosure of Invention

The embodiment of the invention provides a pedestrian crossing detection method, a pedestrian crossing detection device, computer equipment and a storage medium, which can improve the accuracy of pedestrian crossing identification.

The embodiment of the invention provides a pedestrian crossing detection method, which comprises the following steps:

acquiring target satellite image data;

segmenting the target satellite image data to obtain a target image;

performing multi-scale processing on the target image to extract a plurality of scale feature maps of the target image;

and performing pedestrian crossing detection on the target image according to the scale feature maps, and marking the detected pedestrian crossing.

The embodiment of the invention also provides a pedestrian crossing detection device, which comprises:

the acquisition unit is used for acquiring target satellite image data;

the preprocessing unit is used for segmenting the target satellite image data to obtain a target image;

the characteristic extraction unit is used for carrying out multi-scale processing on the target image so as to extract a plurality of scale characteristic maps of the target image;

and the detection unit is used for detecting the pedestrian crosswalk of the target image according to the multiple scale feature maps and marking the detected pedestrian crosswalk.

An embodiment of the present invention further provides a computer device, where the computer device includes: one or more processors; a memory; and one or more applications, wherein the processor is coupled to the memory, the one or more applications being stored in the memory and configured to be executed by the processor to perform the pedestrian crossing detection method described above.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is loaded by a processor to execute the pedestrian crossing detection method.

The embodiment of the invention carries out segmentation processing on the acquired satellite image data to obtain a target image; performing multi-scale processing on the target image to extract a plurality of scale feature maps of the target image; and carrying out pedestrian crossing detection on the target image according to the multiple scale feature maps, and marking the detected pedestrian crossing. According to the method and the device for extracting the feature maps of the multiple scales of the target image, the feature map of each scale is the expression of different features of the target image, wherein the feature map with the larger pixel size is used for extracting the features of the smaller crosswalk in the target image, and the feature map with the smaller pixel size is used for extracting the features of the larger crosswalk in the target image. Therefore, the pedestrian crossing in the satellite image data is detected by using the multi-scale characteristic diagram of the target image, and the accuracy of pedestrian crossing detection can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1a is a schematic flow chart of a pedestrian crossing detection method according to an embodiment of the present invention;

FIG. 1b is a schematic diagram of a plurality of scale feature maps of a target image provided by an embodiment of the invention;

FIG. 1c is a schematic view of a sub-flow of a pedestrian crossing detection method according to an embodiment of the present invention;

FIG. 1d is a schematic illustration of different aspect ratios of an arrangement provided by an embodiment of the present invention;

FIG. 1e is a schematic diagram of an area frame rotated by different angles according to an embodiment of the present invention;

FIG. 1f is a schematic diagram illustrating a representation of a candidate spin frame according to an embodiment of the present invention;

FIG. 1g is a schematic view of a sub-process of a pedestrian crossing detection method according to an embodiment of the present invention;

FIG. 1h is a schematic illustration of three different spin-up sequences and results provided by an embodiment of the present invention;

FIG. 1i is a schematic view of a sub-process of a pedestrian crossing detection method according to an embodiment of the present invention;

FIG. 2a is another schematic flow chart of a pedestrian crossing detection method according to an embodiment of the present invention;

FIG. 2b is a diagram of a convolution network of a single point multi-frame detector model provided by an embodiment of the present invention;

FIG. 2c is a schematic view of another sub-flow of the pedestrian crossing detection method according to the embodiment of the present invention;

FIG. 2d is a schematic view of another sub-flow chart of the pedestrian crossing detection method according to the embodiment of the present invention;

FIG. 2e is a schematic view of another sub-flow of the pedestrian crossing detection method according to the embodiment of the present invention;

FIG. 2f is a schematic diagram of a pedestrian crossing detection result provided by the embodiment of the invention;

FIG. 3 is a schematic block diagram of a crosswalk detection apparatus provided by an embodiment of the present invention;

FIG. 4 is another schematic block diagram of a crosswalk detection apparatus provided by an embodiment of the present invention;

FIG. 5 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present disclosure, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

The embodiment of the invention provides a pedestrian crossing detection method, a pedestrian crossing detection device, computer equipment and a storage medium. The pedestrian crossing detection method operates in terminal equipment, and the terminal equipment can be a server or a terminal, such as mobile phones, pads, desktop computers and other equipment. The following are detailed below.

Fig. 1a is a schematic flow chart of a pedestrian crossing detection method according to an embodiment of the present invention. As shown in fig. 1, the method includes the following specific processes:

101, acquiring target satellite image data.

The target satellite image data may be acquired from other devices, for example, an acquisition instruction is sent to the other devices, and the other devices send the satellite image data after receiving the acquisition instruction; or may be obtained locally, such as storing satellite image data locally; the target satellite image data may also be downloaded locally through third-party software, and then acquired, for example, downloaded through a satellite image Service, such as a Web Map Service (WMS). But may also be obtained in other ways.

And 102, performing segmentation processing on the target satellite image data to obtain a target image.

And segmenting the target satellite image data according to the GPS coordinates, for example, segmenting a large piece of satellite image data into a plurality of images according to the GPS coordinates, and taking the segmented image as a target image. Wherein, the target image comprises a crosswalk. The specific segmentation mode is various, for example, segmentation is performed according to the GPS coordinate by using segmentation software, and the specific segmentation mode is not limited.

103, performing multi-scale processing on the target image to extract a plurality of scale feature maps of the target image.

If the target image has a plurality of images, performing multi-scale processing on each target image to extract a plurality of scale feature maps of the target image. The multi-scale processing may be understood as performing convolution processing on the target image by using different convolution kernels respectively to obtain a plurality of different feature maps subjected to convolution processing, and using the plurality of different feature maps obtained by the convolution processing as the plurality of scale feature maps, which may also be referred to as a multi-scale feature map or a multi-scale convolution feature map. The multiple scale feature maps are represented as multiple different scale feature maps. For example, the target image may be convolved with convolution kernels of 3 × 3, 8 × 8, and 15 × 15, respectively, to obtain feature maps of three different scales.

Wherein the feature map of each scale is a representation of a different feature of the target image. For example, if the size of the target image is 300 × 300 (pixel size), 6 feature maps of different scales, such as 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, 1 × 1, can be obtained, where a feature map with a larger pixel size corresponds to a smaller field of view of the target image through upsampling and is used for extracting smaller features in the target image, such as extracting a smaller crosswalk in the target image, that is, a feature map with a larger pixel size is an expression of the smaller features of the target image; the feature map with the smaller pixel size is used for extracting larger features in the target image, such as extracting larger pedestrian crossings in the target image, by upsampling the larger field of view corresponding to the target image, i.e. the feature map with the smaller pixel size is a representation of the larger features of the target image.

And 104, performing pedestrian crossing detection on the target image according to the multiple scale feature maps, and marking the detected pedestrian crossing.

Since the feature maps of different scales are expressions of different features of the target image, the extracted feature maps of different scales are respectively subjected to pedestrian crossing detection, and the detected pedestrian crossing is marked.

As shown in fig. 1b, the target image has 4 feature maps with different scales, which are respectively a feature map 1, a feature map 2, a feature map 3 and a feature map 4. It should be noted that the feature map in fig. 1b is for illustration purposes only, and does not constitute a limitation on the number of feature maps of different scales. And performing pedestrian crossing detection on each feature map, and marking the detected pedestrian crossing. Because the different scale characteristic diagrams are expressions of different characteristics of the target image, the pedestrian crossing detection is carried out on each characteristic diagram, and different pedestrian crossings in the target image can be detected. Compared with the method for detecting the pedestrian crossing only by one characteristic diagram, the method greatly improves the accuracy of pedestrian crossing detection.

In one embodiment, as shown in FIG. 1c, step 104 includes 1041-1043:

1041, determining a plurality of rotation frame candidates in each scale feature map, wherein the rotation frame candidates are obtained by rotating the preset region frames with different aspect ratios by different angles.

The area frame refers to a frame used for detecting a target in an image, that is, a frame used for detecting a pedestrian crossing, and the area frame may include a rectangle, a square, and the like.

The preset different length-width ratios are determined according to the length-width ratio of the pedestrian crosswalk in the satellite image data. For example, the aspect ratio of the crosswalk in the satellite image data may be relatively large, even exceeding 3:1, and therefore, according to the crosswalk in the satellite image data, the area frames exceeding 3:1, such as 0.707:0.707,1:1,1:2.5,2.5:1,1:4, and 4:1, respectively, may be set, and as shown in fig. 1d, the area frames with different set aspect ratios are used as the area frames with different preset aspect ratios. It should be noted that the aspect ratio set here may also be other values.

In each feature map of different scale, a plurality of region boxes of different aspect ratios, for example, 6 region boxes of different aspect ratios, are set, as shown in fig. 1 d. It is to be understood that the region boxes with different aspect ratios are set in consideration that in each target image, there may be a plurality of different targets, that is, there may be a plurality of crosswalks with different sizes. And rotating the region frames with different length-width ratios by different angles to obtain the rotated region frames. If one area frame is rotated by seven different angles, one area frame obtains the rotated area frames by seven different angles. If there are six area frames with different aspect ratios and the area frames with different aspect ratios are rotated by seven different angles, 6 × 7 — 42 different rotated area frames are obtained. It is understood that the rotation of the area frame by different angles is performed because the crosswalk in the satellite image data is inclined in many cases, considering the influence of the satellite shooting angle, the crosswalk angle, and the like. If the crosswalk is detected only by using the region box (the region box is positive), the accuracy of detection is reduced due to the influence of the non-crosswalk characteristic pixels.

Fig. 1e is a schematic diagram of a region frame rotating at different angles. It should be noted that fig. 1e only shows the case of the region frames corresponding to different angles of rotation (where, for convenience of understanding, the plurality of rotated region frames do not overlap). Specifically, the region is framed in orderThe central point of the area frame is the original point, and the area frame is rotated every 15 degrees to obtain a plurality of rotated area frames, and the rotated area frames are overlapped. Wherein the angle rotation range of the area frame is

Thus, the area frame will obtain seven rotated area frames in total. The angles of the seven rotated area frames are respectively-pi/12, 0, pi/12, pi/6, pi/4, pi/3 and 5 pi/12. It will be appreciated that the seven different rotation angles are provided because they cover substantially all of the crosswalk angles in the target image.

In a feature map of a scale, assuming that the pixel size of the feature map of the scale is M × N, a region frames with different aspect ratios are corresponding to the feature map of the scale, and b rotation angles are performed for each region frame with different aspect ratio, then M × N × a × b rotation frame candidates are correspondingly generated in a target image. If the pixel size of the feature map of a certain scale is 38 × 38, corresponding to six region frames with different aspect ratios, and the rotation of seven different angles is performed for each region frame with different aspect ratio, then 38 × 38 × 6 × 7 candidate rotation frames are correspondingly generated in the target image. It can be understood that the target image is divided into 38 × 38 small blocks, and the rotation frame candidates with different aspect ratios are generated respectively based on the central point of each small block. The above-described processing is performed for each scale feature map, and thus a plurality of candidate rotation frames of a plurality of different scale feature maps can be obtained.

1042, calculating a category probability predicted value and a position offset predicted value of the plurality of candidate rotation frames.

And calculating the category probability predicted values and the position offset predicted values of the plurality of candidate rotating frames by using a pre-trained predictor. Such as using a pre-trained softmax predictor.

Specifically, the predictor includes a set of m × n × p convolution kernels, such as a 5 × 5 × p convolution kernel or a 3 × 3 × p convolution kernel, and performs a sliding window process on corresponding images in a plurality of candidate rotation frames by using the convolution kernels to generate predicted values. For a feature map of M × N × p scale, assuming that the number of detected target classes is C, the aspect ratios of the candidate rotation frames are a, and each aspect ratio is rotated by b different angles, thus, the class probabilities (class confidences) of the candidate rotation frames for C different classes are predicted by using a × b × C5 × 5 × p convolution kernels, and the position offset of each candidate rotation frame is predicted by using a × b × k 5 × 5 × p convolution kernels.

Where k represents the number of parameters for each candidate rotated box, e.g., (x, y, w, h, theta) to represent a candidate rotated box, k is 5. Wherein, (x, y, w, h, theta) respectively represents the coordinates of the center point of the candidate rotation frame, the width, the height, and the rotation angle corresponding to the x-axis. For example, if (ax, ay, bx, by, h) is used to represent a candidate rotation frame, then k is also 5. Wherein, (ax, ay, bx, by, h) respectively represents the coordinates of two points clockwise along the candidate rotation frame, the height of the candidate rotation frame, i.e. the height perpendicular to the line connecting the two points (ax, ay), (bx, by), and specifically, please refer to fig. 1 f.

Thus, the feature map produces M × N × a × b × (C + k) output results, and for each candidate rotation frame, there are (C + k) dimensions of prediction information. For pedestrian crossing detection, the categories include two types: crosswalk and background, therefore, C ═ 2. If one of the candidate rotation frames is expressed by the expression (ax, ay, bx, by, h), k is 5. That is, for each candidate rotation frame, there are 7-dimensional prediction information, and the 7-dimensional prediction information is a prediction value of a category probability of the candidate rotation frame for a background, a prediction value of a category probability of the candidate rotation frame for a pedestrian crossing, and a prediction value of a position offset of the candidate rotation frame, respectively.

And respectively carrying out the same processing on the feature maps with different scales so as to calculate the category probability predicted value and the position offset predicted value of the candidate rotating frames.

It should be noted that the above calculation of the category probability predicted values and the position offset predicted values of the multiple candidate rotating frames is only one of the calculation methods, and other calculation methods may be used to calculate the category probability predicted values and the position offset predicted values of the multiple candidate rotating frames.

1043, determining a target frame from the plurality of candidate rotating frames according to the category probability predicted value and the position offset predicted value, and taking the pedestrian crossing in the target frame as the detected pedestrian crossing.

Firstly, determining a candidate rotating frame comprising a pedestrian crossing from a plurality of candidate rotating frames according to a category probability predicted value, then performing redundancy processing on the candidate rotating frame comprising the pedestrian crossing, and finally determining a target frame from the candidate rotating frame after the redundancy processing according to a position offset predicted value.

In one embodiment, as shown in FIG. 1g, step 1043, comprises 1043a-1043 d:

1043a, obtaining a preset probability threshold.

The preset probability threshold may be preset or may be set in real time. The preset probability threshold is a value greater than or equal to 0.5 and less than 1. For example, a preset probability threshold is set to 0.6.

1043b, deleting the candidate rotation frame whose category probability predicted value is smaller than the preset probability threshold from the plurality of candidate rotation frames.

Obtaining category probability predicted values of a plurality of candidate rotating frames; judging whether the category probability predicted value of each candidate rotating frame is smaller than a preset probability threshold value or not; and deleting the candidate rotating frame with the category probability predicted value smaller than the preset probability threshold value, wherein if the category probability predicted value is smaller than the preset probability threshold value, the candidate rotating frame is considered to be the background instead of the pedestrian crossing, and the candidate rotating frame with the background is deleted to improve the detection speed.

1043c, performing redundant processing on the rotation frame candidates after the deletion processing to eliminate redundant rotation frame candidates.

Specifically, redundancy processing may be performed by Non Maximum Suppression (NMS) to obtain a locally optimal candidate spin frame. The redundancy processing can also be performed in other ways to obtain the locally optimal candidate rotation frame.

A step of performing redundant processing by non-maximum suppression, comprising: arranging the candidate rotating frames after the deletion processing according to the category probability predicted value, for example, arranging in a descending order according to the category probability predicted value; sequentially traversing the arranged candidate rotating frames, and performing rotation-to-parallel ratio calculation on the currently traversed candidate rotating frame and the remaining candidate rotating frames (wherein the intersection-to-parallel ratio can be expressed as IOU (intersection over Union), and the rotation-to-parallel ratio can be expressed as rIOU) to obtain an intersection-to-parallel ratio calculation result; deleting the candidate rotating frames which are larger than a first preset intersection ratio threshold value in the intersection ratio calculation result until all the candidate rotating frames after arrangement are processed; and taking the remaining candidate rotating frames as candidate target frames, and obtaining the candidate target frames.

The redundant processing of the candidate rotating frames is carried out through the non-maximum value inhibition, a large number of redundant candidate rotating frames can be deleted, the locally optimal candidate rotating frames are obtained, and the pedestrian crossing detection efficiency and speed and the pedestrian crossing detection accuracy are improved.

For example, assuming there are 4 candidate rotation boxes A, B, C, D, the corresponding class probability predictors are 0.97, 0.94, 0.87, 0.9, respectively. Firstly, performing descending order according to the category probability predicted values to obtain 0.97(A), 0.94(B), 0.9(D) and 0.87 (C); taking out the candidate rotating frame A with the maximum probability, and respectively carrying out rotation comparison and comparison calculation with the remaining candidate rotating frame B, C, D to obtain a comparison and comparison calculation result; if the intersection-to-parallel ratio calculation result of the candidate rotating frames A and B exceeds a first preset intersection-to-parallel ratio threshold, and the intersection-to-parallel ratio calculation result of the candidate rotating frames A and C, A and D does not exceed the first preset intersection-to-parallel ratio threshold, deleting the candidate rotating frame B, and keeping the candidate rotating frame A as a locally optimal candidate rotating frame; d, C is left in the candidate rotating frames after descending order, a candidate rotating frame D with a larger category probability predicted value is taken out, and rotation-to-parallel ratio calculation is carried out on the candidate rotating frame D and the candidate rotating frame C to obtain a cross-to-parallel ratio calculation result (the cross-to-parallel ratio calculation result refers to a specific value); and if the intersection ratio calculation result of the candidate rotating frames D and C exceeds a first preset intersection ratio threshold value, deleting the candidate rotating frame C, and taking the candidate rotating frame D as a local optimal candidate rotating frame. Thus, two candidate target frames a and D are obtained, and the corresponding category probability predicted values are 0.97 and 0.9, respectively. It will be appreciated that there are two different candidate crosswalks detected.

The rotation union ratio refers to a ratio of an intersection (area intersection) of two candidate rotation frames to a union (area union) of the two candidate rotation frames.

As shown in fig. 1h, a schematic of the results of three different spin-ups. It should be noted that fig. 1h is only three exemplary cases, and that the rotation and result also includes schematic diagrams of other cases. The procedure of the sum-of-rotations calculation will be briefly described by taking the third schematic diagram of the sum-of-rotations in fig. 1h as an example. In the third schematic diagram of the rotation union result, the middle parts of the two candidate rotation frames are overlapped in a crossed manner to form an irregular octagon; the four corners of each of the rotation frame candidates do not overlap, and therefore, 8 different corners are formed, and each edge of each rotation frame candidate has two intersections with another rotation frame candidate.

Specifically, as shown in fig. 1h, the intersection points of 2 candidate rotation frames are connected clockwise to form a polygon IJKLMNOP; then calculating the area of the polygon IJKLMNOP, and taking the area as the area intersection Interarea of the two candidate rotating frames; then calculating the area Union Union of the two candidate rotating frames; and finally, calculating the sum-of-rotation ratio of the two candidate rotating frames according to the area intersection Interarea and the area Union Union of the two candidate rotating frames.

Wherein, the area of the polygon IJKLMNOP is calculated by the following method: the polygon was cut into 6 different triangles: IPO, ION, INM, IML, ILK, IKJ; calculating the areas of the 6 different triangles; the calculated areas of the 6 different triangles are added to obtain the area of the polygon IJKLMNOP.

The Union of the areas of the two candidate rotation frames is obtained by adding the areas R1 and R2 of the two candidate rotation frames and subtracting the intersection of the areas of the two candidate rotation frames, specifically, the Union of the areas of the two candidate rotation frames is R1+ R2-inter area.

It is understood that the steps of calculating the sum of rotation ratios in other cases are similar to the above steps, and the sum of rotation ratios in each case can be calculated in a similar manner, and will not be described herein again.

1043d, according to the predicted value of the position offset, determining the target frame from the candidate rotating frames after the redundancy processing.

Specifically, the position and shape of each candidate rotating frame in the candidate rotating frames after the redundancy processing are respectively adjusted according to the corresponding predicted value of the position offset, so as to obtain the target frame. And adjusting each candidate rotating frame by using the position offset predicted value corresponding to the candidate rotating frame. For example, as illustrated above, the 4 candidate rotation frames A, B, C, D having the category probability predictors of 0.97, 0.94, 0.87, and 0.9 are subjected to redundancy elimination by the non-maximum suppression method to obtain two candidate target frames a and D. And adjusting the candidate target frame A by using the position offset predicted value corresponding to the candidate target frame A to obtain a target frame A ', adjusting the candidate target frame D by using the position offset predicted value corresponding to the candidate target frame D to obtain a target frame D', finally obtaining target frames A 'and D', taking the images included in the target frames A 'and D' as the detected crosswalk, and marking the target frames A 'and D'.

According to the method, the multiple scale feature maps of the target image are extracted, and different features of the target image can be represented by the multiple scale feature maps, so that the features of a larger pedestrian crossing in the target image can be extracted by using the multiple scale feature maps, and the features of a smaller pedestrian crossing in the target image can also be extracted; and then, the pedestrian crossing in the satellite image data is detected for each feature map in the multiple scale feature maps, so that the accuracy of pedestrian crossing detection can be improved.

In one embodiment, after step 104, the crosswalk detection method further comprises 105.

And 105, carrying out duplication elimination processing on the marked pedestrian crossing.

It can be understood that the marked crosswalk may have noise and redundancy due to the threshold value, the imbalance of the samples, and the large number of pixels in the satellite image corresponding to the crosswalk, and the marked crosswalk needs to be de-duplicated. And carrying out duplication removal treatment on the marked pedestrian crossing to remove noise and redundancy, such as removing the pedestrian crossing with the detection error, and further improving the accuracy of pedestrian crossing detection. Specifically, step 105 comprises: and performing road network matching and duplicate removal processing on the marked pedestrian crossing. In other embodiments, other deduplication processes may also be performed.

In one embodiment, as shown in FIG. 1i, step 105 comprises the following

steps

1051 and 1055.

1051, the position coordinates of the marked crosswalk are acquired.

The marked crosswalk has a corresponding target frame, and the coordinates (pixel coordinates) of the target frame are taken as the position coordinates (pixel coordinates) of the marked crosswalk.

1052, determining the range domain of the GPS coordinate corresponding to the marked pedestrian crossing according to the position coordinate.

Because each target image is sliced according to the GPS coordinates, each target image corresponds to the GPS coordinates. Assuming that the position coordinates of the marked crosswalk are pixel coordinates based on the target image, for example, the pixel coordinate of the upper left corner of the target image is set to (0,0), and the pixel coordinate of the lower right corner is set to (w, h). Therefore, the position coordinates of the marked crosswalk can be converted into GPS coordinates according to the GPS coordinates corresponding to the target image, and thus, the range of the GPS coordinates corresponding to the marked crosswalk (target frame) can be determined.

1053, a road network in a circular area centered on the range of the GPS coordinates is determined.

The road network information may be obtained according to public Map information, for example, through a public OpenStreet Map. Step 1053, comprising: determining a circular area by taking the center of the range domain of the GPS coordinates as a center, such as a circular area with a radius r; and acquiring the road network in the circular area. It should be noted that, in other embodiments, the road network in a rectangular area with the center of the range domain of the GPS coordinates as the center may be determined, where the rectangular area may be overlapped with the range domain of the GPS coordinates, or may be larger than the range domain of the GPS coordinates.

1054, according to the range domain of the GPS coordinates and the determined road network, determining the crosswalk with wrong mark.

Specifically, step 1054 includes: judging whether the range domain of the GPS coordinates intersects with the determined road network, for example, comparing and judging the coordinates of the determined road network with the coordinates in the range domain of the GPS coordinates to determine whether the range domain of the GPS coordinates intersects with the determined road network; if the range domain of the GPS coordinates is intersected with the determined road network, the marked crosswalk is determined to be marked correctly; and if the range domain of the GPS coordinates does not intersect with the determined road network, determining that the marked crosswalk is marked wrongly. It is to be understood that a marked crosswalk may be determined to be a wrongly marked crosswalk if the range of GPS coordinates does not intersect the determined road network, i.e. the range of GPS coordinates is not located on the determined road network, i.e. the marked crosswalk is not located on the determined road network.

1055, deleting the crosswalk marked with the error.

And if the marked pedestrian crossing is marked wrongly, deleting the pedestrian crossing marked wrongly so as to filter the pedestrian crossing marked wrongly.

The embodiment further performs road network matching and de-duplication processing on the detected and marked crosswalk to remove noise and redundancy, such as deleting the crosswalk with the wrong mark, and further improves the accuracy of crosswalk detection.

In one embodiment, the crosswalk in the target image may be detected by a crosswalk detection model. The pedestrian crossing detection model may be a multi-scale detection model based on a Single Shot multi-box Detector (SSD), or may be another multi-scale detection model. Specifically, the pedestrian crossing detection method comprises the following steps: acquiring target satellite image data; segmenting the target satellite image data to obtain a target image; performing multi-scale processing on the target image through a pedestrian crossing detection model to extract a plurality of scale feature maps of the target image; and performing pedestrian crossing detection on the target image by using a pedestrian crossing detection model according to the scale feature maps, and marking the detected pedestrian crossing. The target image is detected through the pedestrian crossing detection model, and the speed and the accuracy of pedestrian crossing detection are improved.

In an embodiment, after the cross-walk detection is performed on the target image by using a cross-walk detection model according to the multiple scale feature maps and the detected cross-walk is marked, the cross-walk detection method further includes: and carrying out duplication elimination treatment on the marked pedestrian crossing. So as to remove noise and redundancy and further improve the accuracy of pedestrian crossing detection.

In an embodiment, before the performing the multi-scale processing on the target image through the pedestrian crossing detection model to extract the multiple scale feature maps of the target image, the pedestrian crossing detection method further includes: training the initial multi-scale detection model to obtain a pedestrian crossing detection model; or modifying the initial multi-scale detection model and training the modified multi-scale detection model to obtain the pedestrian crossing detection model. The embodiment further relates to training of the initial multi-scale detection model or the modified multi-scale detection model. The initial multi-scale detection model may be a single-point multi-frame detector (SSD) model, for example, the single-point multi-frame detector model is trained to obtain a pedestrian crossing detection model, and the pedestrian crossing detection model is a multi-scale detection model based on the single-point multi-frame detector; if the single-point multi-frame detector model is modified and the modified single-point multi-frame detector model is trained to obtain the pedestrian crossing detection model, the pedestrian crossing detection model is also a multi-scale detection model based on the single-point multi-frame detector.

Fig. 2a is another schematic flow chart of a pedestrian crossing detection method according to an embodiment of the present invention. As shown in fig. 2a, the detailed process of the crosswalk detection method includes the following

steps

201 and 206.

The initial multi-scale detection model is modified 201 to obtain a modified multi-scale detection model.

The multi-scale detection model can extract a plurality of scale feature maps of the image, and the pedestrian crossing is detected according to the plurality of scale feature maps. In this embodiment, the initial multi-scale detection model is a Single Shot multi box Detector (SSD) model. The single point multi-frame detector model uses VGGNet (VGG16) as the base network and makes some modifications to VGGNet, such as converting the fully connected layers FC6 and FC7 of VGGNet into the convolutional feature layer Conv6 and convolutional feature layer Conv7, respectively. On the basis of VGGNet, 4 layers of additional networks are additionally added.

Specifically, as shown in fig. 2b, it is a convolution network diagram of a single-point multi-frame detector model. The convolutional feature layers Conv4-3 and Conv7 in the VGGNet and four layers of additional networks form 6 convolutional feature graphs with different scales, and training of convolutional features is completed together. Wherein, the four layers of additional networks are Conv8-2, Conv9-2, Conv10-2 and pool11 respectively. The sizes of the pixel sizes of the convolution feature maps with the 6 different scales are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in sequence. It will be appreciated that a convolution feature map with a larger pixel size is used to detect smaller sized crosswalks by upsampling a smaller field of view corresponding to the original image, and a convolution feature map with a smaller pixel size is used to detect larger sized crosswalks by upsampling a larger field of view corresponding to the original image. The convolution feature maps of different scales may also be referred to as multiple scale feature maps or multi-scale feature maps or multiple scale convolution feature maps or multi-scale convolution feature maps. Because the different scale characteristic diagrams are expressions of different characteristics of the image, the modified multi-scale detection model (namely the modified single-point multi-frame detector model) is trained by using the characteristics extracted from the multiple scale characteristic diagrams subsequently, so that the trained pedestrian crossing detection model can extract different characteristics of the image, the detection capability of the pedestrian crossing detection model on a target (pedestrian crossing) can be improved, and the detection accuracy can be improved.

In one embodiment, as shown in FIG. 2c, the step 201 includes the following steps 2011-.

2011, a plurality of different aspect ratios of the region box in each scale feature map in the initial multi-scale detection model are modified.

In the single-point multi-frame detector model, the different length-width ratios of the region frames in each scale feature map are respectively 0.707:0.707,1:1,1:2,2:1,1:3 and 3: 1. Considering that the aspect ratio of the crosswalk in the satellite image data may be relatively large and may exceed the maximum aspect ratios 3:1 and 1:3 set in the single-point multi-frame detector model, therefore, the different aspect ratios of the region frame in each scale feature map in the single-point multi-frame detector model are modified, and the different aspect ratios of the modified region frame are: 0.707:0.707,1:1,1:2.5,2.5:1,1:4,4:1. Specifically, a plurality of different aspect ratios of the modified region box are shown in fig. 1 d. Details of the content described in step 1041 are not described herein again.

2012, the region frames with the modified aspect ratios are rotated by different angles to obtain a plurality of rotation frame candidates.

And rotating each area frame with the modified length-width ratio by different angles by taking the central point of the area frame as the center to obtain the rotated area frame. The angle rotation range of the area frame is [ -pi/12, 5 pi/12 ], the rotation is 15 degrees each time, and thus, the number of the rotated area frames corresponding to one area frame is seven, and the angles of the rotated area frames are different. In a feature map of a scale, assuming that the pixel size of the feature map of the scale is M × N, a region frames with different aspect ratios are corresponding to the feature map of the scale, and b rotation angles are performed for each region frame with different aspect ratio, then M × N × a × b rotation frame candidates are correspondingly generated in a target image. The feature map of each scale is processed as described above, so that a plurality of candidate rotation frames of a plurality of feature maps of different scales can be obtained.

Specifically, the step of rotating each region frame with the modified aspect ratio by different angles to obtain a plurality of candidate rotation frames is described with reference to the content shown in fig. 1e and the content described in step 1041, which is not repeated herein.

2013, modifying the representation mode of the candidate rotation frame.

In the embodiment of the present invention, each candidate rotation frame is represented by (ax, ay, bx, by, h), where ax, ay, bx, by respectively represent two points clockwise along the candidate rotation frame, and h represents a height of the candidate rotation frame, that is, a height of a line connecting two points perpendicular to (ax, ay), (bx, by). In particular, as shown in fig. 1 f.

It should be noted that the candidate spin frame in this embodiment does not use (x1, y1, x2, y2, x3, y3, x4, and y4) to represent four clockwise points of the candidate spin frame, because this representation represents a quadrilateral (including an inclined rectangle) with an arbitrary shape and does not directly represent the rotation angle, and on the other hand, too many parameters of this representation affect the pedestrian crossing detection performance. Also, the candidate rotation box in this embodiment is not represented by a horizontal box and a rotation angle, such as (x, y, w, h, theta), where x, y, w, h represent the center coordinate and the width and height of the rectangle, respectively, and theta represents the rotation angle of the rectangle along the x-axis. This is because the representation method is ambiguous, such as-90 and 90 originally representing the same rotation, but the 2 parameters are very different for the computer program. It should be noted that the method for representing a candidate rotation frame in the embodiment of the present invention improves the performance of pedestrian crossing detection and the accuracy of pedestrian crossing detection.

2014, according to the representation mode of the candidate rotating frame, setting a position regression loss function based on the candidate rotating frame.

According to the representation mode (ax, ay, bx, by, h) of the candidate rotating frame. Let the coordinates corresponding to the ith candidate rotation frame be

Area of ith rotation frame candidate

Indicating the width of the ith rotation frame candidate

Indicating that the ith rotation frame candidate is used for higher

It is shown that the following equations (1), (2) and (3) hold:

setting a position regression loss function based on the candidate rotation frame as shown in formula (4):

in the position regression loss function, pos represents a candidate rotating frame set belonging to a positive sample, wherein the candidate rotating frame is matched with a real frame to determine a matched candidate rotating frame as the positive sample, and the real frame refers to a marked regional frame of a pedestrian crossing in a training set; x represents whether the matched candidate rotating frame is a pedestrian crossing or not, and the value is (0,1), if the candidate rotating frame is the pedestrian crossing, the value is 1, and if the candidate rotating frame is not the pedestrian crossing, the value is 0; n represents the number of all the candidate rotation boxes currently input, i.e., the total candidate rotation boxes; c represents the number of categories, which in this embodiment is 2 (i.e., crosswalk and background);

whether the ith candidate rotating frame is matched with the jth real frame in relation to the type C or not is represented, if so, the value is 1, the candidate rotating frame is represented as a pedestrian crossing, and if not, the value is 0, the candidate rotating frame is represented as a background;

an offset value representing the predicted ith candidate rotation box;

representing the offset value of the jth real box g relative to the candidate rotated box. Wherein the content of the first and second substances,

the respective values in (a) are shown in equations (5), (6) and (7).

Wherein the smoothL1(x) function is expressed as shown in equation (8):

2015, performing weighted summation on the position regression loss function and the class confidence loss function based on the candidate rotation frames to obtain a loss function of the multi-scale detection model.

Wherein, the set category confidence loss function is shown in formula (9):

wherein neg represents a set of candidate rotation boxes belonging to a negative example,

the value of (c) is shown in equation (10):

where, c represents the class confidence,

the ith candidate rotating frame is matched with the jth real frame, if the category is p, the value is 1, the candidate rotating frame is represented as a pedestrian crossing, and the value is 0, the candidate rotating frame is represented as a background;

a probability value representing that the ith candidate spin box is a crosswalk,

representing the probability value of the i-th candidate rotation box as the background.

And (3) carrying out weighted summation on the position regression loss function and the class confidence loss function of the candidate rotating frame to obtain a loss function of the single-point multi-frame detector model, wherein the loss function of the single-point multi-frame detector model is shown as a formula (11):

it can be understood that, since the single-point multi-frame detector model is an end-to-end training process, the total loss function is determined by a category confidence loss function and a location regression loss function, wherein each parameter in the formula (11) is described with reference to the above corresponding formula, and is not described herein again.

This embodiment further details how the single-point multi-frame detector model is modified to obtain a modified multi-scale detection model (modified single-point multi-frame detector model). The accuracy of the single-point multi-frame detector model for detecting the pedestrian crossing is improved.

202, training the modified multi-scale detection model to obtain a pedestrian crossing detection model.

The modified multi-scale detection model is the modified single-point multi-frame detector model, and the modified single-point multi-frame detector model is trained by using a data set of a real frame marked with a pedestrian crossing, so that the pedestrian crossing detection model is obtained.

In one embodiment, as shown in FIG. 2d, step 202, comprises the following steps:

2021, an image dataset is acquired, the image dataset comprising marked crosswalks.

The image data set may be obtained from other devices, for example, sending a data set obtaining instruction to the other devices, and the other devices sending the image data set after receiving the data set obtaining instruction; or locally, i.e. locally saving the image data set; or the satellite image data may be obtained by downloading the satellite image data to the local through third-party software and processing the downloaded satellite image data, for example, downloading the satellite image data through a satellite image Service, such as a Web Map Service (WMS). After the satellite image data is downloaded, the specific processing mode comprises the following steps: segmenting the satellite image data to obtain a segmented image, for example, segmenting according to a GPS coordinate, wherein the pixel size of the segmented image is 256 multiplied by 256; and selecting a data set comprising the crosswalk from the segmented image, and marking the crosswalk in the data set to obtain an image data set. Specifically, selecting a data set including a crosswalk from the sliced image, and marking the crosswalk in the data set, may be implemented in various ways: for example, finding out an image with a pedestrian crossing in a manual mode, and determining the pedestrian crossing in the image in a manual data labeling mode; for example, finding out the image with the pedestrian crossing by a target detection or image matching mode, and marking the pedestrian crossing in the image; for example, an image of a crosswalk is found and marked by means of target detection or image matching, and then an image with a correct mark is found by means of manual work. Since the image data set is used for training a multi-scale detection model (single-point multi-frame detector model), the downloaded satellite image data is relatively large, so that a sufficient image data set can be obtained after processing. It is noted that the image dataset may also be acquired in other ways.

2022, inputting the image dataset into the modified multi-scale detection model for multi-scale processing to extract a plurality of scale feature maps in the image dataset.

The image dataset was input to a modified single point multi-box detector model for multi-scale processing to obtain Conv4-3, Conv7, and four layers of additional nets, i.e. feature maps of 6 different scales in total. The pixel sizes of the feature maps of the 6 different scales are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in sequence. In particular, please refer to the description in step 201, and the contents of fig. 2 b. And will not be described in detail herein.

2023, obtaining a plurality of rotation frame candidates in each scale feature map, where the plurality of rotation frame candidates are obtained by performing different angle rotations on the region frames with different aspect ratios of the multi-scale detection model.

In this step, a plurality of candidate rotation frames in each scale feature map in the modified single-point multi-frame detector model are obtained. For how to obtain the multiple candidate rotation frames, how to perform different angle rotations on the region frames with different aspect ratios, and the like, please refer to the contents described in step 2011 and step 2012 and the contents in fig. 1e, which are not described again here.

2024, calculating a category confidence and a position confidence of each candidate rotation frame according to the real frame of the crosswalk marked in the image data set and the plurality of candidate rotation frames.

Specifically, the real frame of the crosswalk marked in the image data set is matched with a plurality of candidate rotating frames, and the category confidence and the position confidence of each candidate rotating frame are calculated according to the matching result.

In one embodiment, as shown in fig. 2e, step 2024, comprises:

2024a, matching the real frame of the marked crosswalk in the image dataset with a plurality of candidate rotation frames to determine a positive sample and a negative sample in the candidate rotation frames.

For example, sampling at each point on the feature map at each scale results in 6 × 7 — 42 candidate rotation boxes, and the 42 candidate rotation boxes are matched with the real box of the point to select the positive sample and the negative sample.

Specifically, the rotation-to-sum ratio of a real frame of the marked crosswalk and a plurality of candidate rotation frames in the image data set is calculated; calculating the angle difference between the real frame of the marked pedestrian crossing in the image data set and a plurality of candidate rotating frames; positive and negative samples in the plurality of candidate rotation frames are determined based on the calculated sum-of-rotation ratios and the difference between the angles.

The step of calculating the rotation-to-comparison ratio of the real frame of the crosswalk and the candidate rotation frames marked in the image data set is consistent with the method of calculating the rotation-to-comparison ratio described in step 1043c, and please refer to the contents described in step 1043 and the contents in fig. 1h, which are not described herein again.

The step of calculating the angle difference between the real frame of the marked crosswalk in the image data set and the plurality of candidate rotating frames includes calculating the angle between the real frame of the marked crosswalk in the image data set and the plurality of candidate rotating frames, and calculating the angle difference between the angle between the real frame and the candidate rotating frame. Specifically, the angle of the real frame or the candidate rotation frame may be calculated by formula (12).

Wherein the step of determining the positive and negative samples in the plurality of candidate rotation frames based on the calculated sum of rotation ratios and the difference between angles includes: if the calculated rotation-to-parallel ratio value is larger than a second preset rotation-to-parallel ratio threshold value and the angle difference is smaller than 15 degrees, determining the candidate rotation frame as a positive sample; if the calculated rotation-to-parallel ratio value is smaller than the difference between 1 and a second preset cross-to-parallel ratio threshold (namely 1-a second preset cross-to-parallel ratio threshold), determining that the candidate rotation frame is a negative sample; and if the calculated rotation-to-parallel ratio value is larger than a second preset rotation-to-parallel ratio threshold value and the angle difference is larger than 15 degrees, the candidate rotating frame is considered as a negative sample. Wherein the second preset intersection ratio threshold may be set to 0.7. Wherein candidate spin frames having a spin-to-merge ratio value between [ 1-a second preset cross-over ratio threshold, a second preset cross-over ratio threshold ] are discarded. If the second pre-set union ratio threshold is 0.7, candidate spin frames having a value of the spin ratio between [0.3,0.7] are discarded.

If the angle of the candidate rotation frame is recorded as v^θThe angle of the real frame is denoted as g^θThen, the mathematical description formula for the positive sample is shown as formula (13), and the mathematical description formula for the negative sample is shown as formula (14).

Where rIOU (g, v) represents the value of the sum of rotation ratio of the candidate rotation frame and the real frame.

And selecting a positive sample and a negative sample from all the rotation candidate frames by rotating the reference and comparing the reference rIOU and the angle, wherein the positive sample represents the candidate rotation frame possibly comprising the pedestrian crosswalk and is used for performing subsequent regression operation on the candidate rotation frame, and all the coordinates of the candidate rotation frame of the negative sample are assigned to be 0.

A real box marked as crosswalk in an image in the image dataset is represented as

Wherein, g_jThe real box representing the jth crosswalk, c represents the category, including crosswalk and background, represented by 1 and 0. One candidate rotation box is represented as

Wherein v is_iThe ith rotation frame candidate is represented.

2024b, performing position offset prediction on the candidate rotating frames of the positive samples, performing category confidence prediction on the candidate rotating frames of the positive samples and the negative samples, and outputting a position prediction amount prediction result and a category probability prediction result.

The predictor in the modified single point multi-box detector model includes a set of m × n × p convolution kernels. In the training phase of the modified single-point multi-box detector model, the candidate rotating boxes with different scales and different aspect ratios predict the position offset of the candidate rotating box and the class probability (class confidence) of each class. For a feature map of one scale, it is assumed that the number of detection target classes for training a single-point multi-frame detector model is C, for example, C is 2, i.e., two classes, the aspect ratios of the candidate rotation frames are a, for example, a is 6, i.e., 6, and each aspect ratio performs b different angle rotations, for example, b is 7, i.e., 7 different angles. In this way, the class probabilities (class confidences) of each candidate selection frame for C different classes are predicted using a × b × C (i.e., 6 × 7 × 2) m × n × p convolution kernels, respectively, while the position offset of each candidate rotation frame is predicted using a × b × k (6 × 7 × 5) m × n × p convolution kernels. Where k denotes the number of parameters of each candidate rotation frame, and if the coordinates of two points clockwise along the candidate rotation frame and the height of the candidate rotation frame are respectively represented by (ax, ay, bx, by, h), k is 5. And the category confidence is obtained by classifying through a softmax classifier.

It is understood that a feature map with a pixel size of M × N generates M × N × a × b candidate rotation frames, and finally generates M × N × a × b × (C +5) output results, where a × b ═ 6 × 7 × (42), C represents the number of categories of the detection target, and 5 represents the parameter of each candidate rotation frame (ax, ay, bx, by, h), that is, M × N × 42 × (2+5) output results are generated for each layer. For each candidate rotating frame, (C +5) ═ 7-dimensional prediction information is output, and the 7-dimensional prediction information is a category probability prediction result of the candidate rotating frame for the background (category probability prediction value for the background), a category probability prediction result of the candidate rotating frame for the pedestrian crossing (category probability prediction value for the pedestrian crossing), and a position offset prediction result of the candidate rotating frame (position offset prediction value).

2024c, inputting the rotation frame candidates, the real frame, the position pre-measurement prediction result and the class probability prediction result into a loss function of the multi-scale detection model to calculate a class confidence and a position confidence of each rotation frame candidate.

The candidate rotation frame, the real frame, the position prediction result and the category probability prediction result are input into the formula (11), namely, the loss function, so as to calculate the category confidence and the position confidence of the candidate rotation frame.

2025, according to the category confidence and the position confidence of each candidate rotation frame, modifying parameters in the loss function of the multi-scale detection model to make the loss function converge, and using the multi-scale detection model after the loss function converges as a pedestrian crossing detection model.

Training the continuously adjusted and modified multi-scale detection model for multiple times according to the category confidence coefficient and the position confidence coefficient of each candidate rotating frame, namely adjusting a series of parameters in a loss function of the modified single-point multi-frame detector model, wherein the parameters of the loss function comprise prior frame parameters (the parameters of the candidate rotating frame of the positive sample, the prior frame parameters are 5 values in the embodiment of the invention), a learning rate, a training step length and the like, finally obtaining the modified single-point multi-frame detector model with the minimum loss value of the converged loss function, and taking the modified single-point multi-frame detector model as a pedestrian crossing detection model. In some embodiments, the image dataset is cleaned once for each training, and some crosswalks with marked errors are deleted, so that the accuracy of the multi-scale detection model is improved.

In one embodiment, the acquired image dataset is divided into a training set and a test set, e.g., one part of the image dataset is used as the training set and the other part is used as the test set, typically the training set is larger than the test set. If 80% of the images in the image data set are used as a training set, and the other 20% are used as a test set; for example, 4000 data sets are provided, 3000 data sets are used as training sets, and 1000 data sets are used as testing sets.

The method comprises the steps of training a modified multi-scale detection model, namely a modified single-point multi-frame detector model by using a training set to obtain a pedestrian crossing detection model, testing the trained pedestrian crossing detection model by using a testing set to calculate whether the detection accuracy and the recall rate of the pedestrian crossing detection model meet requirements, for example, judging whether the accuracy is greater than a preset accuracy and the recall rate is greater than a preset recall rate, if so, determining that the detection accuracy and the recall rate meet the requirements, otherwise, determining that the detection accuracy and the recall rate do not meet the requirements; and if the requirements are not met, re-training the modified multi-scale detection model to obtain the pedestrian crossing detection model. The pedestrian crossing detection model obtained by training is tested by using the test set, the test set is actually input into the pedestrian crossing detection model to be detected and marked, and the detection and marking result of the pedestrian crossing detection model is compared with the marking result in the test set so as to calculate the accuracy and the recall rate of the pedestrian crossing detection model.

Step 201-.

And 203, acquiring target satellite image data.

And 204, performing segmentation processing on the target satellite image data to obtain a target image.

205, performing multi-scale processing on the target image through a pedestrian crossing detection model to extract a plurality of scale feature maps of the target image.

And performing multi-scale processing on the target image according to the trained single-point multi-frame detector model to obtain Conv4-3, Conv7 and four layers of additional networks, namely feature maps of 6 different scales in total. Specifically, the multi-scale processing is performed on the target image, which is the same as the multi-scale processing performed in the training process of the single-point multi-frame detector model, and specific reference is made to the related content of the multi-scale processing involved in the training process.

And 206, performing pedestrian crossing detection on the target image by using a pedestrian crossing detection model according to the plurality of scale feature maps, and marking the detected pedestrian crossing.

Specifically, a plurality of candidate rotating frames in a plurality of scale feature maps are determined, category probability predicted values and position offset predicted values of the plurality of candidate rotating frames are calculated, namely the predictor outputs (C +5) pieces of dimension information of each candidate rotating frame, a target frame is determined from the plurality of candidate rotating frames according to the category probability predicted values and the position offset predicted values, and a pedestrian crosswalk in the target frame is used as a detected pedestrian crosswalk. Specifically, the step of determining a target frame from the plurality of candidate rotating frames according to the category probability predicted value and the position offset predicted value, and using the pedestrian crossing in the target frame as the detected pedestrian crossing includes: acquiring a preset probability threshold; deleting the candidate rotating frames with the category probability predicted values smaller than a preset probability threshold from the plurality of candidate rotating frames; carrying out non-maximum suppression processing on the candidate rotating frames subjected to deletion processing so as to eliminate redundant candidate rotating frames; and determining a target frame from the candidate rotating frames after the redundancy processing according to the predicted value of the position offset. Please refer to 1043c for a specific step of the non-maximum suppression processing.

The content related to each step in step 206 may refer to the content corresponding to the corresponding step in step 104, and is not described herein again. The pedestrian crossing detection is performed using a preset single-point multi-frame detector model, and the obtained result is shown in fig. 2 f. It can be seen that the crosswalk detection result in fig. 2f marks the target frame of the detected crosswalk.

It should be noted that the feature extraction and predictor in the crosswalk detection process of the target image by using the crosswalk detection model utilizes trained parameters, and in addition, the matching processing of the candidate rotating frame and the real frame is not involved in the crosswalk detection process, so that no loss function and feedback adjustment part exist, and the non-maximum suppression processing is used for eliminating redundant candidate rotating frames and finally determining the target frame.

It should be noted that, step 205 and step 206 all refer to a crosswalk detection model, and it can be understood that, the multi-scale processing on the target image in step 205 is performed to extract a plurality of scale feature maps of the target image, and the crosswalk detection on the target image in step 206 is performed to mark the detected crosswalk through the crosswalk detection model, and only the target image (unmarked crosswalk) needs to be input into the crosswalk detection model, which only needs to be input once.

Step 203-.

In the embodiment, the single-point multi-frame detector model is modified, namely, the SSD model is modified, the modified single-point multi-frame detector model is trained by using the image data set to obtain the pedestrian crossing detection model, and the pedestrian crossing of the target image is detected by using the pedestrian crossing detection model. For example, by modifying different aspect ratios of the region frames in the single-point multi-frame detector model, the accuracy of the detection result can be improved by at least 5%; for example, the candidate rotating frame is used for detecting the pedestrian crossing in the satellite image data, so that the accuracy of pedestrian crossing detection is improved; for example, the representation of the candidate rotation frame is modified to modify the corresponding loss function, so that the training result is more accurate, and the like. In addition, because the single-point multi-frame detector model extracts a plurality of scale feature maps which can express different features of the target image, the features of a larger pedestrian crossing in the target image can be extracted by using the plurality of scale feature maps, and the features of a smaller pedestrian crossing in the target image can also be extracted; and each characteristic graph is used for detecting the pedestrian crossing in the satellite image data, so that the accuracy of pedestrian crossing detection can be improved.

In one embodiment, after step 206, the crosswalk detection method preferably comprises:

and 207, performing duplication elimination processing on the marked pedestrian crossing.

Specifically, road network matching deduplication processing may be performed on the marked pedestrian crossings. For the steps of performing road network matching and deduplication processing on the marked pedestrian crossing and the beneficial effects that can be achieved, please refer to the content described in the embodiment of fig. 1i specifically, and no further description is given here.

In order to better implement the pedestrian crossing detection method in the embodiment of the invention, the embodiment of the invention also provides a pedestrian crossing detection device on the basis of the pedestrian crossing detection method. The pedestrian crossing detection device is applied to terminal equipment, and the terminal equipment can be a server or a terminal, such as mobile phones, pads, desktop computers and other equipment.

Fig. 3 is a schematic block diagram of a pedestrian crossing detection apparatus provided in an embodiment of the present invention. The pedestrian crossing detection device comprises an acquisition unit 301, a preprocessing unit 302, a feature extraction unit 303 and a detection unit 304.

The acquiring unit 301 is configured to acquire target satellite image data.

The preprocessing unit 302 is configured to perform segmentation processing on the target satellite image data to obtain a target image.

A feature extraction unit 303, configured to perform multi-scale processing on the target image to extract multiple scale feature maps of the target image.

A detecting unit 304, configured to perform pedestrian crossing detection on the target image according to the multiple scale feature maps, and mark the detected pedestrian crossing.

In one embodiment, the crosswalk detection apparatus further includes a deduplication unit 305:

and a deduplication unit 305, configured to perform deduplication processing on the marked crosswalk.

In one embodiment, the detection unit 304 includes: the device comprises a rotating frame determining unit, a predicted value calculating unit and a target frame determining unit. Wherein the content of the first and second substances,

and the rotating frame determining unit is used for determining a plurality of candidate rotating frames in each scale feature map, wherein the plurality of candidate rotating frames are obtained by rotating the preset region frames with different aspect ratios by different angles.

And the predicted value calculating unit is used for calculating the category probability predicted value and the position offset predicted value of the plurality of candidate rotating frames.

And the target frame determining unit is used for determining a target frame from a plurality of candidate rotating frames according to the category probability predicted value and the position offset predicted value, and taking the pedestrian crossing in the target frame as the detected pedestrian crossing.

In one embodiment, the target frame determination unit includes: the device comprises a threshold value acquisition unit, a frame deletion unit, a redundancy processing unit and a target detection frame determination unit. The threshold value obtaining unit is used for obtaining a preset probability threshold value. And the frame deleting unit is used for deleting the candidate rotating frames of which the category probability predicted values are smaller than a preset probability threshold value from the plurality of candidate rotating frames. And the redundancy processing unit is used for performing redundancy processing on the candidate rotating frame subjected to the deletion processing so as to eliminate the redundant candidate rotating frame. And the target detection frame determining unit is used for determining a target frame from the candidate rotating frames after the redundancy processing according to the position offset predicted value.

In one embodiment, the deduplication unit 305 comprises: the device comprises a coordinate acquisition unit, a GPS coordinate determination unit, a road network determination unit, an error determination unit and an error deletion unit. Wherein the content of the first and second substances,

and the coordinate acquisition unit is used for acquiring the position coordinates of the marked pedestrian crossing. And the GPS coordinate determination unit is used for determining the range domain of the GPS coordinate corresponding to the marked pedestrian crossing according to the position coordinate. And a road network determining unit for determining a road network in a circular area centered on the range of the GPS coordinates. And the error determining unit is used for determining the crosswalk marked with errors according to the range domain of the GPS coordinates and the determined road network. And the error deleting unit is used for deleting the crosswalk marked with the error.

Fig. 4 is another schematic block diagram of a pedestrian crossing detection apparatus provided in an embodiment of the present invention. The pedestrian crossing detection device comprises a modification unit 401, a training unit 402, an acquisition unit 403, a preprocessing unit 404, a feature extraction unit 405 and a detection unit 406.

And a modifying unit 401, configured to modify the initial multi-scale detection model to obtain a modified multi-scale detection model.

And a training unit 402, configured to train the modified multi-scale detection model to obtain a pedestrian crossing detection model.

An acquiring unit 403 is used for acquiring image data of the target satellite.

And the preprocessing unit 404 is configured to perform segmentation processing on the target satellite image data to obtain a target image.

A feature extraction unit 405, configured to perform multi-scale processing on the target image through a pedestrian crossing detection model to extract multiple scale feature maps of the target image.

And a detecting unit 406, configured to perform pedestrian crossing detection on the target image by using a pedestrian crossing detection model according to the multiple scale feature maps, and mark the detected pedestrian crossing.

In one embodiment, the crosswalk detection apparatus further includes: a deduplication unit 407.

And a deduplication unit 407, configured to perform deduplication processing on the marked crosswalk.

In an embodiment, the modifying unit 401 includes: an aspect ratio modification unit, a rotation unit, a representation modification unit, a position function setting unit, a total loss function determination unit. Wherein the content of the first and second substances,

and the aspect ratio modification unit is used for modifying a plurality of different aspect ratios of the region frame in each scale feature map in the initial multi-scale detection model. And the rotating unit is used for rotating the area frames with the modified aspect ratios by different angles to obtain a plurality of candidate rotating frames. And the representation modification unit is used for modifying the representation mode of the candidate rotating frame. And the position function setting unit is used for setting a position regression loss function based on the candidate rotating frame according to the representation mode of the candidate rotating frame. And the total loss function determining unit is used for weighting and summing the position regression loss function and the category confidence coefficient loss function based on the candidate rotating frame to obtain the modified loss function of the multi-scale detection model.

In one embodiment, training unit 402 includes: the device comprises a data set acquisition unit, a feature map extraction unit, a rotating frame acquisition unit, a confidence coefficient calculation unit and a regression unit.

A dataset acquisition unit for acquiring an image dataset comprising a marked crosswalk.

And the characteristic map extraction unit is used for inputting the image data set into the modified multi-scale detection model to perform multi-scale processing so as to extract a plurality of scale characteristic maps in the image data set.

And a rotation frame acquiring unit, configured to acquire a plurality of rotation frame candidates in each scale feature map, where the plurality of rotation frame candidates are obtained by performing different angle rotations on region frames of different aspect ratios of the multi-scale detection model.

And the confidence coefficient calculation unit is used for calculating the category confidence coefficient and the position confidence coefficient of each candidate rotating frame according to the real frame of the pedestrian crossing marked in the image data set and the plurality of candidate rotating frames.

And the regression unit is used for modifying parameters in a loss function of the multi-scale detection model according to the category confidence coefficient and the position confidence coefficient of each candidate rotating frame so as to enable the loss function to be converged, and taking the multi-scale detection model after the loss function is converged as a pedestrian crossing detection model.

In one embodiment, the confidence calculation unit includes: positive and negative sample determination unit, prediction result output unit,

And the positive and negative sample determining unit is used for matching the real frame of the marked pedestrian crosswalk in the image data set with a plurality of candidate rotating frames so as to determine a positive sample and a negative sample in the candidate rotating frames. And the prediction result output unit is used for performing position offset prediction on the candidate rotating frames of the positive samples, performing category confidence prediction on the candidate rotating frames of the positive samples and the negative samples, and outputting a position prediction amount prediction result and a category probability prediction result. A frame confidence calculation unit, configured to input the candidate rotation frames, the real frame, the position pre-measurement prediction result, and the category probability prediction result into a loss function of the multi-scale detection model to calculate a category confidence and a position confidence of each candidate rotation frame.

It should be noted that, as will be clear to those skilled in the art, specific implementation procedures and achieved beneficial effects of the above-mentioned apparatus and units may refer to corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided herein.

The embodiment of the present invention further provides a computer device, which integrates any one of the pedestrian crossing detection methods provided by the embodiments of the present invention, and the computer device includes:

one or more processors; a memory; and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the processor for performing the steps of the pedestrian crossing detection method in any of the embodiments of the pedestrian crossing detection method described above.

The embodiment of the invention also provides computer equipment which integrates any pedestrian crossing detection device provided by the embodiment of the invention. As shown in fig. 5, it shows a schematic structural diagram of a computer device according to an embodiment of the present invention, specifically:

the computer device may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a power supply 503, and an input unit 504. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 5 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 501 is a control center of the computer device, connects various parts of the entire computer device by using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby monitoring the computer device as a whole. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.

The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.

The computer device further comprises a power supply 503 for supplying power to the various components, and preferably, the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 504, and the input unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a Graphics Processing Unit (GPU) that converts and drives display information required by the computer system, provides a line scan signal to the display, controls the display to display correctly, and takes over the task of outputting display Graphics, thereby increasing the speed of data Processing. If the image processor is used for training the face detection model, compared with the processor, the speed of model training can be improved by more than 10 times.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 501 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application programs stored in the memory 502, thereby implementing various functions, such as implementing various functions in the crosswalk detection method of the human provided by the embodiment of the present invention.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present invention provides a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like. Stored thereon, is a computer program, which is loaded by a processor to perform the steps of any one of the pedestrian crossing detection methods provided by the embodiments of the present invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again. In a specific implementation, each unit or structure may be implemented as an independent entity, or may be combined arbitrarily to be implemented as one or several entities, and the specific implementation of each unit or structure may refer to the foregoing method embodiment, which is not described herein again.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

The pedestrian crossing detection method, the pedestrian crossing detection device, the computer equipment and the storage medium provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A crosswalk detection method is characterized by comprising the following steps:

acquiring target satellite image data;

segmenting the target satellite image data to obtain a target image;

2. The crosswalk detection method according to claim 1, wherein the crosswalk detection of the target image based on the plurality of scale feature maps includes:

determining a plurality of candidate rotating frames in each scale feature map, wherein the plurality of candidate rotating frames are obtained by rotating preset region frames with different length-width ratios by different angles;

calculating category probability predicted values and position offset predicted values of the plurality of candidate rotating frames;

and determining a target frame from the plurality of candidate rotating frames according to the category probability predicted value and the position offset predicted value, and taking the pedestrian crossing in the target frame as the detected pedestrian crossing.

3. The crosswalk detection method according to claim 1, further comprising, after the crosswalk detection is performed on the target image on the basis of the plurality of scale feature maps and the detected crosswalk is marked:

and carrying out duplication elimination treatment on the marked pedestrian crossing.

4. The crosswalk detection method according to claim 3, wherein the performing deduplication processing on the marked crosswalk includes:

acquiring the position coordinates of the marked pedestrian crossing;

determining the range domain of the GPS coordinate corresponding to the marked pedestrian crossing according to the position coordinate;

determining a road network in a circular area with the range domain of the GPS coordinates as a center;

determining pedestrian crosswalks with wrong marks according to the range domain of the GPS coordinates and the determined road network;

and deleting the crosswalk marked with the error.

5. The crosswalk detection method according to claim 1, further comprising, before the performing multi-scale processing on the target image to extract a plurality of scale feature maps of the target image:

modifying the initial multi-scale detection model to obtain a modified multi-scale detection model;

training the modified multi-scale detection model to obtain a pedestrian crossing detection model;

the multi-scale processing of the target image to extract multiple scale feature maps of the target image includes: performing multi-scale processing on the target image through a pedestrian crossing detection model to extract a plurality of scale feature maps of the target image;

the step of performing pedestrian crossing detection on the target image according to the multiple scale feature maps and marking the detected pedestrian crossing comprises the following steps: and according to the multiple scale feature maps, using the crosswalk detection model to perform crosswalk detection on the target image and marking the detected crosswalk.

6. The pedestrian crossing detection method of claim 5, wherein said modifying the initial multi-scale detection model to obtain a modified multi-scale detection model comprises:

modifying a plurality of different aspect ratios of the region box in each scale feature map in the initial multi-scale detection model;

rotating each region frame with the modified length-width ratio by different angles to obtain a plurality of candidate rotating frames;

modifying the representation mode of the candidate rotating frame;

setting a position regression loss function based on the candidate rotating frame according to the representation mode of the candidate rotating frame;

and weighting and summing the position regression loss function and the category confidence coefficient loss function based on the candidate rotating frame to obtain a modified loss function of the multi-scale detection model.

7. The crosswalk detection method of claim 6, wherein training the modified multi-scale detection model to obtain a crosswalk detection model comprises:

acquiring an image data set, wherein the image data set comprises marked pedestrian crossings;

inputting the image data set into a modified multi-scale detection model for multi-scale processing so as to extract a plurality of scale feature maps in the image data set;

obtaining a plurality of candidate rotating frames in each scale feature map, wherein the plurality of candidate rotating frames are obtained by rotating the region frames with different length-width ratios of the multi-scale detection model by different angles;

calculating the category confidence and the position confidence of each candidate rotating frame according to the real frame of the pedestrian crossing marked in the image data set and the plurality of candidate rotating frames;

and modifying parameters in a loss function of the multi-scale detection model according to the category confidence coefficient and the position confidence coefficient of each candidate rotating frame so as to enable the loss function to be converged, and taking the multi-scale detection model after the loss function is converged as a pedestrian crossing detection model.

8. The utility model provides a crosswalk detection device which characterized in that includes:

the acquisition unit is used for acquiring target satellite image data;

9. A computer device, characterized in that the computer device comprises:

one or more processors; a memory; and one or more applications, wherein the processor is coupled to the memory, the one or more applications being stored in the memory and configured to be executed by the processor to implement the crosswalk detection method of any of claims 1-7.

10. A computer storage medium having a computer program stored thereon, the computer program being loaded by a processor to perform the steps in the crosswalk detection method according to any one of claims 1 to 7.