CN113033348A

CN113033348A - Overlook image correction method for pedestrian re-recognition, storage medium, and electronic device

Info

Publication number: CN113033348A
Application number: CN202110262877.8A
Authority: CN
Inventors: 徐子豪; 郑翔
Original assignee: Beijing Vion Intelligent Technology Co ltd
Current assignee: Beijing Vion Intelligent Technology Co ltd
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-25

Abstract

The invention provides a top view image correction method for pedestrian re-identification, a storage medium and an electronic device, wherein the top view image correction method comprises the following steps: constructing a top view image block subset of a plurality of top view image blocks containing the target pedestrian P, wherein each top view image block is a screenshot on one frame of top view image at different time points containing the target pedestrian P in the video; at least one top view image block of the subset of top view image blocks is selectively transformed into a front view image block by a homographic matrix-based projective transformation, so that the target pedestrian P in the top view image block is corrected from a top view perspective to a side view perspective. The invention solves the problem that the prior pedestrian re-identification method has use limitation because the overlook image cannot be effectively used as model image sample input in the pedestrian re-identification technology in the technical field of image processing and pedestrian identification in the prior art, namely the overlook image cannot be processed into an applicable optimal image.

Description

Overlook image correction method for pedestrian re-recognition, storage medium, and electronic device

Technical Field

The invention relates to the technical field of image processing and pedestrian recognition, in particular to a method for correcting overlook images for pedestrian re-recognition, a storage medium and electronic equipment.

Background

At present, monitoring devices for personnel data statistics are arranged in many public places; for example, a camera for passenger flow statistics is usually installed at an entrance of a person in a market or a store, so that macroscopic regulation and control of the number of the person passing in and out of the market or the store are facilitated, accurate matching of demands of the person through big data is facilitated, and the intelligence and convenience of service of the market or the store are improved.

The existing camera for passenger flow statistics is often vertically installed at the top of a building in an embedded or protruding manner, the image capturing direction of the camera is perpendicular to the ground, the image captured by the camera is an overlook image in a fixed area range, the effective body area of a pedestrian captured in the overlook image is small, and further the amount of information of the body characteristics of the pedestrian which can be visually obtained is small.

With the application and popularization of artificial intelligence, the pedestrian re-identification technology is applied to the field of video monitoring, and moving pictures and tracks of the same pedestrian under multiple cameras can be accurately found, so that the pedestrian monitoring and tracking method is beneficial to monitoring and tracking of pedestrians in public places, and the safety of the public places is greatly improved. The existing camera for passenger flow statistics cannot be used in the pedestrian re-identification technology, and an additional video monitoring camera is often required to be installed, so that the camera which is arranged in a messy and complicated way is not beneficial to the whole appearance aesthetic feeling of a building, a person can generate a depression feeling easily, the additional cost is increased, and the improvement of the whole economy is not facilitated.

Therefore, in the field of conventional image processing and pedestrian recognition technology, the overhead image cannot be effectively used as a model image sample input in the pedestrian re-recognition technology, that is, the overhead image cannot be processed into an applicable preferred image, which results in limitation of use of the conventional pedestrian re-recognition method.

Disclosure of Invention

The invention mainly aims to provide a method for correcting an overhead view image for pedestrian re-recognition, a storage medium and an electronic device, and aims to solve the problem that in the technical field of existing image processing and pedestrian recognition, the overhead view image cannot be effectively used as a model image sample input in the pedestrian re-recognition technology, namely, the overhead view image cannot be processed into an applicable optimal image, so that the existing pedestrian re-recognition method has use limitation.

In order to achieve the above object, according to one aspect of the present invention, there is provided a top view image correction method of pedestrian re-recognition, including: step SD1, acquiring a video of an overhead view angle by an image capturing device, and constructing an overhead view image block subset including a plurality of overhead view image blocks of the target pedestrian P, wherein each overhead view image block is a screenshot on one frame of overhead view image at different time points including the target pedestrian P in the video; and step SD2, selectively forming at least one top view image block in the top view image block subset into a front view image block through projection transformation based on the homography matrix, so as to correct the target pedestrian P in the top view image block from the top view angle to the side view angle.

Further, the projective transformation formula for forming the front-view image block from the top-view image block includes:

formula (1):

formula (2):

formula (3):

wherein, in the formula (1),

is denoted as homography matrix T', (x)_a,y_a) As the origin coordinates in the top view image block, (x)_b,y_b) As the origin coordinate (x)_a,y_a) The projection transforms to end point coordinates in the blocks of the orthographic image.

Furthermore, the overlook image comprises a plurality of projection subareas which are circumferentially distributed by taking the midpoint of the overlook image as the center, and the projection subareas correspond to a plurality of projection transformation formulas with different homography matrixes T' one by one; in a step SD2,matching origin coordinates (x) in top view image blocks_a,y_a) After the positions in a plurality of projection subareas are selected, a projection transformation formula corresponding to the positions is selected for projection transformation, so that the end point coordinate (x) in the front-view image block is obtained_b,y_b)。

Furthermore, each overlooking image block is matched with one unique projection subarea, and the position of the foot center point of the target pedestrian P in the overlooking image block is used as a matching basic point.

Furthermore, the circumferential angles of the projection subareas are equal, and the number of the projection subareas of the overlook image is greater than or equal to 3 and less than or equal to 9.

Furthermore, the plurality of projection subareas correspond to the plurality of front-view virtual cameras one by one, and the plurality of front-view virtual cameras correspond to the plurality of projection transformation formulas one by one.

Further, traversing origin coordinates (x) of all pixel points in the overlooking image block_a,y_a) Obtaining coordinates (x) of origin for forming the front image block_a,y_a) One-to-one correspondence of end point coordinates (x)_b,y_b)。

Further, in step SD2, an optimal overlook pedestrian snapshot image block is screened out from a plurality of top view image blocks including the target pedestrian P, and a front view image block formed by performing projection transformation on the optimal overlook pedestrian snapshot image block is used as a standard pedestrian front view image block.

According to another aspect of the present invention, there is provided a storage medium which is a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions, when executed by a processor, are adapted to implement the steps of the above-mentioned top-view image correction method.

According to another aspect of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element are communicated with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the steps of the overlook image correction method.

By applying the technical scheme of the invention, a plurality of overlook image blocks in the overlook image block subset are selected and then projection transformation based on the homography matrix is carried out, so that an orthographic image block is obtained; the target pedestrian P in the front view image block is effectively converted into a side view angle from a top view angle in the top view image block, so that more effective information of the target pedestrian P can be acquired through the top view image block, the subsequent feature extraction and feature comparison on the target pedestrian P are facilitated, and the accuracy of pedestrian re-identification is greatly improved; therefore, through effective processing of the overlook images, the overlook images can be effectively used as model image sample input in the pedestrian re-identification technology, and the practicability of the pedestrian re-identification technology is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 illustrates a flow diagram of a pedestrian re-identification method based on top-view images in accordance with an alternative embodiment of the present invention;

FIG. 2 illustrates a flow diagram of a pedestrian trajectory generation method based on top-view images in accordance with an alternative embodiment of the present invention;

fig. 3 is a flowchart showing the step SB2 of the pedestrian trajectory generation method based on the overhead image in fig. 2, in which the target pedestrian P among all the pedestrians is tracked and matched;

FIG. 4 illustrates a flow diagram of a top view image selection method for pedestrian re-identification in accordance with an alternative embodiment of the present invention;

FIG. 5 illustrates a flow diagram of a top-view image correction method of pedestrian re-identification in accordance with an alternative embodiment of the present invention;

fig. 6 shows one frame of top view image in a top view image mother set containing a target pedestrian P in a top view video captured by a vertically mounted image capturing apparatus according to an alternative embodiment of the present invention;

FIG. 7 illustrates a top view image patch (or an optimal top view pedestrian snap image patch) containing a target pedestrian P after keying the top view image of FIG. 6;

FIG. 8 illustrates an elevational image block (or an underlying pedestrian elevational image block) after projective transformation of the top-view image block of FIG. 7 in an alternative embodiment;

fig. 9 shows an elevational image block (or an underlying pedestrian elevational image block) after projective transformation of the top-view image block of fig. 7 in an alternative embodiment.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," "includes," "including," "has," "having," and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to solve the problem that in the technical field of image processing and pedestrian recognition in the prior art, a top-view image cannot be effectively utilized as a model image sample input in the pedestrian re-recognition technology, that is, the overhead image cannot be processed into an appropriate preferred image, which causes a problem that the conventional pedestrian re-recognition method has a limitation in use, the present invention provides an overhead image correction method, a storage medium, and an electronic apparatus for pedestrian re-recognition, the storage medium is a computer-readable storage medium, and computer program instructions are stored on the storage medium, and when executed by a processor, the program instructions are used for implementing the steps of the pedestrian re-identification method based on the overhead view image, the pedestrian trajectory generation method based on the overhead view image, the overhead view image selection method based on the pedestrian re-identification, or the overhead view image correction method based on the pedestrian re-identification. The electronic device includes: the system comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element are communicated with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the steps of the pedestrian re-identification method based on the overhead view image, the pedestrian track generation method based on the overhead view image, the overhead view image selection method based on the pedestrian re-identification or the overhead view image correction method based on the pedestrian re-identification.

By applying the technical scheme of the invention, the overlook image can be subjected to depth analysis and utilization, useful pedestrian information can be obtained from the overlook image, and the accurate matching of the same pedestrian in the multi-frame overlook image is realized, so that the pedestrian re-identification technology is applied, and the problem of accurately monitoring and tracking the pedestrian across lenses is smoothly realized.

Specifically, fig. 1 is a flowchart of a pedestrian re-identification method based on an overhead image according to an embodiment of the present invention. The pedestrian re-identification method comprises the following steps: step SA1, acquiring a video of an overlooking visual angle through an image acquisition device, and extracting an overlooking image master set of pedestrians in the video; step SA2, pedestrian detection is carried out on the overlook image mother set through a CenterNet depth neural network model, the detection result is tracked and matched to obtain the advancing track of a target pedestrian P in the pedestrian, and according to the advancing track of the target pedestrian P, a top view image block subset of all overlook image blocks containing the target pedestrian P is extracted; step SA3, screening out optimal pedestrian overlook snapshot image blocks from the top view image block subset, and performing projection transformation on the optimal pedestrian overlook snapshot image blocks so as to correct the optimal pedestrian overlook snapshot image blocks into basic pedestrian front view image blocks; step SA4, performing key point prediction on a basic pedestrian front view image block through a human body posture estimation neural network model, and then performing image alignment to obtain a standard pedestrian front view image block; step SA5, feature extraction is carried out on the target pedestrian P in the standard pedestrian front-view image block by utilizing a pedestrian re-identification neural network model; step SA6 is to sequentially traverse the plurality of image capturing apparatuses, perform the operations in steps SA1 to SA5, and perform feature similarity determination on the features of the target pedestrian P extracted from each image capturing apparatus. And then can catch target pedestrian P accurately in a plurality of image capture equipment videos, realize smoothly that stride the accurate monitoring of camera lens and trail target pedestrian P.

It should be noted that, according to the technical solution of the present invention, the overhead view image acquired by the specific image capturing device can be processed and analyzed, the specific image capturing device may be an image capturing device vertically installed on a building, and the image capturing device in the present invention may be an electronic device such as a camera or a video camera, and may also be a camera installed on a ceiling of the building in an embedded manner, or a camera exposed outside the ceiling of the building.

The vertical installation means that the shooting center line of the image capturing equipment is vertical to the ground, and the overlooking images shot by the image capturing equipment are shot towards the periphery by taking the shooting center line as a center point, so that the orientation equipment is easier to install in a hidden manner, the appearance aesthetic feeling of a building is improved, and the video recording and bidirectional people counting functions of the pedestrians can be realized; in addition, the video shot by the vertically installed orientation equipment is in a top view angle, so that all top view images in a formed top view image master set are also in the top view angle, the pedestrian re-identification method can be further processed aiming at typical top view images, and finally the pedestrian re-identification effect is realized by combining a plurality of image capturing equipment. Certainly, the overhead view image captured by the image capturing device adopting the non-vertical installation mode can still be analyzed and processed by using the technical scheme of the invention, and when the image capturing device is installed obliquely, that is, the shooting center line of the image capturing device forms a certain included angle with the ground, the overhead view image captured by the image capturing device can also be processed by using the pedestrian re-identification method of the invention.

The preferred embodiment of the present invention is explained only with reference to overhead images taken by a passenger flow camera mounted vertically on the top of a building. As shown in fig. 6, is a typical overhead image taken by a passenger flow camera mounted vertically on the top of a building. In the overhead image, the portion selected from the frame in the black frame is the target pedestrian P.

As shown in fig. 1, regarding step SA2 in the pedestrian re-identification method based on the top view image according to the present invention, the present invention further provides a pedestrian trajectory generation method based on the top view image, as shown in fig. 2, the pedestrian trajectory generation method includes: step SB1, the pedestrian detection is carried out on the overlook image mother set through the CenterNet depth neural network model, all the pedestrians in each overlook image of the overlook image mother set are traversed, and the detection result corresponding to each pedestrian is obtained, wherein the detection result comprises: in the overlook image, the position of the center point of the head of the pedestrian and the position and the size of a rectangular frame circumscribed to the head of the pedestrian are determined; step SB2, tracking and matching the detection results of all pedestrians, classifying a plurality of detection results belonging to the target pedestrian P, and acquiring the travel trajectory corresponding to the target pedestrian P. Because the head feature which is the most complete pedestrian is kept in the overlooking image, the possibility of shielding the head feature is the smallest, and the influence of other body features is small, the technical scheme of the invention is applied, the head central point position of the pedestrian and the position and the size of the circumscribed rectangular frame of the head of the pedestrian are selected as the result of tracking and matching of the pedestrian, a plurality of pedestrians are effectively distinguished in the multiframe overlooking image of the overlooking image parent set, the target pedestrian P is captured more accurately at the same time, and the traveling track of the target pedestrian P is ensured to be real and effective. That is to say, by using the pedestrian trajectory generation method, multiple frames of top view images in a top view image master set can be detected, different pedestrians can be stably matched and classified according to the detection result, and therefore the travelling trajectory of each pedestrian in the shooting area of the image capturing device can be obtained; and then the subsequent accurate locking and acquisition of the travel track of the target pedestrian P in the shooting area of the image capturing device are facilitated. By the aid of the scheme, overhead images shot by the orientation equipment similar to the passenger flow statistics camera can be fully utilized as image samples used for acquiring pedestrian trajectories in the pedestrian re-identification technology in video monitoring, so that reliable pedestrian trajectories can be generated.

Explained in further detail below for a technical solution in the pedestrian trajectory generation method based on the top view image, optionally, the top view images of the top view image mother set are n frames in total, where n is a positive integer greater than or equal to 2; as shown in fig. 3, the tracking matching of the detection result of the target pedestrian P among all the pedestrians in step SB2 includes:

step SB21, sequencing the n frames of top view images in sequence according to time, and detecting the n frames of top view images in sequence until obtaining the mth frame of top view image of the target pedestrian P appearing for the first time, wherein m +1 is less than or equal to n;

step SB22 of predicting the head center point position of the target pedestrian P in the (m + 1) th frame overhead image by the optical flow method;

step SB23, using the circumscribed rectangle frame at the head center point position of the predicted target pedestrian P in the (m + 1) th frame of overhead view image, sequentially performing IOU matching calculation with the circumscribed rectangle frames of the heads of all the pedestrians detected in the (m + 1) th frame of overhead view image, and obtaining at least one IOU matching calculation result, and taking the maximum value of the IOU matching calculation results as the matching value of the (m + 1) th frame of overhead view image; the circumscribed rectangle of the head center point position of the predicted target pedestrian P in the (m + 1) th frame overhead image corresponds to the transition formation of the head circumscribed rectangle of the target pedestrian P in the m +1 th frame overhead image.

Step SB24, comparing the matching value of the top view image of the (m + 1) th frame with the standard matching judgment value,

when the matching value of the (m + 1) th frame of overhead view image is larger than or equal to the standard matching judgment value, judging that the (m + 1) th frame of overhead view image is matched with the external rectangular frame of the head of the target pedestrian P, applying the head central point position of the target pedestrian P matched in the (m + 1) th frame of overhead view image and the external rectangular frame of the head of the target pedestrian P, and updating the traveling track of the target pedestrian P in the (m + 1) th frame of overhead view image; or

When the matching value of the (m + 1) th frame of the overhead view image is smaller than the standard matching judgment value, judging that the (m + 1) th frame of the overhead view image is not matched with the circumscribed rectangle frame of the head of the target pedestrian P, applying the head center point position of the target pedestrian P predicted in the step SB22, and updating the traveling track of the target pedestrian P in the (m + 1) th frame of the overhead view image;

and step SB25, repeating steps SB22 to SB24 until all the top-view images after the m-th frame top-view image are traversed, and acquiring the travel trajectory corresponding to the target pedestrian P in the n-th frame top-view image.

According to the above-described operation steps, the travel locus of each pedestrian can be obtained quickly, stably, and conveniently, and of course, the travel locus corresponding to the target pedestrian P can be locked accurately.

It should be noted that the target pedestrian P is a specific pedestrian selected for the convenience of clearly explaining the technical solution of the present invention, and may be a general pedestrian in nature, that is, any pedestrian in the top view angle video acquired by the image capturing device.

Alternatively, the standard matching determination value is 0.75 or more and 0.85 or less. Preferably, the standard matching determination value is 0.8. When the standard matching determination value is this preferred value, the accuracy of the matching target pedestrian P in the overhead image achieves a good effect.

In step SB25, when it is determined that the head-circumscribed rectangular frame of the target pedestrian P does not match in the top-view image of the (m + 1) -th frame continuously reaches the preset number of times, it is determined that the target pedestrian P disappears. Optionally, the preset number of times is selected in a range of 36 to 48 times. In this way, if it is determined that the target pedestrian P leaves the shooting area of the image capturing device, the travel trajectory of the target pedestrian P that can be acquired in the overhead view image of the target pedestrian P is matched with the last frame.

In step SB1 of the present invention, the output of the deep neural network model of the centret includes: describing the target pedestrian in the overhead image as a low-resolution heat map of the head center point and describing the target pedestrian as a low-resolution heat map of the foot center point; and 6 regression parameters, wherein the 6 regression parameters are respectively as follows: a lateral offset dx of a head center point in the top view image, a longitudinal offset dy of the head center point, a width w of a head-circumscribed rectangular frame, a height h of the head-circumscribed rectangular frame, a lateral offset fdx of a foot center point in the top view image, and a longitudinal offset fdy of the foot center point; wherein the low resolution heat map is formed by reducing the top view image according to a preset scale.

In the present invention, the deep neural network model Loss function of the centret is defined as a linear combination of the Focal local Loss function and the L1 Loss function, wherein the L1 Loss function includes regressions of the head center point position and the foot center point position of the target pedestrian P. The head center points referred to in the present invention are: a geometric center point of a region of the overhead image occupied by a head of the pedestrian in the overhead image; the foot central point positions are as follows: the middle point of the line connecting the two geometric center points of the area of the overhead image occupied by the feet of the pedestrian.

Specifically, the L1 loss function includes the lateral offset dx of the head center point, the longitudinal offset dy of the head center point, the width w of the head-circumscribing rectangular frame, the height h of the head-circumscribing rectangular frame, the lateral offset fdx of the foot center point, and the longitudinal offset fdy of the foot center point of the target pedestrian P.

It should be noted that the model structure of the deep neural network model of the centret of the present invention is more complex, the expression capability is stronger, and abundant valuable information in the mass data can be mined. The depth neural network model of the CenterNet can recognize the target pedestrian P in a classification mode from a plurality of pedestrians of a plurality of frame overlook images, and therefore more effective characteristics can be extracted.

In the detection result of the target pedestrian P of the present invention, the step of acquiring the head center point position of the pedestrian in the overhead image includes: (1) amplifying and reducing the low-resolution heat map of the head center point into the original image size of the overlook image according to a preset proportion to obtain the initial position of the head center point under the resolution of the original image; (2) and correcting the initial position of the head center point through the transverse offset dx of the head center point and the longitudinal offset dy of the head center point to obtain the position of the head center point of the pedestrian in the overlooking image.

Preferably, the preset scale of the scaling between the low resolution heat map and the top view image is 4, that is, the low resolution heat map is formed by reducing the resolution of the top view image by 4 times, and the low resolution heat map is reduced to the original size of the top view image after being enlarged by 4 times.

In the detection result of the target pedestrian P of the present invention, the step of acquiring the position and size of the circumscribed rectangle frame of the head of the pedestrian in the overhead view image includes: the head center point position of the pedestrian is used as a center origin, the width w of the head external rectangular frame and the height h of the head external rectangular frame are respectively used as the width and the height of the head external rectangular frame, and the position and the size of the head external rectangular frame of the pedestrian in the overlooking image are obtained. In this way, the head of the target pedestrian P can be accurately placed completely within the head-circumscribing rectangular frame.

As shown in fig. 1, for step SA3 in the method for pedestrian re-identification based on top view image of the present invention, the present invention further provides a top view image selecting method for pedestrian re-identification, as shown in fig. 4, the method for selecting top view image for pedestrian re-identification includes:

step SC1, acquiring a video of an overlooking visual angle through an image capturing device, and extracting an overlooking image master set of pedestrians in the video;

step SC2, pedestrian detection is carried out on the overlook image mother set through the CenterNet depth neural network model, the detection result is tracked and matched to obtain the advancing track of the target pedestrian P in the pedestrian, and the overlook image subsets of all overlook image blocks containing the target pedestrian P are extracted according to the advancing track of the target pedestrian P;

step SC3, for each overlook image block, calculating the head center point score and the foot center point score of the target pedestrian P according to the detection result of the target pedestrian P contained in the overlook image block;

and step SC4, selecting one of the plurality of top view image blocks of the top view image subset as the optimal top view pedestrian snapshot image block of the target pedestrian P according to the screening condition and the selection condition by taking the detection result and/or the head center point score and the foot center point score of the target pedestrian P as the influence factors, and taking the selected top view image block as the basis for carrying out feature extraction on the target pedestrian P.

In the invention, the head central point and the foot central point of the pedestrian in different overlook image blocks are assigned to obtain the head central point score and the foot central point score, the two are used for screening redundant top view image blocks in the overlook image subset, and one overlook image block is selected as an influence factor of the optimal overlook pedestrian snapshot image block, so that the overlook image block in the overlook image subset which does not meet the requirement can be efficiently and accurately eliminated, the optimal overlook pedestrian snapshot image block is obtained, the reliability of the corresponding pedestrian in the subsequent feature extraction is ensured, and the accuracy of pedestrian re-identification is greatly improved; therefore, through effective processing of the overlook images, the overlook images can be effectively used as model image sample input in the pedestrian re-identification technology, and the practicability of the pedestrian re-identification technology is improved.

In the method for selecting the overlook image for pedestrian re-identification, the detection result comprises the following steps: in the overhead image, the head center point position and the foot center point position of the target pedestrian P, and the position and the size of the head-circumscribing rectangular frame of the target pedestrian P. In the present invention, the circumscribed rectangular frame for framing the head of the target pedestrian P is not limited to a rectangle, but may be any quadrangle, and is preferably a rectangle in the present application.

In order to effectively screen out overlook image blocks which do not accord with the selection condition of the optimal overlook pedestrian snapshot image block in a plurality of overlook image blocks of the overlook image subset, the screening condition comprises the following steps:

(1) in order to avoid errors in the process of matting the top view image block due to shielding and detector misleading in the later period, when the head central point of a target pedestrian P in the top view image block is close to the central point of the top view image where the head central point is located relative to the foot central point, the top view image block is removed; and/or

(2) In order to prevent incomplete head of the pedestrian after the top view image block is scratched, when the head of a target pedestrian P in the top view image block is externally connected with a rectangular frame and the edge of the top view image where the rectangular frame is located is smaller than a preset edge distance, the top view image block is removed; and/or

(3) In order to prevent the situation that the human body of the pedestrian is not stretched enough in the top view image block, when the head center point of the target pedestrian P in the top view image block and the center point of the top view image where the target pedestrian P is located are smaller than a preset approaching distance, the top view image block is removed; and/or

(4) In order to effectively eliminate the top view image block with the foot of the pedestrian shielded in the top view image block, when the foot center point score of the target pedestrian P in the top view image block is smaller than a first preset score value, the top view image block is eliminated; and/or

(5) In order to effectively eliminate the top view image block with the head of the pedestrian shielded in the top view image block, when the head center point score of the target pedestrian P in the top view image block is smaller than a second preset score value, the top view image block is eliminated;

similarly, in order to further accurately acquire a unique optimal pedestrian overlook snapshot image block in a plurality of overlook image blocks of the overlook image subset, the selection conditions include: and arranging the rest of the top view image blocks excluded by the screening condition in a descending order according to the distance between the head central point position and the foot central point position of the target pedestrian P on the top view image blocks, and preferentially selecting the first top view image block positioned in the sequence as the optimal top view pedestrian snapshot image block. Thus, the selected optimal overlooking pedestrian snapshot image block ensures that the target pedestrian P is the most extended human body image.

Optionally, the preset edge distance is greater than or equal to 20 pixels and less than or equal to 30 pixels; the preset approach distance is more than or equal to 80 pixels and less than or equal to 120 pixels; the first preset score is greater than or equal to 0.45 and less than or equal to 0.55; the second predetermined score is 0.75 or more and 0.85 or less.

Preferably, the preset edge distance is 20 pixels, and the preset approach distance is 100 pixels; the first predetermined score is 0.5 and the second predetermined score is 0.8. The setting of the parameters is beneficial to improving the screening efficiency of the redundant overlook image blocks.

It should be further added that the travel track of the target pedestrian P includes at least one section of preferred track segment, the preferred track segment extends from the outer edge of the top view image where the target pedestrian P is located to the side of the central point of the top view image, and the optimal top view pedestrian snapshot image block is selected from the top view image blocks corresponding to the preferred track segment. That is to say, when the target pedestrian P passes through the shooting area of the image capturing device, the optimal top view pedestrian capturing image block should be selected as much as possible from top view image blocks in which the target pedestrian P approaches toward the central point of the top view image, and is prevented from being selected as much as possible from top view image blocks in which the target pedestrian P is far away from the central point of the top view image; the optimal overlook pedestrian snapshot image block selected according to the condition can acquire the front information of the target pedestrian P, and the front information is superior to the back information.

As shown in fig. 6, the selected portion of the black frame in fig. 6 is the target pedestrian P, and the top view image block in fig. 7 containing the target pedestrian P is obtained by rotation, in the top view image block, the head center point position of the target pedestrian P is shown as O1, the foot center point position is shown as O2, and the head-circumscribing rectangular frame is shown as K, where the width of the head-circumscribing rectangular frame is w and the height thereof is h.

Of course, in this embodiment, the top view image block including the target pedestrian P in fig. 7 may also be used as an optimal top view pedestrian snapshot image block in the top view image subset.

As shown in fig. 4, specifically, step SC3 includes: acquiring a low-resolution heat map describing the target pedestrian P as a head center point, and taking the value of a peak point in the low-resolution heat map of the head center point as a head center point score of the target pedestrian P; and acquiring a low-resolution heat map of the foot center point, wherein the low-resolution heat map describes the foot center point of the target pedestrian P, and the value of the corresponding point of the foot center point position in the overhead view image where the target pedestrian P is located in the low-resolution heat map of the foot center point is used as the foot center point score of the target pedestrian P. Therefore, the basis for specifically acquiring the scores of the head center point and the foot center point is provided, and the method is efficient, accurate and quick.

As in the pedestrian trajectory generation method based on the overhead view image, the low-resolution heat map is formed by reducing the overhead view image in a preset scale; in the detection result of the target pedestrian P according to the present invention, the step of acquiring the head center point position of the target pedestrian P in the overhead image includes: (1) amplifying and reducing the low-resolution heat map of the head center point into the original image size of the overlook image according to a preset proportion to obtain the initial position of the head center point under the resolution of the original image; (2) and correcting the initial position of the head center point through the transverse offset dx of the head center point and the longitudinal offset dy of the head center point to obtain the position of the head center point of the target pedestrian P in the overlooking image.

Similarly, in the detection result of the target pedestrian P of the present invention, the step of acquiring the position and size of the circumscribed rectangle frame of the head of the target pedestrian P in the overhead view image includes: the head center point position of the target pedestrian P is used as a center origin, the width w of the head-external rectangular frame and the height h of the head-external rectangular frame are respectively used as the width and the height of the head-external rectangular frame, and the position and the size of the head-external rectangular frame of the target pedestrian P in the overlooking image are obtained.

In the detection result of the target pedestrian P according to the present invention, the step of acquiring the position of the foot center point of the target pedestrian P in the overhead view image includes: the initial head center point position is corrected by the lateral foot center point offset fdx and the longitudinal foot center point offset fdy, and the foot center point position of the target pedestrian P in the overhead view image is obtained. Thus, when the initial position of the head center point of the target pedestrian P is known, the foot center point position of the target pedestrian P can be accurately obtained by combining the lateral offset fdx of the foot center point and the longitudinal offset fdy of the foot center point, and then the foot center point position of the target pedestrian P is inversely calculated into the low-resolution heat map of the foot center point to obtain the value of the corresponding point, which becomes the foot center point score of the target pedestrian P.

In order to ensure the score validity of the head center point of the target pedestrian P, that is, to ensure the presence of the target pedestrian in the overhead image, the value of the peak point in the low-resolution heatmap of the head center point is larger than the preset pedestrian determination value.

Alternatively, the preset pedestrian determination value is 0.75 or more and 0.85 or less. The pedestrian determination value is preferably 0.8.

As shown in fig. 1, for step SA3 in the method for pedestrian re-identification based on top view image of the present invention, the present invention further provides a method for correcting top view image for pedestrian re-identification, which can correct top view image blocks into front view image blocks, and of course, if after the above method for selecting top view image for pedestrian re-identification, optimal top view pedestrian snapshot image blocks have been selected from the top view image subset, then with this method for correcting top view image, the optimal top view pedestrian snapshot image blocks can be further corrected to form basic pedestrian front view image blocks. As shown in fig. 8 and 9, the image blocks are the front-view image blocks or the basic pedestrian front-view image blocks formed in fig. 7 after being corrected by the top-view image correction method according to the present invention.

Specifically, the overhead view image correction method for pedestrian re-recognition includes:

step SD1, acquiring a video of an overhead view angle by an image capturing device, and constructing an overhead view image block subset including a plurality of overhead view image blocks of the target pedestrian P, where each overhead view image block is a screenshot on one frame of overhead view image at different time points of the video including the target pedestrian P (for example, as shown in fig. 7);

at step SD2, at least one of the top view image blocks in the subset of top view image blocks is selectively transformed into a front view image block (such as shown in fig. 8 and 9) by a homographic matrix-based projective transformation, so as to correct the target pedestrian P in the top view image block from the top view perspective to the side view perspective.

Selecting a plurality of top view image blocks in the top view image block subset, and then performing projection transformation based on a homography matrix to obtain a front view image block; the target pedestrian P in the front view image block is effectively converted into a side view angle from a top view angle in the top view image block, so that more effective information of the target pedestrian P can be acquired through the top view image block, the subsequent feature extraction and feature comparison on the target pedestrian P are facilitated, and the accuracy of pedestrian re-identification is greatly improved; therefore, through effective processing of the overlook images, the overlook images can be effectively used as model image sample input in the pedestrian re-identification technology, and the practicability of the pedestrian re-identification technology is improved.

It should be noted that, the effect of correcting the top-view image block into the front-view image block by using the top-view image correction method of the present invention depends on the image quality of the top-view image block and the pose and blocked condition of the pedestrian in the top-view image block, and the most ideal effect is to correct the top-view image block into the front-view image block similar to the standard position of the target pedestrian P shown in fig. 9, in order to achieve such correction effect, the shooting angle of view of the target pedestrian P in the top-view image block of fig. 7 is rotated and stretched in the horizontal direction, the rotation angle in the rotation stretching is preferably between 0 ° and 45 °, and the rotation in this range can ignore the missing or excessive false image feature of the target pedestrian P in the top-view image block caused by the rotation; that is, the image feature error of the corrected front view image block compared with the original top view image block due to the rotation stretching process is as small as possible, without affecting the accuracy of the subsequent feature extraction result for the target pedestrian P. It should be further added that the false image features appearing in the above-mentioned rotational stretching can be selectively added or removed based on the image features of the target pedestrian P in the original overhead image.

In the embodiment of the present invention, as shown in fig. 9, when the shooting angle of the original top-view image block exceeds the limit range of 45 °, the effect that can be achieved after correcting the original top-view image block does not reach the standard posture front-view image block of the target pedestrian P shown in fig. 9, but the shooting direction shown in fig. 8 still has a certain included angle with the horizontal direction, at this time, the corrected image block is also referred to as a front-view image block in the present invention, and at this time, the posture of the target pedestrian P is not a posture capable of exhibiting the largest area of the human body.

In an alternative embodiment of the present invention, which is not shown in the drawings, the top-view image correction method of the present invention may also be used to rotationally stretch the shooting angle of the target pedestrian P in the top-view image block of fig. 7 around the vertical axis, that is, the rotation angle around the vertical axis is preferably 0 ° to 15 °, which is also to ignore the missing or excessive false image features of the target pedestrian P in the top-view image block caused by the rotation; that is, the image feature error of the corrected front view image block compared with the original top view image block due to the rotation stretching process is as small as possible, without affecting the accuracy of the subsequent feature extraction result for the target pedestrian P.

In the present invention, the projective transformation formula for forming the front-view image block from the top-view image block includes:

formula (1):

formula (2):

formula (3):

wherein, in the formula (1),

is denoted as homography matrix T', (x)_a,y_a) As the origin coordinates in the top view image block, (x)_b,y_b) As the origin coordinate (x)_a,y_a) The projection transforms to end point coordinates in the blocks of the orthographic image. Thus, the origin coordinates (x) of all the pixels in the overlooking image block are traversed by utilizing the projection transformation formula_a,y_a) Obtaining coordinates (x) of origin for forming the front image block_a,y_a) One-to-one correspondence of end point coordinates (x)_b,y_b) Multiple end point coordinates (x)_b,y_b) And forming an orthophoto image block.

In the invention, the projection area of the overlook image is divided, namely the overlook image comprises a plurality of projection subareas which are circumferentially distributed by taking the midpoint of the overlook image as the center, and the projection subareas correspond to a plurality of projection transformation formulas with different homography matrixes T' one by one; the plurality of projection subareas correspond to the plurality of front-view virtual cameras one by one, and the plurality of front-view virtual cameras correspond to the plurality of projection transformation formulas one by one; that is, each homography matrix T' represents a projection matrix of the image capturing device transformed to its corresponding virtual front view virtual camera within the angular range of a certain triangle.

In step SD2, the origin coordinates (x) in the top view image blocks are matched_a,y_a) After the positions in the plurality of projection regions are selected, the projection transformation formula corresponding to the positions is selected for projection transformation, and the end point coordinate (x) in the front-view image block can be obtained_b,y_b)。

It should be noted that the homography matrix T 'is obtained by performing a calibration operation on a zhang's checkerboard of the image capturing device. The homography matrix T 'is composed of internal parameters and external parameters of the image capturing equipment, and under the application environment, the internal parameters of the image capturing equipment are not changed, but the external parameters of the image capturing equipment (namely, the rotational displacement between the image capturing equipment and the front visual angle virtual camera) are not changed, so that in the actual application, the image is projected to the corresponding front visual angle virtual camera according to the foot center point of the target pedestrian P, and the corresponding homography matrix T' is used for calculation.

That is, each overhead view image block is matched with only one projection subarea, and the position of the foot center point of the target pedestrian P in the overhead view image block is used as a matching base point.

Optionally, the circumferential angles of the projection subareas are equal, and the number of the overhead view images divided into the projection subareas is greater than or equal to 3 and less than or equal to 9. Preferably 6, this is done to account for cost and computational effort and to ensure that the segmented projection sub-regions are not too sparse to affect the correction. Of course, the larger the number of the projection subareas into which the overhead image is divided is, the better, regardless of the cost and the amount of calculation.

In order to make the steps of the pedestrian re-identification method coherent and smoothly correct the previously selected optimal overlooking pedestrian snapshot image blocks, in step SD2, the optimal overlooking pedestrian snapshot image blocks are screened out from a plurality of top view image blocks containing the target pedestrian P, and the front-view image blocks formed by performing projection transformation on the optimal overlooking pedestrian snapshot image blocks are used as basic pedestrian front-view image blocks.

In the inventionIn step SA4 of the pedestrian re-identification method based on top-view images, the predicting key points of the basic pedestrian front-view image blocks by the human body posture estimation neural network model includes: step SA41, after the basic pedestrian front-view image blocks are input into a human posture estimation neural network model, a plurality of human key point thermodynamic diagrams are calculated; step SA42, coordinates of the maximum value position of each human body key point thermodynamic diagram are obtained and mapped to the original image resolution of the basic pedestrian front view image block, and then the coordinates (X) of a plurality of human body key points A of the target pedestrian P corresponding to the human body key point thermodynamic diagrams one by one on the basic pedestrian front view image block are obtained_a，Y_a)。

Further, in step SA4, the image alignment includes: step SA43, given a pedestrian-matched image, a plurality of human-body-matched key points B, which correspond one-to-one to the plurality of human-body key points a, are calibrated thereon, and respective coordinates (X) thereof are obtained_b，Y_b) (ii) a Step SA44, obtaining 3 first effective key points A1 according to a first preset selection principle for a plurality of human body key points A, and obtaining respective coordinates (X) of the first effective key points A1_a1，Y_a1) (ii) a For a plurality of human body matching key points B, according to a second preset selection principle, obtaining 3 second effective key points B1 and obtaining respective coordinates (X) thereof_b1，Y_b1) (ii) a Step SA45, an affine transformation expression from the basic pedestrian front-view image block to the standard pedestrian front-view image block is constructed:

is an affine transformation matrix T; step SA46, based on the coordinates (X) of 3 first significant keypoints A1_a1，Y_a1) And the coordinates (X) of the 3 second significant keypoints B1_b1，Y_b1) Solving an affine transformation matrix T; and step SA47, performing affine transformation on the basic pedestrian front view image block by using an affine transformation public, and traversing all pixel points of the basic pedestrian front view image block to obtain a standard pedestrian front view image block. Because the human body is a solid body and has a plurality of postures at the same time, the key point positions of the human body are required to be used for further carrying out image alignment operation, and the image alignment mode of the invention is utilized to align the human faceThe influence of the geometric information is small, and the accuracy of face recognition is guaranteed.

It should be noted that the maximum value of the human body key point thermodynamic diagram is the confidence score of the corresponding human body key point a; and a plurality of human key points A are 17, including: 1 nose point location, 2 eye point locations, 2 ear point locations, 2 shoulder point locations, 2 elbow point locations, 2 hand point locations, 2 hip point locations, 2 knee point locations, and 2 foot point locations of the target pedestrian P;

the first preset selection principle comprises the following steps:

(1) acquiring a geometric midpoint point position as a first effective key point A1 No. 1 in a mode of averaging coordinates of 1 nose point position, 2 eye point positions and 2 ear point positions;

(2) arranging the 2 shoulder point positions, the 2 hip point positions, the 2 knee point positions and the 2 foot point positions in a descending order according to the confidence score, and taking the first ordered position as a No. 2 first effective key point A1;

(3) and randomly selecting 1 human body key point A which is not collinear with the connecting line of the No. 1 first effective key point A1 and the No. 2 first effective key point A1 from the remaining 11 human body key points A except the 1 nose point, the 2 eye point, the 2 ear point and the No. 2 first effective key point A1 as the No. 3 first effective key point.

In order to improve the obtaining convenience of the second effective key point B1, optionally, a second preset selecting rule for obtaining 3 second effective key points B1 from the plurality of human body matching key points B corresponds to the first preset selecting rule.

In step SA5 of the pedestrian re-identification method based on the top view image, the standard pedestrian front view image block is adjusted to the input image block with the preset resolution and then input into the pedestrian re-identification neural network model.

The preset resolution is preferably 128 × 384.

Further, in step SA6, the feature similarity determination is performed as: respectively acquiring two output characteristic vectors of any two input image blocks of an input pedestrian re-recognition neural network model; calculating cosine distances of the two output characteristic vectors, and when the cosine distances are larger than a standard distance judgment value, judging that the pedestrians in the two input image blocks are both target pedestrians P; and when the cosine distance is smaller than the standard distance judgment value, judging that the pedestrians in the two input image blocks are different human bodies.

Alternatively, the standard distance determination value is 0.78 or more and 0.82 or less. Preferably, the standard distance determination value is 0.78.

The low resolution heat map referred to in the present invention is a heat map that is commonly used in the art and that reduces the resolution of the original image.

The technical scheme of the invention has the following beneficial effects:

1. the edge image in the visual angle range (shooting area) of the downward-looking camera (which is preferred by the image capturing device in the invention) and the projection matrix of the downward-looking camera are utilized to convert the downward-looking image blocks in all directions which are distributed in a radial shape into the forward-looking image blocks with similar side-looking effect for extracting the ReID information, which is equivalent to adding a function of outputting the ReID information to the existing downward-looking camera.

2. And tracking each pedestrian in the visual field of the overlook camera, selecting the optimal overlook pedestrian snapshot image block, and extracting the ReiD information. Each pedestrian who passes through the field of vision under the overlooking camera can be ensured to output an optimal snapshot.

The invention can utilize the existing vertical overlook cameras which are densely arranged in public places such as shopping malls and the like to detect and track the pedestrians in the visual field range, select the best snapshot image of the pedestrians in the edge visual angle image, restore the image of the best snapshot image into the side view direction through distortion correction and projection calculation, and extract the ReID information of the pedestrians, thereby realizing the function of tracking the pedestrians across mirrors among the overlook cameras. In addition, because the ReID information extracted by the scheme is restored to the side view direction, the ReID information can be matched with the ReID information acquired by orientation equipment such as a camera used in a classical monitoring scene, and the conventional ReID matching method can be directly used. The scheme expands the functions of the existing vertical overlook camera to provide the ReID information of pedestrians and utilizes the vertical overlook camera which is densely installed. In addition, by analyzing and counting the tracks of a plurality of pedestrians at a plurality of points, more useful information such as strolling tracks, regional stay time, shop association and the like can be provided for the user.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of correcting a top view image for re-recognition of a pedestrian, comprising:

step SD1, acquiring a video of an overhead view angle by an image capturing device, and constructing an overhead view image block subset including a plurality of overhead view image blocks of a target pedestrian P, wherein each of the overhead view image blocks is a screenshot on one frame of overhead view image at different time points including the target pedestrian P in the video;

step SD2, selectively forming at least one of the top view image blocks in the subset of top view image blocks into a front view image block by a projection transform based on a homography matrix to correct the target pedestrian P in the top view image block from a top view perspective to a side view perspective.

2. The method of correcting a top view image according to claim 1, wherein the projective transformation formula for forming the top view image block from the top view image block comprises:

formula (1):

formula (2):

formula (3):

wherein, in the formula (1),

is denoted as homography matrix T', (x)_a,y_a) (x) as origin coordinates in said top view image block_b,y_b) Is the origin coordinate (x)_a,y_a) And performing projection transformation to the end point coordinates in the front-view image block.

3. The method for correcting a top view image according to claim 2,

the overhead view image comprises a plurality of projection subareas which are circumferentially distributed by taking the midpoint of the overhead view image as the center, and the projection subareas correspond to a plurality of projection transformation formulas with different homography matrixes T' one by one;

in step SD2, the origin coordinates (x) in the top view block are matched_a,y_a) After the positions in the projection subareas are selected, the projection transformation formula corresponding to the projection subareas is selected for projection transformation, so that the end point coordinate (x) in the front-view image block is obtained_b,y_b)。

4. A top view image correction method according to claim 3, wherein each of said top view image blocks is locally matched to only one of said projected sub-areas, and the position of the foot center point of said target pedestrian P in said top view image block is used as a matching base point.

5. A top-view image correction method according to claim 3, wherein the projection subareas have equal circumferential angles, and the number of the top-view images divided into the projection subareas is 3 or more and 9 or less.

6. The method according to claim 3, wherein the plurality of projection subareas correspond to a plurality of front-view virtual cameras one-to-one, and the plurality of front-view virtual cameras correspond to a plurality of projection transformation formulas one-to-one.

7. The method of any of claims 2 to 6, wherein the origin coordinates (x) of all pixel points in the top view image block are traversed_a,y_a) Obtaining the coordinates (x) of the origin point for forming the front image block_a,y_a) One-to-one correspondence of the end point coordinates (x)_b,y_b)。

8. The method according to claim 1, wherein in step SD2, an optimal top view pedestrian snapshot image block is selected from a plurality of top view image blocks containing the target pedestrian P, and the front view image block formed by performing projection transformation on the optimal top view pedestrian snapshot image block is used as a standard pedestrian front view image block.

9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which computer program instructions are stored, wherein the program instructions, when executed by a processor, are adapted to implement the steps of the top view image correction method according to any one of claims 1-8.

10. An electronic device, comprising: the system comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element are communicated with each other through the communication bus; the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the top view image correction method of any of claims 1-8.