CN112258571B - Indoor pedestrian positioning method based on monocular vision - Google Patents

Indoor pedestrian positioning method based on monocular vision Download PDF

Info

Publication number
CN112258571B
CN112258571B CN202011023002.4A CN202011023002A CN112258571B CN 112258571 B CN112258571 B CN 112258571B CN 202011023002 A CN202011023002 A CN 202011023002A CN 112258571 B CN112258571 B CN 112258571B
Authority
CN
China
Prior art keywords
human
frame
camera
pedestrian
humanoid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011023002.4A
Other languages
Chinese (zh)
Other versions
CN112258571A (en
Inventor
林宇
赵宇迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuchuan Data Technology Co ltd
Original Assignee
Shanghai Shuchuan Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuchuan Data Technology Co ltd filed Critical Shanghai Shuchuan Data Technology Co ltd
Priority to CN202011023002.4A priority Critical patent/CN112258571B/en
Publication of CN112258571A publication Critical patent/CN112258571A/en
Application granted granted Critical
Publication of CN112258571B publication Critical patent/CN112258571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Abstract

The invention discloses an indoor pedestrian positioning method based on monocular vision, which comprises a pedestrian positioning structure, wherein the pedestrian positioning structure consists of a high-definition monitoring camera, a humanoid detector and a coordinate calculator. The calibration process of the camera posture is not needed to be finished automatically on line, the posture of the camera can still change gradually along with time after being installed due to the action of gravity, and the method can automatically update the measurement of the camera posture without manual intervention, thereby saving a great deal of labor and time for on-site calibration; the invention does not require the person to be positioned to carry a positioning tag or other electronic equipment, can complete positioning under the condition that the person does not feel, is only applicable to the coordinates of the humanoid detection frame during positioning, and does not relate to any privacy data; the invention can achieve the indoor positioning accuracy of 50cm based on the common monocular monitoring camera, and has obvious advantages in implementation cost and positioning accuracy.

Description

Indoor pedestrian positioning method based on monocular vision
Technical Field
The invention relates to the technical field of indoor positioning, in particular to an indoor pedestrian positioning method based on monocular vision.
Background
Along with the continuous promotion of the digitalized and intelligent marketing trend of off-line commercial stores, how to effectively locate the instant position of a customer (pedestrian) in an indoor commercial scene becomes a key problem for providing personalized and intelligent service or interaction, and the existing indoor locating method mainly comprises the following steps:
based on the WIFI positioning of the mobile device (mainly a mobile phone), the distance between the pedestrian holding the mobile device and each WIFI access device is estimated through the signal intensity between the mobile device and a plurality of WIFI access devices, and the position coordinates of the pedestrian can be determined through a triangle, a fingerprint and other methods because the position of each WIFI access device is accurately measured in advance.
Ultra-wideband positioning, WIFI positioning accuracy is greatly affected by indoor environment display, UWB positioning technology reduces the influence of environment display on ranging accuracy by sending extremely narrow pulses, but a person to be positioned is required to hold UWB tag equipment.
The binocular vision positioning method is characterized in that the conventional monocular camera cannot obtain depth information of pedestrians from the camera, the depth information is directly used for indoor positioning, difficulty is high, binocular vision performs vision feature matching by using images of the camera with known optical center distances, and position information of a target distance camera system is calculated according to parallax information among the images and camera parameters calibrated in advance.
In the prior art, the WIFI positioning technology has wide application, the positioning accuracy is generally 5-10 meters, but the positioning accuracy is sensitive to environmental display, the fluctuation of the positioning accuracy is large, and the positioning accuracy range is large and even exceeds the height of a common floor, so that the positioning relationship and interaction behavior between indoor pedestrians (customers) and commercial facilities are difficult to determine by the WIFI positioning.
Although the UWB positioning technology can achieve positioning accuracy within 1 meter (positioning accuracy is in the order of decimeters under ideal non-shielding conditions), the UWB positioning technology is mainly used for personnel management in commercial and industrial scenes because the positioned personnel are required to hold UWB tags, and is difficult to be widely applied to pedestrian (mainly customers) positioning in off-line store scenes.
Binocular vision positioning is less affected by environment display, positioning accuracy is stable, but parallax between the binocular vision positioning and a camera is dependent, parallax can be reduced along with the distance between a target object and the camera system, if a person to be positioned and the camera system reach more than 5 meters, in order to ensure effective parallax, the optical center distance of the binocular camera needs to be correspondingly increased, so that the appearance of the camera is huge or difficult to calibrate, and the binocular vision positioning device cannot be suitable for installation in a commercial scene.
Based on the problems in the prior art, the invention mainly provides a monocular vision method capable of positioning pedestrians within 10 meters in real time in an indoor commercial store scene, and positioning accuracy is up to within 1 meter, so that the intelligent marketing/interaction requirement of off-line stores is met, and the defects in the prior art are overcome.
Disclosure of Invention
The invention aims to provide an indoor pedestrian positioning method based on monocular vision, which has the advantage of high positioning accuracy and solves the problems in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: an indoor pedestrian positioning method based on monocular vision comprises the following steps:
s101: the method comprises the steps that a high-definition monitoring camera collects videos in an off-line store scene, image frames are sent to a human-shaped detector after being decoded, the image frame interval is 40ms, namely 25 frames of images are sent to the human-shaped detector every second, without losing generality, if the original frame rate is 30fps, 30 frames of images can be sent to the human-shaped detector every second, and when the computing resources are insufficient, frame skipping processing is carried out on the image frames, wherein the frame skipping processing is not lower than 1 frame every second;
s102: after receiving the real-time image frames, the humanoid detection module detects humanoid frame coordinates in the image frames, and after collecting a humanoid frame coordinate set in a period of history time, the humanoid frame coordinate set is sent to the coordinate calculation module for estimating the gesture of the camera;
s103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i = (ti, pi), wherein ti is a timestamp corresponding to the human frame, pi is an image coordinate of the human frame, and the camera pose estimation module automatically estimates a downward inclination angle theta of the vertical direction of the camera according to human frame coordinate data of the set O without any manual calibration assistance; the optical imaging formulas of the head vertex P1 and the sole point P2 are combined as follows,
Figure GDA0004183047600000031
where f is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) Human-shaped frame O i Is y on the upper longitudinal coordinate of (2) 1 The lower ordinate is y 2 Subtracting to obtain
Figure GDA0004183047600000032
Wherein Y is 2 -Y 1 I.e. the height of the pedestrian;
s201: selecting a human-shaped frame with visible heads and feet from a historical human-shaped frame set, wherein the heads or the feet are shielded or invisible to be deleted, and the step is finished by utilizing a human skeleton point detection algorithm OpenPose;
s202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) Z in the formula (3) is omitted 1 The second term of (1) is
Figure GDA0004183047600000041
Due to taking theta (0) =10 degrees, Z can be calculated from equation (4) 1 Is the initial estimate Z1 of (1) (0) F is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 );
Will Z1 (0) Substituted into the followingCalculating a first iteration value Z1 according to the formula (5) (1) Wherein the hyper-parameter alpha takes the decimal fraction between 0 and 1 and takes 0.5;
Figure GDA0004183047600000042
/>
at a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Figure GDA0004183047600000043
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) To
Figure GDA0004183047600000044
Substitution of +.>
Figure GDA0004183047600000045
Obtaining Z 1 Second estimate of +.>
Figure GDA0004183047600000046
Replace it by +.>
Figure GDA0004183047600000047
The first iteration value theta of the downward inclination angle theta can be obtained (2)
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternating with theta, when the iteration value approaches to a true value, the iteration values of the front and rear wheels tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the convergence iteration number is about 3 to 6, and the iteration number is 5, namely theta (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
Figure GDA0004183047600000051
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle theta
Figure GDA0004183047600000052
Establishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>
Figure GDA0004183047600000053
Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
Figure GDA0004183047600000054
S104: the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to a coordinate calculator, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 );
S105: final estimation using downtilt angle θ
Figure GDA0004183047600000055
Due to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, and then combining the pedestrian overhead coordinates (x 1 ,y 1 ) And->
Figure GDA0004183047600000056
The following formula is substituted in to be substituted,
Figure GDA0004183047600000057
calculating physical world coordinates (X) of the pedestrian standing position relative to the camera 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera.
The indoor pedestrian positioning method based on monocular vision further comprises a pedestrian positioning structure, wherein the pedestrian positioning structure consists of a high-definition monitoring camera, a humanoid detector and a coordinate calculator.
Preferably, the high-definition monitoring camera is responsible for collecting real-time video in an off-line store scene, and the real-time video is decoded into a real-time image frame sequence and then transmitted to the human-shaped detector.
Preferably, the human-shaped detector comprises a human-shaped detection module which is responsible for extracting human-shaped frames in the image frames and maintaining a historical human-shaped frame set which comprises coordinate information of human-shaped frames which appear in a past period of time.
Preferably, the coordinate calculator comprises a camera gesture estimation module and a coordinate positioning calculation module, wherein the former is responsible for calculating the camera gesture by using the historical humanoid frame, and the latter is responsible for converting the camera image coordinates of the real-time humanoid frame into physical world coordinates, namely, the positioning of the indoor humanoid is completed.
Compared with the prior art, the invention has the following beneficial effects:
1. the calibration process of the camera posture is not needed to be finished automatically on line, the posture of the camera can still change gradually along with time after being installed due to the action of gravity, the method of the invention can automatically update the measurement of the camera posture without manual intervention, and a great deal of labor and time for on-site calibration are saved.
2. The invention does not require the person to be positioned to carry a positioning tag or other electronic equipment, can complete positioning under the condition that the person does not feel, is only applicable to the coordinates of the humanoid detection frame during positioning, and does not relate to any privacy data.
3. The invention can achieve the indoor positioning accuracy of 50cm based on the common monocular monitoring camera, and has obvious advantages in implementation cost and positioning accuracy.
Drawings
FIG. 1 is a schematic diagram of a system architecture of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a camera pose estimation principle according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the implementation of the automatic pose estimation algorithm according to the present invention.
In the figure: 1. a pedestrian positioning structure; 2. a high definition monitoring camera; 3. a humanoid detector; 4. and a coordinate calculator.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, the present invention provides a technical solution: as shown in fig. 1, the implementation method involves a high definition monitoring camera 2, a human shape detector 3, and a coordinate calculator 4.
The high-definition monitoring camera 2 is responsible for collecting real-time video in an off-line store scene, and the real-time video is decoded into a real-time image frame sequence and then transmitted to the human-shaped detector 3.
The human detector 3 comprises a human detection module responsible for extracting human frames in the image frames and maintaining a set of historical human frames containing coordinate information of human frames occurring over a period of time.
The coordinate calculator 4 comprises a camera pose estimation module and a coordinate positioning calculation module, wherein the former is responsible for calculating the camera pose by using the historical humanoid frame, and the latter is responsible for converting the camera image coordinates of the real-time humanoid frame into physical world coordinates, namely, the positioning of the indoor humanoid frame is completed.
As shown in fig. 2, the indoor pedestrian positioning method according to the embodiment of the invention includes the following steps:
s101: the high-definition monitoring camera 2 collects the video in the off-line store scene, the image frames are sent to the human-shaped detector 3 after being decoded, in the embodiment, the image frame interval is 40ms, that is, 25 frames of images are sent to the human-shaped detector 3 every second, and when the computing resource is insufficient, the image frames can be subjected to frame skipping processing, and generally not lower than 1 frame per second.
S102: after receiving the real-time image frame, the humanoid detection module detects humanoid frame coordinates in the image frame, gathers humanoid frame coordinates in a period of history time (humanoid frame coordinates acquired at 10-12 am a day before use in the embodiment), and sends the humanoid frame coordinates to the coordinate calculation module for estimating the pose of the camera.
S103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i =(t i ,p i ) Wherein t is i Is a timestamp corresponding to the human-shaped frame, p i The camera attitude estimation module automatically estimates the downward inclination angle theta of the vertical direction of the camera according to the coordinate data of the human-shaped frame of the set O under the condition that no manual calibration assistance is needed.
As a key part of the present invention, the following describes the algorithm process of the attitude estimation module in detail with reference to fig. 3:
as shown in fig. 3, when a pedestrian (customer) in an off-line store appears in the field of view of the high-definition monitoring camera 2 with the optical center of the camera as the common origin of the camera image coordinate system and the physical world coordinate system, the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) According to the basic optical imaging law, the following formula can be given:
Figure GDA0004183047600000081
wherein (x) 1 ,y 2 ) Is the coordinates on the image of the contact point P2 between the foot of the pedestrian and the floor, f is the focal length of the camera which has been measured in advance, Y 2 I.e. the camera mounting height, has also been measured in advance, the unknowns in the above formula include Z 1 ,X 1 And a downward inclination angle theta due to two squaresThere are three unknowns, so conventionally vision-based positioning methods require one unknown to be reduced by other measurement approaches, e.g. binocular vision uses a method of calculating Z by binocular parallax 1
The method adopted in the embodiment is to automatically estimate the downward inclination angle theta of the camera through the data of the historical human-shaped frame set.
As a key part of the present invention, the following describes the flow of automatic estimation in detail:
first, the optical imaging formulas of the head apex P1 and the sole point P2 are combined as follows,
Figure GDA0004183047600000091
subtracting to obtain
Figure GDA0004183047600000092
Wherein Y2-Y1 is the height of the pedestrian, and in this embodiment, assuming that the statistical average of the height of the pedestrian is 165cm, that is y1=y2-165, the unknown quantity in the above formula only includes Z 1 And θ.
As shown in fig. 4, the automatic estimation flow of the camera downtilt angle θ is as follows:
s201: the step is completed by utilizing a human skeleton point detection algorithm OpenPose in the embodiment, which is not an innovation point of the invention and is not expanded and discussed.
S202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) The quadratic term of Z1 in the formula (3) is omitted, and there is
Figure GDA0004183047600000093
Due to taking theta (0) =10 degrees, the initial estimate Z1 of Z1 can be calculated from equation (4) (0)
Will Z1 (0) Substituting the following formula (5) to calculate a first iteration value Z1 (1) Wherein the hyper-parameter alpha takes a fraction between 0 and 1, in this example 0.5.
Figure GDA0004183047600000101
At a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Figure GDA0004183047600000102
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) To
Figure GDA0004183047600000103
Substitution of +.>
Figure GDA0004183047600000104
Obtaining Z 1 Second estimate of +.>
Figure GDA0004183047600000105
Replace it by +.>
Figure GDA0004183047600000106
The first iteration value theta of the downward inclination angle theta can be obtained (2)
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternate iteration with theta, when the true value is close, the iteration is performed before and afterThe iteration values of the two rounds tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the number of converged iterations is about 3 to 6, and in the embodiment, the number of iterations is 5, namely θ (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
Figure GDA0004183047600000107
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle theta
Figure GDA0004183047600000108
Establishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>
Figure GDA0004183047600000109
Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
Figure GDA00041830476000001010
After the estimation of the downward inclination angle theta is completed, the image coordinates of the real-time human-shaped frame can be converted into the coordinates of a physical world coordinate system by utilizing an optical imaging rule, and the following steps are as follows:
s104, the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to the coordinate calculator 4, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 )。
S105, final estimated value of the downtilt angle theta is used
Figure GDA0004183047600000111
Due to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, re-vertex the pedestrian headCoordinates (x) 1 ,y 1 ) And->
Figure GDA0004183047600000112
Substituting the following formula, calculating physical world coordinates (X 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera. />
Figure GDA0004183047600000113
The key point in the above step S105 is that, after the camera downtilt angle is determined, the present invention calculates the physical world coordinates by the overhead point coordinates y1 of the pedestrians, whereas the conventional method generally calculates the physical world coordinates by using the sole points y2, the reason for this is mainly because the conventional method is to consider that the height of each specific pedestrian is unknown, but in practice the probability that the feet of the pedestrians are blocked is large, the conventional method uses y2 of the lower edge of the humanoid frame as the sole point to calculate the physical world coordinates with a large fluctuation, resulting in poor accuracy of the positioning estimation result.
In the embodiment, the problem of unknown height is avoided by assuming the height of the human body to be 165cm in statistical mean value, so that physical coordinate positioning is possible by utilizing the top point of the pedestrian head, the probability of shielding the head of the pedestrian is far smaller than that of feet, the stability of positioning precision is greatly improved, and practical measurement shows that the horizontal distance between a camera and the pedestrian in an off-line store is mostly within 10m, and the height of the specific pedestrian is approximately 165cm at the distance, and the generated positioning error only exceeds 50cm when the height of the pedestrian is lower than 140cm or higher than 190cm, so that the positioning precision requirement within 1m designed by the invention is still met.
Further, by measuring in advance coordinates (Cx, cy) of the camera on the top plan view of the down-line store and projected unit vectors of the camera Z-axis on the top plan view
Figure GDA0004183047600000121
Projection of camera X-axis on top plan viewThe unit vector of the shadow is
Figure GDA0004183047600000122
Real-time coordinates of travelers on-line store top plane view can be calculated as
Figure GDA0004183047600000123
Thereby realizing the real-time positioning of indoor pedestrians under monocular vision, when the distance between the pedestrians and the camera is within 10 meters, the downward inclination angle of the camera is within 15-45 degrees, and the positioning error is caused<50cm。
To sum up: according to the indoor pedestrian positioning method based on monocular vision, by designing a special cross iterative algorithm, under the condition that depth information is not assisted, automatic estimation of the downward inclination angle of a camera is achieved by utilizing automatically collected historical human frame data, so that indoor pedestrian positioning based on monocular vision is achieved, and the accuracy and stability of positioning are further improved by adopting a mode of calculating the head top point of a pedestrian by assuming the height as a statistical mean value.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. An indoor pedestrian positioning method based on monocular vision is characterized by comprising the following steps of: the indoor pedestrian positioning method comprises the following steps:
s101: the method comprises the steps that a high-definition monitoring camera collects videos in an off-line store scene, image frames are sent to a human-shaped detector after being decoded, the image frame interval is 40ms, namely 25 frames of images are sent to the human-shaped detector every second, and when calculation resources are insufficient, frame skipping processing is carried out on the image frames, wherein the frame skipping processing is not lower than 1 frame every second;
s102: after receiving the real-time image frames, the humanoid detection module detects humanoid frame coordinates in the image frames, and after collecting a humanoid frame coordinate set in a period of history time, the humanoid frame coordinate set is sent to the coordinate calculation module for estimating the gesture of the camera;
s103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i = (ti, pi), wherein ti is a timestamp corresponding to the human frame, pi is an image coordinate of the human frame, and the camera pose estimation module automatically estimates a downward inclination angle theta of the vertical direction of the camera according to human frame coordinate data of the set O without any manual calibration assistance;
the optical imaging formulas of the head vertex P1 and the sole point P2 are combined as follows,
Figure FDA0004183047590000011
where f is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) Human-shaped frame O i Is y on the upper longitudinal coordinate of (2) 1 The lower ordinate is y 2 Subtracting to obtain
Figure FDA0004183047590000012
Wherein Y is 2 -Y 1 I.e. the height of the pedestrian;
s201: selecting a human-shaped frame with visible heads and feet from a historical human-shaped frame set, wherein the heads or the feet are shielded or invisible to be deleted, and the step is finished by utilizing a human skeleton point detection algorithm OpenPose;
s202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) Z in the formula (3) is omitted 1 The second term of (1) is
Figure FDA0004183047590000021
Due to taking theta (0) =10 degrees, Z can be calculated from equation (4) 1 Is the initial estimate Z1 of (1) (0) F is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 );
Will Z1 (0) Substituting the following formula (5) to calculate a first iteration value Z1 (1) Wherein the hyper-parameter alpha takes the decimal fraction between 0 and 1 and takes 0.5;
Figure FDA0004183047590000022
at a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Figure FDA0004183047590000023
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) To
Figure FDA0004183047590000024
Substitution of +.>
Figure FDA0004183047590000025
Obtaining Z 1 Second estimate of +.>
Figure FDA0004183047590000031
Replace it by +.>
Figure FDA0004183047590000032
The first iteration value theta of the downward inclination angle theta can be obtained (2)
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternating with theta, when the iteration value approaches to a true value, the iteration values of the front and rear wheels tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the convergence iteration number is about 3 to 6, and the iteration number is 5, namely theta (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
Figure FDA0004183047590000033
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle theta
Figure FDA0004183047590000034
Establishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>
Figure FDA0004183047590000035
Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
Figure FDA0004183047590000036
S104: the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to a coordinate calculator, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 );
S105: final estimation using downtilt angle θ
Figure FDA0004183047590000037
Due to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, and then combining the pedestrian overhead coordinates (x 1 ,y 1 ) And->
Figure FDA0004183047590000038
The following formula is substituted in to be substituted,
Figure FDA0004183047590000039
calculating physical world coordinates (X) of the pedestrian standing position relative to the camera 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera.
2. An indoor pedestrian positioning method based on monocular vision as claimed in claim 1, further comprising a pedestrian positioning structure (1), characterized in that: the pedestrian positioning structure (1) consists of a high-definition monitoring camera (2), a humanoid detector (3) and a coordinate calculator (4).
3. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the high-definition monitoring camera (2) is responsible for collecting real-time videos in off-line store scenes, and the real-time videos are decoded into real-time image frame sequences and then transmitted to the human-shaped detector (3).
4. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the human-shaped detector (3) comprises a human-shaped detection module which is responsible for extracting human-shaped frames in image frames and maintaining a historical human-shaped frame set which comprises coordinate information of human-shaped frames appearing in a period of time.
5. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the coordinate calculator (4) comprises a camera pose estimation module and a coordinate positioning calculation module.
CN202011023002.4A 2020-09-25 2020-09-25 Indoor pedestrian positioning method based on monocular vision Active CN112258571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011023002.4A CN112258571B (en) 2020-09-25 2020-09-25 Indoor pedestrian positioning method based on monocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011023002.4A CN112258571B (en) 2020-09-25 2020-09-25 Indoor pedestrian positioning method based on monocular vision

Publications (2)

Publication Number Publication Date
CN112258571A CN112258571A (en) 2021-01-22
CN112258571B true CN112258571B (en) 2023-05-30

Family

ID=74234137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011023002.4A Active CN112258571B (en) 2020-09-25 2020-09-25 Indoor pedestrian positioning method based on monocular vision

Country Status (1)

Country Link
CN (1) CN112258571B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223442B (en) * 2021-07-22 2024-04-09 上海数川数据科技有限公司 Automatic generation method of indoor pedestrian map
CN114758457B (en) * 2022-04-19 2024-02-02 南京奥拓电子科技有限公司 Intelligent monitoring method and device for illegal operation among banknote adding

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949361A (en) * 2018-12-16 2019-06-28 内蒙古工业大学 A kind of rotor wing unmanned aerial vehicle Attitude estimation method based on monocular vision positioning
CN110619662A (en) * 2019-05-23 2019-12-27 深圳大学 Monocular vision-based multi-pedestrian target space continuous positioning method and system
CN110793526A (en) * 2019-11-18 2020-02-14 山东建筑大学 Pedestrian navigation method and system based on fusion of wearable monocular vision and inertial sensor

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3108264A2 (en) * 2014-02-20 2016-12-28 Mobileye Vision Technologies Ltd. Advanced driver assistance system based on radar-cued visual imaging
WO2017076928A1 (en) * 2015-11-02 2017-05-11 Starship Technologies Oü Method, device and assembly for map generation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949361A (en) * 2018-12-16 2019-06-28 内蒙古工业大学 A kind of rotor wing unmanned aerial vehicle Attitude estimation method based on monocular vision positioning
CN110619662A (en) * 2019-05-23 2019-12-27 深圳大学 Monocular vision-based multi-pedestrian target space continuous positioning method and system
CN110793526A (en) * 2019-11-18 2020-02-14 山东建筑大学 Pedestrian navigation method and system based on fusion of wearable monocular vision and inertial sensor

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"A Vehicle Localization System Using Visual Road Features from Monocular Camera";Ching Yu Lin.et al;《IEEE》;20191205;全文 *
"Monocular vision pose measurement algorithm based on points feature";Wang Zhongyu.et al;《INFRARED AND LASER ENGINEERING》;20190531;全文 *
"Solving Monocular vision Odometry Scale Factor with Adaptive Step Length Estimates for Pedestrians Using Handheld Devices";Nicolas Antigny.et al;《sensors》;20191231;第19卷(第4期);全文 *
"一种相机标定辅助的单目视觉室内定位方法";王勇等;《测绘通报》;20180225(第02期);全文 *
"单目视觉的室内多行人目标连续定位方法";孙龙培等;《测绘科学》;20191231;第44卷(第12期);全文 *
"基于双目视觉的目标定位研究";鞠冠秋等;《科技创新与应用》;20150428(第12期);全文 *

Also Published As

Publication number Publication date
CN112258571A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN111462200B (en) Cross-video pedestrian positioning and tracking method, system and equipment
CN112902953B (en) Autonomous pose measurement method based on SLAM technology
US9189859B2 (en) 3D image generation
US7929017B2 (en) Method and apparatus for stereo, multi-camera tracking and RF and video track fusion
Liu et al. Surveillance camera autocalibration based on pedestrian height distributions
CN110807809B (en) Light-weight monocular vision positioning method based on point-line characteristics and depth filter
Li et al. Multi-scale 3D scene flow from binocular stereo sequences
CN112258571B (en) Indoor pedestrian positioning method based on monocular vision
CN103735269B (en) A kind of height measurement method followed the tracks of based on video multi-target
Taketomi et al. Real-time and accurate extrinsic camera parameter estimation using feature landmark database for augmented reality
CN104574393A (en) Three-dimensional pavement crack image generation system and method
CN112233177A (en) Unmanned aerial vehicle pose estimation method and system
CN110349257B (en) Phase pseudo mapping-based binocular measurement missing point cloud interpolation method
CN112541938A (en) Pedestrian speed measuring method, system, medium and computing device
CN111915723A (en) Indoor three-dimensional panorama construction method and system
CN114494629A (en) Three-dimensional map construction method, device, equipment and storage medium
JP5027758B2 (en) Image monitoring device
Tirumalai et al. Dynamic stereo with self-calibration
CN115371673A (en) Binocular camera target positioning method based on Bundle Adjustment in unknown environment
CN109740458B (en) Method and system for measuring physical characteristics based on video processing
CN112288792A (en) Vision-based instant measurement method for guest queuing length and waiting time
Kochi et al. 3D modeling of architecture by edge-matching and integrating the point clouds of laser scanner and those of digital camera
CN114569114A (en) Height measuring method and device
Du et al. The study for particle image velocimetry system based on binocular vision
CN112414444A (en) Data calibration method, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant