CN112258571B - Indoor pedestrian positioning method based on monocular vision - Google Patents
Indoor pedestrian positioning method based on monocular vision Download PDFInfo
- Publication number
- CN112258571B CN112258571B CN202011023002.4A CN202011023002A CN112258571B CN 112258571 B CN112258571 B CN 112258571B CN 202011023002 A CN202011023002 A CN 202011023002A CN 112258571 B CN112258571 B CN 112258571B
- Authority
- CN
- China
- Prior art keywords
- human
- frame
- camera
- pedestrian
- humanoid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000001514 detection method Methods 0.000 claims abstract description 14
- 238000012544 monitoring process Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000006467 substitution reaction Methods 0.000 claims description 7
- 238000012634 optical imaging Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 3
- 230000009471 action Effects 0.000 abstract description 5
- 230000008569 process Effects 0.000 abstract description 5
- 238000005259 measurement Methods 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000008859 change Effects 0.000 abstract description 2
- 230000005484 gravity Effects 0.000 abstract description 2
- 238000007796 conventional method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/206—Instruments for performing navigational calculations specially adapted for indoor navigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an indoor pedestrian positioning method based on monocular vision, which comprises a pedestrian positioning structure, wherein the pedestrian positioning structure consists of a high-definition monitoring camera, a humanoid detector and a coordinate calculator. The calibration process of the camera posture is not needed to be finished automatically on line, the posture of the camera can still change gradually along with time after being installed due to the action of gravity, and the method can automatically update the measurement of the camera posture without manual intervention, thereby saving a great deal of labor and time for on-site calibration; the invention does not require the person to be positioned to carry a positioning tag or other electronic equipment, can complete positioning under the condition that the person does not feel, is only applicable to the coordinates of the humanoid detection frame during positioning, and does not relate to any privacy data; the invention can achieve the indoor positioning accuracy of 50cm based on the common monocular monitoring camera, and has obvious advantages in implementation cost and positioning accuracy.
Description
Technical Field
The invention relates to the technical field of indoor positioning, in particular to an indoor pedestrian positioning method based on monocular vision.
Background
Along with the continuous promotion of the digitalized and intelligent marketing trend of off-line commercial stores, how to effectively locate the instant position of a customer (pedestrian) in an indoor commercial scene becomes a key problem for providing personalized and intelligent service or interaction, and the existing indoor locating method mainly comprises the following steps:
based on the WIFI positioning of the mobile device (mainly a mobile phone), the distance between the pedestrian holding the mobile device and each WIFI access device is estimated through the signal intensity between the mobile device and a plurality of WIFI access devices, and the position coordinates of the pedestrian can be determined through a triangle, a fingerprint and other methods because the position of each WIFI access device is accurately measured in advance.
Ultra-wideband positioning, WIFI positioning accuracy is greatly affected by indoor environment display, UWB positioning technology reduces the influence of environment display on ranging accuracy by sending extremely narrow pulses, but a person to be positioned is required to hold UWB tag equipment.
The binocular vision positioning method is characterized in that the conventional monocular camera cannot obtain depth information of pedestrians from the camera, the depth information is directly used for indoor positioning, difficulty is high, binocular vision performs vision feature matching by using images of the camera with known optical center distances, and position information of a target distance camera system is calculated according to parallax information among the images and camera parameters calibrated in advance.
In the prior art, the WIFI positioning technology has wide application, the positioning accuracy is generally 5-10 meters, but the positioning accuracy is sensitive to environmental display, the fluctuation of the positioning accuracy is large, and the positioning accuracy range is large and even exceeds the height of a common floor, so that the positioning relationship and interaction behavior between indoor pedestrians (customers) and commercial facilities are difficult to determine by the WIFI positioning.
Although the UWB positioning technology can achieve positioning accuracy within 1 meter (positioning accuracy is in the order of decimeters under ideal non-shielding conditions), the UWB positioning technology is mainly used for personnel management in commercial and industrial scenes because the positioned personnel are required to hold UWB tags, and is difficult to be widely applied to pedestrian (mainly customers) positioning in off-line store scenes.
Binocular vision positioning is less affected by environment display, positioning accuracy is stable, but parallax between the binocular vision positioning and a camera is dependent, parallax can be reduced along with the distance between a target object and the camera system, if a person to be positioned and the camera system reach more than 5 meters, in order to ensure effective parallax, the optical center distance of the binocular camera needs to be correspondingly increased, so that the appearance of the camera is huge or difficult to calibrate, and the binocular vision positioning device cannot be suitable for installation in a commercial scene.
Based on the problems in the prior art, the invention mainly provides a monocular vision method capable of positioning pedestrians within 10 meters in real time in an indoor commercial store scene, and positioning accuracy is up to within 1 meter, so that the intelligent marketing/interaction requirement of off-line stores is met, and the defects in the prior art are overcome.
Disclosure of Invention
The invention aims to provide an indoor pedestrian positioning method based on monocular vision, which has the advantage of high positioning accuracy and solves the problems in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: an indoor pedestrian positioning method based on monocular vision comprises the following steps:
s101: the method comprises the steps that a high-definition monitoring camera collects videos in an off-line store scene, image frames are sent to a human-shaped detector after being decoded, the image frame interval is 40ms, namely 25 frames of images are sent to the human-shaped detector every second, without losing generality, if the original frame rate is 30fps, 30 frames of images can be sent to the human-shaped detector every second, and when the computing resources are insufficient, frame skipping processing is carried out on the image frames, wherein the frame skipping processing is not lower than 1 frame every second;
s102: after receiving the real-time image frames, the humanoid detection module detects humanoid frame coordinates in the image frames, and after collecting a humanoid frame coordinate set in a period of history time, the humanoid frame coordinate set is sent to the coordinate calculation module for estimating the gesture of the camera;
s103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i = (ti, pi), wherein ti is a timestamp corresponding to the human frame, pi is an image coordinate of the human frame, and the camera pose estimation module automatically estimates a downward inclination angle theta of the vertical direction of the camera according to human frame coordinate data of the set O without any manual calibration assistance; the optical imaging formulas of the head vertex P1 and the sole point P2 are combined as follows,
where f is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) Human-shaped frame O i Is y on the upper longitudinal coordinate of (2) 1 The lower ordinate is y 2 Subtracting to obtain
Wherein Y is 2 -Y 1 I.e. the height of the pedestrian;
s201: selecting a human-shaped frame with visible heads and feet from a historical human-shaped frame set, wherein the heads or the feet are shielded or invisible to be deleted, and the step is finished by utilizing a human skeleton point detection algorithm OpenPose;
s202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) Z in the formula (3) is omitted 1 The second term of (1) is
Due to taking theta (0) =10 degrees, Z can be calculated from equation (4) 1 Is the initial estimate Z1 of (1) (0) F is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 );
Will Z1 (0) Substituted into the followingCalculating a first iteration value Z1 according to the formula (5) (1) Wherein the hyper-parameter alpha takes the decimal fraction between 0 and 1 and takes 0.5;
at a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) ToSubstitution of +.>Obtaining Z 1 Second estimate of +.>Replace it by +.>The first iteration value theta of the downward inclination angle theta can be obtained (2) ;
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternating with theta, when the iteration value approaches to a true value, the iteration values of the front and rear wheels tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the convergence iteration number is about 3 to 6, and the iteration number is 5, namely theta (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle thetaEstablishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
S104: the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to a coordinate calculator, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 );
S105: final estimation using downtilt angle θDue to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, and then combining the pedestrian overhead coordinates (x 1 ,y 1 ) And->The following formula is substituted in to be substituted,
calculating physical world coordinates (X) of the pedestrian standing position relative to the camera 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera.
The indoor pedestrian positioning method based on monocular vision further comprises a pedestrian positioning structure, wherein the pedestrian positioning structure consists of a high-definition monitoring camera, a humanoid detector and a coordinate calculator.
Preferably, the high-definition monitoring camera is responsible for collecting real-time video in an off-line store scene, and the real-time video is decoded into a real-time image frame sequence and then transmitted to the human-shaped detector.
Preferably, the human-shaped detector comprises a human-shaped detection module which is responsible for extracting human-shaped frames in the image frames and maintaining a historical human-shaped frame set which comprises coordinate information of human-shaped frames which appear in a past period of time.
Preferably, the coordinate calculator comprises a camera gesture estimation module and a coordinate positioning calculation module, wherein the former is responsible for calculating the camera gesture by using the historical humanoid frame, and the latter is responsible for converting the camera image coordinates of the real-time humanoid frame into physical world coordinates, namely, the positioning of the indoor humanoid is completed.
Compared with the prior art, the invention has the following beneficial effects:
1. the calibration process of the camera posture is not needed to be finished automatically on line, the posture of the camera can still change gradually along with time after being installed due to the action of gravity, the method of the invention can automatically update the measurement of the camera posture without manual intervention, and a great deal of labor and time for on-site calibration are saved.
2. The invention does not require the person to be positioned to carry a positioning tag or other electronic equipment, can complete positioning under the condition that the person does not feel, is only applicable to the coordinates of the humanoid detection frame during positioning, and does not relate to any privacy data.
3. The invention can achieve the indoor positioning accuracy of 50cm based on the common monocular monitoring camera, and has obvious advantages in implementation cost and positioning accuracy.
Drawings
FIG. 1 is a schematic diagram of a system architecture of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a camera pose estimation principle according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the implementation of the automatic pose estimation algorithm according to the present invention.
In the figure: 1. a pedestrian positioning structure; 2. a high definition monitoring camera; 3. a humanoid detector; 4. and a coordinate calculator.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-4, the present invention provides a technical solution: as shown in fig. 1, the implementation method involves a high definition monitoring camera 2, a human shape detector 3, and a coordinate calculator 4.
The high-definition monitoring camera 2 is responsible for collecting real-time video in an off-line store scene, and the real-time video is decoded into a real-time image frame sequence and then transmitted to the human-shaped detector 3.
The human detector 3 comprises a human detection module responsible for extracting human frames in the image frames and maintaining a set of historical human frames containing coordinate information of human frames occurring over a period of time.
The coordinate calculator 4 comprises a camera pose estimation module and a coordinate positioning calculation module, wherein the former is responsible for calculating the camera pose by using the historical humanoid frame, and the latter is responsible for converting the camera image coordinates of the real-time humanoid frame into physical world coordinates, namely, the positioning of the indoor humanoid frame is completed.
As shown in fig. 2, the indoor pedestrian positioning method according to the embodiment of the invention includes the following steps:
s101: the high-definition monitoring camera 2 collects the video in the off-line store scene, the image frames are sent to the human-shaped detector 3 after being decoded, in the embodiment, the image frame interval is 40ms, that is, 25 frames of images are sent to the human-shaped detector 3 every second, and when the computing resource is insufficient, the image frames can be subjected to frame skipping processing, and generally not lower than 1 frame per second.
S102: after receiving the real-time image frame, the humanoid detection module detects humanoid frame coordinates in the image frame, gathers humanoid frame coordinates in a period of history time (humanoid frame coordinates acquired at 10-12 am a day before use in the embodiment), and sends the humanoid frame coordinates to the coordinate calculation module for estimating the pose of the camera.
S103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i =(t i ,p i ) Wherein t is i Is a timestamp corresponding to the human-shaped frame, p i The camera attitude estimation module automatically estimates the downward inclination angle theta of the vertical direction of the camera according to the coordinate data of the human-shaped frame of the set O under the condition that no manual calibration assistance is needed.
As a key part of the present invention, the following describes the algorithm process of the attitude estimation module in detail with reference to fig. 3:
as shown in fig. 3, when a pedestrian (customer) in an off-line store appears in the field of view of the high-definition monitoring camera 2 with the optical center of the camera as the common origin of the camera image coordinate system and the physical world coordinate system, the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) According to the basic optical imaging law, the following formula can be given:
wherein (x) 1 ,y 2 ) Is the coordinates on the image of the contact point P2 between the foot of the pedestrian and the floor, f is the focal length of the camera which has been measured in advance, Y 2 I.e. the camera mounting height, has also been measured in advance, the unknowns in the above formula include Z 1 ,X 1 And a downward inclination angle theta due to two squaresThere are three unknowns, so conventionally vision-based positioning methods require one unknown to be reduced by other measurement approaches, e.g. binocular vision uses a method of calculating Z by binocular parallax 1 。
The method adopted in the embodiment is to automatically estimate the downward inclination angle theta of the camera through the data of the historical human-shaped frame set.
As a key part of the present invention, the following describes the flow of automatic estimation in detail:
first, the optical imaging formulas of the head apex P1 and the sole point P2 are combined as follows,
subtracting to obtain
Wherein Y2-Y1 is the height of the pedestrian, and in this embodiment, assuming that the statistical average of the height of the pedestrian is 165cm, that is y1=y2-165, the unknown quantity in the above formula only includes Z 1 And θ.
As shown in fig. 4, the automatic estimation flow of the camera downtilt angle θ is as follows:
s201: the step is completed by utilizing a human skeleton point detection algorithm OpenPose in the embodiment, which is not an innovation point of the invention and is not expanded and discussed.
S202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) The quadratic term of Z1 in the formula (3) is omitted, and there is
Due to taking theta (0) =10 degrees, the initial estimate Z1 of Z1 can be calculated from equation (4) (0)
Will Z1 (0) Substituting the following formula (5) to calculate a first iteration value Z1 (1) Wherein the hyper-parameter alpha takes a fraction between 0 and 1, in this example 0.5.
At a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) ToSubstitution of +.>Obtaining Z 1 Second estimate of +.>Replace it by +.>The first iteration value theta of the downward inclination angle theta can be obtained (2) 。
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternate iteration with theta, when the true value is close, the iteration is performed before and afterThe iteration values of the two rounds tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the number of converged iterations is about 3 to 6, and in the embodiment, the number of iterations is 5, namely θ (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle thetaEstablishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
After the estimation of the downward inclination angle theta is completed, the image coordinates of the real-time human-shaped frame can be converted into the coordinates of a physical world coordinate system by utilizing an optical imaging rule, and the following steps are as follows:
s104, the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to the coordinate calculator 4, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 )。
S105, final estimated value of the downtilt angle theta is usedDue to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, re-vertex the pedestrian headCoordinates (x) 1 ,y 1 ) And->Substituting the following formula, calculating physical world coordinates (X 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera. />
The key point in the above step S105 is that, after the camera downtilt angle is determined, the present invention calculates the physical world coordinates by the overhead point coordinates y1 of the pedestrians, whereas the conventional method generally calculates the physical world coordinates by using the sole points y2, the reason for this is mainly because the conventional method is to consider that the height of each specific pedestrian is unknown, but in practice the probability that the feet of the pedestrians are blocked is large, the conventional method uses y2 of the lower edge of the humanoid frame as the sole point to calculate the physical world coordinates with a large fluctuation, resulting in poor accuracy of the positioning estimation result.
In the embodiment, the problem of unknown height is avoided by assuming the height of the human body to be 165cm in statistical mean value, so that physical coordinate positioning is possible by utilizing the top point of the pedestrian head, the probability of shielding the head of the pedestrian is far smaller than that of feet, the stability of positioning precision is greatly improved, and practical measurement shows that the horizontal distance between a camera and the pedestrian in an off-line store is mostly within 10m, and the height of the specific pedestrian is approximately 165cm at the distance, and the generated positioning error only exceeds 50cm when the height of the pedestrian is lower than 140cm or higher than 190cm, so that the positioning precision requirement within 1m designed by the invention is still met.
Further, by measuring in advance coordinates (Cx, cy) of the camera on the top plan view of the down-line store and projected unit vectors of the camera Z-axis on the top plan viewProjection of camera X-axis on top plan viewThe unit vector of the shadow isReal-time coordinates of travelers on-line store top plane view can be calculated asThereby realizing the real-time positioning of indoor pedestrians under monocular vision, when the distance between the pedestrians and the camera is within 10 meters, the downward inclination angle of the camera is within 15-45 degrees, and the positioning error is caused<50cm。
To sum up: according to the indoor pedestrian positioning method based on monocular vision, by designing a special cross iterative algorithm, under the condition that depth information is not assisted, automatic estimation of the downward inclination angle of a camera is achieved by utilizing automatically collected historical human frame data, so that indoor pedestrian positioning based on monocular vision is achieved, and the accuracy and stability of positioning are further improved by adopting a mode of calculating the head top point of a pedestrian by assuming the height as a statistical mean value.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (5)
1. An indoor pedestrian positioning method based on monocular vision is characterized by comprising the following steps of: the indoor pedestrian positioning method comprises the following steps:
s101: the method comprises the steps that a high-definition monitoring camera collects videos in an off-line store scene, image frames are sent to a human-shaped detector after being decoded, the image frame interval is 40ms, namely 25 frames of images are sent to the human-shaped detector every second, and when calculation resources are insufficient, frame skipping processing is carried out on the image frames, wherein the frame skipping processing is not lower than 1 frame every second;
s102: after receiving the real-time image frames, the humanoid detection module detects humanoid frame coordinates in the image frames, and after collecting a humanoid frame coordinate set in a period of history time, the humanoid frame coordinate set is sent to the coordinate calculation module for estimating the gesture of the camera;
s103: the received historical period humanoid frame coordinate set may be described as o= { O 1 ,O 2 …O n I.e. the collection is made up of n humanoid frames, each humanoid frame O i = (ti, pi), wherein ti is a timestamp corresponding to the human frame, pi is an image coordinate of the human frame, and the camera pose estimation module automatically estimates a downward inclination angle theta of the vertical direction of the camera according to human frame coordinate data of the set O without any manual calibration assistance;
the optical imaging formulas of the head vertex P1 and the sole point P2 are combined as follows,
where f is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 ) Human-shaped frame O i Is y on the upper longitudinal coordinate of (2) 1 The lower ordinate is y 2 Subtracting to obtain
Wherein Y is 2 -Y 1 I.e. the height of the pedestrian;
s201: selecting a human-shaped frame with visible heads and feet from a historical human-shaped frame set, wherein the heads or the feet are shielded or invisible to be deleted, and the step is finished by utilizing a human skeleton point detection algorithm OpenPose;
s202: traversing the filtered human-shaped frame set visible to the head and foot of the pedestrian, and sequentially taking out one human-shaped frame O i If the upper edge is y 1 The lower ordinate is y 2 Taking this as an observation to estimate the camera downtilt θ:
initial estimated value theta of camera downward inclination angle theta with 10 degrees (0) Z in the formula (3) is omitted 1 The second term of (1) is
Due to taking theta (0) =10 degrees, Z can be calculated from equation (4) 1 Is the initial estimate Z1 of (1) (0) F is the focal length of the camera measured in advance, and the world coordinate of the pedestrian head vertex P1 is (X 1 ,Y 1 ,Z 1 ) The world coordinate of the contact point P2 between the pedestrian foot and the floor is (X) 1 ,Y 2 ,Z 1 );
Will Z1 (0) Substituting the following formula (5) to calculate a first iteration value Z1 (1) Wherein the hyper-parameter alpha takes the decimal fraction between 0 and 1 and takes 0.5;
at a first iteration value Z1 (1) Substituting formula (6) to reversely calculate the first iteration value theta of the declination angle theta (1)
Will estimate the value theta for the first time (1) Substitution of θ in equation (5) (0) ToSubstitution of +.>Obtaining Z 1 Second estimate of +.>Replace it by +.>The first iteration value theta of the downward inclination angle theta can be obtained (2) ;
The Z is obtained by the two-point difference of the formula (5) 1 Equation (6) is a single point one-time equation, thus when Z 1 Alternating with theta, when the iteration value approaches to a true value, the iteration values of the front and rear wheels tend to converge, the common downtilt angle of the offline store camera is between 15 and 40 degrees, the convergence iteration number is about 3 to 6, and the iteration number is 5, namely theta (5) As a humanoid frame O i Posterior estimate for observed downtilt angle θ
S203: after traversing the collection of human frames, each human frame O i Obtain a corresponding posterior estimation value of the declination angle thetaEstablishing angle histogram with each 0.5 degree as one grid from 10 degrees to 45 degrees, initializing each grid to count to 0, and estimating each human shape frame in human shape frame set +.>Falls into corresponding angle histogram grids, and takes the grid with the largest count as the final estimated value +.>
S104: the humanoid detection model sends humanoid frame coordinate information on the real-time image frame to a coordinate calculator, wherein the image coordinates of the row of humanoid head vertexes are (x) 1 ,y 1 ) The sole point image coordinates are (x 1 ,y 2 );
S105: final estimation using downtilt angle θDue to the camera focal length f and the camera mounting height Y 2 Measuring in advance, taking the height of the person as the statistical average value of 165cm, knowing Y 1 =Y 2 -165, and then combining the pedestrian overhead coordinates (x 1 ,y 1 ) And->The following formula is substituted in to be substituted,
calculating physical world coordinates (X) of the pedestrian standing position relative to the camera 1 ,Z 1 ) Thereby determining the mutual positional relationship between the pedestrian and the camera.
2. An indoor pedestrian positioning method based on monocular vision as claimed in claim 1, further comprising a pedestrian positioning structure (1), characterized in that: the pedestrian positioning structure (1) consists of a high-definition monitoring camera (2), a humanoid detector (3) and a coordinate calculator (4).
3. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the high-definition monitoring camera (2) is responsible for collecting real-time videos in off-line store scenes, and the real-time videos are decoded into real-time image frame sequences and then transmitted to the human-shaped detector (3).
4. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the human-shaped detector (3) comprises a human-shaped detection module which is responsible for extracting human-shaped frames in image frames and maintaining a historical human-shaped frame set which comprises coordinate information of human-shaped frames appearing in a period of time.
5. The monocular vision-based indoor pedestrian positioning method of claim 2, wherein: the coordinate calculator (4) comprises a camera pose estimation module and a coordinate positioning calculation module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011023002.4A CN112258571B (en) | 2020-09-25 | 2020-09-25 | Indoor pedestrian positioning method based on monocular vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011023002.4A CN112258571B (en) | 2020-09-25 | 2020-09-25 | Indoor pedestrian positioning method based on monocular vision |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112258571A CN112258571A (en) | 2021-01-22 |
CN112258571B true CN112258571B (en) | 2023-05-30 |
Family
ID=74234137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011023002.4A Active CN112258571B (en) | 2020-09-25 | 2020-09-25 | Indoor pedestrian positioning method based on monocular vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112258571B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115223442B (en) * | 2021-07-22 | 2024-04-09 | 上海数川数据科技有限公司 | Automatic generation method of indoor pedestrian map |
CN114758457B (en) * | 2022-04-19 | 2024-02-02 | 南京奥拓电子科技有限公司 | Intelligent monitoring method and device for illegal operation among banknote adding |
CN114937060A (en) * | 2022-04-26 | 2022-08-23 | 南京北斗创新应用科技研究院有限公司 | Monocular pedestrian indoor positioning prediction method guided by map meaning |
CN117523009B (en) * | 2024-01-04 | 2024-04-16 | 北京友友天宇系统技术有限公司 | Binocular camera calibration method, system, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949361A (en) * | 2018-12-16 | 2019-06-28 | 内蒙古工业大学 | A kind of rotor wing unmanned aerial vehicle Attitude estimation method based on monocular vision positioning |
CN110619662A (en) * | 2019-05-23 | 2019-12-27 | 深圳大学 | Monocular vision-based multi-pedestrian target space continuous positioning method and system |
CN110793526A (en) * | 2019-11-18 | 2020-02-14 | 山东建筑大学 | Pedestrian navigation method and system based on fusion of wearable monocular vision and inertial sensor |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9664789B2 (en) * | 2014-02-20 | 2017-05-30 | Mobileye Vision Technologies Ltd. | Navigation based on radar-cued visual imaging |
EP3371671B1 (en) * | 2015-11-02 | 2020-10-21 | Starship Technologies OÜ | Method, device and assembly for map generation |
-
2020
- 2020-09-25 CN CN202011023002.4A patent/CN112258571B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949361A (en) * | 2018-12-16 | 2019-06-28 | 内蒙古工业大学 | A kind of rotor wing unmanned aerial vehicle Attitude estimation method based on monocular vision positioning |
CN110619662A (en) * | 2019-05-23 | 2019-12-27 | 深圳大学 | Monocular vision-based multi-pedestrian target space continuous positioning method and system |
CN110793526A (en) * | 2019-11-18 | 2020-02-14 | 山东建筑大学 | Pedestrian navigation method and system based on fusion of wearable monocular vision and inertial sensor |
Non-Patent Citations (6)
Title |
---|
"A Vehicle Localization System Using Visual Road Features from Monocular Camera";Ching Yu Lin.et al;《IEEE》;20191205;全文 * |
"Monocular vision pose measurement algorithm based on points feature";Wang Zhongyu.et al;《INFRARED AND LASER ENGINEERING》;20190531;全文 * |
"Solving Monocular vision Odometry Scale Factor with Adaptive Step Length Estimates for Pedestrians Using Handheld Devices";Nicolas Antigny.et al;《sensors》;20191231;第19卷(第4期);全文 * |
"一种相机标定辅助的单目视觉室内定位方法";王勇等;《测绘通报》;20180225(第02期);全文 * |
"单目视觉的室内多行人目标连续定位方法";孙龙培等;《测绘科学》;20191231;第44卷(第12期);全文 * |
"基于双目视觉的目标定位研究";鞠冠秋等;《科技创新与应用》;20150428(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112258571A (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112258571B (en) | Indoor pedestrian positioning method based on monocular vision | |
CN111462200B (en) | Cross-video pedestrian positioning and tracking method, system and equipment | |
CN107255476B (en) | Indoor positioning method and device based on inertial data and visual features | |
CN107924461B (en) | Method, circuit, equipment, system and the correlation computer executable code for being registrated and tracking for multifactor characteristics of image | |
US9189859B2 (en) | 3D image generation | |
Liu et al. | Surveillance camera autocalibration based on pedestrian height distributions | |
US7929017B2 (en) | Method and apparatus for stereo, multi-camera tracking and RF and video track fusion | |
CN110807809B (en) | Light-weight monocular vision positioning method based on point-line characteristics and depth filter | |
CN112902953A (en) | Autonomous pose measurement method based on SLAM technology | |
CN103735269B (en) | A kind of height measurement method followed the tracks of based on video multi-target | |
Li et al. | Multi-scale 3D scene flow from binocular stereo sequences | |
Taketomi et al. | Real-time and accurate extrinsic camera parameter estimation using feature landmark database for augmented reality | |
CN111915723A (en) | Indoor three-dimensional panorama construction method and system | |
CN112541938A (en) | Pedestrian speed measuring method, system, medium and computing device | |
CN110349257B (en) | Phase pseudo mapping-based binocular measurement missing point cloud interpolation method | |
JP5027758B2 (en) | Image monitoring device | |
CN114494629A (en) | Three-dimensional map construction method, device, equipment and storage medium | |
CN115222884A (en) | Space object analysis and modeling optimization method based on artificial intelligence | |
CN114569114A (en) | Height measuring method and device | |
CN113920254A (en) | Monocular RGB (Red Green blue) -based indoor three-dimensional reconstruction method and system thereof | |
CN109740458B (en) | Method and system for measuring physical characteristics based on video processing | |
CN112288792A (en) | Vision-based instant measurement method for guest queuing length and waiting time | |
CN112414444A (en) | Data calibration method, computer equipment and storage medium | |
CN115773759A (en) | Indoor positioning method, device and equipment of autonomous mobile robot and storage medium | |
CN114913224A (en) | Composition method for mobile robot based on visual SLAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |