CN114674323A

CN114674323A - Intelligent indoor navigation method based on image target detection and tracking

Info

Publication number: CN114674323A
Application number: CN202210382299.6A
Authority: CN
Inventors: 袁泉; 魏星; 付霄元; 李肇强; 李奕硕; 郝天琦; 罗贵阳; 李静林; 刘志晗
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-06-28

Abstract

The invention discloses an intelligent indoor navigation method based on image target detection and tracking, and belongs to the field of camera positioning and tracking. Firstly, aiming at a plurality of users who queue into a room, when each user passes through each camera, an upper computer automatically matches an id and a tracking detection frame under each camera for each user; and positioning the actual physical space coordinates of each user in each camera; then, respectively calculating the distance of each queued user, selecting the user id with the minimum distance to the camera of the user, and matching to generate a user two-dimensional code; until each user generates a respective two-dimensional code; respectively transmitting the start point coordinates and the end point coordinates of each user to an upper computer, planning out a shortest path and returning to each user side; finally, when each user starts to move along the shortest path from the starting point and enters a cross-camera area, the cross-camera pedestrian tracking technology is utilized to transmit and track the user data and update the navigation result in real time; the invention effectively enlarges the coverage area and is suitable for monitoring places.

Description

Intelligent indoor navigation method based on image target detection and tracking

Technical Field

The invention belongs to the field of camera positioning and tracking, pedestrian identification and path planning, and particularly relates to an intelligent indoor navigation method based on image target detection and tracking.

Background

The GPS technology is an outdoor positioning and navigation system serving the public earlier, is relatively mature in development and becomes a necessary tool for the outdoor positioning and navigation service of people; the satellite positioning signal is weak in strength and is easily influenced by environmental factors such as shielding, so that the technical advantages cannot be brought into play in indoor positioning.

With the development of positioning and navigation technologies in the modern times, a scientific research team utilizes a mobile communication network platform to realize combined navigation services based on ubiquitous wireless networks and multi-modal big data fusion, and the like; although the existing indoor positioning technologies such as WLAN and WSN realize efficient positioning in a regional room, a large number of nodes need to be arranged, the signal coverage cost is high, and the wide-area positioning application is not facilitated.

Therefore, the indoor navigation problem is simplified and is a development target by slightly improving, combining and utilizing the prior art, various structures exist in an indoor scene, personnel are dense, targets (articles) are various, and if the real-time tracking of a user cannot be completed, and the process cannot be controlled, the fault tolerance rate is reduced; if the indication in the navigation process is simple, the navigation precision is greatly reduced. The traditional indoor navigation method has limited capability and cannot realize multiple aspects; as in document 1: the application number is 201621245486.6, an indoor vehicle positioning navigation system based on video image processing is provided, moving vehicle information acquired by an image acquisition device is sent to a server through communication equipment, and a background server sends the position of a user vehicle to a vehicle user terminal after image processing. As in document 2: the application number is 201310205587.5, an indoor navigation system and an indoor navigation method thereof are provided, indoor positioning is achieved based on two-dimensional codes, the problem of inaccurate indoor positioning in the prior art is solved, and the accuracy of indoor positioning is improved; and navigating according to the guidance of the color sequence so as to improve the rationality and the popularity of the indoor navigation system. As in document 3: the application number is 200610063715.7, and a map recording method and a device in tracking navigation are provided, wherein an interest point closest to the current position of a controlled end is stored in an electronic map database, the interest point is marked and highlighted, so that the main control end can clearly and conveniently know the current position of the controlled end.

However, the disadvantage of the document (1) is that it is not described how to match a specific vehicle user terminal with a corresponding vehicle in a captured image when a plurality of vehicles are captured simultaneously by an image capturing device. Therefore, when the image acquisition device acquires images of a plurality of vehicles and a plurality of vehicle user terminals send requests to the server at the same time, the phenomenon that mistransmission and mistransmission occur in navigation results returned due to mismatching of information of users such as position information and selected terminal coordinates.

The document (2) has the disadvantages that the applicable scene is limited, and the two-dimensional code identification information of the two-dimensional code label is bound with the target position information, so that the system can only position the user at the moment of scanning the two-dimensional code, and cannot complete real-time tracking; and the user can only reach the terminal point through ground route guidance in the navigation process, and the navigation precision is low. When the user is located in a place with complex routes and dense personnel, such as a library, a shopping mall and the like, the effective path finding is difficult to complete through a navigation system. Meanwhile, when the forward path of the user has problems, the system is difficult to prompt and adjust in time, and the fault tolerance rate is low.

The disadvantage of the document (3) is that if the indoor tracking navigation uses GPS positioning, the indoor tracking navigation is susceptible to position, weather or other factors in most scenes, and the effect is not ideal; the civil precision of the GPS can reach 10 meters, but the GPS cannot exert the advantages in complex and various indoor places, and the indoor most scenes have large pedestrian flow and small space area, so that the accurate tracking of users is also a difficult problem.

The current wireless positioning precision is improved to some extent, but the user positioning is inaccurate due to factors such as shielding of indoor obstacles, and the like, so that a navigation result generates a large error.

Disclosure of Invention

Aiming at the problems, the invention provides an intelligent indoor navigation method based on image target detection and tracking, which realizes user identity matching by using a two-dimensional code, determines the physical space coordinate of a user by using a monocular camera positioning technology, enlarges the navigation coverage range by using the combination of multiple cameras, and completes navigation from the user coordinate to the destination coordinate under the condition of continuously tracking the user. Compared with the existing scheme, the equipment of the invention has simple requirements, and only the camera and the server can meet the requirements, so the application cost is low, the popularization is strong, and the application scenes are wide.

The intelligent indoor navigation method for detecting and tracking the image target specifically comprises the following steps:

step one, aiming at a plurality of users who queue into a room, when each user passes through each monocular camera in the room, an upper computer automatically distributes an id corresponding to the camera and a tracking detection frame for each user;

the id is a random number, and the user a can respectively distribute an id and a tracking detection frame bound with the id to the user through a plurality of indoor cameras. Similarly, each user is reassigned id and corresponding trace detection box through each camera.

Determining the actual physical space coordinates of each user in the shooting range of each camera by using a positioning technology and combining the tracking detection frame of each user;

the method comprises the following specific steps:

step 201, measuring and modeling an actual indoor scene, constructing a 2D digital map, and representing the map by a two-dimensional matrix;

in the map, "0" represents the area of one square meter in the actual scene, "1" represents an obstacle, and "2" represents a target position retrieved by a user according to the name in the system;

step 202, aiming at the current camera b and a user a, calculating the distance D between the user a and the camera b;

the calculation formula is as follows:

d is the distance from the user a to the camera b, wherein P is the pixel width of the user a in the image shot by the camera b, and W is the height of the user a.

Step 203, shooting A4 paper by using the current camera b, and calculating the camera focal length F of the camera b by using the measurement data;

p ' is the pixel height of A4 paper, D ' is the distance from the lens of camera b to A4 paper, and W ' is the height of A4 paper.

Step 204, establishing a rectangular coordinate system of the actual scene according to the 2D digital map, and converting the coordinates of the pixel points of the four vertexes of the user a tracking detection frame in the picture into the coordinates (x) of the user a in the actual scene₁,y₁)：

x₁＝X_max-D

X_maxIs the maximum value of x, bbox [0 ], in the actual scene established by the 2D digital map]The horizontal coordinate of the upper left corner of the detection frame in a coordinate system formed by pixel points in the camera picture is determined; bbox [2 ]]The horizontal coordinate of the lower right corner of the detection frame in a coordinate system formed by pixel points in a camera picture is determined;

and step 205, similarly, calculating the physical space coordinates of each user under each monocular camera respectively.

Step three, the distance between each user queued by the indoor initial camera and the user queued by the indoor initial camera is calculated respectively, the user with the minimum distance to the user is selected, and the user two-dimensional code corresponding to the id is generated by matching the id of the user; until generating respective user two-dimensional codes for each user;

respectively transmitting the start point coordinates and the end point coordinates of each user to an upper computer, planning a shortest path according to a set map, and returning to each user side;

the shortest path is planned as follows: and taking the terminal point coordinates of each user as target grids, and starting to move along the father node of each grid by adopting an Astar algorithm until the father node returns to the starting grid of the user, so that the optimal path of each user can be obtained.

And step five, when each user starts to move along the shortest path from the starting point and enters a cross-camera area, tracking each user and updating the navigation result in real time.

In the process of tracking the user, if the user enters an area covered by another camera from an area covered by one camera, the user data is transmitted by using a cross-camera pedestrian tracking technology:

aiming at the problem that a user a walks into the shooting range of a camera B from the shooting range of the camera A, partial overlapped pictures exist in pictures shot by the two cameras;

the id of the user a randomly distributed in the camera A is k, the id of the user a randomly distributed in the camera B is j, and the user a is bound with the id in the camera A through the two-dimensional code.

When a user a walks into an overlapped area shot by the two cameras, the cameras A and B simultaneously calculate the coordinates of the user a in an actual scene, if the coordinates are consistent, the id k in the camera A and the id j in the camera B are the same user, and data bound with the user are transmitted from the camera A to the camera B. Otherwise, if the coordinates calculated by the camera A and the camera B are not consistent, the id k and the id j appearing in the overlapped area are considered to be two different people, and the user data cannot be transmitted.

Similarly, when the user a continues to walk along the shortest path from the camera B to the next camera C, the user coordinates are calculated according to the overlapping area of the two cameras according to the process, and the like until the user walks to the target grid, and the complete transmission of the data is completed.

The invention has the advantages that:

1) an intelligent indoor navigation method based on image target detection and tracking is characterized in that a monocular camera ranging technology is used for detecting a digital id of a user nearest to a camera, and the id is used for generating a two-dimensional code uniquely corresponding to the user. After the user scans the two-dimensional code, the matching of the user identity and the digital id in the system can be realized, and when the system provides service for a plurality of users at the same time, the identity matching can be realized to ensure that the data transmission between different user sides and the server is not disordered.

2) An intelligent indoor navigation method based on image target detection and tracking is characterized in that tracking navigation is completed by means of multi-camera cooperative work based on a cross-camera navigation method, when a user enters a shooting area with two cameras overlapped, the two cameras respectively calculate physical coordinates of the user, if the physical coordinates are consistent, the two cameras consider the user to be the same user, and data of the user and id in a new camera are bound. When the user enters the next area, continuous tracking navigation can be completed according to the id in the new camera of the user; effectively enlarge the coverage area and adapt to the monitoring place.

Drawings

FIG. 1 is a flow chart of an intelligent indoor navigation method based on image target detection and tracking according to the present invention;

FIG. 2 is a schematic diagram of the system of the present invention assigning each detected user an id and a corresponding bounding box;

FIG. 3 is a flowchart of the operation of the yolov + deepsort algorithm of the present invention;

FIG. 4 is a flowchart of the Astar algorithm of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and examples.

Although the existing indoor positioning technologies such as WLAN and WSN realize regional indoor efficient positioning, a large number of nodes need to be arranged, the signal coverage cost is high, and the coverage area is limited. The invention provides a new indoor navigation realization idea, and has the greatest characteristic that the equipment requirement is simple, the requirements can be met only by a camera and a simple server, and the indoor navigation realization idea has the advantages of low cost and quick calculation. Even the use of the original monitoring camera in the scene can be considered, so that the popularization is strong and the application scene is wide; the matching of the actual user individual identity and the digital id in the system is realized through two-dimensional code scanning, and the tracking navigation is completed by the cooperative work of multiple cameras by combining a cross-camera navigation method, so that the technical difficulty is reduced and the user experience is improved; the following three problems with indoor navigation can be solved:

1) utilization of existing indoor facilities

In the prior art, most of the devices use additional facilities or tools to realize image acquisition or positioning, and nowadays, due to popularization of indoor monitoring, most application scenes are provided with related devices, so that cost can be reduced by better utilizing the devices, and the devices are easy to operate and popularize. In view of the above, the present invention uses the existing camera in the scene, and when a request for front-end indoor navigation is received, the current geographic position of the target and the indoor navigation map to which the current geographic position belongs are acquired through the camera, and the navigation route is calculated according to the target position, so as to realize efficient utilization of the existing facilities.

2) Enhancement of user personalized service experience

In the existing indoor navigation technology, when a plurality of options of the same destination exist, an indoor server acquires each geographic position matched with each destination identifier, and generates an indoor navigation route comprising the current position of a user terminal and each geographic position; and feeding the results back to the indoor map. The indoor navigation requirements are met but are complex and not intuitive enough, based on the method, the path planning process is simplified, the processing effort on a plurality of same destinations is concise, and the optimal solution is selected from a plurality of navigation routes and fed back to the user side; and the user behavior modeling is tried to be integrated into the preset method, and whether recommended commodities are passed or commodities frequently browsed and purchased by the user can be selected when the path is planned, so that the user experience is improved.

3) Balance between cost, efficiency and accuracy

The cost is greatly reduced because extra equipment does not need to be erected for system operation; in the target detection and tracking process, the camera is effectively utilized, and compared with the Bluetooth or other positioning modes, the position determination in the indoor map can reduce the error of indoor navigation and improve the precision; when an indoor navigation request is received, the navigation route is completed by a preset generation rule at the rear end, and finally, the result is only needed to be transmitted back to the front end to be provided for a user, so that the navigation efficiency can be effectively improved by mutual coordination among multiple threads. The method and the device integrate various factors so as to realize balance of the various factors.

Aiming at the problem of inaccurate navigation positioning of indoor places, the invention realizes the navigation effect by a tracking navigation technology based on the existing camera and two-dimensional digital map in the places; the whole technical scheme is divided into five processes: firstly, a user enters a shooting range of a camera, a system detects and tracks a target of the user, determines a unique identifier and a boundary box of the user, and automatically distributes corresponding id of each user under the camera; secondly, the initial camera selects the user with the minimum distance by calculating the distance between each user and the camera, generates a two-dimensional code of the user by using the user id, and performs virtual-real matching on the user identity and the allocated id; thirdly, determining the physical space coordinate of the user by utilizing a monocular camera positioning technology; fourthly, transmitting the start point coordinate and the end point coordinate of the user to a server end, planning a shortest path according to a map set in the system, and returning the shortest path to the user end; and fifthly, tracking the user and updating the navigation result in real time. It is worth noting that in the process of tracking the user, if the user enters the area covered by another camera from the area covered by one camera, the user data is transmitted by using a cross-camera pedestrian tracking technology; the coverage of the camera network to the indoor space of a large area is utilized to realize the expansion of the indoor navigation coverage area.

As shown in fig. 1, the specific steps are as follows:

step one, aiming at a plurality of users to be detected who are queued to enter a room, when each user passes through each monocular camera in the room, the upper computer tracks and automatically allocates an id corresponding to the camera and a tracking detection frame to each user;

in this example, using the existing yolov3+ deepsort algorithm, as shown in fig. 3, the user is subjected to target detection and tracking, a unique identifier and a bounding box of the user are determined, and the system assigns an id to each detected user, where the id assigned to the user is a random number, and binds the id of the user with the bounding box of the corresponding user in the system, as shown in fig. 2 in specific example;

when a user a passes through the shooting range of the indoor camera a1, the upper computer tracks and automatically allocates a corresponding id1 and a corresponding tracking detection frame to the user a; similarly, when the user a passes through the shooting range of the camera a2, the upper computer tracks and automatically allocates a corresponding id2 and a corresponding tracking detection frame to the user a; similarly, each user is reassigned id and corresponding trace detection box through each camera.

Step two, the indoor initial camera respectively calculates the distance between the users queued and the user queued, selects the user with the minimum distance to the user, and generates a user two-dimensional code corresponding to the id by using the id of the user in a matching manner; until generating respective user two-dimensional codes for each user respectively and displaying the user two-dimensional codes on a display screen;

if the initial camera X is adopted, queuing users are user a, user b and user c respectively, the user a with the minimum distance is selected according to the distance between each user and the camera, and the upper computer generates a user a two-dimensional code by using the id distributed to the user a by the camera X;

similarly, after the user a leaves, the next user b repeats the above process until the user two-dimensional code of each user is generated.

The user's two-dimensional code is used to: firstly, when a user enters an indoor navigation application, a user interface of the indoor navigation application is displayed on a mobile phone, and secondly, a user individual (or the mobile phone of the user) is matched with the id distributed in the system, so that the matching of a physical entity and a digital id is realized;

the method comprises the following specific steps:

301, measuring and modeling an actual indoor scene, constructing a 2D digital map of the actual scene in advance, and representing the map by using a two-dimensional matrix;

as shown in the following figures:

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 2 0 1

0 0 0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0 0 0

wherein, each "0" number in the two-dimensional matrix represents an area of one square meter in the actual scene (which can be modified according to the actual situation), "1" represents an obstacle, and "2" represents the position of an object which can be retrieved by a user according to the name in the system

Step 302, aiming at the current camera b and the user a, calculating the distance D between the user a and the camera b;

the calculation formula is as follows:

Step 303, shooting A4 paper by using the current camera b, and calculating the camera focal length F of the camera b by using the measurement data;

when measuring, a piece of A4 paper is taken, the camera is moved to fill the paper with the picture, the corresponding data is measured, P ' is the pixel height of A4 paper, D ' is the distance from the lens of the camera b to A4 paper, and W ' is the height of A4 paper.

And 304, establishing a rectangular coordinate system of the actual scene according to the 2D digital map, and converting the coordinates of pixel points of four vertexes of the tracking detection frame of the user a in the picture. Obtaining the coordinate (x) of the user a in the actual scene₁,y₁)：

x₁＝X_max-D

and 305, respectively calculating the physical space coordinates of each user under each monocular camera in the same way.

the starting point coordinate of the user can be obtained from the actual physical space coordinate, and the end point coordinate of the user is obtained by the input of the user terminal; in the scene modeling, the terminal coordinate S input by the user terminal is bound with the actual physical coordinate thereof, and the actual physical coordinate is the destination of the user.

As shown in fig. 4, the specific steps are as follows:

a) creating an opening list, and initializing and adding a starting point coordinate of a user;

find f (i) lowest grid in the open list, switch to the close list, which is called the current grid.

The estimation of the current node value is completed through a formula:

f(i)＝g(i)+h(i)

f (i) is the value estimate of the current node, where g (i) is the cost of moving from the starting point to the node, and h (i) is the estimated cost of moving the current node to the end point.

b) Judging 8 adjacent grids of the current grid:

if it is not available or is already in the closed list, ignore it;

add it if it is not in the open list; record its f (i), g (i), and h (i) values and take the current cell as the parent node for this cell.

Comparing the value of the original path g (i) if the value is already in the open list; if g (i) is smaller, the parent node of the cell is changed to the current cell, f (i), g (i) are recalculated, and the open list is reordered.

c) The loop is ended when one of the following conditions is satisfied:

adding the target lattice into the closed list and finding the target path;

target lattice is not found and the open list is empty (the target path does not exist at this time).

In the process of tracking the user, if the user enters the area covered by another camera from the area covered by one camera, the user data is transmitted by using a cross-camera pedestrian tracking technology:

When a user a walks into an overlapped area shot by the two cameras, the cameras A and B simultaneously calculate the coordinates of the user a in a 2D map of an actual scene, if the coordinates are consistent, id k in the camera A and id j in the camera B are the same user, and data bound with the user is transmitted from the camera A to the camera B. Otherwise, if the coordinates calculated by the camera A and the camera B are not consistent, the id k and the id j appearing in the overlapped area are considered to be two different people, and the user data cannot be transmitted.

The tracking navigation technology of the invention is composed of camera positioning, pedestrian tracking and path planning; wherein the pedestrian tracking includes: YOLOv3 extraction, deep learning of deep DeepSORT, etc.; the indoor place includes: shopping malls, underground garages or libraries and other places with poor signals and dense targets. And acquiring a specific two-dimensional code from an indoor place, scanning to complete matching, calling the tracking navigation technology to continuously track, and completing navigation by utilizing the indication of a shortest path through path planning.

YOLO is an end-to-end target detection model, and is a fast and accurate real-time object detection algorithm. As shown in fig. 3, the detection can be roughly divided into two steps: 1. determining the position of a detection object; 2. and classifying the detection object, and determining the classification of the detection object. The general flow of the work is as follows:

the YOLOv3 backbone network is darknet 53; the main network is mainly divided into three layers, namely an input layer, a convolutional layer and a residual error network. Each convolution part of the darknet53 uses a special darknetv Conv2D structure, L2 regularization is carried out during each convolution, and Batchnormalization standardization and LeakyReLU activation functions are carried out after the convolution is completed; compared with YOLOv2, YOLOv3 uses a residual network, and a residual structure can be directly mapped from a front feature layer to a rear feature layer (jump connection) without convolution, so that the training and feature extraction are facilitated, and the optimization is easy.

DeepsORT introduces the idea of deep learning, mainly uses a convolutional neural network to extract features, uses the relationship between two combined indexes, namely the Mahalanobis distance and the cosine distance between a state predicted by recursive Kalman filtering and a newly arrived state and a threshold value to carry out correlation measurement in the tracking process to serve the assignment problem, and uses cascade matching to solve the sub-problem in global distribution. The general flow of the DeepsORT work is as follows:

firstly, connecting a camera, inputting a video frame and preprocessing: detecting and extracting deep learning characteristics, solving corresponding confidence coefficient, and primarily screening candidate frames; the candidate boxes were then further screened with Non-Maximum inhibition (NMS). Next, using kalman filter prediction and matching degree measurement, performing cascade matching and iou (interaction over union) regression, and outputting the result. Finally, the above process is repeated for subsequent video frames.

The DeepsORT uses an 8-dimensional space variable to represent the state of a track at a certain moment, and the dimensions respectively represent the center coordinates of a candidate frame, the longitudinal and transverse ratios of the candidate frame, the height and the speed information; predicting an updated track state by using a Kalman filter, and expressing the motion matching degree by using the Mahalanobis distance between the state predicted by the Kalman filter and a newly arrived target state; meanwhile, in order to reduce the occurrence of identifier switching, deep learning is introduced into deep learning and extracted apparent features for matching degree analysis to supplement the Mahalanobis distance, and the minimum cosine distance between the detection and tracking feature vectors is used as the measurement standard of the apparent matching degree.

The shortest path planning method is based on an A Star algorithm obtained after optimization and transformation of a Dijkstra algorithm, and has certain high efficiency in the traversal process of the route and the graph by repeatedly obtaining a node closest to a starting point and expanding the node outwards.

Claims

1. An intelligent indoor navigation method based on image target detection and tracking is characterized by specifically comprising the following steps:

firstly, aiming at a plurality of users who queue into a room, when each user passes through each monocular camera in the room, the upper computer automatically allocates an id corresponding to the camera and a tracking detection frame to each user;

then, determining the actual physical space coordinates of each user in the shooting range of each camera by using a positioning technology and combining the tracking detection frame of each user;

secondly, respectively calculating the distance of each user queued by using an indoor initial camera, selecting the user with the minimum distance to the user, and generating a user two-dimensional code corresponding to the id by using the id of the user in a matching manner until each user generates a respective user two-dimensional code;

finally, respectively transmitting the start point coordinates and the end point coordinates of each user to an upper computer, planning out a shortest path according to a set map, and returning to each user side; when each user starts to move along the shortest path from the starting point and enters a cross-camera area, each user is tracked and the navigation result is updated in real time.

2. The intelligent indoor navigation method based on image target detection and tracking as claimed in claim 1, wherein id is a random number, and a user a passes through a plurality of indoor cameras and respectively allocates an id and a tracking detection frame bound with the id to the user; similarly, each user is reassigned id and corresponding trace detection box through each camera.

3. The intelligent indoor navigation method based on image target detection and tracking as claimed in claim 1, wherein the actual physical space coordinates of each user are calculated by the following specific steps:

the calculation formula is as follows:

d is the distance from the user a to the camera b, wherein P is the pixel width of the user a in the image shot by the camera b, and W is the height of the user a;

p ' is the pixel height of A4 paper, D ' is the distance from the lens of the camera b to A4 paper, and W ' is the height of A4 paper;

X_maxIs the maximum value of x, bbox [0 ], in the actual scene established by the 2D digital map]The horizontal coordinate of the upper left corner of the detection frame in a coordinate system formed by pixel points in a camera picture; bbox [2 ]]The horizontal coordinate of the lower right corner of the detection frame in a coordinate system formed by pixel points in a camera picture is determined;

4. The intelligent indoor navigation method based on image target detection and tracking as claimed in claim 1, wherein the planning of the shortest path is: the starting point coordinate of the user is obtained by calculating the actual physical space coordinate, and the end point coordinate is the actual physical coordinate corresponding to the coordinate input by the user side; and taking the terminal point coordinates of each user as target grids, and starting to move along the father node of each grid by adopting an Astar algorithm until the father node returns to the starting grid of the user, so that the optimal path of each user can be obtained.

5. The intelligent indoor navigation method based on image target detection and tracking as claimed in claim 1, wherein in the process of tracking the user, if the user enters the area covered by another camera from the area covered by one camera, the user data is transmitted by using a cross-camera pedestrian tracking technology:

the id of the user a randomly distributed in the camera A is k, the id of the user a randomly distributed in the camera B is j, and the user a is bound with the id in the camera A through the two-dimensional code;

when a user a walks into an overlapped area shot by the two cameras, the cameras A and B simultaneously calculate the actual scene coordinates of the user a, if the coordinates are consistent, the id k in the camera A and the id j in the camera B are the same user, and data bound with the user is transmitted to the camera B from the camera A; otherwise, if the coordinates calculated by the camera A and the camera B are not consistent, the id k and the id j appearing in the overlapped area are considered to be two different people, and the user data cannot be transmitted;