CN110991297A

CN110991297A - Target positioning method and system based on scene monitoring

Info

Publication number: CN110991297A
Application number: CN201911175561.4A
Authority: CN
Inventors: 李子申; 李瑞东; 吴海涛; 潘军道; 刘振耀; 刘伟
Original assignee: Academy of Opto Electronics of CAS
Current assignee: Academy of Opto Electronics of CAS
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-04-10

Abstract

The embodiment of the invention provides a target positioning method and a system based on scene monitoring, wherein the method comprises the following steps: acquiring a two-dimensional image of a scene area through a camera; performing target detection on the two-dimensional image based on a trained Mask RCNN model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, wherein the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area; and processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area. The embodiment of the invention carries out target detection through the Mask RCNN algorithm and realizes target positioning according to the EPnP algorithm, thereby carrying out real-time positioning on the target in the scene area and improving the target positioning accuracy.

Description

Target positioning method and system based on scene monitoring

Technical Field

The invention relates to the technical field of image processing, in particular to a target positioning method and a target positioning system based on scene monitoring.

Background

Position information plays an increasingly important role in people's lives, and position information is required to be used as a support from traffic and outgoing to express logistics. Meanwhile, with the rise of intelligent mobile terminals, most terminal applications are closely related to position information, and the use of the position information has penetrated the aspects of people's lives. Along with the wide use of the position information, the requirements of people on the position information are higher and higher, and the requirements on the existing positioning mode are new challenges from rough position information to precise position information and from outdoor position information to indoor position information.

The existing unmanned aerial vehicle airborne photoelectric platform target positioning method is characterized in that under the condition that a target area reference image is known, a positioning algorithm based on image matching is used for acquiring a target area image by aerial shooting of an unmanned aerial vehicle and matching the target area image with the reference image to acquire the coordinate position of a target in the aerial image, however, the positioning accuracy of the scheme depends on the accuracy of the reference image and the real-time performance is poor; in addition, the target positioning method based on the total least square method does not depend on prior information such as control points, a reference map and the like, and is not limited to the limiting conditions such as the postures of the unmanned aerial vehicle and the photoelectric platform, but the positioning accuracy is poor.

Therefore, a method and a system for positioning an object based on scene monitoring are needed to solve the above problems.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a target positioning method and a target positioning system based on scene monitoring.

In a first aspect, an embodiment of the present invention provides a target positioning method based on scene monitoring, including:

acquiring a two-dimensional image of a scene area through a camera;

performing target detection on the two-dimensional image based on a trained Mask RCNN model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, wherein the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area;

and processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area.

Further, the trained Mask RCNN model is obtained through the following steps:

constructing a first training sample set according to the sample two-dimensional image of the scene area;

and training the model of the Mask RCNN algorithm through the first training sample set to obtain the trained Mask RCNN model.

Further, the trained Mask RCNN model is obtained by the following steps:

constructing a second training sample set through the MS COCO data set;

inputting the data in the second training sample set into a Mask RCNN algorithm for training to obtain a pre-trained Mask RCNN model;

and adjusting parameters of the pre-trained Mask RCNN model through a sample two-dimensional image of the scene area to obtain the trained Mask RCNN model.

Further, before the two-dimensional pixel coordinate information of each target and the reference data of the camera are processed according to the EPnP algorithm to obtain the positioning information of the plurality of targets in the scene area, the method further includes:

acquiring calibration frame information of each target in the two-dimensional image according to the two-dimensional pixel coordinate information of each target in the two-dimensional image;

and taking the calibration frame information of each target as a reference value of a reference point to acquire reference point pixel coordinate information of each reference point, and acquiring internal reference data of the camera according to the reference point pixel coordinate information.

Further, the acquiring internal reference data of the camera according to the reference point pixel coordinates includes:

and based on a Zhangyingyou calibration method, carrying out data calibration processing on the reference point pixel coordinate information to obtain internal reference data of the camera.

Further, after the acquiring the two-dimensional image of the scene area by the camera, the method further includes:

and preprocessing the two-dimensional image, wherein the preprocessing comprises processing of contrast, brightness, white noise or tone.

In a second aspect, an embodiment of the present invention provides a target positioning system based on scene monitoring, including:

the acquisition module is used for acquiring a two-dimensional image of a scene area through a camera;

the target detection module is used for carrying out target detection on the two-dimensional image based on a trained Mask RCNN model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, and the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area;

and the positioning module is used for processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area.

Further, the system further comprises: and the image preprocessing module is used for preprocessing the two-dimensional image, and the preprocessing comprises the processing of contrast, brightness, white noise or color tone.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the target positioning method and system based on scene monitoring, provided by the embodiment of the invention, target detection is carried out through a Mask RCNN algorithm, and target positioning is realized according to an EPnP algorithm, so that the target in a scene area is positioned in real time, and the target positioning accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a target positioning method based on scene monitoring according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a target positioning system based on scene monitoring according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flowchart of a target positioning method based on scene monitoring according to an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention provides a target positioning method based on scene monitoring, including:

step 101, acquiring a two-dimensional image of a scene area through a camera.

In the embodiment of the invention, the conventional camera is used as a data acquisition terminal to acquire the two-dimensional image information of the scene area in real time, so that the method and the device are more stable, have a wider application range and are easy to integrate with the conventional monitoring equipment, thereby realizing the effect of rapid deployment.

And 102, performing target detection on the two-dimensional image based on a trained Mask RCNN (Mask Region-CNN) model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, wherein the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area.

In the embodiment of the invention, the targets in the two-dimensional image are detected through the trained Mask RCNN model, and the marking frame information of all the targets in the two-dimensional image is extracted, so that the two-dimensional pixel coordinate information of each target is obtained for providing a data basis for the subsequent target positioning. It should be noted that the target in the two-dimensional image may be a person, a vehicle, an animal, or an plant, and a corresponding monitoring target is set according to the actual requirement of scene monitoring, which is not specifically limited in this embodiment of the present invention.

And 103, processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area.

In the embodiment of the invention, based on an EPnP (Efficient coherent-n-Point) algorithm, the three-dimensional coordinates of the target in the two-dimensional image are obtained according to the two-dimensional pixel coordinate information of each target and the internal reference data of the camera, so that the positioning of a plurality of targets in the two-dimensional image is realized. In the embodiment of the invention, the internal parameter data of the camera can be obtained by manually setting the reference point from the two-dimensional image, so that the internal parameter data of the camera can be obtained by calibration.

According to the target positioning method based on scene monitoring provided by the embodiment of the invention, target detection is carried out through a Mask RCNN algorithm, and target positioning is realized according to an EPnP algorithm, so that the target in a scene area is positioned in real time, and the target positioning accuracy is improved.

On the basis of the above embodiment, the trained Mask RCNN model is obtained through the following steps:

and training a Mask RCNN algorithm model through the first training sample set to obtain a trained Mask RCNN model.

On the basis of the above embodiment, preferably, the trained Mask RCNN model is further obtained through the following steps:

constructing a second training sample set through the MS COCO data set;

In the embodiment of the invention, a first training sample set is constructed by obtaining a sample two-dimensional image of a scene area, and the Mask RCNN model is trained through the first training sample set, so that the trained Mask RCNN model can be obtained. In order to improve the training efficiency and the model accuracy, preferably, in the embodiment of the invention, the trained MaskRCNN model can be obtained through transfer learning, so that the target detection accuracy is improved. The goal of the transfer learning is to extract useful knowledge from one or more tasks and use the useful knowledge on a new target task, which is essentially transfer and reuse of the knowledge. Specifically, the Mask RCNN algorithm obtains a pre-trained Mask RCNN model through MS COCO data set training, in the embodiment of the invention, 80000 pictures are taken as a training data set, 35000 pictures are taken as a verification data set, and 5000 pictures are taken as a test data set of the pre-trained Mask RCNN model, target classification, target detection and target segmentation are realized aiming at 80 types of targets, and the performance of the pre-trained Mask RCNN model meets most application scenes. Furthermore, image data of an actual application scene (namely a sample two-dimensional image of a scene area) is obtained on the basis of a pre-trained Mask RCNN model, the image data is made into a data set to perform transfer learning on the pre-trained Mask RCNN model, and model parameters in the pre-trained model are adjusted to obtain trained model parameters, so that the model can be converged more quickly, the training time is reduced, and the accuracy is ensured; meanwhile, the model parameters obtained by training are more suitable for the current application scene by combining the data of the actual application scene. In the embodiment of the invention, the model parameters obtained by training can be preloaded into the system memory, so that the system can be directly called when the target is positioned in real time, the process of repeatedly reading the model parameters is avoided, and the target detection speed is obviously improved.

On the basis of the above embodiment, before the processing the two-dimensional pixel coordinate information of each target and the reference data of the camera according to the EPnP algorithm to obtain the positioning information of the plurality of targets in the scene area, the method further includes:

In the embodiment of the invention, before target positioning is carried out according to the EPnP algorithm, the pixel coordinates and the real position coordinates of the reference point need to be acquired, so that the internal reference data of the camera is acquired through data calibration, and a data basis is provided for target positioning. In the embodiment of the invention, based on a Zhangyingyou calibration method, data calibration processing is carried out on the reference point pixel coordinate information to obtain internal reference data of the camera.

Further, in order to further improve reliability of the reference point, in general, a reference point is manually selected from the two-dimensional image, in the embodiment of the present invention, calibration frame information corresponding to the detection result of the target to be located is used as a reference value for reference point pixel calculation, so as to obtain reference point pixel coordinate information of each reference point, it should be noted that a lower frame midpoint of the calibration frame of the target to be located or a central point of the calibration frame may be selected as the reference point, which is not specifically limited in the embodiment of the present invention.

Compared with a mode of manually selecting a reference point, the reference point pixel coordinate information acquired by the embodiment reduces errors caused by acquiring a reference point pixel value, and the reference point pixel value is acquired in real time through the currently acquired two-dimensional image, so that data calibration is performed in real time, and the acquired camera internal reference data is more suitable for target positioning.

On the basis of the above embodiment, after the two-dimensional image of the scene area is acquired by the camera, the method further includes:

In the embodiment of the present invention, various picture processing including processing of contrast, brightness, white noise or hue is performed on the original two-dimensional image. In the whole preprocessing process, the main purposes are to eliminate redundant information in the original two-dimensional image, filter interference and noise and recover necessary real information, so that the detectability of related information is increased, and the data quality is improved for subsequent processing as much as possible.

In an embodiment of the present invention, a scene monitoring of a certain exhibition hall is used for illustration, and the effectiveness and the accuracy of the target positioning method provided in the embodiment of the present invention are tested by positioning all the human targets in the hall in real time.

Further, in the data preparation stage, model parameters required by the Mask RCNN algorithm and camera internal parameters required by the EPnP algorithm are mainly adjusted. Specifically, in the embodiment of the present invention, the model of the camera is DH-IPC-HFW 4631-12, image data is read from a video captured by the camera through Real Time Streaming Protocol (RTSP), the resolution of the video is set to 1920 × 1080, and the frame rate is set to 20 fps. In this scene area, since there is little change between two adjacent frames of images, it is sufficient to extract 1 frame from 20 frames of images per second, that is, skip the next 19 frames of images each time 1 frame of image is acquired, and then continue to acquire the next frame of image. In the embodiment of the invention, firstly, shooting images of the exhibition hall from 9 am to 4 pm are obtained, 100 images are randomly extracted in each hour, 700 two-dimensional images are calculated in total, an algorithm training data set is manufactured through processing, and adjusted model parameters are obtained through transfer learning on the basis of a pre-trained Mask RCNN model; then, 8 groups of reference points are selected for obtaining camera internal parameters, the actual positions of the reference points are obtained in a manual measurement mode, the reference point pixel coordinate information is obtained through the detected calibration frame information corresponding to the target, and a Zhang-Zhengyou calibration method is adopted for data calibration, so that high-precision camera internal parameter data are obtained.

Further, a plurality of targets in the acquired two-dimensional image are synchronously positioned in real time, and the method specifically comprises the following steps:

step S1, acquiring the two-dimensional image collected by the camera in real time, wherein the acquisition process is similar to that in the data preparation stage, namely, skipping the next 19 frames of images after acquiring 1 frame of image, and then continuing to acquire the next frame of image;

step S2, carrying out target detection processing on the two-dimensional image based on the trained Mask RCNN model, and extracting calibration frame information of all targets in the two-dimensional image to obtain two-dimensional pixel coordinate information of all targets;

and step S3, positioning and resolving the internal reference data of the camera and the calibration frame information of all targets through an EPnP algorithm, so that the positioning of a plurality of targets in the exhibitions hall is realized. It should be noted that the scene of the embodiment of the present invention is the exhibition hall, the elevation information is considered to be constant in the target detection process, the identified target is a person, and the middle position of the lower frame of the calibration frame corresponding to the detected target is used as the position for solving the target location, that is, the middle point of the lower frame of the calibration frame is used as the reference point.

Specifically, the target of the exhibition hall is positioned through the embodiment, and the target detection results of different time periods are obtained for verifying the accuracy of the target positioning method provided by the embodiment of the invention. Further, randomly extracting 5 images from the two-dimensional image shot in each hour between 9 am and 4 pm of the exhibition hall, counting the total number of human targets in the 5 images in each hour, and the total number of human targets detected by the Mask RCNN model, and calculating the detection accuracy (the detection accuracy is the number of detected targets/the number of actual targets 100%), and meanwhile, counting the average time consumption of target detection of the 5 images, wherein the counting result is shown in table 1:

TABLE 1

As can be seen from table 1, the accuracy of the target positioning method provided in the embodiment of the present invention is not lower than 80%, wherein the undetected target is mainly caused by the incomplete target specimen on the image, and the reason for this phenomenon includes that the target is located at the edge of the image or overlaps occur between the targets; in addition, the detection may fail due to the reason that the target is too similar to the background. In practical application, the situation is unavoidable, if an incomplete target on an image is not considered or the image is correspondingly preprocessed, the accuracy of the target positioning method provided by the embodiment of the invention can be further improved, even if the image is not preprocessed, the detection performance meets the use requirement, in addition, the average consumed time of the whole target detection process does not exceed 650 milliseconds, and the timeliness also meets the use requirement.

Further, in order to quantitatively describe the accuracy of the positioning result, in the embodiment of the present invention, 8 groups of target points are selected, the actual position values of the target points are measured respectively, then the positioning result values are solved according to the positioning algorithm, the error is calculated, and meanwhile, the time consumed by positioning calculation is counted, and the result can be referred to table 2 as follows:

TABLE 2

In Table 2, X₁And Y₁X and Y coordinates respectively representing the actual positions of the set of target points, wherein the Z coordinate does not need to be counted in the embodiment of the invention because the elevation information is constant; x₂And Y₂X respectively representing positioning results obtained by solving the group of target points through a target detection algorithm₂And Y₂Coordinates where Δ X ═ X₂-X₁，ΔY＝Y₂-Y₁，

Referring to table 2, among the 8 sets of target points, the largest error is the 1 st set of target points, which is 58.76 cm, and the smallest error is the 8 th set of target points, which is about 1.92 cm. That is to say, in the target positioning method based on scene monitoring provided by the embodiment of the present invention, the positioning errors are all smaller than 1 meter, the time consumption of the positioning process is not more than 50 milliseconds, the whole positioning process includes processing processes such as target detection, and the time consumption is smaller than 1 second, so that near real-time positioning is realized.

In the embodiment of the invention, the camera is used as a data acquisition terminal, so that the deployment and the integration are easy, the acquisition mode of image input is simplified, multiple targets can be simultaneously positioned, the error of the target positioning precision is less than 1 meter, the positioning time is less than 1 second, and the real-time positioning is realized.

Fig. 2 is a schematic structural diagram of a target positioning system based on scene monitoring according to an embodiment of the present invention, and as shown in fig. 2, an embodiment of the present invention provides a target positioning system based on scene monitoring, which includes an obtaining module 201, a target detecting module 202, and a positioning module 203, where the obtaining module 201 is configured to obtain a two-dimensional image of a scene area through a camera; the target detection module 202 is configured to perform target detection on the two-dimensional image based on a trained Mask RCNN model, so as to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, where the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area; the positioning module 203 is configured to process the two-dimensional pixel coordinate information of each target and the reference data of the camera according to an EPnP algorithm to obtain positioning information of a plurality of targets in the scene area.

In the embodiment of the present invention, the obtaining module 201 may obtain the two-dimensional image information of the scene area in real time by using the conventional camera, and has the advantages of stability, wider application range, and easy integration with the conventional monitoring device, thereby achieving the effect of rapid deployment. Then, the target detection module 202 detects targets in the two-dimensional image through the trained Mask RCNN model, and extracts the label frame information of all targets in the two-dimensional image, thereby obtaining two-dimensional pixel coordinate information of each target for providing a data basis for subsequent target positioning. It should be noted that the target in the two-dimensional image may be a person, a vehicle, an animal, or an plant, and a corresponding monitoring target is set according to the actual requirement of scene monitoring, which is not specifically limited in this embodiment of the present invention. Finally, the positioning module 203 obtains the three-dimensional coordinates of the targets in the two-dimensional image according to the two-dimensional pixel coordinate information of each target and the internal reference data of the camera based on the EPnP algorithm, so as to position a plurality of targets in the two-dimensional image. In the embodiment of the present invention, the input of the real-time positioning is the two-dimensional RGB image captured by the acquisition module 201 in real time, and due to the timeliness of the detection algorithm and the positioning algorithm, the timeliness of the whole positioning process can be ensured. The two-dimensional image is acquired by the acquisition module 201, and meanwhile, the positioned target does not need to shoot additional images, so that the passive positioning of the target is realized, and all the positioned targets are placed in a unified coordinate system by the positioning mode, so that the conversion among different coordinate systems is avoided, and the realization of multi-target real-time synchronous positioning is ensured.

According to the target positioning system based on scene monitoring provided by the embodiment of the invention, target detection is carried out through a Mask RCNN algorithm, and target positioning is realized according to an EPnP algorithm, so that targets in a scene area are positioned in real time, and the target positioning accuracy is improved.

On the basis of the above embodiment, the system further includes: and the image preprocessing module is used for preprocessing the two-dimensional image, and the preprocessing comprises the processing of contrast, brightness, white noise or color tone.

The system provided by the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and referring to fig. 3, the electronic device may include: a processor (processor)301, a communication Interface (communication Interface)302, a memory (memory)303 and a communication bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the communication bus 304. Processor 301 may call logic instructions in memory 303 to perform the following method: acquiring a two-dimensional image of a scene area through a camera; performing target detection on the two-dimensional image based on a trained Mask RCNN model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, wherein the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area; and processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area.

In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to, when executed by a processor, perform the method for positioning an object based on scene monitoring provided in the foregoing embodiments, for example, including: acquiring a two-dimensional image of a scene area through a camera; performing target detection on the two-dimensional image based on a trained Mask RCNN model to obtain two-dimensional pixel coordinate information of each target in the two-dimensional image, wherein the trained Mask RCNN model is obtained by training a sample two-dimensional image of the scene area; and processing the two-dimensional pixel coordinate information of each target and the internal reference data of the camera according to an EPnP algorithm to obtain the positioning information of the targets in the scene area.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A target positioning method based on scene monitoring is characterized by comprising the following steps:

acquiring a two-dimensional image of a scene area through a camera;

2. The target positioning method based on scene monitoring as claimed in claim 1, wherein the trained Mask RCNN model is obtained by the following steps:

3. The target positioning method based on scene monitoring as claimed in claim 1, wherein the trained Mask RCNN model is further obtained by the following steps:

constructing a second training sample set through the MS COCO data set;

4. The target positioning method based on scene monitoring according to claim 1, wherein before the two-dimensional pixel coordinate information of each target and the reference data of the camera are processed according to the EPnP algorithm to obtain the positioning information of the plurality of targets in the scene area, the method further comprises:

5. The target positioning method based on scene monitoring as claimed in claim 4, wherein the obtaining of the camera reference data according to the reference point pixel coordinates comprises:

6. The target positioning method based on scene monitoring as claimed in claim 1, wherein after the acquiring the two-dimensional image of the scene area by the camera, the method further comprises:

7. An object positioning system based on scene monitoring, comprising:

8. The target positioning system based on scene monitoring as claimed in claim 7, wherein the system further comprises: and the image preprocessing module is used for preprocessing the two-dimensional image, and the preprocessing comprises the processing of contrast, brightness, white noise or color tone.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for target location based on scene monitoring as claimed in any one of claims 1 to 6.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for object localization based on scene monitoring according to any one of claims 1 to 6.