WO2022227462A1

WO2022227462A1 - Positioning method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022227462A1
Application number: PCT/CN2021/127625
Authority: WO
Inventors: 关英妲; 刘文韬; 钱晨
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2021-04-28
Filing date: 2021-10-29
Publication date: 2022-11-03
Also published as: TW202242803A; CN113129378A

Abstract

The present disclosure provides a positioning method and apparatus, an electronic device, and a storage medium. The positioning method comprises: acquiring video images collected at the same moment by multiple collection devices disposed within a target site, wherein different collection devices have different collection angles of view in the target site, and the video images comprise a target object; on the basis of the video images collected at the same moment by the multiple collection devices, determining initial position coordinates of the target object in the target site, respectively; and fusing the initial position coordinates of the same target object, and obtaining target position coordinates of the target object in the target site.

Description

Positioning method, device, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the priority of the Chinese patent application filed on April 28, 2021, with the application number of 202110467657.9 and the invention titled "a positioning method, device, electronic device and storage medium", which application is by reference incorporated into the text.

technical field

The present disclosure relates to the field of computer vision technology, and in particular, to a positioning method, an apparatus, an electronic device, and a storage medium.

Background technique

Artificial intelligence technology is playing an increasingly important role in creating intelligent education, entertainment and life. Computer vision, as one of the key technologies, is widely used. For example, the positioning technology based on computer vision can locate the target object in the target place under different scenarios, and determine the position of the target object in the target place.

In the process of positioning based on computer vision, the position of the target object in the image of the target site can be determined through the image of the target site collected by the camera, and the position of the target object in the target site can be further determined to complete the target object in the target site. track.

SUMMARY OF THE INVENTION

The embodiments of the present disclosure provide at least one positioning solution.

In a first aspect, an embodiment of the present disclosure provides a positioning method, including:

Acquiring multiple video images collected at the same time by multiple collection devices set in the target site; wherein, different collection devices have different collection perspectives in the target site, and the multiple video images include target objects; wherein, The target object is an object to be positioned in the target place;

Determine the initial position coordinates of the target object in the target place based on the plurality of video images;

The initial position coordinates of the same target object in the target objects are fused to obtain the target position coordinates of the target object in the target place.

In the embodiment of the present disclosure, the initial position coordinates of the target object in different video pictures can be determined through the video pictures collected at the same time by a plurality of acquisition devices with different acquisition perspectives set in the target place, and further the same video pictures in different video pictures can be determined. The initial position coordinates of the target object are fused to determine the target position coordinates of the target object in the target place. In this way, on the one hand, it is possible to complete the comprehensive positioning of the target object in the target site with large space and/or complex site, and on the other hand, it can obtain the target position coordinates of the same target object with high accuracy.

In a possible implementation manner, based on the plurality of video pictures, the initial position coordinates of the target object in the target place are respectively determined, including:

obtaining the pixel coordinates of the target object in the multiple video frames;

For each of the multiple collection devices, the target object collected by the collection device is determined based on the pixel coordinates of at least one of the target objects in the video image collected by the collection device and the parameter information of the collection device The initial position coordinates of at least one of them in the world coordinate system corresponding to the target location.

In the embodiment of the present disclosure, the pixel coordinates of the target object in the video screen can be determined first, and then the initial position coordinates of the target object in the target place can be obtained according to the parameter information of the acquisition device, which is used for subsequent determination of the target object in the target place. Target location coordinates are provided for preparation.

In a possible implementation manner, acquiring the pixel coordinates of the target object in multiple video frames includes:

Inputting the plurality of video pictures into a pre-trained neural network, for each of the plurality of video pictures, a detection frame of the target object in the video picture is obtained; wherein, the neural network includes a plurality of A target detection sub-network that detects target objects of different sizes;

The pixel coordinates of the target position point on the detection frame of the target object in the video picture are extracted in the video picture, and the pixel coordinates of the target object in the video picture are obtained.

In the embodiment of the present disclosure, the neural network includes a plurality of target detection sub-networks for detecting target objects of different sizes, so that when the target object in the video picture is detected by the neural network, the same video picture can be accurately detected target objects of different sizes.

In a possible implementation manner, the target object collected by the collection device is determined based on the pixel coordinates of at least one of the target objects in the video picture collected by the collection device and the parameter information of the collection device The initial position coordinates of at least one of them in the world coordinate system corresponding to the target location, including:

Based on the predetermined internal parameter matrix and distortion parameters of the acquisition device, the pixel coordinates of at least one of the target objects in the video picture collected by the acquisition device are corrected to obtain at least one of the target objects in the video picture. the corrected pixel coordinates of one;

Based on the predetermined homography matrix of the capture device and the modified pixel coordinates of at least one of the target objects in the video frame captured by the capture device, determine the pixel coordinates of at least one of the target objects in the video frame Initial position coordinates.

In the embodiment of the present disclosure, after obtaining the pixel coordinates of the target object in the video picture, the pixel coordinates are first corrected based on the internal parameter matrix and the distortion coefficient of the capture device that captures the video picture, so that the corrected pixel coordinates with higher accuracy can be obtained. , and further obtain the initial position coordinates of the target object with high accuracy in the target place.

In a possible implementation manner, the initial position coordinates of the same target object in the target objects are fused to obtain the target position coordinates of the target object in the target place, including:

determining a plurality of initial position coordinates associated with the same target object in the target objects based on the initial position coordinates of the target object in the target place;

The plurality of initial position coordinates associated with the target object are sequentially fused to obtain the target position coordinates of the target object in the target place.

In the embodiment of the present disclosure, considering that there may be some errors in the initial position coordinates of the same target object determined based on the video images collected by different collection devices, the initial position coordinates of the same target object collected by multiple collection devices can be fused, thereby The target position coordinates with higher accuracy of the same target object can be obtained.

In a possible implementation manner, the multiple initial position coordinates associated with the target object are sequentially fused to obtain the target position coordinates of the target object in the target place, including:

Select any initial position coordinate from the plurality of initial position coordinates associated with the target object, and use the selected initial position coordinate as the first intermediate fusion position coordinate;

The first intermediate fusion position coordinates are fused with any other initial position coordinates to be fused in the plurality of initial position coordinates to generate the second intermediate fusion position coordinates, and the second intermediate fusion position coordinates are used as the updated and returning to the step of generating the second intermediate fusion position coordinates, until there is no initial position coordinate to be fused in the plurality of initial position coordinates.

In a possible implementation manner, the first intermediate fusion position coordinate is fused with any other initial position coordinate to be fused among the plurality of initial position coordinates to generate a second intermediate fusion position coordinate, including:

Determine a midpoint coordinate of the first intermediate fusion position coordinate and any other initial position coordinate to be fused among the plurality of initial position coordinates, and use the midpoint coordinate as the second intermediate fusion position coordinate.

In this embodiment of the present disclosure, multiple initial position coordinates associated with the same target object may be fused in a manner of taking midpoints in sequence, so as to obtain target position coordinates with higher accuracy.

In a possible implementation manner, based on the initial position coordinates of the target object in the target place, determining a plurality of initial position coordinates associated with the same target object in the target objects, including:

For any two video pictures in the plurality of video pictures, the target object in the first video picture in the any two video pictures is determined as the first target object, and the second video picture in the arbitrary two video pictures is determined as the first target object. The target object in the picture is determined as a second target pair object; the initial position coordinates of each of the first target objects are determined in the first video picture; the initial position coordinates of each of the first target objects are determined in the second video picture; each of the second the initial position coordinates of the target object; for the initial position coordinates of each of the first target objects, determine the distance between the initial position coordinates of the first target object and the initial position coordinates of each of the second target objects;

It is determined that a second target object with a minimum distance from the first target object is the same target object as the first target object, wherein the minimum distance is less than a preset fusion distance threshold; the initial position coordinates of the first target object and The initial position coordinates of the second target object with the smallest distance from the first target object are taken as a plurality of initial position coordinates associated with the same target object among the target objects.

In the embodiment of the present disclosure, according to the initial position coordinates of different target objects in any two video frames and the preset fusion distance threshold, the initial position coordinates associated with the same target object can be quickly determined, so as to determine the subsequent position of each target object. The target location coordinates provide the basis.

In a possible implementation manner, after obtaining the target position coordinates of the target object in the target place, the positioning method further includes:

Determine whether there is a target object entering the target area based on the target position coordinates corresponding to each target object in the target place and a preset target area;

When it is determined that there is a target object entering the target area, an early warning prompt is performed.

In the embodiment of the present disclosure, after obtaining the target position coordinates of each target object in the target place with high accuracy, it can be determined whether the target object in the target place is based on a preset target area, such as a preset dangerous area. Enter the target area for timely warning prompts and improve the safety of the target site.

In a second aspect, an embodiment of the present disclosure provides a positioning device, including:

The acquisition module is used to acquire multiple video images collected at the same time by multiple collection devices set in the target site; wherein, different collection devices have different collection perspectives in the target site, and the multiple video images include A target object; wherein, the target object is an object to be positioned in the target place;

a determining module, configured to respectively determine the initial position coordinates of the target object in the target place based on the plurality of video pictures;

The fusion module is used for fusing the initial position coordinates of the same target object in the target objects to obtain the target position coordinates of the target object in the target place.

In a third aspect, embodiments of the present disclosure provide an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processing The processor and the memory communicate through a bus, and the machine-readable instructions execute the steps of the positioning method according to the first aspect when the machine-readable instructions are executed by the processor.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the positioning method according to the first aspect are executed .

In a fifth aspect, an embodiment of the present disclosure provides a computer program product, the computer program product includes a computer program and is stored on a storage medium, and when the computer program is executed by a processor, executes the steps of the positioning method according to the first aspect .

In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

FIG. 1 shows a flowchart of a positioning method provided by an embodiment of the present disclosure;

FIG. 2 shows a flowchart of a method for determining the initial position coordinates of a target object provided by an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a target object detected in a video picture provided by an embodiment of the present disclosure;

4 shows a flowchart of a method for determining target position coordinates of a target object provided by an embodiment of the present disclosure;

FIG. 5 shows a flowchart of a method for early warning provided by an embodiment of the present disclosure;

FIG. 6 shows a schematic structural diagram of a positioning device provided by an embodiment of the present disclosure;

FIG. 7 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this paper only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B, which can mean: the existence of A alone, the existence of A and B at the same time, the existence of B alone. a situation. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.

In many application scenarios, it is usually necessary to locate the target object in a place. For example, in a factory, it is necessary to detect whether employees are working in designated locations, or whether they have entered a dangerous area. In shopping malls, the distribution of people flow in shopping malls can be detected by locating customers. In the process of locating the target object in the place, the position of the target object can be determined through images collected by a plurality of cameras. However, for some target sites with complex and large areas, in the process of locating the target objects based on multiple cameras, it may not be possible to capture all the target objects, and there is a problem of incomplete target object positioning; there may also be some occlusion areas. , the target objects in these occluded areas cannot be localized.

Based on the above research, the present disclosure provides a positioning method, which can determine the initial position coordinates of the target object in different video pictures through the video pictures collected at the same time by a plurality of acquisition devices with different acquisition perspectives set in the target place, and further The initial position coordinates of the same target object in different video images are fused to determine the target position coordinates of the target object in the target place. In this way, on the one hand, it is possible to complete the comprehensive positioning of the target object in the target site with large space and/or complex site, and on the other hand, it can obtain the target position coordinates of the same target object with high accuracy.

In order to facilitate the understanding of this embodiment, a positioning method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the positioning method provided by the embodiment of the present disclosure is a computer device with computing capability, and the computer device includes, for example: server or other processing device. In some possible implementations, the positioning method may be implemented by the processor invoking computer-readable instructions stored in the memory.

Referring to FIG. 1, which is a flowchart of a positioning method provided by an embodiment of the present disclosure, the positioning method includes the following S101-S103:

S101 , acquiring video images collected at the same time by multiple collection devices set in the target site; wherein, different collection devices have different collection perspectives in the target site, and the video images include target objects.

Exemplarily, for different application scenarios, the target place may be the place corresponding to the application scenario. For example, if the employees in the factory need to be located, the target place may be the factory; In this case, the target place can be a shopping mall; when it is necessary to locate the athletes in the gymnasium, the target place can be the gymnasium.

Exemplarily, the target objects are objects to be located in the target location, such as the aforementioned employees, customers and athletes.

Exemplarily, the collection device may be a monocular camera or a binocular camera, and multiple collection devices may be set in the target site. For different target sites, the installation positions of multiple collection devices can be determined according to the actual site of the target site. For example, the acquisition angles of the acquisition devices in the target site may be different, so as to cover the entire area of the target site without leaving a dead angle. In addition, considering that too many capture devices will result in too many video images captured at the same time, it will affect the processing speed of the video images. Therefore, when installing the acquisition equipment in the target site, it is necessary to consider the installation angle and quantity of the acquisition equipment at the same time. For example, each target object entering the target site can be captured by two acquisition devices at the same time, so that multiple acquisition devices set in the target site can completely capture the video images of the entire area of the target site.

S102: Determine the initial position coordinates of the target object in the target place based on the video images collected by multiple collection devices at the same time.

Exemplarily, after acquiring the video images collected by multiple collection devices at the same time, target detection can be further performed on the video images collected by multiple collection devices at the same time, and it is determined that the target objects in different video images correspond to the target location. The initial position coordinates in the world coordinate system of . Specifically, the initial position coordinates of the target object in the video picture can be determined based on the detected pixel coordinates of the target object in the video picture and the parameter information of the acquisition device that collects the video picture.

Exemplarily, the world coordinate system corresponding to the target location may be predetermined. For example, take the center point of the ground of the target place as the origin of the world coordinate system, take the direction passing through the origin perpendicular to the ground as the Z-axis direction, take a direction on the ground of the target place passing through the origin as the X-axis direction, and take the ground of the target place as the direction of the X-axis. The direction that passes through the origin and is perpendicular to the X-axis is the Y-axis direction.

S103 , fuse the initial position coordinates of the same target object in the target objects to obtain the target position coordinates of the target object in the target place.

Exemplarily, considering that there are some errors between the parameter information of different collection devices, there will be some differences in the initial position coordinates of the same target object determined based on the video images collected by different collection devices. The initial position coordinates of the same target object can be fused to obtain the target position coordinates of the same target object in the world coordinate system corresponding to the target location.

For the above S102, when the initial position coordinates of the target object in the target place are respectively determined based on the video images collected by multiple collection devices at the same time, as shown in FIG. 2, the following S201-S202 may be included:

S201: Acquire pixel coordinates of a target object in a video image separately collected by multiple collection devices at the same time.

Exemplarily, the target object in the video picture can be identified based on a pre-trained neural network for target detection, and the pixel coordinates of the set position point in the target object in the image coordinate system corresponding to the video picture can be read, The pixel coordinates corresponding to the set position point are taken as the pixel coordinates of the target object.

Specifically, when acquiring the pixel coordinates of the target object included in the video images separately collected by multiple collection devices at the same time, the following steps S2011 to S2012 may be included:

S2011, inputting a plurality of video frames into a pre-trained neural network to obtain a detection frame of a target object in each video frame; wherein, the neural network includes a plurality of target detection sub-networks for detecting target objects of different sizes;

S2012 , extracting the pixel coordinates of the target position point on the detection frame of the target object in each video picture in the video picture, to obtain the pixel coordinates of the target object in the video picture.

Exemplarily, the neural network can detect each target object contained in the video picture, and mark the detection frame of each target object. As shown in FIG. 3 , it is a schematic diagram of a detection frame of a target object included in a video picture. The video image contains two detection frames corresponding to the target objects, including the detection frame A1B1C1D1 of the target object 1 and the detection frame A2B2C2D2 of the target object 2 respectively. A position point can be extracted as the target position point on the detection frame of each target object, for example, the midpoint of the bottom edge of the detection frame is extracted as the target position point. As shown in FIG. 3, the pixel coordinates of the target object 1 are represented by the pixel coordinates of the midpoint K1 of the bottom edge D1C1 of the detection frame A1B1C1D1, and the pixel coordinates of the target object 2 are represented by the pixel coordinates of the midpoint position K2 of the bottom edge D2C2 of the detection frame A2B2C2D2.

Exemplarily, considering that the position of the target object in the target place changes, and the capture angles of multiple capture devices set in the target place are different in the target place, therefore, in the video images captured by different capture devices at the same time The dimensions of the included target objects may vary. In order to accurately mark detection frames of target objects of different sizes, the neural network used in the embodiments of the present disclosure may include multiple target detection sub-networks for detecting target objects of different sizes. For example, it can be a feature pyramid network. Each target detection sub-network in the feature pyramid network is used to detect and identify target objects of the corresponding size of the target detection sub-network in the video picture. Through the neural network, targets of different sizes in the same video picture can be accurately detected. object.

In the embodiment of the present disclosure, the neural network includes a plurality of target detection sub-networks for detecting target objects of different sizes. In this way, when the target object in the video picture is detected by the neural network, the target objects of different sizes in the same video picture can be accurately detected.

S202 , based on the pixel coordinates of the target object in the video picture collected by each collection device and the parameter information of the collection device, determine the initial position coordinates of the target object collected by the collection device in the world coordinate system corresponding to the target location.

Exemplarily, the parameter information of each acquisition device may include a homography matrix of the acquisition device, wherein the homography matrix may represent the image coordinate system corresponding to the video picture acquired by the acquisition device and the target location where the acquisition device is located. The transformation relationship between world coordinate systems. In this way, after obtaining the pixel coordinates of the target object in the image coordinate system corresponding to the video screen, the initial position coordinates of the target object in the world coordinate system corresponding to the target location can be determined according to the parameter information of the acquisition device.

Exemplarily, the world coordinate system corresponding to the target site may take a fixed position in the target site as the coordinate origin to establish a unique world coordinate system. For example, you can take the center point of the ground of the target site as the origin of the coordinate system, set a direction on the ground as the positive direction of the X-axis of the world coordinate system, and set the direction perpendicular to the X-axis on the ground as the positive direction of the Y-axis of the world coordinate system , take the vertical and ground-up direction as the positive direction of the Z-axis of the world coordinate system.

In one embodiment, for the above S202, based on the pixel coordinates of the target object in the video picture collected by each collection device and the parameter information of the collection device, determine the world corresponding to the target object collected by the collection device in the target place The initial position coordinates in the coordinate system include the following S2021~S2022:

S2021 , based on the predetermined internal parameter matrix and distortion parameter of each acquisition device, correct the pixel coordinates of the target object in the video picture collected by the acquisition device, and obtain the corrected pixel coordinates of the target object in the video picture.

Exemplarily, the internal parameter matrix of the acquisition device contains

(f _x , f _y ) represents the focal length of the capture device, and (c _x , c _y ) represents the pixel coordinates of the center point of the video image captured by the capture device in the image coordinate system. The distortion parameters of the acquisition device include radial distortion parameters and tangential distortion coefficients. After obtaining the internal parameter matrix and distortion coefficient of each acquisition device in advance, the pixel coordinates of the target object in the video image collected by the acquisition device can be de-distorted according to the internal parameter matrix and distortion coefficient of the acquisition device. For example, the corrected pixel coordinates of the target object in the video image captured by the capture device can be obtained through the de-distortion function in the Opencv software.

Exemplarily, the internal parameter matrix and distortion parameters of each acquisition device may be predetermined in the manner of Zhang Zhengyou's chessboard calibration. For example, multiple checkerboard images can be taken from different angles to detect feature points in the images. According to the pixel coordinates of these feature points in the checkerboard image, the internal parameter matrix and distortion parameters of the acquisition device are solved, and then the internal parameter matrix and distortion parameters are continuously optimized. In the optimization process, the same pixel coordinates can be corrected according to the internal parameter matrix and distortion parameters obtained twice adjacently. Whether to end the optimization is determined by the difference between the two corrected pixel coordinates before and after, for example, after the difference is no longer reduced, the optimization can be ended to obtain the internal parameter matrix and distortion parameters of the acquisition device.

S2022 , based on the predetermined homography matrix of the acquisition device and the corrected pixel coordinates of the target object in the video picture acquired by the acquisition device, determine the initial position coordinates of the target object in the video picture.

Exemplarily, the homography matrix may represent the conversion relationship between the image coordinate system corresponding to the video picture captured by the capture device and the world coordinate system corresponding to the target location where the capture device is located. The homography matrix can be determined when the acquisition device is calibrated in advance. For example, a sample video image with multiple markers can be collected by a collection device, and the intersection of the multiple markers and the ground (the plane where the X and Y axes of the world coordinate system are located) is in the world coordinate system corresponding to the target site. World coordinates, and then determine the corrected pixel coordinates corresponding to the intersections of multiple markers and the ground in the sample video screen according to the above method, and further determine the single unit of the acquisition device based on the corrected pixel coordinates and world coordinates corresponding to the multiple markers respectively. Responsiveness Matrix.

Exemplarily, when determining the initial position coordinates of the target object in the video picture, the target object in the video picture can be obtained according to the corrected pixel coordinates of the target object in the video picture and the homography matrix of the acquisition device that collects the video picture. The initial position coordinates of the object in the world coordinate system corresponding to the target location.

In one embodiment, for the above S103, when the initial position coordinates of the same target object are fused to obtain the target position coordinates of the target object in the target place, as shown in FIG. 4, the following S301-S302 may be included:

S301: Determine a plurality of initial position coordinates associated with the same target object based on the initial position coordinates of the target object determined based on the plurality of video images.

Exemplarily, according to the above-mentioned target location, each target object is captured by at least two capture devices at the same time, and for each target object, in the case of being captured by different capture devices at the same time, the capture device There is a certain error in the parameter information, and the error between the parameter information of different acquisition devices is different. Therefore, the initial position coordinates of the same target object determined based on different video pictures may be different. Before fusing the initial position coordinates of the same target object, it is necessary to determine multiple initial position coordinates associated with the same target object.

S302 , successively fuse multiple initial position coordinates associated with the same target object to obtain target position coordinates of the same target object in the target place.

Exemplarily, assuming that the multiple initial position coordinates associated with the same target object include N, the first two may be fused first to obtain the fused initial position coordinates. Then, the fused initial position coordinates are fused with the third initial position coordinates until the last initial position coordinates are fused, and the final fused position coordinates are used as the target position coordinates of the same target object.

In one embodiment, for the above S301, when multiple initial position coordinates associated with the same target object are determined based on the initial position coordinates of the target object determined based on multiple video images, the following steps S3011 to S3012 are included:

S3011, for any two video pictures in the plurality of video pictures, determine that the target object in the first video picture in the arbitrary two video pictures is the first target object, and the target object in the second video picture in the arbitrary two video pictures The target object is a second target object, and for the initial position coordinates of each of the first target objects, determine the initial position coordinates of the first target object and the coordinates of each second target object in the second video frame in any two video frames. the distance between the initial position coordinates;

S3012: Determine that a second target object having a minimum distance from the first target object and the first target object are the same target object, wherein the minimum distance is less than a preset fusion distance threshold; the initial position of the first target object Coordinates as multiple initial position coordinates associated with the same target object in the target object.

Exemplarily, for example, A collection device is set up in the target site, and it is assumed that the video images captured by the A collection devices at the same time all contain at least one target object, at this moment, the initial position coordinates of the A group and the initial position coordinates of the A group can be obtained. Constitute the initial coordinate set s={S1, S2, S3, ...... SA}. Among them, S1, S2, S3...SA are sequentially represented as the target in the video screen shot by the first acquisition device, the second acquisition device, the third acquisition device to the A-th acquisition device in the A acquisition devices The initial position coordinates of the object. The following is an example of how to determine multiple initial position coordinates associated with the same target object by taking any two of the following video images as the video images captured by the first capture device and the second capture device at the same time:

Exemplarily, S1 includes initial position coordinates (also referred to as first initial position coordinates) of a first target objects, and S2 includes b initial position coordinates (also referred to as second initial position coordinates) of second target objects. ), the Euclidean distance between each first initial position coordinate and each second initial position coordinate can be determined to obtain the distance matrix:

Among them, d ₁₁ represents the distance between the first first initial position coordinate in S1 and the first second initial position coordinate in S2; d _1b represents the first first initial position coordinate in S1 and the bth in S2 The distance between the second initial position coordinates; d _ij represents the distance between the i-th first initial position coordinate in S1 and the j-th second initial position coordinate in S2; d _a1 represents the a-th first initial position coordinate in S1 The distance between the position coordinates and the first and second initial position coordinates in S2; d _ab represents the distance between the a-th first initial position coordinates in S1 and the b-th second initial position coordinates in S2.

Exemplarily, during operation, multiple initial position coordinates associated with the same target object in S1 and S2 can be determined in the following manner, including S30121 to S30124:

S30121, find the current minimum distance in the elements in the current distance matrix;

Exemplarily, in the case of finding the minimum distance for the first time, the elements in the current distance matrix include the Euclidean distance between each first initial position coordinate in S1 and each second initial position coordinate in S2.

S30122: Determine whether the current minimum distance is less than a preset fusion distance threshold.

Exemplarily, the preset fusion distance may be set empirically. For example, the same target object is photographed by different collection devices in advance, and then multiple position coordinates of the same target object in the target site are determined respectively according to the video images collected by different collection devices. The preset fusion distance threshold is determined according to distances between a plurality of position coordinates.

S30123: In the case where it is determined that the current minimum distance is smaller than the preset fusion distance threshold, determine that the two initial position coordinates associated with the current minimum distance are the initial position coordinates associated with the same target object.

Exemplarily, if it is determined that d _a1 is the current minimum distance, and d _a1 is smaller than the preset fusion distance threshold, the a-th first initial position coordinate in S1 and the first second initial position coordinate in S2 can be regarded as the same as the The initial position coordinates associated with the target object.

S30124, after setting the current minimum distance in the current distance matrix and all other distances between any one of the two initial position coordinates associated with the current minimum distance as the preset fusion distance threshold, return to executing S30121, until When the current minimum distance in the current distance matrix is greater than or equal to the preset fusion distance threshold, all initial position coordinates associated with the same target object in S1 and S2 are obtained.

Exemplarily, it is assumed that the current distance matrix is calculated from the initial position coordinates in S1 and S2, and the specific one is a 3×3 matrix:

The preset fusion threshold is d _th ; assuming that d ₁₁ is the minimum distance in the current matrix and less than d _th , then the first first initial position coordinate in S1 and the first second initial position coordinate in S2 are the same target. The object's associated initial position coordinates. Then in the current distance matrix, all other distances calculated from any of the two initial position coordinates are d ₁₂ , d ₁₃ , d ₂₁ , and d ₃₁ . Therefore, according to S30124, in the current matrix, it is necessary to set d ₁₁ , d ₁₂ , d ₁₃ , d ₂₁ , and d ₃₁ to d _th ; the set matrix is:

Then, it returns to execute S30121.

Exemplarily, after setting the current minimum distance in the current distance matrix and all other distances between any one of the two initial position coordinates associated with the current minimum distance as the preset fusion distance threshold, continue In the process of finding the current minimum distance, the elements set as the preset fusion distance threshold can be excluded, thereby improving the search efficiency.

Exemplarily, in one embodiment, after obtaining multiple initial position coordinates associated with the same target object in S1 and S2, it can continue to determine whether there is an initial position associated with the same target object based on any other two video frames. The coordinates of different initial positions of each target object in the video images collected by the A collection devices at the same time can be obtained after the video images collected by the A collection devices at the same time are judged. Then, the initial position coordinates associated with the same target object are fused to obtain the target position coordinates of each target object in the target place in the A video images shot by the A collection devices at the same moment.

Exemplarily, in another embodiment, after obtaining multiple initial position coordinates associated with the same target object in S1 and S2, coordinate fusion can be performed on the plurality of initial position coordinates to obtain the updated version of the same target object. Initial position coordinates. For the initial position coordinates in S1 and S2 that are not involved in the fusion, S2' can be formed with the updated initial position coordinates. Further form a new current distance matrix by the initial position coordinates in S2' and S3 and repeat the steps of S30121 to S30124 to obtain a plurality of initial position coordinates associated with the same target object in S2' and S3, and obtain S3' in the same way . A new current distance matrix is further formed by the initial position coordinates in S3 ' and S4, and the steps of S30121 to S30124 are repeatedly executed, until after the fusion with the initial position coordinates of the last element in the initial coordinate set is completed, A collection devices are obtained The target position coordinates of each target object in the target location in the A video frames shot at the same time.

In particular, until the fusion with the initial position coordinates of the last element in the initial coordinate set is completed, if any initial position coordinates are detected to be involved in the fusion from the beginning to the end, considering that each target in the target location The object is collected by at least two collecting devices at the same time, so any initial position coordinate can be used as the error initial position coordinate for filtering.

Specifically, for the above S302, when the multiple initial position coordinates associated with the same target object are sequentially fused to obtain the target position coordinates of the same target object in the target place, the following steps S3021 to S3022 may be included:

S3021: Select any initial position coordinate from a plurality of initial position coordinates associated with the same target object, and use the initial position coordinate as the first intermediate fusion position coordinate.

S3022, fuse the first intermediate fusion position coordinate with any other initial position coordinate to be fused among the plurality of initial position coordinates to generate a second intermediate fusion position coordinate; use the second intermediate fusion position coordinate as the updated first an intermediate fusion position coordinate, and return to the step of generating the second intermediate fusion position coordinate, until there is no initial position coordinate to be fused.

The initial position coordinates to be fused refer to the initial position coordinates that do not participate in the fusion.

Exemplarily, when the first intermediate fusion position coordinate is fused with any other initial position coordinate to be fused among the plurality of initial position coordinates to generate the second intermediate fusion position coordinate, the method includes: determining the first intermediate fusion position. The midpoint coordinates of the coordinates and any other initial position coordinates to be fused, and the midpoint coordinates are used as the generated second intermediate fusion position coordinates.

Exemplarily, in combination with the above-mentioned embodiment, if it is determined that the plurality of initial position coordinates associated with the target object A includes N, any initial position coordinate may be used as the first intermediate fusion position coordinate, and it is determined that the first intermediate fusion position coordinate is the same as that of the target object A. The midpoint coordinates of any other initial position coordinates to be fused. Then, the midpoint coordinate is used as the updated first intermediate fusion position coordinate, and continues to be fused with any other initial position coordinate to be fused. Until there is no initial position coordinate to be fused among the N initial position coordinates, the target position coordinate of the target object A is obtained.

In the embodiments of the present disclosure, it is proposed that multiple initial position coordinates associated with the same target object may be fused in a manner of taking midpoints in sequence, thereby obtaining target position coordinates with higher accuracy.

The positioning method proposed in the embodiment of the present disclosure can accurately determine the target position coordinates of each target object in the target place, and this method can be applied to various application scenarios. Taking the application in a factory as an example, after obtaining the target position coordinates of the target object in the target place, as shown in FIG. 5 , the positioning method provided by the embodiment of the present disclosure further includes the following S401 to S402:

S401, based on the target position coordinates corresponding to each target object in the target place, and the preset target area, determine whether there is a target object entering the target area;

S402, if it is determined that there is a target object entering the target area, perform an early warning prompt.

Exemplarily, in the case where the target site is a factory, a coordinate range corresponding to a dangerous target area in the factory may be set in advance in the world coordinates corresponding to the target site. Then, it is determined whether there is a target object entering the target area according to the target position coordinates corresponding to each target object in the determined target place and the target location in the corresponding coordinate range. Further, when it is determined that there is a target object entering the target area, an early warning prompt is performed.

Exemplarily, the early warning prompts may include, but are not limited to, sound and light alarm prompts, voice alarm prompts, and the like. Through the early warning prompts, the safety of employees in the target site can be guaranteed and the safety of the target site can be improved.

Taking the target site as a factory as an example, the positioning method provided by the present disclosure will be introduced as a whole in conjunction with specific embodiments:

1) Installation of acquisition equipment for the factory, such as installing multiple cameras in the factory. In order to achieve accurate positioning of the target in the scene and ensure the universality and robustness of the algorithm, different acquisition devices have different acquisition perspectives in the factory, and ensure that each employee entering the factory is captured by at least two acquisition devices at the same time. .

2) Use Zhang Zhengyou's calibration method to determine the internal parameter matrix and distortion coefficient of each camera.

3) Set up multiple markers in the factory, and determine the position coordinates of the intersection of the markers and the ground in the world coordinate system corresponding to the factory. And the corrected pixel coordinates of the intersection of the marker and the ground in the sample video picture are determined according to the camera's internal parameter matrix and distortion coefficient. And according to the position coordinates of the intersection point in the world coordinate system and the corrected pixel coordinates in the sample video picture, the homography matrix of each camera is determined.

4) Use the neural network added to the feature pyramid to perform target detection on the video pictures collected by the cameras in the factory, and obtain the pixel coordinates of the employees contained in each video picture.

5) According to the internal parameter matrix and the distortion coefficient of the camera that collected the video picture, correct the pixel coordinates of the employee included in the video picture to obtain the corrected pixel coordinates of the employee included in the video picture.

6) According to the homography matrix of the camera that collected the video picture and the corrected pixel coordinates of the employee contained in the video picture, determine the initial position coordinates of the employee in the factory contained in the video picture.

7) Integrate the initial position coordinates of the same employee in the video images collected at the same moment to obtain the target position coordinates of the employees in the factory at this moment.

8) According to the target position coordinates of the employees in the factory at this moment and the preset dangerous area in the factory, determine whether there is an employee entering the dangerous area. When it is determined that there is an employee entering the dangerous area, an early warning prompt is given.

Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

Based on the same technical concept, the embodiment of the present disclosure also provides a positioning device corresponding to the positioning method. Since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the above-mentioned positioning method of the embodiment of the present disclosure, the implementation of the device can refer to the method implementation, and the repetition will not be repeated.

Referring to FIG. 6 , which is a schematic diagram of a positioning device 500 according to an embodiment of the present disclosure, the positioning device includes:

The acquisition module 501 is used to acquire the video images collected by a plurality of collection devices set in the target site at the same time; wherein, different collection devices have different collection perspectives in the target site, and the video images include the target object;

The determining module 502 is configured to respectively determine the initial position coordinates of the target object in the target place based on the video images collected by multiple collection devices at the same time;

The fusion module 503 is used to fuse the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

In one embodiment, when the determining module 502 is used to determine the initial position coordinates of the target object in the target place based on the video images captured by multiple capturing devices at the same time, the steps include:

Obtain the pixel coordinates of the target object in the video images collected by multiple collection devices at the same time;

Based on the pixel coordinates of the target object in the video image captured by each capture device and the parameter information of the capture device, determine the initial position coordinates of the target object captured by the capture device in the world coordinate system corresponding to the target location.

In a possible implementation manner, when the determining module 502 is used to acquire the pixel coordinates of the target object in the video images captured by multiple capturing devices at the same moment, the following steps are included:

Inputting multiple video images into a pre-trained neural network to obtain a detection frame of the target object in each video image; wherein, the neural network includes multiple target detection sub-networks for detecting target objects of different sizes;

The pixel coordinates of the target position point on the detection frame of the target object in each video picture are extracted in the video picture, and the pixel coordinates of the target object in the video picture are obtained.

In a possible implementation manner, the determination module 502 is used to determine that the target object collected by the collection device is in the target place based on the pixel coordinates of the target object in the video picture collected by each collection device and the parameter information of the collection device The corresponding initial position coordinates in the world coordinate system include:

Based on the predetermined internal parameter matrix and distortion parameters of each acquisition device, correct the pixel coordinates of the target object in the video picture collected by the acquisition device, and obtain the corrected pixel coordinates of the target object in the video picture;

Based on the predetermined homography matrix of the capture device and the corrected pixel coordinates of the target object in the video frame captured by the capture device, the initial position coordinates of the target object in the video frame are determined.

In a possible implementation, the fusion module 503 is used to fuse the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place, including:

Determine a plurality of initial position coordinates associated with the same target object based on the initial position coordinates of the target object determined based on the plurality of video images;

The multiple initial position coordinates associated with the same target object are sequentially fused to obtain the target position coordinates of the same target object in the target place.

In a possible implementation manner, when the fusion module 503 is used to sequentially fuse multiple initial position coordinates associated with the same target object to obtain the target position coordinates of the same target object in the target place, it includes:

Select any initial position coordinate from a plurality of initial position coordinates associated with the same target object, and use the initial position coordinate as the first intermediate fusion position coordinate;

The first intermediate fusion position coordinates are fused with any other initial position coordinates to be fused in the plurality of initial position coordinates to generate the second intermediate fusion position coordinates, and the second intermediate fusion position coordinates are used as the updated first intermediate The position coordinates are fused, and the step of generating the second intermediate fused position coordinates is returned until there are no initial position coordinates to be fused.

In a possible implementation manner, the fusion module 503 is used to fuse the first intermediate fusion position coordinate with any other initial position coordinate to be fused among the plurality of initial position coordinates to generate the second intermediate fusion position coordinate , including:

Determine the midpoint coordinate of the first intermediate fusion position coordinate and any other initial position coordinate to be fused, and use the midpoint coordinate as the generated second intermediate fusion position coordinate.

In a possible implementation manner, when the fusion module 503 is used to determine multiple initial position coordinates associated with the same target object based on the initial position coordinates of the target object determined based on multiple video frames, the method includes:

For any two video pictures in the plurality of video pictures, it is determined that the target object in the first video picture in the arbitrary two video pictures is the first target object, and the target object in the second video picture in the arbitrary two video pictures is determined as the first target object. For the second target object, for the initial position coordinates of each first target object, determine the initial position coordinates of the first target object and the initial position of each second target object in the second video picture in any two video pictures distance between coordinates;

It is determined that the second target object with the minimum distance from the first target object and the first target object are the same target object, wherein the minimum distance is less than the preset fusion distance threshold; the initial position coordinates of the first target object, As multiple initial position coordinates associated with the same target object in the target object.

In a possible implementation manner, after the fusion module 503 obtains the target position coordinates of the target object in the target place, the determination module 502 is further configured to:

Determine whether there is a target object entering the target area based on the target position coordinates corresponding to each target object in the target place and the preset target area;

When it is determined that there is a target object entering the target area, an early warning prompt is given.

For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

Corresponding to the positioning method in FIG. 1 , an embodiment of the present disclosure further provides an electronic device 600 . As shown in FIG. 7 , a schematic structural diagram of the electronic device 600 provided by the embodiment of the present disclosure includes:

The processor 610, the memory 620, and the bus 630; the memory 620 is used to store the execution instructions, including the memory 621 and the external memory 622; the memory 621 here is also called the internal memory, which is used to temporarily store the operation data in the processor 610, and The data exchanged by the external memory 622 such as the hard disk, the processor 610 exchanges data with the external memory 622 through the memory 621, and when the electronic device 600 is running, the processor 610 and the memory 620 communicate through the bus 630, so that The processor 610 executes the following instructions: acquiring video images collected by multiple collection devices set in the target site at the same time; wherein, different collection devices have different collection perspectives in the target site, and the video images include target objects; For the video images collected by multiple collection devices at the same time, the initial position coordinates of the target pair in the target place are respectively determined; the initial position coordinates of the same target object are fused to obtain the target position coordinates of the target object in the target place.

Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the positioning method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

An embodiment of the present disclosure further provides a computer program product, where the computer program product carries a program code and is stored in a storage medium, where the instructions included in the program code can be used to execute the steps of the positioning method described in the above method embodiments, For details, reference may be made to the foregoing method embodiments, which will not be repeated here.

Wherein, the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, but not to limit them. The protection scope of the present disclosure is not limited to this, although the aforementioned The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

A positioning method comprising:

Acquiring multiple video images collected at the same time by multiple collection devices set in the target site; wherein, different collection devices have different collection perspectives in the target site, and the multiple video images include target objects; wherein, The target object is an object to be positioned in the target place;

Determine the initial position coordinates of the target object in the target place based on the plurality of video images;

The initial position coordinates of the same target object in the target objects are fused to obtain the target position coordinates of the target object in the target place.
The positioning method according to claim 1, wherein determining the initial position coordinates of the target object in the target place based on the plurality of video images, comprising:

obtaining the pixel coordinates of the target object in the multiple video frames;

For each of the multiple collection devices, the target object collected by the collection device is determined based on the pixel coordinates of at least one of the target objects in the video image collected by the collection device and the parameter information of the collection device The initial position coordinates of at least one of them in the world coordinate system corresponding to the target location.
The positioning method according to claim 2, wherein acquiring the pixel coordinates of the target object in the plurality of video pictures comprises:

Inputting the multiple video pictures into a pre-trained neural network,

For each of the plurality of video frames,

Obtain the detection frame of the target object in the video image;

The pixel coordinates of the target position point on the detection frame of the target object in the video picture are extracted in the video picture, and the pixel coordinates of the target object in the video picture are obtained.
The positioning method according to claim 2 or 3, wherein the acquisition is determined based on the pixel coordinates of at least one of the target objects in the video picture acquired by the acquisition device and parameter information of the acquisition device The initial position coordinates of at least one of the target objects collected by the device in the world coordinate system corresponding to the target location, including:

Based on the predetermined internal parameter matrix and distortion parameters of the acquisition device, the pixel coordinates of at least one of the target objects in the video picture collected by the acquisition device are corrected to obtain at least one of the target objects in the video picture. the corrected pixel coordinates of one;

Based on the predetermined homography matrix of the capture device and the modified pixel coordinates of at least one of the target objects in the video frame captured by the capture device, determine the pixel coordinates of at least one of the target objects in the video frame Initial position coordinates.
The positioning method according to any one of claims 1 to 4, wherein the initial position coordinates of the same target object in the target objects are fused to obtain the target position coordinates of the target object in the target place, comprising: :

determining a plurality of initial position coordinates associated with the same target object in the target objects based on the initial position coordinates of the target object in the target place;

The plurality of initial position coordinates associated with the target object are sequentially fused to obtain the target position coordinates of the target object in the target place.
The positioning method according to claim 5, wherein the plurality of initial position coordinates associated with the target object are sequentially fused to obtain the target position coordinates of the target object in the target place, comprising:

Select any initial position coordinate from the plurality of initial position coordinates associated with the target object, and use the selected initial position coordinate as the first intermediate fusion position coordinate;

The first intermediate fusion position coordinates are fused with any other initial position coordinates to be fused in the plurality of initial position coordinates to generate the second intermediate fusion position coordinates, and the second intermediate fusion position coordinates are used as the updated and returning to the step of generating the second intermediate fusion position coordinates, until there is no initial position coordinate to be fused in the plurality of initial position coordinates.
The positioning method according to claim 6, wherein the first intermediate fusion position coordinate is fused with any other initial position coordinate to be fused among the plurality of initial position coordinates to generate a second intermediate fusion position Coordinates, including:

Determine the midpoint coordinate of the first intermediate fusion position coordinate and any other initial position coordinate to be fused among the plurality of initial position coordinates, and use the midpoint coordinate as the second intermediate fusion position coordinate.
The positioning method according to any one of claims 5 to 7, characterized in that, based on the initial position coordinates of the target object in the target place, a plurality of target objects associated with the same target object in the target objects are determined. Initial position coordinates, including:

For any two video pictures in the plurality of video pictures, the target object in the first video picture in the any two video pictures is determined as the first target object, and the second video picture in the arbitrary two video pictures is determined as the first target object. The target object in the picture is determined as the second target object;

determining the initial position coordinates of each of the first target objects in the first video frame;

determining the initial position coordinates of each of the second target objects in the second video frame;

For the initial position coordinates of each of the first target objects,

determining the distance between the initial position coordinates of the first target object and the initial position coordinates of each of the second target objects;

determining that a second target object having a minimum distance from the first target object is the same target object as the first target object, wherein the minimum distance is less than a preset fusion distance threshold;

The initial position coordinates of the first target object and the initial position coordinates of the second target object with the smallest distance from the first target object are taken as a plurality of initial position coordinates associated with the same target object in the target objects.
The positioning method according to any one of claims 1 to 8, wherein after obtaining the target position coordinates of the target object in the target place, the positioning method further comprises:

Determine whether there is a target object entering the target area based on the target position coordinates corresponding to each of the target objects in the target place and a preset target area;

When it is determined that there is a target object entering the target area, an early warning prompt is performed.
A positioning device, comprising:

The acquisition module is used to acquire multiple video images collected at the same time by multiple collection devices set in the target site; wherein, different collection devices have different collection perspectives in the target site, and the multiple video images include A target object; wherein, the target object is an object to be positioned in the target place;

a determining module, configured to respectively determine the initial position coordinates of the target object in the target place based on the plurality of video pictures;

The fusion module is used for fusing the initial position coordinates of the same target object in the target objects to obtain the target position coordinates of the target object in the target place.
An electronic device, comprising: a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory communicate through the bus , the machine-readable instructions execute the steps of the positioning method according to any one of claims 1 to 9 when the machine-readable instructions are executed by the processor.
A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, executes the steps of the positioning method according to any one of claims 1 to 9.