CN113129378A

CN113129378A - Positioning method, positioning device, electronic equipment and storage medium

Info

Publication number: CN113129378A
Application number: CN202110467657.9A
Authority: CN
Inventors: 关英妲; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2021-07-16
Also published as: WO2022227462A1; TW202242803A

Abstract

The disclosure provides a positioning method, a positioning device, an electronic device and a storage medium, wherein the positioning method comprises the following steps: acquiring video pictures acquired by a plurality of acquisition devices arranged in a target place at the same moment; the video images comprise target objects, wherein different acquisition equipment has different acquisition visual angles in a target place; respectively determining initial position coordinates of a target object in a target place based on video pictures acquired by a plurality of acquisition devices at the same time; and fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

Description

Positioning method, positioning device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a positioning method, an apparatus, an electronic device, and a storage medium.

Background

The artificial intelligence technology plays an increasingly important role in creating intelligent education, entertainment and life, wherein computer vision is one of key technologies and is widely applied, for example, a positioning technology based on the computer vision can be used for positioning target objects in target places in different scenes and determining the positions of the target objects in the target places.

In the process of positioning based on computer vision, the position of a target object in a target place image can be determined through the target place image acquired by the monocular camera, the position of the target object in the target place image is further determined, and the tracking of the target object in the target place is completed.

For some target places with complex places, some occlusion areas are easy to exist in the process of positioning the target object based on the monocular camera, and the target object in the occlusion areas cannot be positioned.

Disclosure of Invention

The disclosed embodiments provide at least one positioning scheme.

In a first aspect, an embodiment of the present disclosure provides a positioning method, including:

acquiring video pictures acquired by a plurality of acquisition devices arranged in a target place at the same moment; different acquisition equipment has different acquisition visual angles in the target place, and the video picture comprises a target object;

respectively determining initial position coordinates of the target object in the target place based on video pictures acquired by the plurality of acquisition devices at the same moment;

and fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

In the embodiment of the disclosure, the initial position coordinates of the target object in different video pictures can be determined through the video pictures which are set in the target place and collected at the same time by a plurality of collection devices with different collection visual angles, the initial position coordinates of the same target object in different video pictures are further fused, and the target position coordinates of the target object in the target place are determined.

In a possible embodiment, the determining initial position coordinates of the target object in the target site based on the video pictures acquired by the plurality of acquisition devices at the same time respectively includes:

acquiring pixel coordinates of the target object in video pictures respectively acquired by a plurality of acquisition devices at the same moment;

and determining the initial position coordinates of the target object acquired by the acquisition equipment under a world coordinate system corresponding to the target place based on the pixel coordinates of the target object in the video picture acquired by each acquisition equipment and the parameter information of the acquisition equipment.

In the embodiment of the disclosure, it is proposed that the pixel coordinates of the target object in the video picture are determined first, then the initial position coordinates of the target object in the target place are obtained according to the parameter information of the acquisition device, and preparation is provided for subsequently determining the target position coordinates of the target object in the target place.

In a possible embodiment, the obtaining pixel coordinates of the target object in the video frames respectively captured by the multiple capturing devices at the same time includes:

inputting a plurality of video pictures into a pre-trained neural network to obtain a detection frame of a target object in each video picture; wherein the neural network comprises a plurality of target detection sub-networks for detecting target objects of different sizes;

and extracting the pixel coordinates of the target position point on the detection frame of the target object in each video picture in the video picture to obtain the pixel coordinates of the target object in the video picture.

In the embodiment of the disclosure, a plurality of target detection subnetworks for detecting target objects of different sizes are provided in a neural network, so that when the target objects in a video frame are detected through the neural network, the target objects of different sizes in the same video frame can be accurately detected.

In a possible implementation manner, the determining, based on the pixel coordinates of the target object in the video frame acquired by each acquisition device and the parameter information of the acquisition device, the initial position coordinates of the target object acquired by the acquisition device in a world coordinate system corresponding to the target location includes:

correcting the pixel coordinates of the target object in the video picture acquired by the acquisition equipment based on the predetermined internal reference matrix and distortion parameter of each acquisition equipment to obtain the corrected pixel coordinates of the target object in the video picture;

and determining the initial position coordinates of the target object in the video picture based on the predetermined homography matrix of the acquisition equipment and the corrected pixel coordinates of the target object in the video picture acquired by the acquisition equipment.

In the embodiment of the disclosure, after the pixel coordinates of the target object in the video picture are obtained, the pixel coordinates are corrected based on the internal reference matrix and the distortion coefficient of the acquisition device acquiring the video picture, so that corrected pixel coordinates with high accuracy can be obtained, and further, initial position coordinates of the target object in the target place with high accuracy are obtained.

In a possible implementation manner, the fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target location includes:

determining a plurality of initial position coordinates associated with the same target object based on the initial position coordinates of the target object determined by the plurality of video pictures;

and sequentially fusing a plurality of initial position coordinates associated with the same target object to obtain the target position coordinates of the same target object in the target place.

In the embodiment of the disclosure, in consideration of the fact that the initial position coordinates of the same target object determined based on the video pictures acquired by different acquisition devices have some errors, the initial position coordinates of the same target object acquired by a plurality of acquisition devices can be fused, so that the target position coordinates of the same target object with higher accuracy can be obtained.

In a possible implementation manner, the sequentially fusing the initial position coordinates associated with the same target object to obtain the target position coordinates of the same target object in the target location includes:

selecting any initial position coordinate from a plurality of initial position coordinates associated with the same target object, and taking the selected any initial position coordinate as a first intermediate fusion position coordinate;

and fusing the first intermediate fusion position coordinate with any other initial position coordinate to be fused to generate a second intermediate fusion position coordinate, taking the second intermediate fusion position coordinate as the updated first intermediate fusion position coordinate, and returning to the step of generating the second intermediate fusion position coordinate until no initial position coordinate to be fused exists.

In one possible embodiment, fusing the first intermediate fusion position coordinate with any other initial position coordinate to be fused to generate a second intermediate fusion position coordinate includes:

and determining the midpoint coordinate of the first intermediate fusion position coordinate and any other initial position coordinate to be fused, and taking the midpoint coordinate as the generated second intermediate fusion position coordinate.

In the embodiment of the disclosure, it is proposed that a plurality of initial position coordinates associated with the same target object may be fused in a manner of sequentially taking midpoints, so as to obtain a target position coordinate with higher accuracy.

In one possible embodiment, the determining the initial position coordinates of the target object based on the initial position coordinates of the target object determined by the video pictures includes:

determining the distance between the initial position coordinate of each first target object in a first video picture in any two video pictures and the initial position coordinate of each second target object in a second video picture in any two video pictures aiming at any two video pictures;

and taking the initial position coordinates of the first target object and the initial position coordinates of a second target object forming a minimum distance with the first target object as a plurality of initial position coordinates associated with the same target object, wherein the minimum distance is smaller than a preset fusion distance threshold value.

In the embodiment of the disclosure, the initial position coordinates associated with the same target object can be quickly determined according to the initial position coordinates of different target objects in any two video pictures and the preset fusion distance threshold, so that a basis is provided for subsequently determining the target position coordinates of each target object.

In a possible implementation, after obtaining the target position coordinates of the target object in the target site, the positioning method further includes:

determining whether a target object entering the target area exists or not based on target position coordinates corresponding to each target object in the target place and a preset target area;

and under the condition that the target object entering the target area is determined, early warning prompt is carried out.

In the embodiment of the disclosure, after the target position coordinates of each target object in the target place with higher accuracy are obtained, whether the target object in the target place enters the target area or not can be judged based on the preset target area, such as the preset danger area, so as to prompt an early warning in time and improve the safety of the target place.

In a second aspect, an embodiment of the present disclosure provides a positioning apparatus, including:

the acquisition module is used for acquiring video pictures acquired by a plurality of acquisition devices arranged in a target site at the same moment; different acquisition equipment has different acquisition visual angles in the target place, and the video picture comprises a target object;

the determining module is used for respectively determining initial position coordinates of the target object in a target place based on video pictures acquired by the plurality of acquisition devices at the same time;

and the fusion module is used for fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the positioning method according to the first aspect.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the positioning method according to the first aspect.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flow chart of a positioning method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flowchart of a method for determining initial position coordinates of a target object provided by an embodiment of the present disclosure;

fig. 3 illustrates a schematic diagram for a target object detected in a video frame according to an embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of a method for determining target location coordinates of a target object provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a method for warning a reminder according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a positioning device provided in an embodiment of the present disclosure;

fig. 7 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

In many application scenarios, it is generally necessary to locate a target object in a place, for example, in a factory, it is necessary to detect whether a staff works at a specified position or enters a dangerous area, in a shopping mall, people flow distribution in the shopping mall can be detected by locating a customer, and in the process of locating the target object in the place, the position of the target object can be determined through an image collected by a monocular camera, however, for some target places with complicated sites and large areas, in the process of locating the target object based on the monocular camera, all the target objects cannot be captured, and the problem of incomplete location exists.

Based on the research, the disclosure provides a positioning method, which may determine initial position coordinates of a target object in different video frames through video frames acquired at the same time by a plurality of acquisition devices with different acquisition perspectives in a target site, further fuse the initial position coordinates of the same target object in different video frames, and determine target position coordinates of the target object in the target site.

To facilitate understanding of the present embodiment, first, a positioning method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the positioning method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a server or other processing device. In some possible implementations, the location method may be implemented by a processor calling computer readable instructions stored in a memory.

Referring to fig. 1, a flowchart of a positioning method provided in the embodiment of the present disclosure is shown, where the positioning method includes the following steps S101 to S103:

s101, acquiring video pictures acquired by a plurality of acquisition devices arranged in a target place at the same moment; the different acquisition equipment has different acquisition visual angles in the target place, and the video picture comprises the target object.

For example, for different application scenarios, the target location may be a location corresponding to the application scenario, for example, the target location may be a factory where employees in the factory need to be located, the target location may be a mall where customers in the factory need to be located, and the target location may be a gym where athletes in the gym need to be located.

Illustratively, the target object is an object within the target site that needs to be located, such as the aforementioned employees, patrons, and athletes.

The acquisition device may be a monocular camera or a binocular camera, for example, and a plurality of acquisition devices may be provided in the target site, and may be configured to acquire, for different target sites, the installation positions of a plurality of collecting devices can be determined according to the actual field of the target place, for example, the collecting visual angles of the collecting devices in the target place can be different so as to cover the whole area of the target place without dead angles, in addition, considering that too many acquisition devices will result in too many video pictures acquired at the same time, and therefore will affect the processing speed of the video pictures, when the acquisition devices are installed in the target location, the installation angle and the number of the acquisition devices need to be considered simultaneously, for example, each target object entering the target site can be acquired by two acquisition devices simultaneously, therefore, a plurality of acquisition devices arranged in the target place can completely acquire the video pictures of the whole area of the target place.

And S102, respectively determining initial position coordinates of the target object in the target place based on video pictures acquired by a plurality of acquisition devices at the same time.

For example, after the video pictures acquired by the multiple acquiring devices at the same time are acquired, target detection may be further performed on the video pictures acquired by the multiple acquiring devices at the same time, and initial position coordinates of the target object in different video pictures in the world coordinate system corresponding to the target location are determined.

For example, the world coordinate system corresponding to the target location may be predetermined, such as taking a midpoint position of the ground of the target location as an origin of the world coordinate system, a direction perpendicular to the ground passing through the origin as a Z-axis direction, one direction passing through the origin on the ground of the target location as an X-axis direction, and a direction perpendicular to the X-axis passing through the origin on the ground of the target location as a Y-axis direction.

And S103, fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

For example, in consideration of some errors existing between parameter information of different acquisition devices, initial position coordinates of the same target object determined based on video pictures acquired by different acquisition devices have some differences, and the initial position coordinates of the same target object may be fused to obtain target position coordinates of the same target object in a world coordinate system corresponding to a target location.

With respect to S102 described above, when determining the initial position coordinates of the target object in the target location based on the video pictures captured by the plurality of capturing devices at the same time, as shown in fig. 2, the following S201 to S202 may be included:

s201, acquiring pixel coordinates of a target object in a video picture respectively acquired by a plurality of acquisition devices at the same moment.

For example, a target object in a video screen may be identified based on a pre-trained neural network for target detection, and further, pixel coordinates of a set position point in the target object in an image coordinate system corresponding to the video screen may be read, and the pixel coordinates corresponding to the set position point may be taken as the pixel coordinates of the target object.

Specifically, when acquiring pixel coordinates of a target object included in a video screen respectively acquired by a plurality of acquisition devices at the same time, the following S2011 to S2012 may be included:

s2011, inputting a plurality of video pictures into a pre-trained neural network to obtain a detection frame of a target object in each video picture; wherein the neural network comprises a plurality of target detection sub-networks for detecting target objects of different sizes;

s2012, extracting the pixel coordinates of the target position point on the detection frame of the target object in each video frame in the video frame, and obtaining the pixel coordinates of the target object in the video frame.

For example, the neural network may detect each target object included in the video frame and mark a detection frame of each target object, as shown in fig. 3, which is a schematic diagram of a detection frame of a target object included in the video frame, the video frame includes detection frames corresponding to two target objects, and includes the detection frame A1B1C1D1 of the target object 1 and the detection frame A2B2C2D2 of the target object 2, respectively, a location point may be extracted as a target location point on the detection frame of each target object, for example, a midpoint of a bottom edge of the detection frame may be extracted as a target location point, as in fig. 3, pixel coordinates of a midpoint K1 of a bottom edge D1C1 of the detection frame A1B1C1D1 indicate pixel coordinates of the target object 1, and pixel coordinates of a midpoint K2 of a bottom edge D2C2 of the detection frame A2B2C2D2 indicate pixel coordinates of the target object 2.

For example, considering that the position of a target object in a target site varies and the collection view angles of a plurality of collection devices arranged in the target site are different, so that sizes of target objects contained in video pictures collected by different collection devices at the same time may be different, in order to be able to accurately mark detection frames of target objects of different sizes, a neural network used in an embodiment of the present disclosure may include a plurality of target detection sub-networks for detecting target objects of different sizes, such as a feature pyramid network, each target detection sub-network in the feature pyramid network being used to detect and identify a target object of a size corresponding to the target detection sub-network in a video picture, and target objects of different sizes in the same video picture may be accurately detected through the neural network.

S202, based on the pixel coordinates of the target object in the video picture acquired by each acquisition device and the parameter information of the acquisition device, determining the initial position coordinates of the target object acquired by the acquisition device in a world coordinate system corresponding to a target place.

For example, the parameter information of each capturing device may include a homography matrix of the capturing device, such as a homography matrix, where the homography matrix may represent a conversion relationship between an image coordinate system corresponding to a video picture captured by the capturing device and a world coordinate system corresponding to a target location where the capturing device is located, so that after obtaining pixel coordinates of the target object in the image coordinate system corresponding to the video picture, an initial position coordinate of the target object in the world coordinate system corresponding to the target location may be determined according to the parameter information of the capturing device.

For example, the world coordinate system corresponding to the target location may use a fixed position in the target location as a coordinate origin to establish a unique world coordinate system, and for example, a midpoint on the ground of the target location may be used as the coordinate origin, a direction on the ground may be set as a positive direction of an X axis of the world coordinate system, a direction perpendicular to the X axis may be set as a positive direction of a Y axis of the world coordinate system, and a direction perpendicular to the ground may be set as a positive direction of a Z axis of the world coordinate system.

In one embodiment, regarding S202, when determining the initial position coordinates of the target object captured by each capturing device in the world coordinate system corresponding to the target location based on the pixel coordinates of the target object in the video screen captured by the capturing device and the parameter information of the capturing device, the method includes the following S2021 to S2022:

s2021, modifying the pixel coordinates of the target object in the video frame acquired by the acquisition device based on the predetermined internal reference matrix and distortion parameter of each acquisition device, to obtain modified pixel coordinates of the target object in the video frame.

Illustratively, the internal reference matrix of the acquisition device comprises

(f_x,f_y) Denotes the focal length of the acquisition device, (c)_x,c_y) The method includes the steps that pixel coordinates of a central point of a video picture acquired by acquisition equipment in an image coordinate system are represented, distortion parameters of the acquisition equipment comprise radial distortion parameters and tangential distortion coefficients, after an internal parameter matrix and a distortion coefficient of each acquisition equipment are obtained in advance, distortion removal processing can be conducted on the pixel coordinates of a target object in the video picture acquired by the acquisition equipment according to the internal parameter matrix and the distortion coefficient of the acquisition equipment, and for example, corrected pixel coordinates of the target object in the video picture acquired by the acquisition equipment can be obtained through a distortion removal function in Opencv software.

For example, the internal reference matrix and the distortion parameter of each acquisition device may be predetermined in a Zhangyingyou chessboard calibration manner, for example, multiple checkerboard images may be taken from different angles, feature points in the images may be detected, the internal reference matrix and the distortion parameter of the acquisition device may be solved according to pixel coordinates of the feature points in the checkerboard images, then the internal reference matrix and the distortion parameter may be continuously optimized, in the optimization process, the same pixel coordinate may be corrected according to the internal reference matrix and the distortion parameter obtained twice in the vicinity, and whether the optimization is finished or not may be determined by a difference between corrected pixel coordinates of the previous and subsequent times, for example, the optimization may be finished to obtain the internal reference matrix and the distortion parameter of the acquisition device after the difference is not reduced.

S2022, determining the initial position coordinates of the target object in the video frame based on the predetermined homography matrix of the capturing device and the corrected pixel coordinates of the target object in the video frame captured by the capturing device.

For example, the homography matrix may represent a conversion relationship between an image coordinate system corresponding to a video picture captured by the capturing device and a world coordinate system corresponding to a target location where the capturing device is located, and the homography matrix may also be determined in advance when the capturing device is calibrated, for example, a sample video picture with a plurality of markers may be captured by the capturing device, world coordinates of intersections of the plurality of markers and the ground (a plane where X and Y axes of the world coordinate system are located) in the world coordinate system corresponding to the target location may be determined in advance, then modified pixel coordinates of the intersections of the plurality of markers and the ground in the sample video picture may be determined according to the above manner, and further, the homography matrix of the capturing device may be determined based on the modified pixel coordinates and the world coordinates corresponding to the plurality of markers, respectively.

For example, when determining the initial position coordinates of the target object in the video frame, the initial position coordinates of the target object in the video frame in the world coordinate system corresponding to the target location may be obtained according to the corrected pixel coordinates of the target object in the video frame and the homography matrix of the capturing device capturing the video frame.

In one embodiment, in step S103, when the initial position coordinates of the same target object are fused to obtain the target position coordinates of the target object in the target location, as shown in fig. 4, the method may include steps S301 to S302 as follows:

s301, a plurality of initial position coordinates associated with the same target object are determined based on the initial position coordinates of the target object determined by the plurality of video frames.

For example, according to the above-mentioned situation that each target object in the target location is captured by at least two capturing devices at the same time, in the case that each target object is captured by different capturing devices at the same time, considering that parameter information of the capturing devices has a certain error and that the error between the parameter information of different capturing devices is different, initial position coordinates of the same target object determined based on different video pictures may be different, before fusing the initial position coordinates of the same target object, a plurality of initial position coordinates associated with the same target object need to be determined.

And S302, sequentially fusing a plurality of initial position coordinates associated with the same target object to obtain the target position coordinates of the same target object in a target place.

For example, assuming that the plurality of initial position coordinates associated with the same target object includes N, the first two may be fused to obtain fused initial position coordinates, and then the fused initial position coordinates and the third initial position coordinates are fused until the fused initial position coordinates and the last initial position coordinates are fused, and then the position coordinates obtained by final fusion are used as the target position coordinates of the same target object.

In one embodiment, the step S301, when determining a plurality of initial position coordinates associated with a same target object based on the initial position coordinates of the target object determined on the plurality of video screens, includes the steps S3011 to S3012 of:

s3011, determining the distance between the initial position coordinate of each first target object in the first video picture of any two video pictures and the initial position coordinate of each second target object in the second video picture of any two video pictures aiming at any two video pictures;

s3012, using the initial position coordinates of the first target object and the initial position coordinates of the second target object forming the minimum distance with the first target object as a plurality of initial position coordinates associated with the same target object, where the minimum distance is smaller than a preset fusion distance threshold.

For example, a target site is provided with a number of capturing devices, assuming that video pictures captured by the number of capturing devices at the same time each include at least one target object, at which time a set of initial position coordinates may be obtained, where the set of initial position coordinates constitutes an initial coordinate set S ═ S1, S2, S3,...... SA }, where S1, S2, and S3.. SA are sequentially expressed as initial position coordinates of the target object in video pictures captured by a first capturing device, a second capturing device, and a third capturing device to the a-th capturing device in the number of capturing devices, and the following two video pictures are taken as video pictures captured by the first capturing device and the second capturing device at the same time as an example to describe how to determine a plurality of initial position coordinates associated with the same target object:

for example, the initial position coordinates of a first target objects are included in S1, the initial position coordinates of b second target objects are included in S2, and the euclidean distance between the initial position coordinates of each first target object and the initial position coordinates of each second target object can be determined, so as to obtain a distance matrix:

wherein d is₁₁Representing a distance between the initial position coordinates of the 1 st first target object in S1 and the initial position coordinates of the 1 st second target object in S2; d_1bRepresenting a distance between the initial position coordinates of the 1 st first target object in S1 and the initial position coordinates of the b th second target object in S2; d_ijRepresenting a distance between the initial position coordinates of the ith first target object in S1 and the initial position coordinates of the jth second target object in S2; d_a1Representing a distance between the initial position coordinates of the a-th first target object in S1 and the initial position coordinates of the 1 st second target object in S2; d_abIndicating a distance between the initial position coordinates of the a-th first target object in S1 and the initial position coordinates of the b-th second target object in S2.

Illustratively, specifically in operation, the plurality of initial position coordinates associated with the same target object in S1 and S2 may be determined in the following manner, including S30121 to S3012:

s30121, searching the current minimum distance in the elements in the current distance matrix;

illustratively, in the case of finding the minimum distance for the first time, the elements in the current distance matrix contain euclidean distances between the initial position coordinates of each first target object in S1 and the initial position coordinates of the respective second target objects in S2.

S30121, judging whether the current minimum distance is smaller than a preset fusion distance threshold value.

For example, the preset fusion distance may be set empirically, for example, the same target object is photographed by different capturing devices in advance, then a plurality of position coordinates of the same target object in the target site are determined according to video pictures captured by different capturing devices, and the preset fusion distance threshold is determined according to a distance between the plurality of position coordinates.

S30123, when it is determined that the current minimum distance is smaller than the preset fusion distance threshold, determining that the two initial position coordinates forming the current minimum distance are the initial position coordinates associated with the same target object.

Illustratively, d is determined, for example_a1Is the current minimum distance, and d_a1Less than the preset fusion distance threshold, the initial position coordinates of the a-th first target object in S1 and the initial position coordinates of the 1 st second target object in S2 may be taken as the initial position coordinates associated with the same target object.

S30124, after setting the current minimum distance in the current distance matrix and the other distance formed by any initial position coordinate in the two initial position coordinates associated with the current minimum distance as a preset fusion distance threshold, returning to execute S30121 until the current minimum distance in the current distance matrix is not smaller than the preset fusion distance threshold, and obtaining all the initial position coordinates associated with the same target object in S1 and S2.

For example, after setting the other distance formed by the current minimum distance in the current distance matrix and any one of the two initial position coordinates associated with the current minimum distance as the preset fusion distance threshold, in the process of continuously searching for the current minimum distance, the element set as the preset fusion distance threshold may be excluded, thereby improving the search efficiency.

For example, in one embodiment, after obtaining the plurality of initial position coordinates associated with the same target object in S1 and S2, determining whether there are initial position coordinates associated with the same target object based on any other two video frames may be continued until the video frames captured by the a capturing devices at the same time are determined, obtaining different initial position coordinates of each target object in the video frames captured by the a capturing devices at the same time, and then fusing the initial position coordinates associated with the same target object to obtain the target position coordinates of each target object in the target site in the a video frames captured by the a capturing devices at the same time.

For example, in another embodiment, after obtaining the initial position coordinates associated with the same target object in S1 and S2, coordinate fusion may be performed on the initial position coordinates to obtain updated initial position coordinates of the same target object, for the initial position coordinates not participating in fusion in S1 and S2, S2 ' may be formed with the updated initial position coordinates, further a new current distance matrix is formed by the initial position coordinates in S2 ' and S3, the steps of S30121 to S30124 are repeatedly performed, a plurality of initial position coordinates associated with the same target object in S2 ' and S3 are obtained, S3 ' is obtained in the same manner, a new current distance matrix is further formed by the initial position coordinates in S3 ' and S4, the steps of S30121 to S30124 are repeatedly performed until fusion with the initial position coordinates in the last SA in the initial coordinate set is completed, and obtaining the target position coordinates of each target object in the target place in A video pictures shot by A pieces of acquisition equipment at the same time.

In particular, after the fusion with the initial position coordinates in the last SA in the initial coordinate set is completed, if it is detected that any initial position coordinate participates in the fusion from the beginning to the end, considering that each target object in the target site is simultaneously acquired by at least two acquisition devices, the any initial position coordinate may be filtered as an error initial position coordinate.

Specifically, in S302, when sequentially fusing a plurality of initial position coordinates associated with the same target object to obtain a target position coordinate of the same target object in the target location, the following S3021 to S3022 may be included:

s3021, selecting any initial position coordinate from the multiple initial position coordinates associated with the same target object, and using the selected any initial position coordinate as a first intermediate fusion position coordinate.

And S3022, fusing the first intermediate fusion position coordinate with any other initial position coordinate to be fused to generate a second intermediate fusion position coordinate, taking the second intermediate fusion position coordinate as the updated first intermediate fusion position coordinate, and returning to the step of generating the second intermediate fusion position coordinate until the initial position coordinate to be fused does not exist.

The initial position coordinate to be fused refers to an initial position coordinate which does not participate in fusion.

Illustratively, in fusing the first intermediate fused position coordinate with any other initial position coordinate to be fused to generate a second intermediate fused position coordinate, it includes:

For example, if it is determined that the plurality of initial position coordinates associated with the target object a includes N, any initial position coordinate may be used as a first intermediate fusion position coordinate, a midpoint coordinate between the first intermediate fusion position coordinate and any other initial position coordinate to be fused is determined, then the midpoint coordinate is used as an updated first intermediate fusion position coordinate, and the fusion with any other initial position coordinate to be fused is continued until the target position coordinate of the target object a is obtained after there is no initial position coordinate to be fused in the N initial position coordinates.

The positioning method provided by the embodiment of the present disclosure can accurately determine the target position coordinates of each target object in the target location, and this method can be applied to various application scenarios, taking application to a factory as an example, and after obtaining the target position coordinates of the target object in the target location, as shown in fig. 5, the positioning method further includes the following steps S401 to S402:

s401, determining whether a target object entering a target area exists or not based on target position coordinates corresponding to each target object in a target place and the preset target area;

s402, under the condition that the target object entering the target area is determined, early warning prompt is carried out.

For example, when the target location is a factory, a coordinate range corresponding to a target area where a danger exists in the factory may be set in advance in world coordinates corresponding to the target location, and then, according to target position coordinates corresponding to each target object in the determined target location and a coordinate range where the target is located, whether a target object entering the target area exists or not may be determined, and further, when it is determined that a target object entering the target area exists, an early warning prompt may be performed.

Illustratively, the early warning prompt may include, but is not limited to, an audible and visual alarm prompt, a voice alarm prompt, and the like, and through the early warning prompt, the safety of the staff in the target site may be ensured, and the safety of the target site may be improved.

The following takes a target site as an example, and a positioning method provided by the present disclosure is integrally introduced with specific embodiments:

1) the method comprises the steps that collection equipment is installed in a factory, for example, a plurality of cameras are installed in the factory, in order to achieve accurate positioning of targets in a scene, and algorithm universality and robustness are guaranteed, so that collection visual angles of different collection equipment in the factory are different, and it is guaranteed that each worker entering the factory is collected by at least two collection equipment at the same time.

2) And determining an internal reference matrix and a distortion coefficient of each camera by using a Zhang friend calibration mode.

3) The method comprises the steps of setting a plurality of markers in a factory, determining position coordinates of intersection points of the markers and the ground in a world coordinate system corresponding to the factory, determining corrected pixel coordinates of the intersection points of the markers and the ground in a sample video picture according to an internal reference matrix and a distortion coefficient of a camera, and determining a homography matrix of each camera according to the position coordinates of the intersection points in the world coordinate system and the corrected pixel coordinates in the sample video picture.

4) And performing target detection on video pictures acquired by a camera in a factory by using the neural network added with the characteristic pyramid to obtain pixel coordinates of the staff contained in each video picture.

5) And correcting the pixel coordinates of the staff contained in the video picture according to the internal reference matrix and the distortion coefficient of the camera for collecting the video picture to obtain the corrected pixel coordinates of the staff contained in the video picture.

6) And determining the initial position coordinates of the staff in the factory, wherein the initial position coordinates are contained in the video picture according to the homography matrix of the camera for collecting the video picture and the corrected pixel coordinates of the staff contained in the video picture.

7) And fusing initial position coordinates of the same employee in the video pictures acquired at the same moment to obtain target position coordinates of the employee in the factory at the moment.

8) And determining whether the staff enters the dangerous area or not according to the target position coordinates of the staff in the factory at the moment and the preset dangerous area in the factory, and giving an early warning prompt under the condition that the staff enters the dangerous area.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same technical concept, the embodiment of the present disclosure further provides a positioning apparatus corresponding to the positioning method, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the positioning method described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 6, there is shown a schematic diagram of a positioning apparatus 500 according to an embodiment of the present disclosure, the positioning apparatus includes:

an obtaining module 501, configured to obtain video pictures collected at the same time by multiple collection devices in a target location; the video image comprises a target object, wherein different acquisition equipment has different acquisition visual angles in a target place, and the video image comprises the target object;

a determining module 502, configured to determine initial position coordinates of a target object in a target location based on video frames acquired by multiple acquiring devices at the same time, respectively;

the fusion module 503 is configured to fuse the initial position coordinates of the same target object to obtain a target position coordinate of the target object in the target location.

In one embodiment, the determining module 502, when configured to respectively determine the initial position coordinates of the target object in the target site based on the video frames captured by the plurality of capturing devices at the same time, includes:

acquiring pixel coordinates of a target object in a video picture respectively acquired by a plurality of acquisition devices at the same moment;

and determining the initial position coordinates of the target object acquired by the acquisition equipment in a world coordinate system corresponding to the target place based on the pixel coordinates of the target object in the video picture acquired by each acquisition equipment and the parameter information of the acquisition equipment.

In one possible implementation, the determining module 502, when configured to acquire pixel coordinates of a target object in a video frame respectively acquired by a plurality of acquiring devices at the same time, includes:

In one possible implementation, the determining module 502, when configured to determine, based on the pixel coordinates of the target object in the video frame captured by each capturing device and the parameter information of the capturing device, the initial position coordinates of the target object captured by the capturing device in the world coordinate system corresponding to the target location, includes:

correcting the pixel coordinates of a target object in a video picture acquired by the acquisition equipment based on a predetermined internal reference matrix and distortion parameters of each acquisition equipment to obtain corrected pixel coordinates of the target object in the video picture;

and determining initial position coordinates of the target object in the video picture based on a predetermined homography matrix of the acquisition equipment and the corrected pixel coordinates of the target object in the video picture acquired by the acquisition equipment.

In a possible implementation manner, the fusion module 503 is configured to fuse the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target location, and includes:

In one possible implementation, the fusion module 503, when configured to sequentially fuse a plurality of initial position coordinates associated with the same target object to obtain target position coordinates of the same target object in the target location, includes:

In one possible embodiment, the fusion module 503, when configured to fuse the first intermediate fusion position coordinate with any other initial position coordinate to be fused to generate a second intermediate fusion position coordinate, includes:

In one possible implementation, the fusing module 503, when configured to determine a plurality of initial position coordinates associated with a same target object based on the initial position coordinates of the target object determined by the plurality of video pictures, includes:

aiming at any two video pictures, determining the distance between the initial position coordinate of each first target object in a first video picture in any two video pictures and the initial position coordinate of each second target object in a second video picture in any two video pictures;

In a possible implementation, after the fusion module 503 obtains the target position coordinates of the target object in the target location, the determination module 502 is further configured to:

determining whether a target object entering a target area exists or not based on target position coordinates corresponding to each target object in a target place and a preset target area;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Corresponding to the positioning method in fig. 1, an embodiment of the present disclosure further provides an electronic device 600, as shown in fig. 7, which is a schematic structural diagram of the electronic device 600 provided in the embodiment of the present disclosure, and includes:

a processor 61, a memory 62, and a bus 63; the memory 62 is used for storing execution instructions and includes a memory 621 and an external memory 622; the memory 621 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 61 and the data exchanged with the external memory 622 such as a hard disk, the processor 61 exchanges data with the external memory 622 through the memory 621, and when the electronic device 600 operates, the processor 61 communicates with the memory 62 through the bus 63, so that the processor 61 executes the following instructions: acquiring video pictures acquired by a plurality of acquisition devices arranged in a target place at the same moment; the video images comprise target objects, wherein different acquisition equipment has different acquisition visual angles in a target place; respectively determining initial position coordinates of a target pair in a target place based on video pictures acquired by a plurality of acquisition devices at the same time; and fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target place.

The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the positioning method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the positioning method in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.

The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of positioning, comprising:

2. The positioning method according to claim 1, wherein the determining initial position coordinates of the target object in the target site based on the video pictures acquired by the plurality of acquisition devices at the same time respectively comprises:

3. The positioning method according to claim 2, wherein the obtaining of the pixel coordinates of the target object in the video frames respectively captured by the plurality of capturing devices at the same time comprises:

4. The positioning method according to claim 2 or 3, wherein the determining of the initial position coordinates of the target object acquired by each acquisition device in the world coordinate system corresponding to the target location based on the pixel coordinates of the target object in the video frame acquired by the acquisition device and the parameter information of the acquisition device comprises:

and determining the initial position coordinates of the target object in the video picture based on a predetermined homography matrix of the acquisition equipment and the corrected pixel coordinates of the target object in the video picture acquired by the acquisition equipment.

5. The positioning method according to any one of claims 1 to 4, wherein the fusing the initial position coordinates of the same target object to obtain the target position coordinates of the target object in the target location comprises:

6. The positioning method according to claim 5, wherein said sequentially fusing the initial position coordinates associated with the same target object to obtain the target position coordinates of the same target object in the target location comprises:

7. The method of claim 6, wherein fusing the first intermediate fused position coordinate with any other initial position coordinate to be fused to generate a second intermediate fused position coordinate comprises:

8. The method according to any one of claims 5 to 7, wherein determining the initial position coordinates of the target object based on the initial position coordinates of the target object determined by the plurality of video pictures comprises:

9. The positioning method according to any one of claims 1 to 8, wherein after obtaining the target position coordinates of the target object in the target site, the positioning method further comprises:

10. A positioning device, comprising:

the acquisition module is used for acquiring video pictures acquired by a plurality of acquisition devices arranged in a target site at the same moment; the video pictures comprise target objects, wherein different acquisition equipment have different acquisition visual angles in the target place;

the determining module is used for respectively determining initial position coordinates of the target object in the target place based on video pictures acquired by the plurality of acquisition devices at the same time;

11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the positioning method according to any one of claims 1 to 9.

12. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the positioning method according to any one of claims 1 to 9.