CN114170324A

CN114170324A - Calibration method and device, electronic equipment and storage medium

Info

Publication number: CN114170324A
Application number: CN202111497801.XA
Authority: CN
Inventors: 刘思成; 朱烽; 赵瑞
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-11
Also published as: WO2023103377A1

Abstract

The disclosure relates to a calibration method and device, an electronic device and a storage medium, wherein the method comprises the following steps: detecting an image to be processed to obtain a target object; determining preset feature points of a target object; determining a mapping matrix according to the preset characteristic points; and acquiring parameter information of the image acquisition equipment according to the mapping matrix and the preset characteristic points. According to the calibration method disclosed by the embodiment of the disclosure, the mapping matrix can be determined through the preset characteristic points of the area where any target object is located in the image, so that the internal reference information and the pose information of the image acquisition equipment are determined, and the self-calibration process can be completed without the need that the same target object appears at a plurality of preset positions or the need that the target objects are matched. The method reduces the manual workload and the calibration cost, and is suitable for scenes with a large number of image acquisition devices and wide distribution.

Description

Calibration method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a calibration method and apparatus, an electronic device, and a storage medium.

Background

The camera calibration method mainly comprises the step of calibrating a calibration plate, wherein the calibration plate needs to be manually placed in a camera visual field, and a camera video stream corresponding to time is acquired. Because the coverage range of the cameras in the city is wide, the method is complex and time-consuming in operation, requires huge labor cost, and is not suitable for the city level scene.

Disclosure of Invention

The disclosure provides a calibration method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a calibration method, including: detecting an image to be processed to obtain a target object in the image to be processed; determining preset feature points of the target object; determining a mapping matrix corresponding to the preset feature points according to the preset feature points; and acquiring parameter information of image acquisition equipment according to the mapping matrix and the preset characteristic points, wherein the image to be processed is acquired by the image acquisition equipment.

According to the calibration method disclosed by the embodiment of the disclosure, the mapping matrix can be determined through the preset characteristic points of the area where any target object is located in the image, so that the internal reference information and the pose information of the image acquisition equipment are determined, and the self-calibration process can be completed without the need that the same target object appears at a plurality of preset positions or the need that the target objects are matched. The method and the device reduce the manual workload and the calibration cost, and are suitable for scenes with a large number of image acquisition devices and wide distribution, for example, self-calibration of a plurality of cameras of a city monitoring system.

In a possible implementation manner, detecting an image to be processed to obtain a target object in the image to be processed includes: detecting a plurality of objects in the image to be processed; and screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object.

In a possible implementation manner, the screening the plurality of objects according to the postures and/or the occlusion states of the plurality of objects to obtain the target object includes: acquiring key points of the plurality of objects; respectively determining the posture of each object according to the key point of each object; and screening the plurality of objects according to the posture of each object to obtain the target object.

In a possible implementation manner, the screening the plurality of objects according to the postures and/or the occlusion states of the plurality of objects to obtain the target object includes: acquiring key points of the plurality of objects; respectively determining the confidence of the key points of each object; respectively determining the shielding state of each object according to the confidence of the key points; and screening the plurality of objects according to the shielding state to obtain the target object.

In one possible implementation, the obtaining the key points of the plurality of objects includes: obtaining location information for each of the objects; and obtaining key points of the plurality of objects according to the position information of each object.

In a possible implementation manner, determining the preset feature point of the target object includes: acquiring a mask image of the target object; acquiring covariance matrixes of a plurality of pixel points of the mask image; performing characteristic decomposition on the covariance matrix to obtain a characteristic vector; and determining the preset feature points according to the feature vectors and a plurality of pixel points of the mask image.

Through the method, the preset characteristic points can be determined through the characteristic decomposition mode of the matrix, the detection accuracy can be improved, and compared with a deep learning mode, the matrix characteristic decomposition mode can reduce the operation amount and reduce the occupation of operation resources.

In a possible implementation manner, the obtaining the parameter information of the image acquisition device according to the mapping matrix and the preset feature point includes: acquiring error information of the preset characteristic points according to the mapping matrix and the preset characteristic points; adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix; and obtaining internal reference information and pose information of the image acquisition equipment according to the adjusted mapping matrix.

In a possible implementation manner, the mapping matrix is a matrix used for representing a position relationship between different preset feature points of the same target object.

According to an aspect of the present disclosure, there is provided a calibration apparatus including: the target object acquisition module is used for detecting the image to be processed and acquiring a target object in the image to be processed; the characteristic point determining module is used for determining preset characteristic points of the target object; the mapping matrix determining module is used for determining a mapping matrix corresponding to the preset characteristic point according to the preset characteristic point; and the parameter information determining module is used for obtaining the parameter information of the image obtaining equipment according to the mapping matrix and the preset characteristic points, wherein the image to be processed is obtained by the image obtaining equipment.

In one possible implementation manner, the target object obtaining module is further configured to: detecting a plurality of objects in the image to be processed; and screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object.

In one possible implementation manner, the target object obtaining module is further configured to: acquiring key points of the plurality of objects; respectively determining the posture of each object according to the key point of each object; and screening the plurality of objects according to the posture of each object to obtain the target object.

In one possible implementation manner, the target object obtaining module is further configured to: acquiring key points of the plurality of objects; respectively determining the confidence of the key points of each object; respectively determining the shielding state of each object according to the confidence of the key points; and screening the plurality of objects according to the shielding state to obtain the target object.

In one possible implementation manner, the target object obtaining module is further configured to: obtaining location information for each of the objects; and obtaining key points of the plurality of objects according to the position information of each object.

In one possible implementation manner, the feature point determining module is further configured to: acquiring a mask image of the target object; acquiring covariance matrixes of a plurality of pixel points of the mask image; performing characteristic decomposition on the covariance matrix to obtain a characteristic vector; and determining the preset feature points according to the feature vectors and a plurality of pixel points of the mask image.

In one possible implementation manner, the parameter information includes internal reference information and pose information, and the parameter information determination module is further configured to: acquiring error information of the preset characteristic points according to the mapping matrix and the preset characteristic points; adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix; and obtaining internal reference information and pose information of the image acquisition equipment according to the adjusted mapping matrix.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow chart of a calibration method according to an embodiment of the disclosure;

FIGS. 2A and 2B show schematic diagrams of keypoints, according to embodiments of the present disclosure;

FIG. 3 shows a schematic diagram of keypoints, according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of preset feature points according to an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of an application of a calibration method according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a calibration arrangement according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flow chart of a calibration method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes:

in step S11, detecting the image to be processed, and acquiring a target object in the image to be processed;

in step S12, determining preset feature points of the target object;

in step S13, determining a mapping matrix corresponding to the preset feature point according to the preset feature point;

in step S14, parameter information of an image acquisition device is obtained according to the mapping matrix and the preset feature point, where the image to be processed is obtained by the image acquisition device.

For scenes with a large number of image acquisition devices and wide distribution, the calibration is performed manually one by one, or the image acquisition devices acquire required images to perform self-calibration through manual cooperation, so that the workload is large, the calibration cost is high, and the calibration robustness is not high, for example, if the image acquisition devices move, the calibrated parameters may fail and the calibration needs to be performed again.

In a possible implementation manner, for the above problem, the calibration may be performed based on preset feature points of the target object in the image acquired by the image acquisition device, where the preset feature points may be pixel points in a region where the target object is located, and pixel points (or feature points) capable of representing the size of the target object, for example, pixel points at the top and bottom of the target object, pixel points at two ends of two shoulders of the target object, and the like. For example, preset feature points of the target object may be acquired, and a mapping matrix may be acquired based on a relationship between the preset feature points (for example, a distance between the top and the bottom of the head of the target object (i.e., a height of the target object) may be considered to be fixed, and thus, a specific distance relationship between two points may be established), and then internal reference information and pose information of the image acquisition apparatus may be solved based on the mapping matrix, i.e., automatically targeting the image acquisition apparatus.

In one possible implementation, for any image acquisition device (e.g., a surveillance camera in a surveillance system), the device may be placed in any location to capture video of that location. A plurality of video frames of the video may be obtained, and a to-be-processed image in which an object (e.g., a pedestrian passes) is present may be acquired in the video frames.

In a possible implementation manner, since in the self-calibration, a relationship between preset feature points of the target object needs to be determined, for example, a distance relationship between the top and the bottom of the head of the target object, and a mapping matrix is obtained based on the relationship for calibration. Therefore, among a plurality of objects of the image to be processed, a qualified target object can be determined, that is, a preset feature point can be determined based on a pixel point of a region where the target object is located, and the mapping matrix can be determined based on the preset feature point. In an example, if the mapping matrix is determined based on a distance between the vertex and the sole of the target object, a target object whose vertex and sole can be detected (not occluded) and whose posture is a standing posture (i.e., a distance relationship between the vertex and the sole conforms to a preset fixed relationship, and a difference between the distance and the preset fixed distance is not excessively large due to the posture of the target object) may be screened out among the plurality of objects. In another example, if the mapping matrix is determined based on the distance between the shoulders of the target object, a target object that is directly facing or facing away from the image capturing device (i.e., the distance relationship between the shoulders conforms to a preset fixed relationship, and the difference between the distance and the preset fixed distance is not too large due to the angle of the target object) and neither shoulder is occluded may be screened out of the plurality of objects.

In one possible implementation manner, in the step S11, in the screening, the screening may be performed based on the above factors, and the step S11 may include: detecting a plurality of objects in the image to be processed; and screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object. That is, at least one of the postures and the occlusion states of the plurality of objects in the image to be processed may be determined, and based on the factor, a filtering may be performed, for example, to filter out a standing posture with an object that is not occluded as a target object, or to filter out a target object that is facing toward or away from the image acquisition apparatus and that is not occluded.

In one possible implementation, in the screening, the key points of each object may be detected, and the above-mentioned pose and/or occlusion state is obtained based on the key points, so as to screen a plurality of objects. Screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object, wherein the screening comprises the following steps: acquiring key points of the plurality of objects; respectively determining the posture of each object according to the key point of each object; and screening the plurality of objects according to the posture of each object to obtain the target object.

In one possible implementation, the key points of each subject may be obtained, for example, the key points may be key points representing the body structure of the target subject, the key points may include head key points, shoulder key points, elbow key points, hand key points, waist key points, knee key points, foot key points, and the like, and the present disclosure does not limit the types and positions of the key points.

In one possible implementation, the obtaining the key points of the plurality of objects may include: obtaining location information for each of the objects; and obtaining key points of the plurality of objects according to the position information of each object.

In an example, the to-be-processed image may be detected by a deep learning neural network, and the position information of the plurality of objects is obtained, and the detection method is not limited by the present disclosure. The position information may indicate the position of each object, and may be, for example, a detection frame for framing the object, a contour line for drawing the contour of the object, coordinate information indicating a key point of the target object, or the like. The present disclosure does not limit the specific form of the location information.

In an example, the key points of each object may be obtained according to position information of each object, for example, the position information may be a detection frame for framing the object, and when the key points of each object are detected, key point detection may be performed on image blocks in each detection frame to obtain the key points of each object. And only the image blocks in the detection frame are subjected to key point detection, so that the operation amount of key point detection processing can be reduced, and full-image detection is not required. Alternatively, if the position information is a contour line that describes the contour of the object, only the region inside the contour line may be detected to acquire the keypoints of each object, and this method may also reduce the amount of computation for keypoint detection. In an example, the keypoint detection may be performed by a deep learning neural network, and the disclosure does not limit the specific method of keypoint detection.

In one possible implementation, after the key points of each object are obtained, the target objects meeting the conditions can be screened out based on the key points. For example, if the mapping matrix is determined by the distance between the top and bottom of the head and the bottom of the foot, a target object can be screened that is both the top and bottom of the head and the bottom of the foot can be detected (i.e., not occluded) and that has a posture in a standing posture. For another example, if the mapping matrix needs to be determined by the distance between the shoulders, then both shoulders can be screened out to be able to be detected (i.e., not occluded) and to face or face away from the target object of the image acquisition device.

In an example, the pose of each object may be determined by the angle of the line between the keypoints. For example, when the angle between the upper body and the thigh is large, the subject may be considered not to be standing, for example, a sitting posture or a bowing posture of the body, an angle between a line connecting a critical point of the shoulder and a critical point of the waist and a critical point of the knee may be determined, and if the angle is large (for example, an angle threshold value of 30 ° or more), the subject may be considered not to be standing.

Fig. 2A and 2B illustrate schematic views of key points according to an embodiment of the present disclosure, and as shown in fig. 2A, the posture of a target subject may be considered as a standing posture if the angle between the line between the shoulder key point and the waist key point and the line between the waist key point and the knee key point is small, for example, less than 30 °. As shown in fig. 2B, the angle between the line connecting the shoulder key point and the waist key point of the target subject and the line connecting the waist key point and the knee key point is small, for example, greater than 30 °, and the posture of the subject can be considered as the non-standing posture.

In an example, a plurality of objects may be filtered based on the posture of each object determined in the above manner, and objects whose postures are not standing postures may be excluded.

In one possible implementation, the pose of the object may be determined by other means, in an example, if a target object facing the image acquisition device or facing away from the image acquisition device needs to be screened out, the target object may be screened out based on a traveling direction of the target object, for example, if the traveling direction of the target object is parallel to a height direction (e.g., Y-axis direction) of the image to be processed, the target object may face the image acquisition device or face away from the image acquisition device, otherwise, an included angle exists between the target object and the image acquisition device. A target image that faces or faces away from the image acquisition device may be screened out based thereon. The present disclosure does not limit the manner of screening.

In one possible implementation, in addition to determining the posture and/or angle of the target object, it may be determined whether the target object is occluded, and if so, the preset feature point of the target object may not be obtained. The keypoints of the objects can be utilized to determine whether objects are occluded. Screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object, wherein the screening comprises the following steps: acquiring key points of the plurality of objects; respectively determining the confidence of the key points of each object; respectively determining the shielding state of each object according to the confidence of the key points; and screening the plurality of objects according to the shielding state to obtain the target object.

In a possible implementation manner, the detection manner of the key point is as described above, and is not described herein again. When the deep learning neural network detects the key points in the region where each object is located, the confidence of each key point in the region can be determined, for example, the confidence of a certain key point is higher, for example, higher than the confidence threshold, the probability that the key point detection is correct is higher. For example, if the confidence of the shoulder keypoint is 99%, the keypoint is considered to be correctly detected and can be used as the shoulder keypoint of the object. If the confidence of a key is 10%, it cannot be determined whether the detection for that key point is correct. The confidence of the keypoints of each object can be determined separately. For example, for an unoccluded object, the confidence levels of the plurality of keypoints are all higher than the confidence level threshold, and then the plurality of keypoints of the object can be considered to be accurately detected. For another example, if the confidence of some key points of an object is low, it is difficult to determine whether the some key points are correctly detected, which may be caused by that some regions of the object are blocked, so that the key point detection is inaccurate.

In one possible implementation, the occlusion status of each object may be determined based on the confidence of each keypoint, e.g., if one or more keypoints of an object have a low confidence, e.g., below a confidence threshold (e.g., 0.2), then a partial region of the object may be considered occluded. The object may be excluded.

Fig. 3 is a schematic diagram of key points according to an embodiment of the present disclosure, and as shown in fig. 3, a partial region of the object is blocked, so that a part of the key points is difficult to detect, or the detection result is inaccurate.

In a possible implementation manner, through the above-mentioned screening work, a target object meeting the condition, for example, a target object with no occlusion and a standing posture, or a target object with no occlusion and facing toward or away from the image acquisition device, may be selected. The present disclosure does not limit the above conditions.

In a possible implementation manner, based on a plurality of pixel points in the area where the qualified target object is located, a preset feature point of the target object, that is, a feature point capable of expressing size information (for example, height or shoulder width) of the target object may be determined.

In one possible implementation, step S12 may include: acquiring a mask image of the target object; acquiring covariance matrixes of a plurality of pixel points of the mask image; performing characteristic decomposition on the covariance matrix to obtain a characteristic vector; and determining the preset feature points according to the feature vectors and a plurality of pixel points of the mask image.

In a possible implementation manner, a mask image of each target object may be obtained, where the mask image is an image representing a contour of the target object, for example, the contour of the target object may be detected, and an image in which pixel values of pixel points inside the contour are 1 and pixel values of pixel points outside the contour are 0 is obtained, and the image is the mask image of the target object. The mask image of each target object can be acquired in this manner. The present disclosure does not limit the pixel values of the pixel points of the mask image.

In one possible implementation, a covariance matrix of a plurality of pixel points of the mask image of the target object may be obtained, for example, a mean of pixel values of the pixel points may be determined, and a covariance between the pixel points may be determined based on the mean. Since the mask image may include a plurality of pixel points, a covariance matrix of the plurality of pixel points with respect to each other may be obtained.

In one possible implementation, feature decomposition may be performed on the covariance matrix based on a correlation technique to obtain feature vectors, and in the decomposition process, two sets of feature vectors may be obtained, where the two sets of feature vectors may respectively correspond to two sets of pixel points in the mask image, and the pixel points corresponding to each set of feature vectors may form an axis. The eigenvectors corresponding to the larger eigenvalues may make up a longer axis, and the eigenvectors corresponding to the smaller eigenvalues may make up a shorter axis.

In one possible implementation, the intersection of the longer axis with the region on and within the contour line in the mask image may be determined as the predetermined feature point. The preset feature points may be feature points representing heights of the target object, that is, there are two intersection points, where the position of one of the intersection points is the top of the head of the target object, and the position of the other intersection point is the sole of the target object.

Fig. 4 is a schematic diagram illustrating preset feature points according to an embodiment of the disclosure, and as shown in fig. 4, intersection points of a longer axis and an area on and in a contour line in a mask image are respectively located on the top and bottom of the head and the bottom of the foot of a target object, and the two intersection points can be used as the preset feature points representing the height of the target object. The relationship between the two intersection points is the above-mentioned fixed distance relationship, i.e., the distance between the two intersection points can be considered fixed, equal to the height of the target object. For example, as shown in the right target object in FIG. 4, the longer axis is perpendicular to the ground, and the distance between the two intersections is the height of the target object. As shown in the left target object in fig. 4, the target object in the image is not directly facing the image capturing device due to the shooting angle, and the like, so that the visual effect is inclined, the longer axis of the target object is not perpendicular to the ground, but the straight-line distance between two intersection points is taken as the height of the target object, and the distance between the two intersection points is considered to be a fixed value.

In a possible implementation manner, the preset feature points may also be obtained in other manners, for example, two points of the top and the bottom of the head of the target object may be used as detection targets, and the detection may be directly performed through a neural network to obtain the preset feature points. The present disclosure does not limit the detection manner of the preset feature points. In another example, a preset feature point representing the shoulder width of the target object may also be determined, for example, the shoulder of the target object may be used as a detection target to be detected by a neural network to obtain the preset feature point representing the shoulder width of the target object.

In one possible implementation manner, in step S13, the mapping matrix may be determined based on the position information of the preset feature points. As described above, the relationship between the preset feature points may be considered to be a fixed relationship, for example, the distance relationship between the preset feature points of the top and bottom of the head and the bottom of the foot may be fixed, that is, the height of each target object may be considered to be fixed. Alternatively, the distance relationship between the preset feature points of both shoulders may be considered fixed, that is, the shoulder width of each target object may be considered fixed. The fixed relationship is not limited by the present disclosure, and for example, the fixed relationship may include an angular relationship or the like in addition to a distance relationship. The mapping matrix may be used to reflect the fixed relationship, and the mapping matrix is a matrix used to represent a position relationship between different preset feature points of the same target object.

In a possible implementation manner, taking preset feature points of the top and the bottom of the head as an example, when the mapping matrix is solved, the distance between the top and the bottom of the head may be set, that is, the height of each target object is fixed, for example, the fixed value may be set to 1.65 meters, 1.7 meters, and the like, and the set value of the height of the target object is not limited by the present disclosure.

In a possible implementation manner, the distance between the preset feature points of the top and the bottom of the head and the bottom of each target object is the fixed value, and the preset feature points between the top and the bottom of the head and the bottom of each target object have a fixed mapping relationship, and the mapping relationship can be expressed by the following formula (1):

wherein H is a mapping matrix representing the mapping relation, and (u) is_head，v_head) Coordinates representing preset feature points of the vertex, (u)_foot，v_foot) Coordinates of preset feature points representing the sole of the foot.

In one possible implementation, the coordinates of preset feature points of a plurality of target objects (for example, 4 or more) may be determined, and the mapping relationship thereof is determined according to formula (1), respectively. Further, the mapping matrix may be calculated based on coordinates of preset feature points of a plurality of target objects. In an example, the mapping matrix may be calculated by DLT (Direct Linear Transformation). The parameters in the mapping matrix are initial parameters, and the mapping matrix can be further optimized to determine internal reference information and pose information of the image acquisition equipment, namely, to calibrate the image acquisition equipment.

In a possible implementation manner, the mapping matrix may also be determined based on a relationship between other preset feature points, for example, the mapping matrix may be determined in a similar manner based on a fixed distance between the preset feature points of the shoulders, which is not limited by the present disclosure.

In one possible implementation, in step S14, parameters of the mapping matrix may be further optimized based on the mapping matrix determined above and the coordinates of the preset feature points to calibrate the image acquisition apparatus, that is, to obtain parameter information of the image acquisition apparatus, which may include internal reference information and external reference information (i.e., pose information) of the image acquisition apparatus.

In one possible implementation, the above mapping matrix is a square matrix, i.e., a matrix with equal number of rows and columns, which can be decomposed, for example, into the form of the following equation (2):

where h is a fixed distance between preset feature points, e.g., the height of the target object, z is the mounting height of the image acquisition device, (P)₀ P₁ P₂) The first three columns of the projection matrix P of the image capturing device may be denoted as P ═ K (R | t), where R is the rotation matrix, t is the translation vector, and K is the internal reference matrix of the image capturing device.

In one possible implementation, the mapping matrix may be optimized to improve the accuracy of the internal reference information and the pose information. In an example, the optimization may be performed by presetting coordinates of the feature points to obtain accurate parameter information, such as internal reference information and pose information, of the image acquisition apparatus. Step S14 may include: acquiring error information of the preset characteristic points according to the mapping matrix and the preset characteristic points; adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix; and obtaining internal reference information and pose information of the image acquisition equipment according to the adjusted mapping matrix.

In a possible implementation manner, taking the preset feature points of the top and the bottom of the head as an example, the error information may be determined, according to the mapping relationship described in formula (1), the coordinates of the preset feature points of the top of the head may be obtained by transforming the coordinates of the preset feature points of the bottom of the foot through a mapping matrix, however, the initial parameters of the mapping matrix obtained above may have an error, so that the coordinates obtained by transforming the coordinates of the lower preset feature points through the mapping matrix may have an error with the coordinates of the head feature points. The error may be reduced by optimization, thereby reducing the error of the parameters in the mapping matrix to optimize the mapping matrix.

In an example, the optimization can be performed by the following optimization function (3):

f＝∑||P_head-P′_head||+∑||P_foot-P′_foot|| (3)

wherein, P_headCoordinates of predetermined characteristic points, P, for the vertex_footIs the coordinate of a preset feature point of the sole, P'_head＝HP_footThat is, coordinates, P ', obtained by transforming preset feature points of the sole through a mapping matrix'_foot＝H^-1P_headThat is, coordinates obtained by transforming the preset feature points at the vertex by the inverse matrix of the mapping matrix. As described above, P is caused by an error in the mapping matrix_headAnd P'_headIs not equal to P_footAnd P'_footNor equal, | | P_head-P′_headI is P_headAnd P'_headTwo norms in between, representing error information between the two, | | P_foot-P′_footI is P_footAnd P'_footThe two norms in between, represent the error information between the two. The error information may also be represented in other forms, such as a norm, euclidean distance, etc., and the present disclosure does not limit the specific form of the error information.

In a possible implementation manner, the sum of the above error information of all the target objects is formula (3), and the value of formula (3) may be minimized to obtain a mapping matrix that minimizes the sum of the error information, i.e. the optimization of the mapping matrix may be implemented.

In a possible implementation manner, the parameters of the mapping matrix may be adjusted according to the values of the above formula (3), for example, the parameters of the mapping matrix may be adjusted by a gradient descent method or the like to gradually reduce the values of the formula (3), and after multiple adjustments, the values of the formula (3) are not reduced any more, and then the adjusted mapping matrix may be obtained. Alternatively, the minimum value of the formula (3) may be determined by a non-linear programming method or the like, and thus the mapping matrix (i.e., the adjusted mapping matrix) when the value of the formula (3) reaches the minimum value is determined. The present disclosure is not limited to the manner of adjustment.

In a possible implementation manner, in the adjusted mapping matrix, as can be known from formula (2), based on the parameters of the mapping matrix, the mapping matrix H is subjected to matrix decomposition in the manner of formula (2), and a rotation matrix, a translation vector, and an internal reference matrix can be obtained. The internal reference information and the pose information of the image acquisition apparatus can be obtained based on this. In an example, the internal reference information includes a focal length of the image acquisition device, and the pose information includes a height, a pitch angle, a yaw angle, and a roll angle of the image acquisition device.

In a possible implementation manner, according to the adjusted parameters of the mapping matrix, an internal reference matrix may be obtained based on the decomposition result, where the parameters of the internal reference matrix may include the focal length of the image capturing device.

In a possible implementation, the adjusted mapping matrix parameters are used to obtain the decomposition result

Since h is a fixed value (e.g., the set height of the target object), the height z of the image acquisition apparatus can be obtained.

In one possible implementation, initial parameters of the image capturing device may be set, for example, the image capturing device is located at an origin in the image to be processed, and an azimuth angle thereof is a north direction, so that a pitch angle, a roll angle, and a yaw angle of the image capturing device may be obtained based on the rotation matrix and the translation vector obtained by the decomposition thereof. The present disclosure does not limit the parameters included in the reference information and the pose information.

In a possible implementation manner, the mapping matrix corresponding to the preset feature points of the shoulders can be optimized through the above manner, and then the internal reference information and the pose information are solved. The initial parameters of the mapping matrix obtained in step S13 may be used to solve the internal reference information and pose information without performing the optimization step, but the error of the solved parameters may be higher than the error of the optimized parameters. The present disclosure is not so limited.

According to the calibration method disclosed by the embodiment of the disclosure, the preset feature points can be determined in a matrix feature decomposition mode, the detection accuracy can be improved, the mapping matrix can be determined through the preset feature points, the mapping matrix can be optimized, the error of the mapping matrix is reduced, the internal reference information and the pose information of the image acquisition equipment are determined, the same target object does not need to appear at a plurality of preset positions, and the self-calibration process can be completed without the cooperation of the target object. The method and the device reduce the manual workload and the calibration cost, and are suitable for scenes with a large number of image acquisition devices and wide distribution, for example, self-calibration of a plurality of cameras of a city monitoring system.

Fig. 5 is an application schematic diagram of the calibration method according to the embodiment of the disclosure, and as shown in fig. 5, the image acquisition device is any monitoring camera in a monitoring system of a city, and when calibrating the camera, a video shot by the image acquisition device can be acquired, and a to-be-processed image in which an object exists is determined.

In one possible implementation, objects in the image to be processed may be screened to screen out target objects whose postures are standing postures and are not shielded. For example, keypoints of each object may be detected and the pose of each object determined based on the keypoints to exclude objects that are not standing. And determining whether the position of the key point is occluded or not based on the confidence coefficient of the key point, and excluding the occluded object. Thus, the target object with the posture of standing and not being shielded can be obtained.

In one possible implementation, preset feature points of each target object, for example, preset feature points representing the height of the target object, that is, preset feature points of the top and bottom of the head, may be acquired. The feature decomposition can be carried out on the covariance matrix of the pixel values of the pixel points of the mask image of the region where each target object is located to obtain a feature vector, wherein a series of pixel points corresponding to the feature vector corresponding to a larger feature value are longer axes in the mask image, and the intersection point of the axes and the outline of the target object in the mask image is the preset feature point of the vertex and the sole.

In one possible implementation, the height of the target object may be set to a fixed value, and a mapping matrix representing the mapping relationship of each target object is obtained based on this, as shown in equation (1). Based on the coordinates of the size key points of the plurality of target objects, the initial parameters of the mapping matrix may be solved. The initial parameters may have errors that can be optimized to minimize the errors.

In one possible implementation, the mapping matrix may be optimized by equation (3) to obtain a mapping matrix with minimized error. And then the internal reference information and the pose information can be obtained based on the mapping matrix. For example, the decomposition may be performed according to equation (2) to obtain an internal reference matrix, a rotation matrix, and a translation vector of the image acquisition device. The parameters based on the internal reference matrix can obtain internal reference information such as focal length, the attitude information such as pitch angle, yaw angle and roll angle can be obtained based on the rotation matrix and the translation vector, and the attitude information such as height of the image acquisition equipment can be obtained based on the decomposition result of the formula (2).

In a possible implementation manner, the calibration method can be used in a city monitoring system with numerous monitoring cameras and wide distribution, so as to reduce the workload of manually calibrating each monitoring camera. The method can also be used in other camera calibration scenes, and the application field of the calibration method is not limited by the disclosure.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a calibration apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the calibration methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions of the method portions are not repeated.

Fig. 6 shows a block diagram of a calibration apparatus according to an embodiment of the present disclosure, as shown in fig. 6, the apparatus includes: the target object acquisition module 11 is configured to detect an image to be processed and acquire a target object in the image to be processed; a feature point determining module 12, configured to determine preset feature points of the target object; a mapping matrix determining module 13, configured to determine, according to the preset feature point, a mapping matrix corresponding to the preset feature point; a parameter information determining module 14, configured to obtain parameter information of an image obtaining device according to the mapping matrix and the preset feature point, where the image to be processed is obtained by the image obtaining device.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the calibration method provided in any of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the calibration method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 7 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 8 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932^TM，Mac OS X^TM，Unix^TM,Linux^TM，FreeBSD^TMOr the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A calibration method, comprising:

detecting an image to be processed to obtain a target object in the image to be processed;

determining preset feature points of the target object;

determining a mapping matrix corresponding to the preset feature points according to the preset feature points;

and acquiring parameter information of image acquisition equipment according to the mapping matrix and the preset characteristic points, wherein the image to be processed is acquired by the image acquisition equipment.

2. The method according to claim 1, wherein detecting the image to be processed and acquiring the target object in the image to be processed comprises:

detecting a plurality of objects in the image to be processed;

and screening the plurality of objects according to the postures and/or the shielding states of the plurality of objects to obtain the target object.

3. The method according to claim 2, wherein the screening the plurality of objects according to the postures and/or the occlusion states of the plurality of objects to obtain the target object comprises:

acquiring key points of the plurality of objects;

respectively determining the posture of each object according to the key point of each object;

and screening the plurality of objects according to the posture of each object to obtain the target object.

4. The method according to claim 2, wherein the screening the plurality of objects according to the postures and/or the occlusion states of the plurality of objects to obtain the target object comprises:

acquiring key points of the plurality of objects;

respectively determining the confidence of the key points of each object;

respectively determining the shielding state of each object according to the confidence of the key points;

and screening the plurality of objects according to the shielding state to obtain the target object.

5. The method of claim 3 or 4, wherein obtaining keypoints for the plurality of objects comprises:

obtaining location information for each of the objects;

and obtaining key points of the plurality of objects according to the position information of each object.

6. The method of claim 1, wherein determining the preset feature point of the target object comprises:

acquiring a mask image of the target object;

acquiring covariance matrixes of a plurality of pixel points of the mask image;

performing characteristic decomposition on the covariance matrix to obtain a characteristic vector;

and determining the preset feature points according to the feature vectors and a plurality of pixel points of the mask image.

7. The method of claim 1, wherein the parameter information includes internal reference information and pose information,

acquiring parameter information of the image acquisition equipment according to the mapping matrix and the preset feature points, wherein the parameter information comprises:

acquiring error information of the preset characteristic points according to the mapping matrix and the preset characteristic points;

adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix;

and obtaining internal reference information and pose information of the image acquisition equipment according to the adjusted mapping matrix.

8. The method according to claim 1, wherein the mapping matrix is a matrix for representing a positional relationship between different preset feature points of the same target object.

9. A calibration device, comprising:

the target object acquisition module is used for detecting the image to be processed and acquiring a target object in the image to be processed;

the characteristic point determining module is used for determining preset characteristic points of the target object;

the mapping matrix determining module is used for determining a mapping matrix corresponding to the preset characteristic point according to the preset characteristic point;

and the parameter information determining module is used for obtaining the parameter information of the image obtaining equipment according to the mapping matrix and the preset characteristic points, wherein the image to be processed is obtained by the image obtaining equipment.

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 8.

11. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 8.