WO2023103377A1

WO2023103377A1 - Calibration method and apparatus, electronic device, storage medium, and computer program product

Info

Publication number: WO2023103377A1
Application number: PCT/CN2022/105545
Authority: WO
Inventors: 刘思成; 朱烽; 赵瑞
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-12-09
Filing date: 2022-07-13
Publication date: 2023-06-15
Also published as: CN114170324A

Abstract

The present invention relates to a calibration method and apparatus, an electronic device, a storage medium, and a computer program product. The method comprises: detecting an image to be processed to obtain a target object in said image; determining preset feature points of the target object; determining, according to the preset feature points, a mapping matrix corresponding to the preset feature points; and obtaining parameter information of an image acquisition device according to the mapping matrix and the preset feature points, said image being acquired by the image acquisition device.

Description

Calibration method and device, electronic equipment, storage medium and computer program product

Cross References to Related Applications

The embodiment of the present disclosure is based on the Chinese patent application with the application number 202111497801.X, the application date is December 09, 2021, and the application name is "calibration method and device, electronic equipment and storage medium", and requires the Chinese patent application Priority, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.

technical field

The present disclosure relates to but not limited to the technical field of computer vision, and in particular relates to a calibration method and device, electronic equipment, storage media and computer program products.

Background technique

The monitoring system is one of the most widely used systems in the security system. According to the camera parameters of the monitoring camera in the monitoring system, a variety of information can be obtained from the monitored scene, such as the specific position, height, and walking speed of pedestrians. However, the large number and wide distribution of surveillance cameras in cities make it difficult for surveillance cameras to obtain camera parameters. Therefore, there is an urgent need for a low-cost camera calibration method.

Contents of the invention

Embodiments of the present disclosure provide a calibration method and device, electronic equipment, a storage medium, and a computer program product.

An embodiment of the present disclosure provides a calibration method, including: detecting an image to be processed, and acquiring a target object in the image to be processed; determining preset feature points of the target object; and determining according to the preset feature points A mapping matrix corresponding to the preset feature points; according to the mapping matrix and the preset feature points, parameter information of an image acquisition device is obtained, and the image to be processed is acquired by the image acquisition device.

According to the calibration method of the embodiment of the present disclosure, the mapping matrix can be determined through the preset feature points in the area where any target object is located in the image, and then the internal reference information and pose information of the image acquisition device can be determined, without the need for the same target object to appear in multiple The preset position can complete the self-calibration process without the cooperation of the target object, which reduces the manual workload and calibration cost, and is suitable for scenes with a large number of image acquisition devices and wide distribution. For example, it is suitable for many urban surveillance systems. Camera self-calibration.

An embodiment of the present disclosure also provides a calibration device, including: a target object acquisition part configured to detect an image to be processed, and acquire a target object in the image to be processed; a feature point determination part configured to determine the The preset feature points of the target object; the mapping matrix determining part is configured to determine a mapping matrix corresponding to the preset feature points according to the preset feature points; the parameter information determining part is configured to determine according to the mapping matrix As well as the preset feature points, parameter information of an image acquisition device is obtained, and the image to be processed is acquired by the image acquisition device.

An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

An embodiment of the present disclosure also provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.

An embodiment of the present disclosure also provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device is made to execute the above method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of the embodiments with reference to the accompanying drawings.

Description of drawings

The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

FIG. 1 is a schematic flowchart of a calibration method provided by an embodiment of the present disclosure;

FIG. 2A is a schematic diagram of key points of an object in an image to be processed provided by an embodiment of the present disclosure;

FIG. 2B is a schematic diagram of key points of an object in an image to be processed provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of key points of an object in an image to be processed provided by an embodiment of the present disclosure;

FIG. 4A is a schematic diagram of a mask image of a target object in an image to be processed provided by an embodiment of the present disclosure;

FIG. 4B is a schematic diagram of a mask image of a target object in an image to be processed provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of the application of the calibration method provided by the embodiment of the present disclosure;

FIG. 6 is a structural block diagram of a calibration device provided by an embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device provided by an embodiment of the present disclosure;

Fig. 8 is a block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Various embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

The term "and/or" in this article is an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone These three situations. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, components and circuits that are well known to those skilled in the art have not been described in detail so as to obscure the gist of the present disclosure.

In the related art, the calibration of the monitoring camera includes the following two methods:

The first one is the calibration of the calibration board. The calibration of the calibration board needs to manually place the calibration board in the field of view of the surveillance camera, and obtain the camera video stream at the corresponding time. However, due to the wide coverage of cameras in cities, this method is cumbersome and time-consuming, and requires huge labor costs, so it is not applicable in city-level scenarios;

The second is camera self-calibration. Camera self-calibration is to calculate the camera parameters by detecting the positions of the heads and feet of multiple people in different positions of the picture in the video, and estimating the vanishing point of the camera through geometric methods. The camera self-calibration method requires the same person to capture at different positions in the picture, and the robustness of the camera parameter estimation results is low. For example, if the image acquisition device moves, the calibrated parameters may become invalid and need to be re-calibrated.

Based on this, the embodiment of the present disclosure provides a calibration method, and the technical solution in the embodiment of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiment of the present disclosure.

Fig. 1 is a schematic flowchart of a calibration method proposed by an embodiment of the present disclosure. As shown in Fig. 1, the method includes steps S11 to S14, wherein:

Step S11, detecting the image to be processed, and acquiring the target object in the image to be processed;

Step S12, determining preset feature points of the target object;

Step S13, determining a mapping matrix corresponding to the preset feature points according to the preset feature points;

Step S14, according to the mapping matrix and the preset feature points, obtain parameter information of an image acquisition device, and the image to be processed is acquired by the image acquisition device.

According to the calibration method of the embodiment of the present disclosure, the mapping matrix is determined through the preset feature points of the area where any target object is located in the image to be processed, and then the parameter information of the image acquisition device is determined according to the mapping matrix and the preset feature points. Therefore, the calibration method proposed by the embodiments of the present disclosure can complete the self-calibration process without the same target object appearing in multiple preset positions in the field of view of the image acquisition device, and without the cooperation of the target object, thereby reducing the manual workload and calibration. Cost, suitable for scenes with a large number of image acquisition devices and wide distribution, for example, it is suitable for self-calibration of many cameras in urban surveillance systems.

In some implementations, the calibration is performed based on the preset feature points of the target object in the image to be processed acquired by the image acquisition device, wherein the preset feature points may be pixels in the area where the target object is located that can represent the size of the target object The pixel points (or feature points) may be, for example, the pixel points on the top of the head and the bottom of the feet of the target object, or the pixel points on both ends of the shoulders of the target object, and the like. The present disclosure does not limit which part of the target object the preset feature point can be. In some implementations, the preset feature points of the target object are obtained, according to the relationship between the preset feature points (for example, the distance between the top of the head and the bottom of the target object (ie, the height of the target object) is fixed, so , there is a specific distance relationship between two points) to obtain a mapping matrix, and then based on the mapping matrix to solve the internal parameter information and pose information of the image acquisition device, that is, to automatically calibrate the image acquisition device.

In some implementations, for any image acquisition device (for example, a surveillance camera in a surveillance system), the image acquisition device can be set at any position to capture video within the field of view of the image acquisition device. When performing self-calibration of the image acquisition device, obtain a plurality of video frames in the video taken by the image acquisition device, and determine among the plurality of video frames that there is an object (for example, a pedestrian passes by) as a video frame to be processed image.

In some implementations, since the self-calibration needs to determine the relationship between the preset feature points of the target object, for example, the distance relationship between the top of the head and the bottom of the target object, a mapping matrix is obtained based on the relationship for calibration. Therefore, it is necessary to determine a qualified object among multiple objects in the image to be processed as the target object, that is, the preset feature points of the target object can be determined based on the pixel points of the area where the target object is located, and the preset feature points of the target object can be determined based on the preset feature points to determine the above mapping matrix. In some implementations, if the mapping matrix is determined based on the distance between the top of the head and the soles of the feet of the target object, it can be screened out among multiple objects that both the top of the head and the soles of the feet can be detected (not blocked), and An object whose posture is a standing posture (that is, the distance relationship between the top of the head and the sole of the feet conforms to the preset fixed distance relationship, and the gap between the distance and the preset fixed distance will not be too large due to the posture of the target object) as target. In some implementations, if the mapping matrix is determined based on the distance between the shoulders of the target object, multiple objects can be screened out to face or face away from the image acquisition device (that is, the distance relationship between the shoulders An object that conforms to the preset fixed distance relationship and does not cause too large a gap between the distance and the preset fixed distance due to the angle of the target object) and whose shoulders are not occluded is used as the target object. In this way, by screening the objects in the image to be processed to obtain qualified objects as target objects, not only the accuracy of calibration can be improved, but also the calculation amount of the next image processing step can be reduced, and the calibration efficiency of image acquisition equipment can be improved. .

In some implementations, in step S11, when performing target object detection, the detection may be based on the above factors, that is, this step S11 may include step S111 and step S112, wherein:

Step S111, detecting multiple objects in the image to be processed;

Step S112, according to at least one of the poses of the multiple objects and the occlusion states of the multiple objects, filter the multiple objects to obtain the target object, that is, determine how many objects in the image to be processed At least one of the pose and occlusion state of an object, and filter based on at least one of the pose and occlusion state, for example, filter out the standing posture, and the unoccluded object is used as the target object, or filter out the directly opposite object. Or the object facing away from the image acquisition device and not being occluded is used as the target object.

In some implementation manners, when performing screening, key points of each object may be detected, and at least one of the aforementioned pose and occlusion state may be obtained based on the key points, so as to screen multiple objects. Therefore, step S112 includes step S1121 to step S1123, wherein:

Step S1121, acquiring key points of the plurality of objects;

Step S1122, according to the key points of each of the objects, determine the pose of each of the objects;

Step S1123, according to the posture of each of the objects, the multiple objects are screened to obtain the target object.

In some implementations, the key points of each object may be key points representing the body structure of the target object, for example, the key points may include head key points, shoulder key points, elbow key points, hand key points , waist key points, knee key points, foot key points, etc. The present disclosure does not limit the type and position of key points of each object.

In some implementations, step S1121 may include step S1121A and step S1121B, wherein:

Step S1121A, obtaining the location information of each of the objects;

Step S1121B, according to the location information of each of the objects, the key points of the multiple objects are obtained.

In some implementation manners, the image to be processed may be detected through a deep learning neural network to obtain position information of multiple objects, and the disclosure does not limit the detection method. The position information may represent the position of each object. For example, the position information may be a detection frame for frame-selecting the object, a contour line for delineating the outline of the object, or coordinate information representing key points of the target object, etc. . The present disclosure does not limit the specific form of the location information.

In some implementations, the key points of each object can be obtained according to the position information of each object. For example, when the position information is a detection frame for selecting objects, when detecting the key points of each object, the The image blocks in each detection frame are subjected to key point detection to obtain the key points of each object. In this way, the key point detection is only performed on the image blocks in the detection frame without the need for full image detection, which can reduce the calculation amount of key point detection processing. Alternatively, when the position information is a contour line depicting the contour of the object, only the area inside the contour line can be detected to obtain the key points of each object, which can also reduce the calculation amount of key point detection. In some implementations, keypoint detection may be performed by a deep learning neural network. The present disclosure does not limit the specific method of key point detection.

In some implementation manners, after the key points of each object are acquired, target objects meeting the conditions may be filtered out based on the key points. For example, when the mapping matrix is determined by the distance between the top of the head and the soles of the feet, target objects whose tops of the head and soles of the feet can be detected (that is, not occluded) and whose posture is standing can be filtered out. For another example, when the mapping matrix is determined by the distance between the shoulders, target objects whose both shoulders can be detected (that is, are not blocked) and are facing or facing away from the image acquisition device can be screened out.

In some implementations, the pose of each object can be determined by the angle of the line connecting the keypoints. For example, when the angle between the upper body key point line and the thigh key point line is large, it can be considered that the object's posture is not standing. For example, it can be judged that the object's posture is sitting or bowing. For another example, it is possible to judge the connection line between the shoulder key point and the waist key point, and the angle of the connection line between the waist key point and the knee key point. If the angle is large (for example, greater than or equal to 30°, etc. Angle threshold), then the posture of the object can be considered as a non-standing posture.

FIG. 2A and FIG. 2B are schematic diagrams of key points of an object in an image to be processed proposed by an embodiment of the present disclosure. As shown in Figure 2A, the angle between the line 21 between the shoulder key point and the waist key point of the target object and the line 22 between the waist key point and the knee key point is relatively small, for example, less than 30°, Then the posture of the object can be considered as a standing posture. As shown in Figure 2B, the angle between the line 23 between the shoulder key point and the waist key point of the target object and the line 24 between the waist key point and the knee key point is relatively large, for example, greater than 30°, Then it can be considered that the posture of the object is a non-standing posture.

In some implementation manners, multiple objects may be screened based on the poses of the objects determined in the above manner, and objects whose poses are not standing poses may be excluded.

In some implementations, the posture of the object can also be determined in other ways. In some implementations, when it is necessary to filter out the target object facing or facing away from the image acquisition device, it can be determined based on the direction of travel of the target object, etc. Screening, for example, if the direction of travel of the target object is parallel to the height direction (for example, the Y-axis direction) of the image to be processed, then the target object is facing directly or facing away from the image acquisition device; otherwise, the target object is not facing directly or facing away from the image acquisition device. Face away from the image acquisition device. In this way, target images facing directly or facing away from the image acquisition device can be screened out. The present disclosure does not limit the screening method.

In some implementations, in addition to determining the pose of each object, it may also be determined whether each object is occluded. If occluded, it may not be possible to obtain the preset feature points of the object. In some implementations, the keypoints of each object can be utilized to determine whether each object is occluded. The above step S112 includes the following steps S112A to S112D, wherein:

Step S112A, acquiring key points of the plurality of objects;

Step S112B, respectively determining the confidence of each key point of the object;

Step S112C, according to the confidence of the key point, respectively determine the occlusion state of each of the objects;

Step S112D: Filter the multiple objects according to the occlusion state to obtain the target object.

In some implementations, keypoints are detected as described in detail above. When using the deep learning neural network to detect key points in the area where each object is located, the confidence of each key point in the area can be determined, for example, the confidence of a certain key point is high, for example, when it is higher than the confidence threshold, Then the probability that the key point can be accurately detected is high. In some implementation manners, if the confidence level of the key point of the shoulder is 99%, it can be considered that the detection accuracy of the key point is relatively high, and it can be used as the key point of the shoulder of the object. If the confidence level of a certain key is 10%, it is impossible to determine whether the detection of this key point is accurate. Determine the confidence of the key points of each object separately. For example, for an unoccluded object, if the confidence of multiple key points is higher than the confidence threshold, it can be considered that multiple points of the object have been accurately detected. Key points; for another example, if the confidence of some key points of an object is low, it is difficult to confirm whether the part of the key points is detected correctly. The reason may be that some areas of the object are blocked, resulting in inaccurate key point detection .

In some implementations, the occlusion status of each object may be determined based on the confidence of each keypoint, e.g., one or more keypoints of an object have a low confidence, e.g., below a confidence threshold (e.g. , 0.2), then it can be considered that part of the object area is occluded, thereby excluding the object.

Fig. 3 is a schematic diagram of key points of objects in objects to be processed provided by an embodiment of the present disclosure. As shown in FIG. 3 , a part of the area of the person object 3 is exposed in the field of view of the image acquisition device, and key points 31 in this part of the area can be detected; another part of the area of the object 3 is blocked by an obstruction 32, causing the Key points in some areas are difficult to detect, or the detection results are inaccurate.

In some implementations, through the above screening work, eligible target objects can be selected, for example, target objects without occlusion and in a standing posture, or target objects without occlusion and facing directly or facing away from the image acquisition device. The present disclosure does not limit the above conditions.

In some implementations, based on multiple pixel points in the area where the above-mentioned qualified target object is located, the preset feature points of the target object can be determined, that is, the feature points that can express the size information of the target object (for example, height or shoulder width, etc.) Feature points.

In some implementations, step S12 may include the following steps S121 to S124, wherein:

Step S121, acquiring a mask image of the target object;

Step S122, obtaining the covariance matrix of multiple pixels of the mask image;

Step S123, performing eigendecomposition on the covariance matrix to obtain eigenvectors;

Step S124, determining the preset feature points according to the feature vector and a plurality of pixel points of the mask image.

In some implementations, the mask image of each target object is obtained, the mask image is an image representing the outline of the target object, for example, the outline of the target object is detected, and the pixel value of the pixel point within the outline is set to 1 , and the pixel values of the pixel points outside the outline are set to 0, so as to obtain the mask image of the target object. The present disclosure does not limit the pixel values of the pixel points of the mask image.

In some implementations, the covariance matrix of multiple pixels of the mask image of the target object is obtained, for example, the mean value of the pixel values of each pixel point is determined, and the relationship between each pixel point is determined based on the mean value of the pixel values of each pixel point. covariance between. Since the mask image may include a plurality of pixels, a covariance matrix among the plurality of pixels may be obtained.

In some implementations, the covariance matrix is subjected to eigendecomposition based on related techniques, for example, the covariance matrix is decomposed based on eigenvalues to obtain eigenvectors. In the process of decomposing the covariance matrix, two sets of eigenvectors can be obtained, and these two sets of eigenvectors can respectively correspond to two sets of pixels in the mask image, and the pixels corresponding to each set of eigenvectors can form an axis, where, and Eigenvectors corresponding to larger eigenvalues form a longer axis, and eigenvectors corresponding to smaller eigenvalues form a shorter axis.

In some implementation manners, the intersection of the longer axis and the contour line and the area inside the contour line in the mask image may be determined as a preset feature point. The preset feature point may be a feature point representing the height of the target object, that is, there are two intersection points, one of which is located at the top of the target object's head, and the other intersection point is located at the sole of the target object.

FIG. 4A and FIG. 4B are schematic diagrams of a mask image of a target object in an image to be processed provided by an embodiment of the present disclosure. As shown in FIG. 4A and FIG. 4B , the intersection points of the longer axis and the contour line in the mask image are respectively located on the top of the head and the bottom of the feet of the target object, and these two intersection points can be used as preset feature points representing the height of the target object. The relationship between the two intersection points is the above-mentioned preset fixed distance relationship, that is, the distance between the two intersection points can be considered to be fixed, which is equal to the height of the target object. For example, as for the target object in FIG. 4A , the longer axis 41 is perpendicular to the ground, and the distance between the intersection point 42 and the intersection point 43 is the height of the target object. As shown in the target object in Figure 4B, due to factors such as shooting angles, the target object in the image is not facing the image acquisition device, so its visual effect is inclined, and its longer axis 41' is not perpendicular to the ground, but still The straight-line distance between the intersection point 42' and the intersection point 43' is the height of the target object, and the distance between the two intersection points is considered to be a fixed value.

In some implementations, the preset feature points can also be obtained in other ways, for example, the top of the head and the bottom of the feet of the target object can be used as detection targets and directly detected by the neural network to obtain the preset feature points. The present disclosure does not limit the detection method of the preset feature points. In some implementations, a preset feature point representing the shoulder width of the target object can also be determined. For example, the shoulder of the target object can be used as a detection target and detected by a neural network to obtain a preset feature point representing the shoulder width of the target object. point.

In this way, the predetermined feature points can be determined by performing eigendecomposition on the covariance matrix, which can improve the detection accuracy. Compared with the deep learning method, the matrix eigendecomposition method can reduce the amount of computation and reduce the occupancy of computing resources.

In some implementation manners, in step S13, a mapping matrix may be determined based on position information of preset feature points. As mentioned above, the relationship between the preset feature points can be considered as a fixed relationship, for example, the distance relationship between the preset feature points on the top of the head and the bottom of the feet is a preset fixed distance relationship, that is, each target object can be considered Height is fixed. Alternatively, the distance relationship between the preset feature points on both shoulders can be considered as a preset fixed distance relationship, that is, the shoulder width of each target object can be considered to be fixed. The present disclosure does not limit the fixed relationship, for example, in addition to the distance relationship, the fixed relationship may also include an angle relationship and the like. The mapping matrix is a matrix used to represent the positional relationship between different preset feature points of the same target object, and can be used to reflect the fixed relationship.

In some implementations, taking the preset feature points of the top of the head and the soles of the feet as an example, when solving the mapping matrix, the distance between the top of the head and the soles of the feet can be set, that is, the height of each target object is fixed, for example, This fixed value can be set to 1.65 meters or 1.7 meters, etc., and the present disclosure does not limit the set value of the height of the target object.

In some implementations, the distance between the preset feature points of the top of the head and the soles of each target object is the above-mentioned fixed value, and there is a fixed distance between the preset feature points between the top of the head and the soles of each target object. The mapping relationship, the mapping relationship can be expressed by the following formula (1-1):

Wherein, H is a mapping matrix representing the mapping relationship, (u _head , v _head ) represents the coordinates of the preset feature points on the top of the head, and (u _foot , v _foot ) represents the coordinates of the preset feature points on the soles of the feet.

In some implementation manners, coordinates of preset feature points of multiple target objects (for example, greater than or equal to 4) may be determined, and their mapping relationships are respectively determined according to formula (1-1). In some implementation manners, the mapping matrix may be calculated based on coordinates of preset feature points of multiple target objects. In some implementation manners, the mapping matrix may be calculated through DLT (Direct Linear Transformation, direct linear transformation). The parameters in the mapping matrix are initial parameters. In practical applications, the mapping matrix can be optimized to determine internal reference information and pose information of the image acquisition device, that is, to calibrate the image acquisition device.

In some implementations, the mapping matrix may also be determined based on the relationship between other preset feature points, for example, the mapping matrix may be determined in a similar manner based on the fixed distance between the preset feature points on both shoulders. The disclosure does not limit the manner of determining the mapping matrix.

In some implementations, in step S14, the parameters of the mapping matrix can be optimized based on the above determined mapping matrix and the coordinates of the preset feature points, so as to realize the calibration of the image acquisition device, that is, to obtain the image acquisition device Parameter information, the parameter information may include internal parameter information and external parameter information (ie, posture information) of the image acquisition device.

In some implementations, the above mapping matrix is a square matrix, that is, a matrix with the same number of rows and columns, which can be decomposed, for example, can be decomposed into the form of the following formula (1-2):

Among them, h is the fixed distance between preset feature points, for example, the height of the target object; z is the installation height of the image acquisition device; (P ₀ P ₁ P ₂ ) is the first three columns of the projection matrix P of the image acquisition device , the projection matrix P can be expressed as P=K(R|t), wherein, R is the rotation matrix of the image acquisition device, t is the three-dimensional translation vector of the center position of the image acquisition device, and K is the internal reference matrix of the image acquisition device, that is, The projection matrix P can be expressed as the product of the internal reference matrix K of the image acquisition device, the matrix translated by the three-dimensional translation vector t, and the rotation matrix R.

In some implementations, the mapping matrix can be optimized to improve the accuracy of internal reference information and pose information. In some implementation manners, the optimization may be performed by presetting the coordinates of the feature points, so as to obtain accurate parameter information of the image acquisition device, for example, internal reference information and pose information. The above step S14 may include the following steps S141 to S143, wherein:

Step S141, obtaining error information of the preset feature points according to the mapping matrix and the preset feature points;

Step S142, adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix;

Step S143, according to the adjusted mapping matrix, obtain internal reference information and pose information of the image acquisition device.

In some implementations, taking the preset feature points on the top of the head and soles of the feet as an example, the error information of the preset feature points can be determined. According to the mapping relationship described in formula (1-1), the preset feature points on the top of the head The coordinates can be obtained by using the coordinates of the preset feature points on the soles of the feet and the transformation of the mapping matrix. However, due to the possible errors in the initial parameters of the mapping matrix obtained by the above method, the coordinates of the preset feature points on the soles of the feet will be transformed by the mapping matrix. There is an error between the coordinates obtained after and the coordinates of the preset feature points on the top of the head. Therefore, the error can be reduced through optimization, so that the error of the parameters in the mapping matrix can be reduced to optimize the mapping matrix.

In some implementations, optimization can be performed by the following optimization functions (1-3):

f＝∑||P _head -P' _head ||+∑||P _foot -P' _foot || (1-3);

Wherein, P _head is the coordinates of the preset feature points on the top of the head, P _foot is the coordinates of the preset feature points on the soles of the feet, P' _head = H P _foot , that is, P' _head is the preset feature points on the soles of the feet through the mapping matrix Coordinates obtained after transformation; P' _foot =H ^-1 P _head , that is, P' _foot is the coordinates obtained after transforming the preset feature points on the top of the head through the inverse matrix of the mapping matrix. As mentioned above, due to the error in the mapping matrix, P _head and P' _head are not equal, and P _foot and P' _foot are not equal, and ||P _head -P' _head || is the distance between P _head and P' _head The two-norm indicates the error information between the two, and ||P _foot -P' _foot || indicates the two-norm between P _foot and P' _foot indicates the error information between the two. The error information may also be expressed in other forms, for example, a norm, Euclidean distance, etc., and the present disclosure does not limit the specific form of the error information.

In some implementations, the sum of the above-mentioned error information of all target objects is the formula (1-3), and the value of the formula (1-3) can be minimized to obtain a mapping matrix that minimizes the sum of the error information, namely Optimization of the mapping matrix can be achieved.

In some implementations, the parameters of the mapping matrix can be adjusted according to the value of the above formula (1-3), for example, the parameters of the mapping matrix can be adjusted by methods such as the gradient descent method to gradually reduce the value of the formula (1-3) . After multiple adjustments, the value of the formula (1-3) does not continue to shrink, and the adjusted mapping matrix can be obtained. Alternatively, the minimum value of formula (1-3) may be determined by means of nonlinear programming, so as to determine the mapping matrix (that is, the adjusted mapping matrix) when the value of formula (1-3) reaches the minimum value. The present disclosure does not limit the adjustment method.

In some implementations, in the adjusted mapping matrix, according to the formula (1-2), it can be known that based on the parameters of the mapping matrix, the mapping matrix H is decomposed according to the formula (1-2), and the rotation matrix can be obtained , translation vector and internal reference matrix. The internal reference information and pose information of the image acquisition device can be obtained based on this. In some implementation manners, the internal reference information includes the focal length of the image acquisition device, and the pose information includes the height, pitch angle, yaw angle, and roll angle of the image acquisition device.

In some implementation manners, according to the parameters of the adjusted mapping matrix, an internal reference matrix may be obtained based on its decomposition result, and the parameters of the internal reference matrix may include the focal length of the image acquisition device.

In some implementations, according to the parameters of the adjusted mapping matrix, based on its decomposition results can be obtained

The value of , since h is a fixed value (for example, the height of the set target object), therefore, the height z of the image acquisition device can be obtained.

In some implementations, the initial parameters of the image acquisition device can be set, for example, the position of the image acquisition device is the origin in the image to be processed, and its azimuth is the true north direction, so it can be obtained based on the decomposition of the mapping matrix The rotation matrix and translation vector of the image acquisition device to obtain the pitch angle, roll angle and yaw angle. The disclosure does not limit the parameters included in the internal reference information and pose information.

In some implementation manners, the mapping matrix corresponding to the preset feature points of both shoulders may also be optimized in the above manner, and then the above internal parameter information and pose information are obtained. It is also possible to directly use the initial parameters of the mapping matrix obtained in step S13 to solve the internal reference information and pose information without performing the above optimization steps, but the error of the obtained parameters will be higher than the error of the optimized parameters. This disclosure does not limit this.

According to the calibration method of the embodiment of the present disclosure, the preset feature points can be determined through the eigendecomposition of the matrix, and then the mapping matrix can be determined by using the preset feature points, and finally the internal reference information of the image acquisition device can be determined according to the mapping matrix and the preset feature points And pose information, without the need for the same target object to appear in multiple preset positions, and without the cooperation of the target object, the self-calibration process can be completed, which reduces the manual workload and calibration cost, and is applicable to a large number of image acquisition devices and wide distribution In the scene, for example, it is suitable for the self-calibration of many cameras in the urban monitoring system; in addition, the mapping matrix can also be optimized to reduce the error of the mapping matrix and improve the detection accuracy.

Fig. 5 is a schematic diagram of the application of the calibration method provided by the embodiment of the present disclosure. As shown in Figure 5, the image acquisition device 51 is any monitoring camera in the monitoring system of the city, and a plurality of person objects 52 appear in the field of view of the image acquisition device 51, when the image acquisition device 51 is calibrated, it can be The video taken by the image acquisition device 51 is acquired, and the image to be processed of the object 52 existing therein is determined.

In some implementation manners, the objects in the image to be processed may be screened, so as to screen out target objects whose poses are standing and not occluded. For example, the key points of each object can be detected, and the posture of each object can be determined based on the key points to exclude non-standing objects; it can also be determined based on the confidence of the key points Objects that are occluded, so as to obtain the target object whose pose is standing and not occluded.

In some implementation manners, preset feature points of each target object are acquired, for example, preset feature points representing the height of the target object, that is, preset feature points of the top of the head and soles of the feet. Perform eigendecomposition on the covariance matrix of the pixel values of the pixel points of the mask image in the area where each target object is located to obtain the eigenvectors, wherein a series of pixels corresponding to the eigenvectors corresponding to the larger eigenvalues are in the mask image The longer axis of , the intersection of the axis and the contour of the target object in the mask image is the preset feature point of the top of the head and the bottom of the feet.

In some implementation manners, the height of the target object is set as a fixed value, and a mapping matrix representing the mapping relationship of each target object is obtained based on the fixed value, as shown in formula (1-1). Based on the coordinates of the dimension key points of the plurality of target objects, the initial parameters of the mapping matrix can be solved. There may be errors in the initial parameters, which can be optimized to minimize the errors.

In some implementation manners, the mapping matrix is optimized by formula (1-3) to obtain a mapping matrix with minimized errors. Then based on this mapping matrix, internal reference information and pose information are obtained. For example, the mapping matrix is decomposed according to formula (1-2) to obtain the internal reference matrix, rotation matrix and translation vector of the image acquisition device. Based on the parameters of the internal reference matrix, the internal reference information such as focal length can be obtained, and the attitude information such as pitch angle, yaw angle, and roll angle can be obtained based on the rotation matrix and translation vector. Based on the decomposition results of formula (1-2), the image acquisition device can be obtained Attitude information such as altitude.

In some implementation manners, the calibration method can be used for self-calibration of surveillance cameras in a city surveillance system with a large number of surveillance cameras and wide distribution, so as to reduce the workload of manual calibration for each surveillance camera. It can also be used in other camera calibration scenarios, and the present disclosure does not limit the applicable field of the calibration method.

It can be understood that the above-mentioned method embodiments mentioned in this disclosure can all be combined with each other to form a combined embodiment without violating the principle and logic. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

In addition, the present disclosure also provides calibration devices, electronic equipment, computer-readable storage media, and computer program products, all of which can be used to implement any of the calibration methods provided in the present disclosure. For the corresponding technical solutions and descriptions, refer to the corresponding records in the method section.

Fig. 6 is a block diagram of a calibration device provided by an embodiment of the present disclosure. As shown in FIG. 6 , the device includes: a target object acquisition part 61 configured to detect the image to be processed and acquire the target object in the image to be processed; a feature point determination part 62 configured to determine the target object The preset feature points of the object; the mapping matrix determining part 63 is configured to determine a mapping matrix corresponding to the preset feature points according to the preset feature points; the parameter information determining part 64 is configured to determine according to the mapping The matrix and the preset feature points are used to obtain parameter information of an image acquisition device, and the image to be processed is acquired by the image acquisition device.

In some implementation manners, the target object acquisition part is further configured to: detect a plurality of objects in the image to be processed; At least one of the steps is to screen the multiple objects to obtain the target object.

In some implementation manners, the target object acquiring part is further configured to: acquire the key points of the plurality of objects; respectively determine the pose of each of the objects according to the key points of each of the objects; poses of each of the objects, and screen the multiple objects to obtain the target object.

In some implementations, the target object acquiring part is further configured to: acquire the key points of the plurality of objects; respectively determine the confidence of the key points of each of the objects; according to the confidence of the key points , respectively determine the occlusion state of each of the objects; and filter the plurality of objects according to the occlusion state to obtain the target object.

In some implementation manners, the target object obtaining part is further configured to: obtain position information of each of the objects; and obtain key points of the plurality of objects according to the position information of each of the objects.

In some implementations, the feature point determination part is further configured to: acquire a mask image of the target object; acquire a covariance matrix of multiple pixels of the mask image; Perform feature decomposition to obtain feature vectors; determine the preset feature points according to the feature vectors and multiple pixels of the mask image.

In some implementations, the parameter information includes internal reference information and pose information, and the parameter information determining part is further configured to: obtain the preset feature points according to the mapping matrix and the preset feature points According to the error information, the parameters of the mapping matrix are adjusted to obtain an adjusted mapping matrix; according to the adjusted mapping matrix, internal reference information and pose information of the image acquisition device are obtained.

In some implementation manners, the mapping matrix is a matrix used to represent the positional relationship between different preset feature points of the same target object.

In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.

In some embodiments, the functions or modules included in the calibration device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments, and for specific implementation, refer to the descriptions of the above method embodiments.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

An embodiment of the present disclosure also provides a computer program product, including computer readable codes. When the computer readable codes run on the device, the processor in the device executes instructions for implementing the calibration method provided in any of the above embodiments. .

The embodiments of the present disclosure also provide another computer program product, which is used for storing computer-readable instructions. When the instructions are executed, the computer executes the operation of the calibration method provided by any of the above-mentioned embodiments.

Electronic devices may be provided as terminals, servers, or other forms of devices.

FIG. 7 is a block diagram of an electronic device 700 provided by an embodiment of the present disclosure. For example, the electronic device 700 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.

7, electronic device 700 may include one or more of the following components: processing component 702, memory 704, power supply component 706, multimedia component 708, audio component 710, input/output (I/O) interface 712, sensor component 714 , and the communication component 716.

The processing component 702 generally controls the overall operations of the electronic device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 718 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 702 may include one or more modules that facilitate interaction between processing component 702 and other components. For example, processing component 702 may include a multimedia module to facilitate interaction between multimedia component 708 and processing component 702 .

The memory 704 is configured to store various types of data to support operations at the electronic device 700 . Examples of such data include instructions for any application or method operating on the electronic device 700, contact data, phonebook data, messages, pictures, videos, and the like. Memory 704 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable read only memory, EEPROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), programmable read-only memory (Programmable Read-only memory, PROM), read-only memory (Read-only memory , ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The power supply component 706 provides power to various components of the electronic device 700 . Power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 700 .

The multimedia component 708 includes a screen providing an output interface between the electronic device 700 and the user. In some embodiments, the screen may include a liquid crystal display (Liquid Crystal Display, LCD) and a touch panel (TouchPanel, TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front camera and/or a rear camera. When the electronic device 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a microphone (microphone, MIC), and when the electronic device 700 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode, the microphone is configured to receive an external audio signal. Received audio signals may be stored in memory 704 or sent via communication component 716 . In some embodiments, the audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

Sensor assembly 714 includes one or more sensors for providing status assessments of various aspects of electronic device 700 . For example, the sensor component 714 can detect the open/closed state of the electronic device 700, the relative positioning of components, such as the display and the keypad of the electronic device 700, the sensor component 714 can also detect the electronic device 700 or one of the electronic device 700 Changes in position of components, presence or absence of user contact with electronic device 700 , electronic device 700 orientation or acceleration/deceleration and temperature changes in electronic device 700 . Sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 714 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the electronic device 700 and other devices. The electronic device 700 can access a wireless network based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In some embodiments, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In some embodiments, the communication component 716 also includes a near field communication (Near Field Communication, NFC) module to facilitate short-range communication. For example, the NFC module can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (bluetooth, BT) technology and other technology to achieve.

In some embodiments, the electronic device 700 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing equipment (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for performing the above method .

In some embodiments, there is also provided a non-volatile computer-readable storage medium, such as the memory 704 including computer program instructions, which can be executed by the processor 718 of the electronic device 700 to implement the above method.

FIG. 8 is a block diagram of an electronic device 800 provided by an embodiment of the present disclosure. For example, the electronic device 800 may be provided as a server. Referring to FIG. 8 , electronic device 800 includes processing component 802 , which also includes one or more processors, and a memory resource represented by memory 804 for storing instructions executable by processing component 802 , such as application programs. The application program stored in memory 804 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 802 is configured to execute instructions to perform the above method.

The electronic device 800 may also include a power supply component 806 configured to perform power management of the electronic device 800, a wired or wireless network interface 808 configured to connect the electronic device 800 to a network, and an input-output (I/O) interface 810 . The electronic device 800 can operate based on an operating system stored in the memory 804, such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.

In some embodiments, there is also provided a non-volatile computer-readable storage medium, such as a memory 804 including computer program instructions, which can be executed by the processing component 802 of the electronic device 800 to complete the above method. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

The present disclosure can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, Random Access Memory (RAM), Read-Only Memory (ROM), computer Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM or flash memory), Static Random Access Memory (Static Random Access Memory, SRAM), Portable Compact Disc Read-Only Memory (CD- ROM), digital versatile disk (Digital Versatile Disc, DVD), memory stick, floppy disk, mechanically encoded devices, such as punched cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (Field Programmable Gate Array, FPGA) or programmable logic arrays (PLA) are personalized by utilizing state information of computer readable program instructions , the electronic circuit can execute computer-readable program instructions, thereby implementing various aspects of the present disclosure.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.

The computer program product can be specifically realized by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Having described various embodiments of the present disclosure above, it is not exhaustive and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Industrial Applicability

Embodiments of the present disclosure provide a calibration method and device, electronic equipment, a storage medium, and a computer program product, wherein the calibration method includes: detecting an image to be processed, acquiring a target object in the image to be processed; determining the target object in the image to be processed The preset feature points of the target object; according to the preset feature points, determine the mapping matrix corresponding to the preset feature points; according to the mapping matrix and the preset feature points, obtain the parameter information of the image acquisition device, and the The image to be processed is acquired by the image acquisition device. According to the calibration method of the embodiment of the present disclosure, the mapping matrix can be determined through the preset feature points in the area where any target object is located in the image, and then the internal reference information and pose information of the image acquisition device can be determined, without the need for the same target object to appear in multiple The preset position can complete the self-calibration process without the cooperation of the target object. The manual workload and calibration cost are reduced, and it can be applied to scenes with a large number of image acquisition devices and wide distribution.

Claims

A calibration method, comprising:

Detecting the image to be processed, and acquiring the target object in the image to be processed;

determining preset feature points of the target object;

determining a mapping matrix corresponding to the preset feature points according to the preset feature points;

According to the mapping matrix and the preset feature points, parameter information of an image acquisition device is obtained, and the image to be processed is acquired by the image acquisition device.
The method according to claim 1, wherein the detecting the image to be processed and obtaining the target object in the image to be processed comprises:

detecting a plurality of objects in the image to be processed;

According to at least one of the poses of the multiple objects and the occlusion states of the multiple objects, the multiple objects are screened to obtain the target object.
The method according to claim 2, wherein, according to at least one of the poses of the multiple objects and the occlusion states of the multiple objects, the multiple objects are screened to obtain the target object ,include:

acquiring key points of the plurality of objects;

Determining the posture of each of the objects respectively according to the key points of each of the objects;

According to the posture of each of the objects, the multiple objects are screened to obtain the target object.
The method according to claim 2, wherein, according to at least one of the poses of the multiple objects and the occlusion states of the multiple objects, the multiple objects are screened to obtain the target object ,include:

acquiring key points of the plurality of objects;

determining a confidence level for each keypoint of said object separately;

According to the confidence of the key point, respectively determine the occlusion state of each of the objects;

According to the occlusion state of each of the objects, the multiple objects are screened to obtain the target object.
The method according to claim 3 or 4, wherein said acquiring key points of said plurality of objects comprises:

obtaining location information for each of said objects;

Key points of the multiple objects are obtained according to the position information of each of the objects.
The method according to any one of claims 1 to 5, wherein said determining the preset feature points of the target object comprises:

Acquiring a mask image of the target object;

Obtain the covariance matrix of multiple pixels of the mask image;

Carry out eigendecomposition to described covariance matrix, obtain eigenvector;

The preset feature points are determined according to the feature vector and multiple pixel points of the mask image.
The method according to any one of claims 1 to 6, wherein the parameter information includes internal reference information and pose information,

The obtaining parameter information of the image acquisition device according to the mapping matrix and the preset feature points includes:

Obtain error information of the preset feature points according to the mapping matrix and the preset feature points;

adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix;

According to the adjusted mapping matrix, internal reference information and pose information of the image acquisition device are obtained.
The method according to any one of claims 1 to 7, wherein the mapping matrix is a matrix used to represent the positional relationship between different preset feature points of the same target object.
A calibration device comprising:

The target object acquiring part is configured to detect the image to be processed, and acquire the target object in the image to be processed;

a feature point determining part configured to determine preset feature points of the target object;

The mapping matrix determining part is configured to determine a mapping matrix corresponding to the preset feature points according to the preset feature points;

The parameter information determining part is configured to obtain parameter information of an image acquisition device according to the mapping matrix and the preset feature points, and the image to be processed is acquired by the image acquisition device.
The device according to claim 9, wherein the target object acquisition part is further configured to:

detecting a plurality of objects in the image to be processed;

According to at least one of the poses of the multiple objects and the occlusion states of the multiple objects, the multiple objects are screened to obtain the target object.
The device according to claim 10, wherein the target object acquisition part is further configured to:

acquiring key points of the plurality of objects;

Determining the posture of each of the objects respectively according to the key points of each of the objects;

According to the posture of each of the objects, the multiple objects are screened to obtain the target object.
The device according to claim 10, wherein the target object acquisition part is further configured to:

acquiring key points of the plurality of objects;

determining a confidence level for each keypoint of said object separately;

According to the confidence of the key point, respectively determine the occlusion state of each of the objects;

According to the occlusion state of each of the objects, the multiple objects are screened to obtain the target object.
The device according to claim 11 or 12, wherein the target object acquisition part is further configured to:

obtaining location information for each of said objects;

Key points of the multiple objects are obtained according to the position information of each of the objects.
The device according to any one of claims 9 to 13, wherein the feature point determining part is further configured to:

Acquiring a mask image of the target object;

Obtain the covariance matrix of multiple pixels of the mask image;

Carry out eigendecomposition to described covariance matrix, obtain eigenvector;

The preset feature points are determined according to the feature vector and multiple pixel points of the mask image.
The device according to any one of claims 9 to 14, wherein the parameter information includes internal reference information and pose information, and the parameter information determining module is further configured to:

Obtain error information of the preset feature points according to the mapping matrix and the preset feature points;

adjusting parameters of the mapping matrix according to the error information to obtain an adjusted mapping matrix;

According to the adjusted mapping matrix, internal reference information and pose information of the image acquisition device are obtained.
The device according to any one of claims 9 to 15, wherein the mapping matrix is a matrix used to represent the positional relationship between different preset feature points of the same target object.
An electronic device comprising:

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1-8.
A computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method according to any one of claims 1 to 8 is implemented.
A computer program product, the computer program product comprising a computer program or an instruction, when the computer program or instruction is run on an electronic device, the electronic device is made to execute any one of claims 1 to 8 The steps of the labeling method.