CN113688846A

CN113688846A - Object size recognition method, readable storage medium, and object size recognition system

Info

Publication number: CN113688846A
Application number: CN202110975318.1A
Authority: CN
Inventors: 罗欢; 徐青松; 李青
Original assignee: Chengdu Ruiqi Technology Co ltd
Current assignee: Chengdu Ruiqi Technology Co ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2021-11-23
Anticipated expiration: 2041-08-24
Also published as: WO2023024766A1; CN113688846B

Abstract

The invention provides an object size identification method, a readable storage medium and an object size identification system, wherein the object size identification method comprises the following steps: acquiring at least two images of an object from different visual angles by shooting; respectively acquiring two-dimensional position information of a plurality of object vertexes of each image; establishing a three-dimensional space coordinate system according to at least two images and a characteristic point matching method, and determining the space position of a camera; and selecting any one of the images, and obtaining three-dimensional space position information of a plurality of vertexes based on the parameter information calibrated by the camera and the space position of the camera, so as to obtain the size of the object. According to the configuration, at least two images of an object at different visual angles are obtained through shooting, the size of the object can be obtained by combining the parameter information calibrated by the camera, the operation steps are simple and convenient, and the problem that the size of the object in the space cannot be measured in the prior art is solved.

Description

Object size recognition method, readable storage medium, and object size recognition system

Technical Field

The present invention relates to the field of object recognition technologies, and in particular, to an object size recognition method, a readable storage medium, and an object size recognition system.

Background

When there is no measuring tool at hand or the object to be measured is not at hand, it is a difficult problem how to measure the size of the object. At present, people often take pictures of objects and can acquire images of the objects, however, the objects in the images obtained by shooting do not have scales, and the actual sizes of the objects cannot be known. How to simply measure the size of an object in space is an urgent problem to be solved.

Disclosure of Invention

The invention aims to provide an object size identification method, a readable storage medium and an object size identification system, which aim to solve the problem that the size of an object is difficult to measure in the prior art.

In order to solve the above technical problem, according to a first aspect of the present invention, there is provided an object size identification method including:

acquiring at least two images of an object from different visual angles by shooting;

respectively acquiring two-dimensional position information of a plurality of object vertexes of each image;

establishing a three-dimensional space coordinate system according to at least two images and a characteristic point matching method, and determining the space position of a camera; and

and selecting any one of the images, and obtaining three-dimensional space position information of the vertexes of the objects based on the parameter information calibrated by the camera and the space position of the camera, so as to obtain the size of the object.

Optionally, the step of acquiring two-dimensional position information of vertices of the plurality of objects in the image includes:

inputting the images into the trained vertex recognition model to obtain the relative positions of each object vertex and the image vertex corresponding to the object vertex;

determining the actual position of each object vertex in the image according to the relative position of each object vertex and the image vertex corresponding to the object vertex;

and according to the actual position of each object vertex in the image, taking a reference point of the image as a coordinate origin of a two-dimensional image coordinate system to obtain two-dimensional position information of each object vertex in the two-dimensional image coordinate system.

Optionally, the step of determining the actual position of each object vertex in the image according to the relative position of each object vertex and the image vertex corresponding to the object vertex comprises:

determining the reference position of each object vertex in the image according to the relative position of each object vertex and the image vertex corresponding to the object vertex;

aiming at each object vertex, carrying out corner point detection in a preset area where the reference position of the object vertex is located;

and determining the actual position of each object vertex in the image according to the corner detection result.

Optionally, the preset region where the reference position of the object vertex is located is a circular region with a pixel point at the reference position of the object vertex as a center of a circle and a first preset pixel as a radius;

for each object vertex, performing corner detection in a preset area where a reference position of the object vertex is located, including:

and performing corner detection on pixel points in the circular area corresponding to each object vertex, and in the corner detection process, all the pixel points with the characteristic value change amplitude larger than a preset threshold value are taken as candidate corners, and determining a target corner corresponding to each object vertex from the candidate corners.

Optionally, the determining the actual position of each vertex of the object in the image according to the corner detection result includes:

and for each object vertex, if the corner detection result of the object vertex contains a corner, determining the position of the corner as the actual position of the object vertex in the image, and if the corner detection result of the object vertex does not contain a corner, determining the reference position of the object vertex in the image as the actual position of the object vertex in the image.

Optionally, the step of obtaining a plurality of object vertices in the image includes:

processing the image to obtain a line graph of a gray level contour in the image;

combining similar lines in the line graph to obtain a plurality of reference boundary lines;

processing the image through the trained boundary line region recognition model to obtain a plurality of boundary line regions of the object in the image;

for each boundary line region, determining a target boundary line corresponding to the boundary line region from a plurality of reference boundary lines;

determining the edge of an object in the image according to the determined plurality of target boundary lines;

and configuring the intersection points of the edges of the objects in the image as the object vertexes.

Optionally, the step of merging similar lines in the line drawing to obtain a plurality of reference boundary lines includes:

merging similar lines in the line graph to obtain a plurality of initial merging lines, and determining a boundary matrix according to the plurality of initial merging lines;

combining similar lines in the plurality of initial combination lines to obtain a target line, and taking the uncombined initial combination lines as the target line;

and determining a plurality of reference boundary lines from the plurality of target lines according to the boundary matrix.

Optionally, the step of establishing a three-dimensional space coordinate system according to at least two images and a feature point matching method, and determining the spatial position of the camera includes:

extracting two-dimensional feature points which are matched with each other in at least two images;

obtaining a constraint relation of at least two images according to the two-dimensional feature points matched with each other;

and obtaining the three-dimensional space position of the two-dimensional feature point in each image based on the constraint relation, and further obtaining the space position of the camera corresponding to each image.

In order to solve the above technical problem, according to a second aspect of the present invention, there is also provided a readable storage medium having stored thereon a program which, when executed, implements the object size recognition method as described above.

In order to solve the above technical problem, according to a third aspect of the present invention, there is also provided an object size recognition system comprising a processor and a memory, the memory having stored thereon a program which, when executed by the processor, implements the object size recognition method as described above.

In summary, in the object size recognition method, the readable storage medium and the object size recognition system provided by the present invention, the object size recognition method includes: acquiring at least two images of an object from different visual angles by shooting; respectively acquiring two-dimensional position information of a plurality of object vertexes of each image; establishing a three-dimensional space coordinate system according to at least two images and a characteristic point matching method, and determining the space position of a camera; and selecting any one of the images, and obtaining three-dimensional space position information of a plurality of vertexes based on the parameter information calibrated by the camera and the space position of the camera, so as to obtain the size of the object.

According to the configuration, at least two images of an object at different visual angles are obtained through shooting, the size of the object can be obtained by combining the parameter information calibrated by the camera, the operation steps are simple and convenient, and the problem that the size of the object in the space cannot be measured in the prior art is solved.

Drawings

It will be appreciated by those skilled in the art that the drawings are provided for a better understanding of the invention and do not constitute any limitation to the scope of the invention. Wherein:

FIG. 1 is a flow chart of an object size identification method of an embodiment of the present invention;

FIG. 2 is a schematic view of a photographed object according to an embodiment of the present invention;

FIG. 3 is a schematic view of another object being photographed in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of line merging according to an embodiment of the present invention.

Detailed Description

To further clarify the objects, advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is to be noted that the drawings are in greatly simplified form and are not to scale, but are merely intended to facilitate and clarify the explanation of the embodiments of the present invention. Further, the structures illustrated in the drawings are often part of actual structures. In particular, the drawings may have different emphasis points and may sometimes be scaled differently.

As used in this specification, the singular forms "a", "an" and "the" include plural referents, the term "or" is generally employed in its sense including "and/or," the terms "a" and "an" are generally employed in their sense including "at least one," the terms "at least two" are generally employed in their sense including "two or more," and the terms "first", "second" and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", "third" may explicitly or implicitly include one or at least two of the features. The specific meanings of the above terms in the present specification can be understood by those of ordinary skill in the art as appropriate.

The following description refers to the accompanying drawings.

Referring to fig. 1, an embodiment of the invention provides an object size identification method, which includes:

step S1: at least two images of an object from different viewing angles are acquired by shooting. It will be appreciated that each of the images has a plurality of object vertices representing the object. In some embodiments, the images may be captured using a binocular camera or a depth camera, and in other embodiments, the images may be captured using a cell phone having more than two cameras.

Step S2: and acquiring two-dimensional position information of a plurality of object vertexes of each image. The two-dimensional position information of the object vertex here refers to the coordinates of each object vertex in the image coordinate system.

Step S3: establishing a three-dimensional space coordinate system according to at least two images and a characteristic point matching method, and determining the space position of a camera; and

step S4: and selecting any one of the images, and obtaining three-dimensional space position information of the vertexes of the objects based on the parameter information calibrated by the camera and the space position of the camera, so as to obtain the size of the object.

Referring to fig. 2, in an exemplary embodiment, the object to be photographed is a rectangle (e.g., a business card) having four edges (i.e., lines) a 1-a 4, and a junction of two adjacent edges of the four edges forms an object vertex, i.e., the business card in the image has four object vertices a 1-a 4. Referring to fig. 3, in another example, since the image does not capture all the area of the whole object, the top left corner and the top right corner of the object are not included in the image, and for this case, the 4 edge lines B1-B4 of the business card in the image may be extended to obtain the virtual top of the bottom left corner and the virtual top right corner of the object, and together with the actual top captured, the 4 object vertices B1-B4 of the business card are obtained. Of course, the rectangular shape is only an example of the object to be photographed, and is not limited to the shape of the object to be photographed, and the object to be photographed may have other planar or solid shapes. But preferably the object to be photographed should have several vertices for subsequent recognition and calculation.

After the object vertex in the image is identified, step S2 is executed to acquire two-dimensional position information of the object vertex. In an alternative example, the step of obtaining two-dimensional position information of vertices of a plurality of objects in the image comprises:

step SA 21: and inputting the images into the trained vertex recognition model to obtain the relative positions of each object vertex and the image vertex corresponding to the object vertex. The vertex recognition model herein may be implemented, for example, using machine learning techniques and run, for example, on a general purpose computing device or a special purpose computing device. The object vertex recognition model is a neural network model obtained through pre-training. For example, the object vertex recognition model may be implemented by using a neural network such as a DEEP convolutional neural network (DEEP-CNN). In some embodiments, the image is input to the object vertex recognition model, which may recognize object vertices in the image to derive the relative position of each object vertex to its corresponding image vertex. It is to be understood that the image vertices of the image refer to the vertices of the image edges, for example, in fig. 2, the image is rectangular, and the image vertices are a 5-a 8, respectively.

Alternatively, the vertex recognition model may be built by machine learning training. In one example, the training step of the object vertex recognition model includes:

step SA211, acquiring a training sample set, wherein each sample image in the training sample set is labeled with each object vertex of an object in an image and the relative position of each object vertex and the corresponding image vertex;

step SA212, obtaining a test sample set, where each sample image in the test sample set is also labeled with each object vertex of an object in the image and a relative position of each object vertex and its corresponding image vertex, where the test sample set is different from the training sample set;

step SA213 of training the object vertex recognition model based on the training sample set;

step SA214, testing the object vertex identification model based on the test sample set;

step SA215, when the test result indicates that the identification accuracy of the object vertex identification model is smaller than a preset accuracy, increasing the number of samples in the training sample set for retraining; and

step SA216, when the test result indicates that the identification accuracy of the object vertex identification model is greater than or equal to the preset accuracy, training is completed.

Alternatively, the type of the object to be measured is not particularly limited in the present invention, and the object may be a two-dimensional object such as a business card, a test paper, a laboratory sheet, a document, and an invoice, or may be a three-dimensional object. For each object type of each object, a certain number of sample images labeled with corresponding information are obtained, and the number of sample images prepared for each object type may be the same or different. Each sample image may include the entire region of the object (as shown in fig. 2) or may include only a partial region of the object (as shown in fig. 3). The sample images acquired for each object type may include images taken at different angles of capture, under different lighting conditions, as much as possible. In these cases, the corresponding information labeled for each sample image may also include information such as the shooting angle, illumination, and the like of the sample image.

The sample image subjected to the labeling processing may be divided into a training sample set for training the object vertex recognition model and a test sample set for testing a training result. Typically, the number of samples in the training sample set is significantly larger than the number of samples in the test sample set, e.g., the number of samples in the test sample set may be 5% to 20% of the total number of sample images, while the number of samples in the corresponding training sample set may be 80% to 95% of the total number of sample images. It will be appreciated by those skilled in the art that the number of samples in the training sample set and the test sample set may be adjusted as desired.

The object vertex recognition model may be trained using a training sample set, and the recognition accuracy of the trained object vertex recognition model may be tested using a test sample set. And if the identification accuracy rate does not meet the requirement, increasing the number of sample images in the training sample set, and retraining the object vertex identification model by using the updated training sample set until the identification accuracy rate of the trained object vertex identification model meets the requirement. And if the identification accuracy meets the requirement, finishing the training. In one embodiment, whether training can be ended may be determined based on whether the recognition accuracy is less than a preset accuracy. In this way, the trained object vertex identification model with output accuracy meeting the requirement can be used for identifying the object vertex in the image.

When the image shown in fig. 3 is used as the sample image, in addition to marking the object vertices b2 and b4 in the sample image, the object vertices b1 and b3 other than the sample image may be obtained by extending the adjacent edge lines, and the object vertices b1 and b3 may be marked, and the relative positions of the object vertices b1 to b4 and the corresponding image vertices may be marked.

In this way, when the sample image labeled according to the labeling method is used for training the object vertex recognition model, the object vertex recognition model can recognize not only the object vertices located in the image but also the object vertices located outside the image and the relative positions of the object vertices and the corresponding image vertices when recognizing the image similar to fig. 3. Furthermore, when the sample image is marked, the adjacent edge lines are extended to obtain the object vertex positioned outside the image, but when the trained object vertex identification model identifies the image, the object vertex positioned outside the image is not required to be extended to obtain the object vertex, and the coordinates of the external object vertex and the corresponding image vertex can be directly obtained.

Preferably, in the training step of the object vertex recognition model, in step SA211, when the relative position between each object vertex of the object in the sample image and the corresponding image vertex is labeled, the relative position of the image vertex closest to the object vertex is preferably labeled for each object vertex. Taking the image shown in fig. 2 as a sample image as an example, since the object vertex a1 is closest to the image vertex a5, the relative position between the object vertex a1 and the image vertex a5 is labeled, that is, the coordinates of the object vertex a1 are converted into coordinates with the image vertex a5 as the origin for the object vertex a1, and similarly, the coordinates of the object vertex a2 are converted into coordinates with the image vertex a6 as the origin for the object vertex a2, the coordinates of the object vertex a3 are converted into coordinates with the image vertex a7 as the origin for the object vertex a3, and the coordinates of the object vertex a4 are converted into coordinates with the image vertex a8 as the origin for the object vertex a 4.

In this way, the sample image labeled according to the labeling method is used for training the object vertex identification model, and the identification result of the object vertex identification model identifies the relative position of each object vertex in the image with respect to the image vertex closest to the object vertex.

Taking the image shown in fig. 2 as an example, after the object vertex recognition model recognizes, the relative position of the object vertex a1 with respect to the image vertex a5 (i.e., the coordinates of the object vertex a1 with the image vertex a5 as the origin), the relative position of the object vertex a2 with respect to the image vertex a6 (i.e., the coordinates of the object vertex a2 with the image vertex a6 as the origin), the relative position of the object vertex a3 with respect to the image vertex a7 (i.e., the coordinates of the object vertex a3 with the image vertex a7 as the origin), and the relative position of the object vertex a4 with respect to the image vertex a8 (i.e., the coordinates of the object vertex a4 with the image vertex a8 as the origin) can be obtained.

Step SA 22: and determining the actual position of each object vertex in the image according to the relative position of each object vertex and the image vertex corresponding to the object vertex.

In some embodiments, the relative position of each object vertex and the image vertex closest to the object vertex in the image is converted into the coordinates of the object vertex in the target coordinate system, so as to obtain the actual position of each object vertex in the image.

Step SA 23: and according to the actual position of each object vertex in the image, taking a reference point of the image as a coordinate origin of a two-dimensional image coordinate system to obtain two-dimensional position information of each object vertex in the two-dimensional image coordinate system.

Preferably, the target coordinate system is a two-dimensional image coordinate system, and the origin point of the two-dimensional image coordinate system is a position point in the image. Taking the image shown in fig. 2 as an example, the coordinates of object vertex a1 with the origin of image vertex a5, the coordinates of object vertex a2 with the origin of image vertex a6, the coordinates of object vertex a3 with the origin of image vertex a7, and the coordinates of object vertex a4 with the origin of image vertex a8 are obtained in step SA 21. Since the coordinates of the vertices of the objects obtained at this time are not coordinates in the same coordinate system, the coordinates of the vertices of the objects need to be converted into coordinates in the same coordinate system, and specifically, in step SA23, the coordinates of the 4 vertices of the objects may be converted into coordinates with the same location point as the origin of the common coordinate system, so as to facilitate determination of the actual locations of the vertices of the objects in the image.

Since the same position point is a specific position in the image, relative coordinates of each image vertex and the position point of the image are known, and relative coordinates of each object vertex with the position point as a coordinate system origin can be obtained.

For example, in some embodiments, the origin of the target coordinate system may be the center point of the image. In other embodiments, the origin of the target coordinate system is a certain image vertex of the image. Taking the image shown in fig. 2 as an example, the origin of the target coordinate system may be, for example, the image vertex a5, so that when the image vertex a5 is taken as the origin of the coordinate system, the coordinate values of the object vertices a 1-a 4 can be obtained, and the actual positions of the object vertices a 1-a 4 in the image can be further known.

After the two-dimensional position information of the vertex of the object is acquired in step S2, a three-dimensional space coordinate system is established according to the feature point matching method in step S3. Preferably, the step of establishing a three-dimensional space coordinate system according to at least two images and a feature point matching method and determining the spatial position of the camera includes:

step S31: extracting two-dimensional feature points which are matched with each other in at least two images;

step S32: obtaining a constraint relation of at least two images according to the two-dimensional feature points matched with each other;

step S33: and obtaining the three-dimensional space position of the two-dimensional feature point in each image based on the constraint relation, and further obtaining the space position of the camera corresponding to each image.

In one example, the ORB algorithm is used to quickly find and extract all two-dimensional feature points of each image that do not change with the movement, rotation, or changes in illumination of the camera. And then matching the two-dimensional feature points of the images to extract the mutually matched two-dimensional feature points in the images. The two-dimensional feature points are composed of two parts: a key point (Keypoint) and a Descriptor (Descriptor), wherein the key point refers to the position of the two-dimensional feature point in the image, and some key points also have direction and scale information; a descriptor is typically a vector that describes the information of the pixels around a keypoint in an artificially designed way. Usually, descriptors are designed according to similar appearance characteristics and should have similar descriptors, so that when matching is performed, two-dimensional characteristic points can be considered as mutually matched characteristic points as long as the distance between the descriptors of the two-dimensional characteristic points in a vector space is close. In this embodiment, during matching, the key points in each image are extracted, descriptors of each two-dimensional feature point are calculated according to the positions of the key points, and matching is performed according to the descriptors, so that two-dimensional feature points matched with each other in each image are extracted. Of course, there are other ways to extract the two-dimensional feature points, such as rough matching or proximity search, which are not illustrated here, and those skilled in the art can select the two-dimensional feature points according to the actual application.

After the two-dimensional feature points of each image are matched, the three-dimensional space position of the camera corresponding to any one of the images can be obtained (the lens orientation of the camera is always perpendicular to the two-dimensional plane of the image obtained by shooting the camera). And then according to the position of the camera corresponding to each picture, converting all the two-dimensional feature points in each picture into three-dimensional feature points to form a three-dimensional space and establish a three-dimensional space coordinate system.

It can be understood that two-dimensional feature points of the same three-dimensional feature point in a three-dimensional scene under different viewing angles (which means when the camera simultaneously rotates and translates) have a constraint relationship: epipolar constraint, wherein the basic matrix is an algebraic representation of the constraint relation, the constraint relation is independent of the structure of the scene and only depends on internal parameters and external parameters of the camera, and for two-dimensional feature points p matched with each other₁、p₂And the basis matrix F has the following relationship:

wherein F ═ K^-Tt×RK^-1 (1)

Wherein K is an internal reference of the camera, that is, the fundamental matrix F of each image can be calculated only by the two-dimensional feature point pairs (minimum 7 pairs) matched with each other, and then the fundamental matrix F is decomposed from F to obtain the rotation matrix R and the translational vector t of the camera, so that the spatial position of the camera in the three-dimensional space coordinate system is obtained.

Further, the homography matrix H can provide more constraints for each image, and when the camera takes two images of the same scene with only rotation and no translation, the epipolar constraints of the two images are no longer applicable, and the homography matrix H can be used to describe the relationship between the two images. It can be seen that both the basis matrix F and the homography matrix H can represent the constraint relationship between the two images, but both have respective applicable scenes, and for different application scenes, the matrices to which the constraint relationship between the images may be applicable are different (the basis matrix represents epipolar constraint, the position of the camera needs to be rotated and translated, and the homography matrix needs only the camera to be rotated without translation). Please refer to the prior art, the process of calculating the basis matrix and the homography matrix of each image will not be described in detail in this embodiment.

After the spatial position of the camera is determined in step S3, in step S4, any one of the images is selected, and based on the parameter information calibrated by the camera and the spatial position of the camera, the three-dimensional spatial position information of the vertices of the plurality of objects in the image can be obtained, so that the actual size of the object can be obtained.

The purpose of camera calibration is to determine the values of some parameter information of the camera. In general, the parameter information may establish a mapping relationship between a three-dimensional coordinate system determined by a calibration board and a camera image coordinate system, in other words, the parameter information may be used to map a point in a three-dimensional space to an image space, or vice versa. The parameters of the camera needing calibration are generally divided into an internal parameter part and an external parameter part. The external parameters determine the position and orientation of the camera in a three-dimensional space, and the external parameter matrix represents how points (world coordinates) in a three-dimensional space undergo rotation and translation and then fall on the image space (camera coordinates). The rotation and translation of the camera belong to external parameters and are used for describing the motion of the camera in a static scene or the rigid motion of a moving object when the camera is fixed. Therefore, in image stitching or three-dimensional reconstruction, external parameters are required to solve the relative motion between several images, so that the images are registered in the same coordinate system.

The intrinsic parameters are parameters inside the camera, which are generally inherent to the camera, and the intrinsic parameter matrix represents how a point in a three-dimensional space passes through the lens of the camera after falling on an image space and how the point becomes a pixel point through optical imaging and electronic conversion. It should be noted that the real camera lens may also have radial and tangential distortions, and these distortion parameters also belong to the camera parameters, and these parameters can be obtained by calibrating in advance.

The specific calibration method of the camera can be understood by those skilled in the art according to the prior art, for example, the zhang's calibration method can be used. By calibrating the external reference and the internal reference of the camera, the three-dimensional space position information of the vertex of the object in the picture can be obtained based on the space position of the camera, and therefore the actual size of the object is obtained through calculation.

In another alternative example, the step SA22 of determining the actual position of each object vertex in the image according to the relative position of each object vertex and the corresponding image vertex comprises:

step SA 221: determining the reference position of each object vertex in the image according to the relative position of each object vertex and the image vertex corresponding to the object vertex;

step SA 222: aiming at each object vertex, carrying out corner point detection in a preset area where the reference position of the object vertex is located;

step SA 223: and determining the actual position of each object vertex in the image according to the corner detection result.

In this example, unlike the previous example, the position of each object vertex in the image obtained by using the relative position of each object vertex and the image vertex corresponding thereto is not directly set as the actual position, but is determined as the reference position of each object vertex in the image. And then carrying out corner detection on the reference position of each object vertex, and finally determining the actual position of each object vertex in the image according to the result of the corner detection. The method for detecting the angular points is adopted to correct the positions of the vertexes of the objects, so that the edge detection of the objects with edges in the images is realized, and the accuracy of the edge and vertex detection is improved.

Fig. 2 and 3 are also illustrated as examples in the following. In step SA221, the relative position of each object vertex and the image vertex closest to the object vertex in the image is converted into a reference coordinate of the object vertex in a target coordinate system, so as to obtain a reference position of each object vertex in the image.

In step SA222, in a general sense, a corner point is an extreme point, that is, a point with a particular attribute highlighted in some aspect, and is an isolated point or an end point of a line segment with the greatest or smallest intensity on some attribute. A corner point is usually defined as the intersection of two edges, or, alternatively, a local neighborhood of a corner point should have boundaries of two different regions in different directions. More strictly speaking, a local neighborhood of a corner point should have differently oriented borders of two different regions. In practice, most so-called corner detection methods detect image points with specific features, not just "corners". These feature points have specific coordinates in the image and have certain mathematical features such as local maximum or minimum gray levels, certain gradient features, etc.

The basic idea of the corner detection algorithm is to use a fixed window (a neighborhood window of a certain pixel) to slide on an image in any direction, compare the two conditions before and after sliding, and determine that a corner exists in the window if the degree of gray change of the pixel in the window is large if the sliding in any direction exists.

Generally, any object vertex of an object with an edge corresponds to a corner point in the image. And detecting the corner points corresponding to the vertex points of each object by detecting the corner points in a preset area where the reference position of the vertex point of each object is located.

Preferably, the preset region where the reference position of the object vertex is located is a circular region with a pixel point at the reference position of the object vertex as a center of a circle and a first preset pixel as a radius; the range of the first preset pixels is, for example, 10-20 pixels, and preferably 15 pixels.

For each object vertex, performing corner detection in a preset area where a reference position of the object vertex is located, including: and performing corner detection on pixel points in the circular area corresponding to each object vertex, and in the corner detection process, all the pixel points with the characteristic value change amplitude larger than a preset threshold value are taken as candidate corners, and determining a target corner corresponding to each object vertex from the candidate corners. Wherein, the characteristic value variation amplitude refers to the pixel gray scale variation degree in a fixed window for corner detection. It can be understood that the smaller the variation amplitude of the characteristic value is, the less the possibility that the pixel point is a corner point is. By comparing the variation amplitude of the characteristic value with a preset threshold value, pixel points with low corner point drop probability can be eliminated, and pixel points with high corner point probability are reserved as candidate corner points, so that the target corner point can be further determined from the candidate corner points. Specific corner detection algorithms include, for example, a gray-scale image-based corner detection algorithm, a binary image-based corner detection algorithm, a contour curve-based corner detection algorithm, and the like, which refer to the prior art specifically and are not described herein in detail.

Specifically, the determining the target corner point corresponding to each object vertex from the candidate corner points includes:

step SA 2221: sorting the candidate angular points in a descending order according to the variation amplitude of the characteristic values, determining the candidate angular point with the first position as the target angular point, and determining the candidate angular point with the second position as the current angular point to be selected;

step SA 2222: judging whether the distances between the current corner point to be selected and all the current target corner points are larger than a second preset pixel or not; if yes, executing step SA2223, otherwise executing step SA 2224;

step SA 2223: determining the current corner to be selected as the target corner;

step SA2224, discarding the current corner point to be selected, determining the next candidate corner point as the current corner point to be selected, and returning to execute step SA 2222.

It can be understood that, sorting in descending order according to the variation amplitude of the feature value, the variation amplitude of the feature value of the first candidate corner is the largest, so that the probability that the candidate corner is the corner is also the largest, and therefore the candidate corner can be directly determined as the target corner. For the second-ranked candidate corner point, it may be located in the circular region of the same object vertex (assumed to be object vertex 1) as the first-ranked candidate corner point, or may be located in the circular region of other object vertices (assumed to be object vertex 2). In the first case, since the candidate corner point at the first position in the row is already determined as the target vertex of object vertex 1, it is impossible to determine the candidate corner point at the second position in the row as the target vertex of object vertex 1. For the second case, the candidate corner point at the second position is necessarily the pixel point with the highest probability of the corner point in the circular region of the object vertex 2, and therefore the candidate corner point at the second position needs to be determined as the target vertex of the object vertex 2. Based on the above consideration, the present embodiment determines which of the above cases the candidate corner point at the second position belongs to by determining whether the distance between the candidate corner point at the second position and the target corner point is greater than the second preset pixel. If the distance between the candidate corner point of the second row and the target corner point is larger than a second preset threshold value, the candidate corner point of the second row is represented as belonging to a second condition, otherwise, the candidate corner point of the second row is represented as belonging to a first condition. If the second situation is the case, the candidate corner point of the second position row needs to be determined as the target corner point, and if the first situation is the case, the candidate corner point of the second position row needs to be discarded. And analogizing in turn, judging each candidate corner point according to the logic, and finally determining a plurality of target corner points from each candidate corner point.

Through the processing, only one candidate corner point at most around each object vertex can be ensured to be remained, and the position of the remained candidate corner point is the actual position of the object vertex. Preferably, the range of the second preset pixels may be greater than or equal to 50 pixels, and the upper limit value may be set according to the specific size of the image, which is not limited herein.

It should be noted that there may be a case where no corner is detected in the process of detecting a corner of an object vertex, for example, a preset region of the object vertex has a small change from the background of the image so that a corner cannot be detected, or the object vertex is outside the image (e.g., object vertices b1, b3 in fig. 3) and no corner exists at all. For the case where no corner is detected, the object vertex can also be regarded as a corner.

Preferably, in step SA223, the step of determining the actual position of each vertex of the object in the image according to the corner detection result includes:

and for each object vertex, if the corner detection result of the object vertex contains a corner, determining the position of the corner as the actual position of the object vertex in the image, and if the corner detection result of the object vertex does not contain a corner, determining the reference position of the object vertex in the image as the actual position of the object vertex in the image. In some embodiments, the vertices of the object around which the remaining corner points appear may be replaced with the corresponding corner points as the actual vertices of the object. That is, for each object vertex, if the corner detection result of the object vertex includes one corner, the position of the corner is determined as the actual position of the object vertex in the image, and if the corner detection result of the object vertex does not include a corner, the reference position of the object vertex in the image is determined as the actual position of the object vertex in the image.

Through the processing, the actual position of the object vertex in the image can be corrected according to the detected coordinates of the corner points, so that the position detection of the object vertex is more accurate.

In another alternative example, the identification of object vertices in the image may be different from the previous example, in which the object vertices are identified after the edges are identified, using edge intersections rather than directly. Specifically, the step of acquiring vertices of the plurality of objects in the image in step S2 includes:

step SB 21: processing the image to obtain a line graph of a gray level contour in the image;

step SB 22: combining similar lines in the line graph to obtain a plurality of reference boundary lines;

step SB 23: processing the image through the trained boundary line region recognition model to obtain a plurality of boundary line regions of the object in the image;

step SB 24: for each boundary line region, determining a target boundary line corresponding to the boundary line region from a plurality of reference boundary lines;

step SB 25: determining the edge of an object in the image according to the determined plurality of target boundary lines;

step SB 26: and configuring the intersection points of the edges of the objects in the image as the object vertexes.

In step SB21, the image includes an object having an edge, the line graph includes a plurality of lines, and the line graph is a grayscale graph. Here, the edge is not limited to a straight edge, and may be an arc line, a line segment having a shape of a fine wave, a zigzag, or the like. The image may be a grayscale image or a color image. For example, the image may be an original image obtained by directly capturing an image with a camera, or may be an image obtained by preprocessing the original image. For example, to avoid the influence of the data quality, data imbalance, and the like of the image on the object edge detection, an operation of preprocessing the image may be further included before processing the image. Preprocessing may eliminate extraneous or noisy information in the image to facilitate better processing of the image.

Further, step SB21 may include: and processing the image through an edge detection algorithm to obtain a line graph of the gray level profile in the image.

In some embodiments, the input image may be processed, such as by an OpenCV-based edge detection algorithm, to obtain a line graph of the gray-scale contours in the input image. OpenCV is an open-source computer vision library, and edge detection algorithms based on OpenCV comprise various algorithms such as Sobel, Scarry, Canny, Laplacian, Prewitt, Marr-Hildresh and scharr. A person skilled in the art can select a suitable edge detection algorithm according to the prior art. And will not be described further herein.

In other embodiments, step SB21 may comprise: processing the image through a boundary area identification model to obtain a plurality of boundary areas; and processing the plurality of boundary areas through an edge detection algorithm to obtain a line graph of the gray level contour in the image. For example, processing the plurality of boundary regions to obtain a plurality of boundary region labeling frames; and processing the plurality of boundary region labeling frames through an edge detection algorithm to obtain a line graph of the gray level outline in the image.

The boundary region identification model may be implemented using machine learning techniques and run, for example, on a general purpose computing device or a special purpose computing device. The boundary region identification model is a neural network model obtained by pre-training. For example, the boundary region identification model may be implemented by using a neural network such as a DEEP convolutional neural network (DEEP-CNN). In some embodiments, the image is input to a boundary region identification model, which may identify the edges of objects in the image to obtain a plurality of boundary regions (i.e., mask regions of respective boundaries of the objects); then, marking the identified boundary regions to determine a plurality of boundary region marking frames, for example, a rectangular frame may be circumscribed to the boundary regions to mark the boundary regions; finally, processing the marked multiple boundary region labeling boxes by using an edge detection algorithm (for example, a Canny edge detection algorithm and the like) to obtain a line graph of the gray contour in the image.

In this embodiment, the edge detection algorithm only needs to perform edge detection on the labeled boundary region labeling frame, and does not need to perform edge detection on the whole image, so that the calculation amount can be reduced, and the processing speed can be increased. Note that the boundary region labeling box labels a partial region in the image.

In some other embodiments, step SB21 may include: carrying out binarization processing on the image to obtain a binarized image of the image; and filtering noise lines in the binary image to obtain a line graph of the gray level profile in the image. For example, a corresponding filtering rule may be preset to filter various line segments and various relatively small lines inside the object in the binarized image, so as to obtain a line drawing of a gray level contour in the image.

In an alternative example, the step SB22 of merging similar lines in the line graph to obtain a plurality of reference boundary lines includes:

step SB 221: merging similar lines in the line drawing to obtain an initial merged line group; the initial merge line groups correspond to the boundary regions one by one, and each initial merge line group in the initial merge line groups comprises at least one initial merge line; determining a plurality of boundary connecting lines according to the plurality of initial merged line groups, wherein the plurality of boundary connecting lines are in one-to-one correspondence with the plurality of boundary regions, and the plurality of boundary connecting lines are in one-to-one correspondence with the plurality of initial merged line groups; respectively converting the plurality of boundary areas into a plurality of straight line groups, wherein the plurality of straight line groups correspond to the plurality of boundary areas one by one, and each straight line group in the plurality of straight line groups comprises at least one straight line; calculating a plurality of average slopes in one-to-one correspondence with the plurality of straight line groups; calculating the slopes of a plurality of boundary connecting lines respectively; judging whether the difference value between the slope of the ith edge connection line and the average slope corresponding to the ith edge connection line in the plurality of average slopes is higher than a second slope threshold value or not aiming at the ith edge connection line in the plurality of edge connection lines, wherein i is a positive integer and is less than or equal to the number of the plurality of edge connection lines; and in response to that the difference value between the slope of the ith side boundary connecting line and the average slope corresponding to the ith side boundary connecting line is lower than or equal to a second slope threshold, taking the initial merging line in the initial merging line group corresponding to the ith side boundary connecting line and the ith side boundary connecting line as a reference boundary line in the reference boundary line group corresponding to the boundary region corresponding to the ith side boundary connecting line, and in response to that the difference value between the slope of the ith side boundary connecting line and the average slope corresponding to the ith side boundary connecting line is higher than a second slope threshold, taking the initial merging line in the initial merging line group corresponding to the ith side boundary connecting line as a reference boundary line in the reference boundary line group corresponding to the boundary region corresponding to the ith side boundary connecting line, and performing the above operations on the plurality of boundary connecting lines respectively so as to determine the plurality of reference boundary lines. In some embodiments, the second slope threshold may range from 0-20 degrees, preferably 0-10 degrees, and more preferably, the second slope threshold may range from 5 degrees, 15 degrees, etc.

It is noted that, in the embodiment of the present disclosure, "the difference between two slopes" means the difference between the inclination angles corresponding to the two slopes. For example, the inclination angle corresponding to the slope of the ith side boundary connection line may represent an angle between the ith side boundary connection line and a given direction (e.g., a horizontal direction or a vertical direction), and the inclination angle corresponding to the average slope may represent an angle between a straight line determined based on the average slope and the given direction. For example, an inclination angle (e.g., a first inclination angle) of an ith side boundary connection line and an inclination angle (e.g., a second inclination angle) corresponding to an average inclination angle corresponding to the ith side boundary connection line in the plurality of average inclination angles may be calculated, and if a difference between the first inclination angle and the second inclination angle is greater than or equal to a second inclination threshold, the ith side boundary connection line is not used as a reference boundary line; and if the difference value between the first inclination angle and the second inclination angle is lower than the second slope threshold value, the ith side boundary connecting line can be used as a reference boundary line.

It should be noted that the straight line group, the average slope, the boundary region, and the like will be described later, and are not described herein again.

For example, in step SB221, similar lines of the plurality of lines are merged to obtain a plurality of initial merged line groups, and a boundary matrix is determined according to the plurality of initial merged lines. The step of merging similar lines in the plurality of lines comprises the following steps: obtaining a plurality of long lines in the plurality of lines, wherein each long line in the plurality of long lines is a line with the length exceeding a length threshold; obtaining a plurality of combined line groups according to the plurality of long lines, wherein each combined line group in the plurality of combined line groups comprises at least two sequentially adjacent long lines, and an included angle between any two adjacent long lines in each combined line group is smaller than an angle threshold; and for each merged line group in the plurality of merged line groups, sequentially merging each long line in the merged line group to obtain an initial merged line corresponding to the merged line group, and respectively merging the plurality of merged line groups to obtain an initial merged line in the plurality of initial merged line groups.

For example, the number of all initial merge lines included in the plurality of initial merge line groups is the same as the number of the plurality of merge line groups, and all initial merge lines included in the plurality of initial merge line groups correspond to the plurality of merge line groups one to one. It should be noted that, after obtaining the initial merge bar corresponding to the merge bar group based on the merge bar group, the boundary region corresponding to the initial merge bar may be determined based on the position of the initial merge bar, so as to determine the initial merge bar group to which the initial merge bar belongs.

It should be noted that, in the embodiment of the present disclosure, a "similar line" indicates that an included angle between two lines is smaller than an angle threshold.

For example, a long line in the line graph refers to a line with a length exceeding a length threshold value in a plurality of lines in the line graph, for example, a line with a length exceeding 2 pixels is defined as a long line, that is, the length threshold value is 2 pixels, and embodiments of the present disclosure include but are not limited thereto, and in other embodiments, the length threshold value may also be 3 pixels, 4 pixels, and the like. Only the long lines in the line drawing are acquired for subsequent merging processing, and some shorter lines in the line drawing are not considered, so that line interference inside the object and outside the object can be avoided when the lines are merged, for example, corresponding lines of characters and graphics inside the object, other objects outside the object, and the like can be removed.

For example, the merged bank group may be obtained by: first, a long line T1 is selected, and then, from the long line T1, whether the included angle between two adjacent long lines is smaller than an angle threshold is determined in sequence, and if it is determined that the included angle between a certain long line T2 and the long line adjacent to the long line T2 is not smaller than the angle threshold, all the sequentially adjacent long lines between the long line T1, the long line T2, and the long line T1 to the long line T2 may be combined into a merged line group. And then, repeating the above process, namely starting from the long line adjacent to the long line T2, sequentially judging whether the included angle between the two adjacent long lines is smaller than an angle threshold value, and repeating the above process until all the long lines are traversed, so as to obtain a plurality of combined line groups. It should be noted that "two adjacent long lines" means two physically adjacent long lines, that is, there is no other long line between the two adjacent long lines.

For example, the initial merged line is a plurality of lines that are longer than the long lines.

Fig. 4 is a schematic diagram of a line merging process according to an embodiment of the present disclosure.

The above-described procedure for obtaining the merged bank group will be described below with reference to fig. 4 as an example. In one embodiment, for example, first, a first long line a is selected, whether an angle between the long line a and a long line B adjacent to the long line a is smaller than an angle threshold is determined, if the angle between the long line a and the long line B is smaller than the angle threshold, it indicates that the long line a and the long line B belong to the same merged line group, then, whether an angle between the long line B and a long line C adjacent to the long line B is smaller than the angle threshold is determined, if the angle between the long line B and the long line C is also smaller than the angle threshold, it indicates that the long line C, the long line B, and the long line a all belong to the same merged line group, and then, the angle between the long line C and a long line D adjacent to the long line C is determined, and if the angle between the long line C and the long line D is also smaller than the angle threshold, it indicates that the long line D, the long line C, the long line B, and the long line B belong to the same merged line group, The long line B and the long line a both belong to the same merge line group, then, an included angle between the long line D and the long line E adjacent to the long line D is continuously determined, if the included angle between the long line D and the long line E is greater than or equal to an angle threshold, it indicates that the long line E and the long line a/B/C/D do not belong to the same merge line group, and up to this point, the long line a, the long line B, the long line C, and the long line D may be taken as one merge line group, for example, the merge line group consisting of the long line a, the long line B, the long line C, and the long line D may be the first merge line group. Then, whether the included angle between two adjacent long lines is smaller than the angle threshold is sequentially judged from the long line E, so that the long line G, the long line H, the long line I, and the long line J can be obtained to belong to one merged line group, for example, the merged line group composed of the long line G, the long line H, the long line I, and the long line J can be a second merged line group, and the long line M, the long line N, and the long line O can also belong to one merged line group, for example, the merged line group composed of the long line M, the long line N, and the long line O can be a third merged line group.

For example, in another embodiment, first, one long line may be arbitrarily selected from a plurality of long lines, for example, a long line D, and a long line adjacent to the long line D includes a long line C and a long line E, it is determined whether an angle between the long line D and the long line C is smaller than an angle threshold, it is determined whether an angle between the long line D and the long line E is smaller than the angle threshold, since the angle between the long line D and the long line C is smaller than the angle threshold, the long line D and the long line C belong to the same merge line group, since the angle between the long line D and the long line E is larger than the angle threshold, the long line D and the long line E belong to different merge line groups, and then, on the one hand, it is possible to continue to determine angles between other long lines adjacent in sequence from the long line C, thereby determining other long lines belonging to the same merge line group as the long line D, in addition, other merge bar groups may also be determined; on the other hand, the included angle between other long lines adjacent in sequence can be judged from the long line E, so as to determine other merged line groups. By analogy, finally, it can also be determined that the long linear bar a, the long linear bar B, the long linear bar C and the long linear bar D belong to one merged linear bar group, the long linear bar G, the long linear bar H, the long linear bar I and the long linear bar J belong to one merged linear bar group, and the long linear bar M, the long linear bar N and the long linear bar O also belong to one merged linear bar group.

For example, the included angle between two adjacent long lines is calculated by the following formula:

wherein the content of the first and second substances,

respectively representing the vectors of two adjacent long lines. For example, the value of the angle threshold may be set according to the actual situation, e.g., in some embodimentsThe angle threshold may range from 0 to 20 degrees, preferably from 0 to 10 degrees, more preferably 5 degrees, 15 degrees, etc.

For example, merging two long lines means averaging the slopes of the two long lines to obtain a slope average value, which is the slope of the merged line. In practical application, the merging of two long lines is calculated according to an array form of the two long lines, for example, the two long lines are respectively a first long line and a second long line, the merging of the two long lines indicates that a starting point (i.e., a line segment head) of the first long line and an end point (i.e., a line segment tail) of the second long line are directly connected to form a new longer line, that is, the starting point of the first long line and the end point of the second long line are directly and linearly connected in a coordinate system corresponding to the line diagram to obtain a merged line, for example, a coordinate value of a pixel point corresponding to the starting point of the first long line is used as a coordinate value of a pixel point corresponding to the starting point of the merged line, a coordinate value of a pixel point corresponding to the end point of the second long line is used as a coordinate value of a pixel point corresponding to the end point of the merged line, and finally, the coordinate values of a pixel point corresponding to the starting point of the merged line and the pixel point of the end point of the merged line are formed into a merged line The array of the back line and store the array. And sequentially merging the long lines in each merged line group to obtain the corresponding initial merged line.

For example, as shown in fig. 4, the long line a, the long line B, the long line C, and the long line D in the first merged line group are sequentially merged to obtain an initial merged line corresponding to the merged line group, for example, first, the long line a and the long line B may be merged to obtain a first merged line, then, the first merged line and the long line C may be merged to obtain a second merged line, and then, the second merged line and the long line D may be merged to obtain an initial merged line 1 corresponding to the first merged line group. Similarly, each long stripe in the second merged stripe group is merged to obtain an initial merged stripe 2 corresponding to the second merged stripe group, and each long stripe in the third merged stripe group is merged to obtain an initial merged stripe 3 corresponding to the third merged stripe group. After the merged line groups are merged, the long line E, the long line F, the long line K, and the long line L are not merged.

In addition, the boundary matrix is determined by: redrawing the initial merged lines and the lines which are not merged in the long lines, corresponding position information of pixel points in all the redrawn lines to the whole image matrix, setting values of positions of the pixel points of the lines in the image matrix as first numerical values, and setting values of positions of the pixel points except the lines as second numerical values, thereby forming a boundary matrix. Specifically, the boundary matrix may be a matrix having the same size as the image matrix, for example, the size of the image is 1024 × 1024 pixels, the image matrix is a 1024 × 1024 matrix, and the boundary matrix is a 1024 × 1024 matrix, the initial merged lines and the un-merged lines in the long lines are redrawn according to a certain line width (for example, the line width is 2), the boundary matrix is filled with values according to the positions of the pixel points of the redrawn lines corresponding to the positions in the matrix, the positions of the pixel points on the lines corresponding to the positions in the matrix are all set to a first value, for example, 255, and the positions of the pixel points without lines corresponding to the position in the matrix are set to a second value, for example, 0, so as to form a very large matrix of the entire picture, that is, the boundary matrix. It should be noted that, since the plurality of initial merged lines and the lines not merged in the long line are stored in the form of an array, the lines need to be formed into actual line data when the boundary matrix is determined, and therefore, the lines are redrawn, for example, with a line width of 2, so as to obtain coordinate values of pixel points corresponding to each point on each line, and then, the boundary matrix is filled with values according to the obtained coordinate values, for example, the values of positions corresponding to the coordinate values in the boundary matrix are set to 255, and the values of the remaining positions are set to 0.

In the following, a boundary matrix is exemplarily provided, and the boundary matrix is a 10 × 10 matrix, where all positions with a value of 255 in the boundary matrix are connected to form a plurality of initial merged lines and an un-merged line in the long line.

Step SB 222: and combining similar lines in the plurality of initial combination lines to obtain a target line, and taking the uncombined initial combination lines as the target line.

In step SB221, the initial merged line after merging is a plurality of longer lines. Step SB222 may continue to determine whether similar lines exist in the initial merged lines according to the merging rule in step SB221, so as to merge the similar lines again to obtain a plurality of target lines, and meanwhile, the initial merged line that cannot be merged is also used as the target line.

The specific merging step of merging similar lines in the plurality of initial merging lines to obtain the target line is as follows: step a: acquiring a plurality of groups of second lines from the plurality of initial merging lines; the second line comprises at least two sequentially adjacent initial merging lines, and the included angle between any two adjacent initial merging lines is smaller than a third preset threshold; step b: and aiming at each group of second lines, sequentially merging each initial merging line in the group of second lines to obtain a target line.

The principle of the step of merging the initial merging lines is the same as that of merging the lines in the line graph in step SB221, and reference may be made to the related description in step SB221, which is not described herein again. The third preset threshold may be the same as or different from the second preset threshold, which is not limited in this embodiment, for example, the third preset threshold is set to have an included angle of 10 degrees. As shown in the comparison diagram before and after line merging shown in fig. 4, after the step of merging the initial merged lines 1, 2, and 3, since the included angle between the initial merged lines 1 and 2 is smaller than the third preset threshold, and the included angle between the initial merged line 3 and the initial line 2 is larger than the third preset threshold, the initial merged lines 1 and 2 can be further merged into the target line 12, and if the initial merged line 3 cannot be merged, the initial merged line 3 is directly used as a target line.

Thus, a plurality of target lines are obtained, and not only the reference boundary line but also some long interference lines, for example, long interference lines obtained by merging corresponding lines of internal characters and graphics, external other objects, and the like, exist in the plurality of target lines, and the interference lines are removed according to subsequent processing (specifically, processing in step SB223 and step SB 23) and rules.

Step SB 223: determining a plurality of reference boundary lines from the plurality of target lines according to the boundary matrix; specifically, determining a plurality of reference boundary lines from the plurality of target lines according to the boundary matrix includes: firstly, aiming at each target line, prolonging the target line, determining a line matrix according to the prolonged target line, then comparing the line matrix with the boundary matrix, and calculating the number of pixel points belonging to the boundary matrix on the prolonged target line as the score of the target line, wherein the size of the line matrix is the same as that of the boundary matrix; then, according to the achievement of each target line, a plurality of reference boundary lines are determined from the plurality of target lines.

Wherein the line matrix may be determined in the following manner: redrawing the extended target line, corresponding position information of pixel points in the redrawn line to the whole image matrix, setting the value of the position of the pixel points of the line in the image matrix as a first numerical value, and setting the value of the position of the pixel points outside the line as a second numerical value, thereby forming the line matrix. The forming manner of the line matrix is similar to that of the boundary matrix, and is not described herein again. It should be noted that the target line is stored in an array form, that is, coordinate values of a start point and an end point of the target line are stored, after the target line is extended, the extended target line forms an array with the coordinate values of the start point and the end point of the extended target line when being stored, so that when the extended target line is redrawn, the target line is redrawn according to the same line width, for example, the line width is 2, so as to obtain coordinate values of pixel points corresponding to each point on the extended target line, and then the line matrix is value-filled according to the coordinate values, that is, values of positions corresponding to the coordinate values in the line matrix are set to 255, and values of the rest positions are set to 0.

And (4) prolonging the merged target lines, and judging that the pixel points on the merged target lines fall into the initial merged line in the step SB222 and the most target lines on the lines which are not merged in the long lines to be used as reference boundary lines. Aiming at each target line, judging how many pixel points belong to the boundary matrix, and calculating a score, which specifically comprises the following steps: and (3) extending the target line, forming a line matrix by the line obtained after the target line is extended according to a forming mode of the boundary matrix, comparing the line matrix with the boundary matrix to judge how many pixel points fall into the boundary matrix, namely judging how many pixel points at the same position in the two matrixes have the same first numerical value, such as 255, and calculating the score. In this case, since there may be a plurality of lines having the best performance, a plurality of target lines having the best performance are determined as reference boundary lines from the plurality of target lines based on the performance of each target line.

For example, a line matrix formed by a lengthened target line is as follows, and by comparing the line matrix with the boundary matrix, it can be known that 7 pixel points on the lengthened target line fall into the boundary matrix, so as to obtain the result of the target line.

Preferably, in step SB23, the boundary region identification model may be implemented using machine learning techniques and run, for example, on a general purpose computing device or a special purpose computing device. The boundary region identification model is a neural network model obtained by pre-training. For example, the boundary region identification model may be implemented by using a neural network such as a DEEP convolutional neural network (DEEP-CNN). Note that the boundary region identification model here and the boundary region identification model in step SB21 may be the same model or different models.

Firstly, establishing a boundary region identification model through machine learning training, wherein the boundary region identification model can be obtained through the following training processes: performing labeling processing on each image sample in the image sample set to label a boundary line area, an inner area and an outer area of an object in each image sample; and training the neural network through the image sample set subjected to labeling processing to obtain a boundary region identification model.

For example, through a boundary region identification model established by machine learning training, 3 parts of a boundary region, an inner region (i.e., a region where an object is located), and an outer region (i.e., an outer region of the object) in an image can be identified, so as to obtain each boundary region of the image, and at this time, an edge contour in the boundary region is thick. For example, in some embodiments, the shape of the object may be a rectangle, and the number of the boundary regions may be 4, that is, the input image is recognized by the boundary region recognition model, so that four boundary regions corresponding to four sides of the rectangle, respectively, may be obtained.

In some embodiments, the plurality of bounding regions includes a first bounding region, a second bounding region, a third bounding region, and a fourth bounding region. In some embodiments, as shown in fig. 2, the first boundary region may represent a region corresponding to boundary line a1, the second boundary region may represent a region corresponding to boundary line a2, the third boundary region may represent a region corresponding to boundary line A3, and the fourth boundary region may represent a region corresponding to boundary line a 4; in other embodiments, as shown in fig. 3, the first boundary region may represent a region corresponding to the boundary line B1, the second boundary region may represent a region corresponding to the boundary line B2, the third boundary region may represent a region corresponding to the boundary line B3, and the fourth boundary region may represent a region corresponding to the boundary line B4.

It is understood that the boundary region of the object in the image is identified by the boundary region identification model, and then the target boundary line is determined from the plurality of reference boundary lines based on the boundary region, so that the misrecognized interference lines, such as lines falling in the middle of a business card or document, lines in the middle of a table, and the like, can be removed.

Preferably, step SB 24: for each boundary line region, determining a target boundary line corresponding to the boundary line region from a plurality of reference boundary lines; the method can comprise the following steps: firstly, calculating the slope of each reference boundary line; then, for each boundary line region, converting the boundary line region into a plurality of straight lines, calculating the average slope of the plurality of straight lines, judging whether a reference boundary line with a slope matched with the average slope exists in the plurality of reference boundary lines, and if so, determining the reference boundary line as a target boundary line corresponding to the boundary line region. The boundary line region may be converted into a plurality of straight lines by hough transform, and certainly, other methods may also be used for conversion, which is not limited in this embodiment.

In this embodiment, the edge contour in the boundary line region is thicker, for each boundary line region, the boundary line region may be converted into a plurality of straight lines by using hough transform, the lines have approximate slopes, an average slope is obtained, and then the slope of each reference boundary line is compared with the slope of each reference boundary line to determine whether a reference boundary line whose slope matches the average slope exists in the plurality of reference boundary lines, that is, the most approximate reference boundary line is found from the plurality of reference boundary lines and is used as a target boundary line corresponding to the boundary line region.

Since the difference between the determined slope of the target boundary line and the average slope cannot be too large, when the average slope is compared with the slope of each reference boundary line, a comparison threshold is set, and when the absolute value of the difference between the slope of a certain reference boundary line and the average slope is smaller than the comparison threshold, the slope of the reference boundary line is determined to be a reference boundary line matched with the average slope, and the reference boundary line is further determined to be the target boundary line corresponding to the boundary line region.

Further, for each border line region, if it is determined that there is no reference border line having a slope matching the average slope in the plurality of reference border lines, the following processing is performed: aiming at each straight line obtained by the conversion of the boundary line region, comparing a line matrix formed by the straight line with the boundary matrix, and calculating the number of pixel points belonging to the boundary matrix on the straight line as the score of the straight line; the best-performing straight line is determined as the target boundary line corresponding to the boundary line region. If the straight line with the best performance has a plurality of straight lines, one straight line which appears first is used as the best boundary line according to a sorting algorithm. Wherein the line matrix is determined in the following manner: and redrawing the line, corresponding the position information of the pixel points in the redrawn line to the whole image matrix, setting the value of the position of the pixel points of the line in the image matrix as a first numerical value, and setting the value of the position of the pixel points outside the line as a second numerical value, thereby forming the line matrix. The forming manner of the line matrix is similar to that of the boundary matrix, and is not described herein again.

If the target boundary line corresponding to a certain boundary line region cannot be found from the reference boundary line, forming a corresponding line matrix for a plurality of straight lines obtained by Hough transform according to the matrix forming mode in the step SB222 and the step SB223, and judging which straight line has the best pixel point in the boundary matrix, and then considering the target boundary line corresponding to the boundary line region. The manner of calculating the score of the straight line by comparing the line matrix formed by the straight line with the boundary matrix may refer to the related description in step SB223, which is not described herein again.

Step SB25, after determining a plurality of target boundary lines, the plurality of target boundary lines constitute edges of objects in the image, since each target boundary line corresponds to a boundary line region of an object in the image. As shown in fig. 2, the edges of the object in the image are constituted by the four longer lines in fig. 2, i.e., the target boundary lines a1, a2, A3, a 4; as shown in fig. 3, the edges of the object in the image are constituted by four longer lines in fig. 3, i.e., target boundary lines B1, B2, B3, B4.

Further, in step SB26, after edges of the object in the image are obtained, the intersection of the edges is configured as the object vertex. The following steps refer to steps S2 to S4, which are not repeated here.

The present embodiment also provides a readable storage medium on which a program is stored, which when executed, implements the object size identification method as described above. Further, the present embodiment also provides an object size recognition system, which includes a processor and a memory, where the memory stores a program, and the program, when executed by the processor, implements the object size recognition method as described above.

In summary, in the object size recognition method, the readable storage medium and the object size recognition system provided by the present invention, the object size recognition method includes: acquiring at least two images of an object from different visual angles by shooting; respectively acquiring two-dimensional position information of a plurality of object vertexes of each image; establishing a three-dimensional space coordinate system according to at least two images and a characteristic point matching method, and determining the space position of a camera; and selecting any one of the images, and obtaining three-dimensional space position information of a plurality of vertexes based on the parameter information calibrated by the camera and the space position of the camera, so as to obtain the size of the object. According to the configuration, at least two images of an object at different visual angles are obtained through shooting, the size of the object can be obtained by combining the parameter information calibrated by the camera, the operation steps are simple and convenient, and the problem that the size of the object in the space cannot be measured in the prior art is solved.

The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims.

Claims

1. An object size recognition method, comprising:

2. The object size recognition method according to claim 1, wherein the step of acquiring two-dimensional position information of vertices of the plurality of objects in the image includes:

3. The object size recognition method of claim 2, wherein the step of determining the actual position of each object vertex in the image based on the relative position of each object vertex and its corresponding image vertex comprises:

4. The object size recognition method according to claim 3, wherein the preset region where the reference position of the object vertex is located is a circular region with a pixel point at the reference position of the object vertex as a center of a circle and a first preset pixel as a radius;

5. The object size recognition method according to claim 4, wherein the determining the actual position of each vertex of the object in the image according to the corner detection result comprises:

6. The object size recognition method of claim 1, wherein the step of obtaining a plurality of object vertices in the image comprises:

7. The object size recognition method according to claim 6, wherein the step of combining similar lines in the line drawing to obtain a plurality of reference boundary lines comprises:

8. The object size recognition method according to claim 1, wherein a three-dimensional space coordinate system is established based on at least two of the images and a feature point matching method, and the step of determining the spatial position of the camera includes:

9. A readable storage medium on which a program is stored, wherein the program, when executed, implements the object size recognition method according to any one of claims 1 to 8.

10. An object size recognition system comprising a processor and a memory, the memory having stored thereon a program which, when executed by the processor, implements an object size recognition method according to any one of claims 1 to 8.