CN108320290B

CN108320290B - Target picture extraction and correction method and device, computer equipment and recording medium

Info

Publication number: CN108320290B
Application number: CN201711483213.4A
Authority: CN
Inventors: 杨波
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2021-10-22
Anticipated expiration: 2037-12-29
Also published as: CN108320290A

Abstract

The invention relates to a method and a device for extracting and correcting a target picture, a computer device and a recording medium. The target picture extracting and correcting method comprises the following steps: predicting the scene picture by utilizing a plurality of customized trained neural network models to generate a plurality of target label graphs; obtaining a comprehensive label graph capable of covering the vertexes of the four corners of the quadrilateral target picture based on the plurality of target label graphs; acquiring vertex coordinates of the four corners from the comprehensive label graph by using a clustering algorithm; determining the corresponding relation between the four acquired vertex coordinates and the four corners of the quadrangle so as to adjust the sequence of the four vertex coordinates; and performing perspective transformation on the scene picture based on the adjusted four vertex coordinates to extract and correct the quadrilateral target picture.

Description

Target picture extraction and correction method and device, computer equipment and recording medium

Technical Field

The invention belongs to the technical field of image processing, and relates to a method and a device for extracting and correcting a quadrilateral target picture from a scene picture, computer equipment and a recording medium.

Background

In many services, a desired target picture needs to be extracted from a scene picture. At present, methods for extracting a required target picture from a scene picture are mostly target detection and identification methods based on opencv. For square pictures such as paper, bank cards and the like, HoughLine is generally used to detect the boundary straight lines of the target in the pictures after binarization or laplace transformation, so as to obtain the coordinates of the square corner points to realize the extraction and correction of the target image. However, this method has weak interference resistance, for example, the background is not clearly distinguished from the foreground, or the foreground picture contains a picture with longer color excess. Since it is difficult to correctly distinguish the boundaries, a great difficulty is brought to the object extraction. In addition, errors in line detection are amplified when acquiring corner positions, which affects the accurate extraction and correction of the desired target picture.

Disclosure of Invention

The present invention has been made to overcome one or more of the above-mentioned disadvantages, or other disadvantages, and the technical solutions adopted are as follows.

According to an aspect of the present invention, there is provided a method for extracting and rectifying a quadrangular target picture from a scene picture, including: step S1: predicting the scene picture by utilizing a plurality of customized trained neural network models to generate a plurality of target label graphs; step S2: obtaining a comprehensive label graph capable of covering the vertexes of the four corners of the quadrilateral target picture based on the plurality of target label graphs; step S3: acquiring vertex coordinates of the four corners from the comprehensive label graph by using a clustering algorithm; step S4: determining the corresponding relation between the four acquired vertex coordinates and the four corners of the quadrangle so as to adjust the sequence of the four vertex coordinates; and step S5: and carrying out perspective transformation on the scene picture based on the adjusted four vertex coordinates so as to extract and correct the quadrilateral target picture.

Further, in one aspect according to the present invention, further comprising: step S00: generating an input picture for training and a label picture for training by using the random background picture and the quadrangular target picture for training; step S01: setting different parameters to establish the plurality of neural network models; and step S02: training the plurality of neural network models using the training input picture and the training label graph.

Further, in one aspect according to the invention, the neural network model is an HED model.

Further, in an aspect according to the present invention, the step S2 includes: step S21: retaining all points of vertices in the plurality of target label graphs that are simultaneously predicted as corners; step S22: taking a matrix kernel with a certain size to perform corrosion and expansion operations on all the points; and step S23: obtaining the comprehensive label map based on the results of the steps S21 and S22.

Further, in an aspect according to the present invention, the step S3 includes: step S31: obtaining the coordinates of all vertexes judged as angles in the comprehensive label graph; and step S32: applying the clustering algorithm to the coordinates to obtain vertex coordinates of the four corners.

Further, in one aspect according to the invention, the clustering algorithm is a kmeans algorithm.

Further, in an aspect according to the present invention, the step S4 includes: step S41: for each coordinate in the acquired four vertex coordinates, summing horizontal and vertical coordinates of the coordinate to obtain four coordinate sums corresponding to the four vertices; step S42: respectively determining the coordinates corresponding to the minimum value and the maximum value in the four coordinate sums as the coordinates of the top left corner vertex and the top right corner vertex of the quadrangle; step S43: comparing the abscissas of the remaining two coordinates; step S44: determining the corresponding relation between the two coordinates and the coordinates of the top left corner vertex and the bottom right corner vertex of the quadrangle according to the comparison result of the step S43; and step S45: the order of the four vertex coordinates is adjusted accordingly according to the results of the step S42 and the step S44.

Further, in an aspect according to the present invention, the step S5 includes: step S51: determining a perspective transformation operator based on the adjusted four vertex coordinates and predetermined correction target vertex coordinates; and step S52: and carrying out perspective transformation on the scene picture by using the perspective transformation operator so as to extract and correct the quadrilateral target picture.

According to another aspect of the present invention, there is provided an apparatus for extracting and rectifying a quadrangular target picture from a scene picture, including: a unit 1, configured to predict a scene picture using a plurality of custom-trained neural network models to generate a plurality of target label maps; a unit 2, configured to obtain a comprehensive label map that can cover vertices of four corners of the quadrangular target picture based on the plurality of target label maps; a 3 rd unit for acquiring vertex coordinates of the four corners from the integrated label map using a clustering algorithm; a 4 th unit for determining correspondence relationships between the acquired four vertex coordinates and the four corners of the quadrangle to adjust an order of the four vertex coordinates; and the 5 th unit is used for carrying out perspective transformation on the scene picture based on the adjusted perspective transformation relation between the four vertex coordinates and the corresponding target quadrilateral vertexes so as to extract the quadrilateral target picture and correct the quadrilateral target picture.

Further, in another aspect according to the present invention, further comprising: means for generating a training input picture and a training label picture using the random background picture and the training quadrangular target picture; means for setting different parameters to build the plurality of neural network models; and means for training the plurality of neural network models using the training input picture and the training label graph.

Further, in another aspect according to the invention, the neural network model is an HED model.

Further, in another aspect according to the present invention, the 2 nd unit includes: a 2A unit for retaining all points of vertices of the plurality of target label graphs that are predicted as corners at the same time; a 2B unit for taking a matrix kernel of a certain size to perform erosion and expansion operations on all the points; and a 2C unit for obtaining the comprehensive label map based on the results in the 2A and 2B units.

Further, in another aspect according to the present invention, the 3 rd unit includes: a 3A unit for obtaining coordinates of all vertexes judged as angles in the integrated label map; and a 3B unit for applying the clustering algorithm to the coordinates to obtain vertex coordinates of the four corners.

Further, in another aspect according to the present invention, the clustering algorithm is a kmeans algorithm.

Further, in another aspect according to the present invention, the 4 th unit includes: a 4A unit for summing the horizontal and vertical coordinates of each of the acquired four vertex coordinates to obtain four coordinate sums corresponding to the four vertices; a 4B unit, configured to determine coordinates corresponding to a minimum value and a maximum value in the four coordinate sums as coordinates of a lower left corner vertex and an upper right corner vertex of the quadrangle, respectively; a 4C unit comparing the abscissas of the remaining two coordinates; a 4D unit, configured to determine, according to the comparison result in the 4C unit, a correspondence between the two coordinates and coordinates of an upper-left corner vertex and a lower-right corner vertex of the quadrangle; and a 4E unit for adjusting the order of the four vertex coordinates according to the results of the 4B unit and the 4D unit.

Further, in another aspect according to the present invention, the 5 th unit includes: unit 5A: the system comprises a processor, a controller and a controller, wherein the processor is used for determining a perspective transformation operator based on the adjusted four vertex coordinates and predetermined correction target vertex coordinates; and a 5B unit: and carrying out perspective transformation on the scene picture by using the perspective transformation operator so as to extract and correct the quadrilateral target picture.

According to a further aspect of the invention there is provided a computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterised in that the steps of a method according to an aspect of the invention are carried out when the program is executed by the processor.

According to yet another aspect of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, characterized in that the program is executed by a computer to implement the steps of a method according to an aspect of the present invention.

Compared with the prior art, the invention can obtain one or more of the following beneficial effects:

1) according to the invention, the quadrilateral target picture can be effectively extracted from any background (especially a complex background), and the anti-interference capability on changes such as perspective, displacement and the like is stronger;

2) according to the invention, by adopting a plurality of models to generate the label graph, the error and noise influence can be reduced;

3) according to the invention, the portability is strong, and the corresponding training data can be used for different application scenes and services to obtain the corresponding model meeting the requirements.

Drawings

Fig. 1 is an exemplary flowchart of a method for extracting and rectifying a quadrangular target picture from a scene picture according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of generation of an input picture for training according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of generation of a tagchart for training according to an embodiment of the invention.

FIG. 4 is an exemplary diagram of a target tag map and a comprehensive tag map according to one embodiment of the invention.

Fig. 5 is a schematic diagram of generating an outline of a quadrangular target picture depicted according to a coordinate correspondence according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of extracting and correcting a quadrangular target picture according to an embodiment of the present invention.

Fig. 7 is an exemplary block diagram of an apparatus for extracting and rectifying a quadrangular target picture from a scene picture according to an embodiment of the present invention.

FIG. 8 is an exemplary block diagram of a computer device for performing the method shown in FIG. 1, according to one embodiment of the invention.

Detailed Description

The present invention relates to a method and apparatus for extracting and correcting a quadrangular target picture from a scene picture, a computer device, and a recording medium, which will be described in further detail below with reference to the accompanying drawings. It is to be noted that the following detailed description is exemplary rather than limiting, is intended to provide a basic understanding of the invention, and is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.

The present invention is described below with reference to block diagram illustrations, block diagrams, and/or flow diagrams of methods and apparatus of embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block and/or flow diagram block or blocks.

These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable processor to produce a computer implemented process such that the instructions which execute on the computer or other programmable processor provide steps for implementing the functions or acts specified in the flowchart and/or block diagram block or blocks. It should also be noted that, in some alternative implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The method and the device for extracting and correcting the quadrangular target picture from the scene picture can be used in various business scenes, for example, the quadrangular target picture such as an identity card, a bank card, a membership card, various files and the like can be extracted and corrected from the scene picture acquired by an image acquisition device such as a camera, a video camera and the like. The present invention is described below by taking an example of extracting and rectifying a target picture of a bank card from a scene picture, and those skilled in the art will understand that the method and apparatus for extracting and rectifying a target picture of a bank card of the following example may be applied to extraction and rectification of other types of target pictures in a transformation manner according to the type transformation of the extracted and rectified target picture.

Fig. 1 is a flow chart illustrating a method for extracting and rectifying a quadrangular target picture from a scene picture according to an embodiment of the present invention. As shown in fig. 1, the method S100 includes the following steps: the scene picture is predicted using a plurality of custom trained neural network models to generate a plurality of target label maps (step S1).

In an embodiment, as shown in fig. 1, the method S100 may further include the following steps: a comprehensive label map that can cover the vertices of the four corners of the quadrangular target picture is obtained based on the plurality of target label maps (step S2).

In an embodiment, as shown in fig. 1, the method S100 may further include the following steps: the vertex coordinates of the four corners are acquired from the integrated label map using a clustering algorithm (step S3).

In an embodiment, as shown in fig. 1, the method S100 may further include the following steps: the correspondence relationship of the acquired four vertex coordinates and the four corners of the quadrangle is determined to adjust the order of the four vertex coordinates (step S4).

In an embodiment, as shown in fig. 1, the method S100 may further include the following steps: and performing perspective transformation on the scene picture based on the adjusted four vertex coordinates to extract and rectify the quadrangular target picture (step S5).

Before describing the above steps S1 to S5 in detail, a process of how to obtain a plurality of customized trained neural network models in step S1 is briefly illustrated.

First, a random background picture and a training quadrangular target picture are processed by a manual synthesis method to generate a training input picture and a training label picture (step S00).

In one embodiment, in step S00, a large number of random pictures are selected and respectively clipped to obtain a random background picture consistent with the input size of the neural network model described later, for example, the background picture shown in fig. 2. Further, a large number of quadrangular target pictures for training, for example, a bank card foreground picture shown in fig. 2 (taking bank card foreground pictures of two bank cards as an example) are selected. The following perspective transformation is carried out on the foreground images of the bank cards:

(1) randomly selecting an initial position for each bank card foreground image, expanding the initial position to be the size of a background image, supplementing 0 to the boundary, and recording initial coordinates of four vertexes;

(2) randomly moving the four vertexes to obtain new coordinates of the four vertexes, ensuring that the four vertexes do not overflow the boundary of the background image in the moving process, and ensuring that the geometric relative sequence of the four vertexes is unchanged (namely the relative relation of the left lower part, the left upper part, the right upper part and the right lower part);

(3) obtaining a perspective transformation operator according to the initial coordinate and the new coordinate;

(4) and utilizing a perspective transformation operator to realize perspective transformation on the expanded foreground picture. And finally, fusing the perspective-transformed foreground image and the background image of the bank card to obtain an input picture for final training, namely an input picture for training (such as a composite image shown in fig. 2).

In addition, the new coordinates after the random movement in (2) above are saved, and the new coordinates of the four vertices are plotted on the blank image without the background image as a label image for final training, that is, a training label image, for example, the label image shown in fig. 3.

Next, different parameters are set to build a plurality (e.g., n, where n is a positive integer) of neural network models (step S01). The neural network model may be an HED model, however, one skilled in the art will recognize that the neural network model herein is not limited to an HED model, and other types of neural network models having edge detection functions may be applied thereto to achieve substantially the same function or effect.

Then, the n HED models are trained using the plurality of training input pictures obtained in step S00 and the corresponding training label maps (step S02).

In an embodiment, each training input picture obtained in step S00 is input to each of the n HED models as an input picture, and a corresponding label map is generated through operations and processing inside the model. Specifically, the generated label map is compared with the corresponding training label map obtained in step S00, the difference between them is expressed by a loss function J, and the parameters of the HED model used are adjusted to the optimal values by minimizing the loss function J, so that n HED models with different trained parameters are generated. Since the label data in fig. 3 is extremely asymmetric as the loss function J, the loss function J can be expressed by, for example, the following cross-entropy equation (1) with weight:

wherein y is the label data in the label graph used in the final training,

for the label data in the label graph generated by the model, p is a positive label weight, and the label data is a matrix with a value of (0, 1), for example, the position of the vertex coordinate is labeled as label data 1, and the rest positions are labeled as label data 0.

After obtaining the trained n HED models with different parameters, processing the scene pictures that need to be subjected to target picture extraction and rectification, specifically, passing the scene pictures through the trained n HED models respectively to generate n target label maps, for example, 5 target label maps (a) (b) (c) (d) (e) shown in fig. 4 (step S1). The purpose of simultaneously generating the target label graph by adopting n models is to reduce the influence of errors and noise, wherein n is more than or equal to 2, and the specific value of n can be set according to requirements and computing resources.

Next, the n (5 in FIG. 4) target label maps are merged. Specifically, first, all points of the 5 object label maps that are simultaneously predicted as vertices of corners are retained, and a result I is obtained₁(step S21), next, take a matrix kernel of a certain size (e.g., size (2,2)) to pair I₁Sequentially performing corrosion and expansion to eliminate noise and obtain result I₂(step S22), then, to ensure that the dilation does not add a new point, I₁And I₂The two are multiplied to obtain a comprehensive label graph which can cover the top points of the four corners of the bank card picture as shown in (f) of fig. 4 (step S23).

Then, the vertex coordinates of the four corners are acquired from the integrated label map using a clustering algorithm (step S3). Specifically, the coordinates C of all the vertices determined as corners in the integrated label map as shown in (f) in fig. 4 are obtained₁(step S31), further, for the coordinate C₁A clustering algorithm is applied to obtain the vertex coordinates C of the four corners (step S32). The clustering algorithm may be a kmeans algorithm, however, one skilled in the art will recognize that the clustering algorithm herein is not limited thereto.

Since the order of the vertex coordinates C of the four corners obtained through the above steps is generally random, that is, the correspondence relationship between the vertex C of the top left corner, the vertex of the top right corner, the vertex of the bottom left corner, and the vertex of the bottom right corner of the bank card picture is not determined, the vertex coordinates C of the four corners need to be adjusted to clarify the correspondence relationship (step S4).

In one embodiment, the final coordinate order is taken as id, and then, for each of the obtained coordinates of the four vertices, the horizontal and vertical coordinates thereof are summed to obtain the sum of the four coordinates corresponding to the four vertices, i.e., S ═ S1, S2, S3, S4](step S41). The coordinates corresponding to the minimum value and the maximum value in S1, S2, S3 and S4 are respectively determined as the top point of the lower left corner of the quadrangle (corresponding to id)₁) And the top right vertex (corresponding id)₃) Of (a) coordinate, i.e. id₁＝argmin(S)，id₃Argmin denotes the number of the minimum value, and argmax denotes the number of the maximum value (step S42). Then, the remaining two coordinates in C are recorded as

And the serial numbers are noted

If it is

Then

If it is

Then

That is, the smaller of the x-coordinate components is id₂Corresponding to the top left corner vertex, the larger x coordinate component is id₃Corresponding to the lower right corner vertex (steps S43 and S44). The bank card is then outlined according to the correct correspondence, for example as shown in figure 5.

Through the above adjustment, four vertex coordinates whose final coordinate order is id are obtained. On the other hand, it is known that the target length and width of the extracted and corrected quadrangular target picture, i.e., the correction target vertex coordinates C of four vertices₀E.g. C₀＝[[0,0],[0,107],[170,107],[170,0]]The top vertex of the left corner, the top vertex of the right corner and the top vertex of the right corner of the corrected bank card picture are in one-to-one correspondence. By calculating the coordinates C of the vertex of the correction target₀With the final coordinate obtained in order id₁、id₂、id₃、id₄Determines a perspective transformation operator M (step S51), and then uses the perspective transformation operator M to perform perspective transformation on the scene picture which needs to be subjected to bank card picture extraction and rectification and is shown on the left side of fig. 6, and cuts out the bank card picture, as shown on the right side of fig. 6 (step S52).

Through the steps, the required quadrilateral target picture can be extracted and corrected from any scene picture.

Next, an apparatus for performing the extraction and rectification of the quadrangular target picture from the scene picture shown in fig. 1 will be described with reference to fig. 7. As shown in fig. 7, the apparatus includes 1 st to 5 th units. Although only 5 units are shown in fig. 7, the apparatus may further include other units, preferably, a unit for generating an input picture for training and a label picture for training using a random background picture and a quadrangular target picture for training, a unit for setting different parameters to create the plurality of neural network models, and a unit for training the plurality of neural network models using the input picture for training and the label picture for training. The neural network model may be an HED model, however, those skilled in the art should recognize that the neural network model herein is not limited to the HED model, and any neural network model with an edge detection function falls within the scope of the present invention.

The functions of the 1 st to 5 th units are explained in detail below.

Unit 1 is a unit for predicting a scene picture with a plurality of custom trained neural network models to generate a plurality of target label maps.

Element 2 102 is an element for obtaining a comprehensive label map capable of covering the vertices of the four corners of the quadrangular target picture based on the plurality of target label maps. Preferably, the 2 nd unit includes: a 2A unit for retaining all points of the plurality of target label maps that are simultaneously predicted as vertices of angles, a 2B unit for taking a matrix kernel of a size to perform erosion and dilation operations on the all points, and a 2C unit for obtaining the composite label map based on results in the 2A and 2B units.

Element 3 is a unit for obtaining the vertex coordinates of the four corners from the integrated label map using a clustering algorithm. Preferably, the 3 rd unit includes: a 3A unit for obtaining the coordinates of all the vertices judged as corners in the composite label map, and a 3B unit for applying the clustering algorithm to the coordinates to obtain the coordinates of the vertices of the four corners. The clustering algorithm described above may be a kmeans algorithm, however, those skilled in the art will recognize that the clustering algorithm herein is not limited thereto.

The 4 th unit is a unit for determining correspondence relationships between the acquired four vertex coordinates and the four corners of the quadrangle to adjust the order of the four vertex coordinates. Preferably, the 4 th unit includes: a 4A unit for summing the horizontal and vertical coordinates of each of the acquired four vertex coordinates to obtain four coordinate sums corresponding to the four vertices; a 4B unit for determining the coordinates corresponding to the minimum value and the maximum value in the four coordinate sums as the coordinates of the top left corner vertex and the top right corner vertex of the quadrangle respectively; a 4C unit for comparing the abscissas of the remaining two coordinates; a 4D unit for determining the correspondence of the two coordinates to the coordinates of the upper left corner vertex and the lower right corner vertex of the quadrangle according to the comparison result in the 4C unit; and a 4E unit for adjusting the order of the four vertex coordinates accordingly according to the results of the 4B unit and the 4D unit.

The 5 th unit is a unit for performing perspective transformation on the scene picture based on the adjusted four vertex coordinates to extract and correct the quadrangular target picture. Preferably, the 5 th unit includes: a 5A unit for determining a perspective transformation operator based on the adjusted four vertex coordinates and the predetermined correction target vertex coordinates, and a 5B unit for performing perspective transformation on the scene picture by using the perspective transformation operator to extract and correct the quadrilateral target picture.

Although the description has been made mainly on embodiments of a method and apparatus for extracting and correcting a quadrangular target picture from a scene picture, the present invention is not limited to these embodiments, and may be implemented as follows: a computer device for executing the above method, or a computer program for realizing the functions of the above apparatus, or a computer-readable recording medium on which the computer program is recorded.

A computer device for performing the method for extracting and rectifying the quadrangular target picture from the scene picture shown in fig. 1 according to an embodiment of the present invention is shown in fig. 8. As shown in fig. 8, the computer device 200 includes a memory 201 and a processor 202. Although not shown, the computer device 200 also includes a computer program stored on the memory 201 and executable on the processor 202. The processor implements the following steps when executing the program: predicting the scene picture by using a plurality of customized trained neural network models to generate a plurality of target label maps (step S1); obtaining a comprehensive label map capable of covering the vertexes of the four corners of the quadrangular target picture based on the plurality of target label maps (step S2); acquiring vertex coordinates of the four corners from the integrated label map using a clustering algorithm (step S3); determining correspondence relationships of the acquired four vertex coordinates to the four corners of the quadrangle to adjust an order of the four vertex coordinates (step S4); and performing perspective transformation on the scene picture based on the adjusted four vertex coordinates to extract and rectify the quadrangular target picture (step S5).

In addition to the above steps S1 to S5, the processor 202, when executing the program, further implements the steps of: generating a training input picture and a training label picture using the random background picture and the training quadrangular target picture (step S00); setting different parameters to build the plurality of neural network models (step S01); and training the plurality of neural network models using the training input picture and the training label graph (step S02).

It should be noted that the neural network model may be an HED model, but those skilled in the art should recognize that the neural network model herein is not limited to the HED model, and any neural network model with an edge detection function falls within the scope of the present invention.

Preferably, the step S2 includes: retaining all points of the plurality of target label maps which are simultaneously predicted as vertices of corners (step S21); taking a matrix kernel of a certain size to perform erosion and expansion operations on all the points (step S22); and obtaining the integrated label map based on the results of the steps S21 and S22 (step S23).

Preferably, the step S3 includes: obtaining coordinates of all vertexes judged as angles in the composite label map (step S31); and applying the clustering algorithm to the coordinates to obtain vertex coordinates of the four corners (step S32).

It should be noted that the above-mentioned clustering algorithm may be a kmeans algorithm, but those skilled in the art should recognize that the clustering algorithm herein is not limited thereto.

Further, it is preferable that the step S4 includes: for each of the acquired four vertex coordinates, summing up the horizontal and vertical coordinates thereof to obtain four coordinate sums corresponding to the four vertices (step S41); determining the coordinates corresponding to the minimum value and the maximum value in the four coordinate sums as the coordinates of the top left corner vertex and the top right corner vertex of the quadrangle respectively (step S42); comparing the abscissas of the remaining two coordinates (step S43); determining the correspondence between the two coordinates and the coordinates of the upper left corner vertex and the lower right corner vertex of the quadrangle according to the comparison result of the step S43 (step S44); and adjusting the order of the four vertex coordinates accordingly according to the results of the step S42 and the step S44 (step S45).

Preferably, the step S5 includes: determining a perspective transformation operator based on the adjusted four vertex coordinates and predetermined correction target vertex coordinates (step S51); and performing perspective transformation on the scene picture by using the perspective transformation operator to extract and rectify the quadrangular target picture (step S52).

In addition, as described above, the present invention can also be embodied as a recording medium in which a program for causing a computer to execute the method for extracting and correcting a quadrangular target picture from a scene picture shown in fig. 1 is stored.

As the recording medium, various types of recording media such as a disk (e.g., a magnetic disk, an optical disk, etc.), a card (e.g., a memory card, an optical card, etc.), a semiconductor memory (e.g., a ROM, a nonvolatile memory, etc.), a tape (e.g., a magnetic tape, a cassette tape, etc.), and the like can be used.

By recording a computer program that causes a computer to execute the method of extracting and correcting a quadrangular target picture from a scene picture or a computer program that causes a computer to realize the function of the apparatus of extracting and correcting a quadrangular target picture from a scene picture in the above-described embodiments in these recording media and circulating them, it is possible to reduce costs and improve portability and versatility.

The recording medium is loaded on a computer, a computer program recorded on the recording medium is read by the computer and stored in a memory, and a processor (CPU: Central Processing Unit (CPU)), MPU: Micro Processing Unit (MPU: Micro Processing Unit)) included in the computer reads the computer program from the memory and executes the computer program, whereby the method of extracting and correcting a quadrangular target picture from a scene picture in the above embodiment can be executed, and the function of the apparatus of extracting and correcting a quadrangular target picture from a scene picture in the above embodiment can be realized.

It will be appreciated by persons skilled in the art that the present invention is not limited to the embodiments described above, but that the invention may be embodied in many other forms without departing from the spirit or scope of the invention. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made thereto without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims

1. A method for extracting and rectifying a quadrangular target picture from a scene picture, comprising:

step S1: predicting the scene picture by utilizing a plurality of customized trained neural network models to generate a plurality of target label graphs;

step S2: obtaining a comprehensive label graph capable of covering the vertexes of the four corners of the quadrilateral target picture based on the plurality of target label graphs;

step S3: acquiring vertex coordinates of the four corners from the comprehensive label graph by using a clustering algorithm;

step S4: determining the corresponding relation between the four acquired vertex coordinates and the four corners of the quadrangle so as to adjust the sequence of the four vertex coordinates; and

step S5: performing perspective transformation on the scene picture based on the adjusted four vertex coordinates to extract and correct the quadrangular target picture,

the method further comprises the following steps:

step S00: generating an input picture for training and a label picture for training by using the random background picture and the quadrangular target picture for training;

step S01: setting different parameters to establish the plurality of neural network models; and

step S02: training the plurality of neural network models using the training input picture and the training label graph.

2. The method of claim 1, wherein the neural network model is an HED model.

3. The method according to claim 1, wherein the step S2 includes:

step S21: retaining all points of vertices in the plurality of target label graphs that are simultaneously predicted as corners;

step S22: taking a matrix kernel with a certain size to perform corrosion and expansion operations on all the points; and

step S23: obtaining the comprehensive label map based on the results of the steps S21 and S22.

4. The method according to claim 1, wherein the step S3 includes:

step S31: obtaining the coordinates of all vertexes judged as angles in the comprehensive label graph; and

step S32: applying the clustering algorithm to the coordinates to obtain vertex coordinates of the four corners.

5. The method of claim 4, wherein the clustering algorithm is a kmeans algorithm.

6. The method according to claim 1, wherein the step S4 includes:

step S41: for each coordinate in the acquired four vertex coordinates, summing horizontal and vertical coordinates of the coordinate to obtain four coordinate sums corresponding to the four vertices;

step S42: respectively determining the coordinates corresponding to the minimum value and the maximum value in the four coordinate sums as the coordinates of the top left corner vertex and the top right corner vertex of the quadrangle;

step S43: comparing the abscissas of the remaining two coordinates;

step S44: determining the corresponding relation between the two coordinates and the coordinates of the top left corner vertex and the bottom right corner vertex of the quadrangle according to the comparison result of the step S43; and

step S45: the order of the four vertex coordinates is adjusted accordingly according to the results of the step S42 and the step S44.

7. The method according to claim 1, wherein the step S5 includes:

step S51: determining a perspective transformation operator based on the adjusted four vertex coordinates and predetermined correction target vertex coordinates; and

step S52: and carrying out perspective transformation on the scene picture by using the perspective transformation operator so as to extract and correct the quadrilateral target picture.

8. An apparatus for extracting and rectifying a quadrangular target picture from a scene picture, comprising:

a unit 1, configured to predict a scene picture using a plurality of custom-trained neural network models to generate a plurality of target label maps;

a unit 2, configured to obtain a comprehensive label map that can cover vertices of four corners of the quadrangular target picture based on the plurality of target label maps;

a 3 rd unit for acquiring vertex coordinates of the four corners from the integrated label map using a clustering algorithm;

a 4 th unit for determining correspondence relationships between the acquired four vertex coordinates and the four corners of the quadrangle to adjust an order of the four vertex coordinates; and

a 5 th unit for subjecting the scene picture to perspective transformation based on the adjusted four vertex coordinates to extract and rectify the quadrangular target picture,

the device further comprises:

means for generating a training input picture and a training label picture using the random background picture and the training quadrangular target picture;

means for setting different parameters to build the plurality of neural network models; and

means for training the plurality of neural network models using the training input picture and the training label graph.

9. The apparatus of claim 8, wherein the neural network model is an HED model.

10. The apparatus of claim 8, wherein the 2 nd unit comprises:

a 2A unit for retaining all points of vertices of the plurality of target label graphs that are predicted as corners at the same time;

a 2B unit for taking a matrix kernel of a certain size to perform erosion and expansion operations on all the points; and

a 2C unit to obtain the composite label map based on the results in the 2A and 2B units.

11. The apparatus of claim 8, wherein the 3 rd unit comprises:

a 3A unit for obtaining coordinates of all vertexes judged as angles in the integrated label map; and

a 3B unit for applying the clustering algorithm to the coordinates to obtain vertex coordinates of the four corners.

12. The apparatus of claim 11, wherein the clustering algorithm is a kmeans algorithm.

13. The apparatus of claim 8, wherein the 4 th unit comprises:

a 4A unit for summing the horizontal and vertical coordinates of each of the acquired four vertex coordinates to obtain four coordinate sums corresponding to the four vertices;

a 4B unit, configured to determine coordinates corresponding to a minimum value and a maximum value in the four coordinate sums as coordinates of a lower left corner vertex and an upper right corner vertex of the quadrangle, respectively;

a 4C unit comparing the abscissas of the remaining two coordinates;

a 4D unit, configured to determine, according to the comparison result in the 4C unit, a correspondence between the two coordinates and coordinates of an upper-left corner vertex and a lower-right corner vertex of the quadrangle; and

a 4E unit for adjusting the order of four vertex coordinates accordingly according to the results of the 4B unit and the 4D unit.

14. The apparatus of claim 8, wherein the 5 th unit comprises:

unit 5A: the system comprises a processor, a controller and a controller, wherein the processor is used for determining a perspective transformation operator based on the adjusted four vertex coordinates and predetermined correction target vertex coordinates; and

unit 5B: and carrying out perspective transformation on the scene picture by using the perspective transformation operator so as to extract and correct the quadrilateral target picture.

15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the program is executed by the processor.

16. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a computer to implement the steps of the method according to any one of claims 1 to 7.