CN113850805B

CN113850805B - Multi-document detection method and device, electronic equipment and storage medium

Info

Publication number: CN113850805B
Application number: CN202111433043.5A
Authority: CN
Inventors: 张子浩
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-02-22
Anticipated expiration: 2041-11-29
Also published as: CN113850805A

Abstract

The present disclosure provides a multi-document detection method, apparatus, electronic device and storage medium, the method comprising: the method comprises the steps of obtaining an image to be detected containing at least one document, inputting the image to be detected into a pre-trained central point and corner point detection model to obtain a central point prediction graph and a corner point prediction graph of the image to be detected, then determining at least one central point from the central point prediction graph, determining a plurality of corner points from the corner point prediction graph, matching each central point in the at least one central point with the plurality of corner points to determine a target corner point associated with each central point, and then cutting the image to be detected by adopting perspective transformation based on the target corner point associated with each central point to obtain a document image corresponding to each document in the at least one document.

Description

Multi-document detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method and an apparatus for detecting multiple documents, an electronic device, and a storage medium.

Background

With the continuous development of computer technology and artificial intelligence technology, the artificial intelligence technology has been gradually applied in educational and teaching scenes, such as photo-taking for searching questions, intelligent correction, question entry, and the like.

In an educational scene, when a student or a teacher takes an image of a test paper, a homework, or the like, a plurality of document pages may be taken at the same time, so that one image includes a plurality of documents. However, the existing document detection technology can only detect a single-page document in an image, and does not support multi-page document detection, so that the accuracy of modifying an image containing multiple documents is low.

Therefore, how to split a plurality of documents in an image is an urgent problem to be solved.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a multi-document detection method, apparatus, electronic device, and storage medium.

According to an aspect of the present disclosure, there is provided a multi-document detection method including:

acquiring an image to be detected, wherein the image to be detected comprises at least one document;

inputting the image to be detected into a pre-trained central point and corner point detection model to obtain a central point prediction image and a corner point prediction image of the image to be detected;

determining at least one center point from the center point prediction map, and determining a plurality of corner points from the corner point prediction map;

matching each central point of the at least one central point with the plurality of corner points to determine a target corner point associated with each central point;

and based on the target corner point associated with each central point, cutting the image to be detected by adopting perspective transformation to obtain a document image corresponding to each document in the at least one document.

According to another aspect of the present disclosure, there is provided a multi-document detecting apparatus including:

the image acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises at least one document;

the prediction image acquisition module is used for inputting the image to be detected into a pre-trained central point and corner point detection model so as to acquire a central point prediction image and a corner point prediction image of the image to be detected;

a determining module, configured to determine at least one center point from the center point prediction map, and determine a plurality of corner points from the corner point prediction map;

a matching module, configured to match each central point of the at least one central point with the plurality of corner points, so as to determine a target corner point associated with each central point;

and the cutting module is used for cutting the image to be detected by adopting perspective transformation based on the target corner point associated with each central point to obtain a document image corresponding to each document in the at least one document.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the multiple document detection method according to the aforementioned aspect.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the multi-document detection method according to the foregoing one aspect.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is adapted to cause the computer to carry out the multiple document detection method according to the preceding aspect.

One or more technical solutions provided in the embodiments of the present disclosure can implement detection on at least one document included in an image, and implement splitting of multiple documents in the image.

Drawings

Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 shows a flow diagram of a multiple document detection method according to an example embodiment of the present disclosure;

FIG. 2 illustrates a flow diagram of a multiple document detection method according to another exemplary embodiment of the present disclosure;

FIG. 3 shows a block diagram of a center point and corner point detection model according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates an example diagram of an image to be detected according to an example embodiment of the present disclosure;

FIG. 5A illustrates an example graph of a center point prediction graph according to an example embodiment of the present disclosure;

FIG. 5B illustrates an exemplary diagram of a corner prediction graph according to an exemplary embodiment of the present disclosure;

FIG. 6 illustrates a post-processing flow diagram according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a cropping result diagram of a document image according to an exemplary embodiment of the present disclosure;

FIG. 8 shows a schematic block diagram of a multiple document detection apparatus according to an example embodiment of the present disclosure;

FIG. 9 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

A multi-document detection method, apparatus, electronic device, and storage medium provided by the present disclosure are described below with reference to the accompanying drawings.

The document detection technology plays an important role in the teaching and teaching scene. The existing document detection technology can only support single-page document detection and can not support multi-page document detection, and for an image comprising a plurality of documents, each document in the image cannot be accurately identified, so that the accuracy rate of modifying the image comprising the plurality of documents is low.

In addition, the conventional document detection technology only supports the detection of a rectangular frame or an inclined rectangular frame, but after the document page is shot, the document may have an irregular shape such as a trapezoid or a non-rectangular quadrangle in the image, and the conventional document detection technology cannot accurately detect the document with the irregular shape in the image.

In order to solve the problems, the disclosure provides a multi-document detection method, which includes the specific scheme that a central point and angular points of an image to be detected are obtained, each document in the image is cut out according to the central point and the angular points, the number of the documents contained in the image can be accurately obtained through the central point detection, each angular point coordinate of each document in the image can be obtained through the angular point detection, an image corresponding to each document is further obtained, and the separation of a plurality of documents in the image is achieved.

Fig. 1 shows a flowchart of a multi-document detection method according to an exemplary embodiment of the present disclosure, which may be performed by a multi-document detection apparatus, wherein the apparatus may be implemented in software and/or hardware, and may be generally integrated in an electronic device, which may be, but is not limited to, a smartphone, a tablet computer, a laptop computer, a server, a wearable device, and the like. As shown in fig. 1, the multi-document detection method includes:

step 101, obtaining an image to be detected, wherein the image to be detected comprises at least one document.

The image to be detected is an image which needs to be subjected to document detection.

For example, in a scenario where a student performs self-checking on a homework completed by the student, the image to be detected may be an image obtained after the student photographs at least one document page through a camera of the electronic device.

For example, in a scene where the teacher corrects the paper homework or test paper handed over by the student, the image to be detected may be an image obtained after the teacher photographs the paper homework or test paper handed over by the student through a camera of the electronic device.

For example, in a scene in which a student submits a finished paper job in the form of a photo, the image to be detected may be an image obtained from a storage space of the electronic device, and the image in the storage space of the electronic device is uploaded to the electronic device and stored in the storage space by the electronic device after the student photographs at least one document page of the paper job.

And 102, inputting the image to be detected into a pre-trained central point and corner point detection model to obtain a central point prediction image and a corner point prediction image of the image to be detected.

The center point and corner point detection model is obtained by pre-training, and the specific training process will be described in detail in the following embodiments, which are not repeated herein. And obtaining a central point prediction graph and a corner point prediction graph of each document contained in the image by using the trained central point and corner point detection model.

In the embodiment of the disclosure, for the acquired image to be detected, the image to be detected may be input into a pre-trained center point and corner point detection model, and a center point prediction graph and a corner point prediction graph corresponding to the image to be detected are output by the center point and corner point detection model.

The central point prediction graph includes prediction information of a central point of at least one document in the image to be detected, the corner point prediction graph includes prediction information of corner points of at least one document in the image to be detected, and the prediction information may include, but is not limited to, a prediction score of each feature point in the central point prediction graph and the corner point prediction graph, a coordinate value of each feature point, and the like.

It can be understood that the central point prediction graph includes a plurality of feature points, and the feature point with a higher prediction score may be the central point; the corner prediction map also contains a plurality of feature points, and the feature points with higher prediction scores may be corner points.

Step 103, determining at least one center point from the center point prediction map, and determining a plurality of corner points from the corner point prediction map.

In the embodiment of the present disclosure, after the center point prediction image and the corner point prediction image of the image to be detected are obtained, at least one center point may be determined from the center point prediction image, and a plurality of corner points may be determined from the corner point prediction image.

For example, according to the prediction score of each feature point in the center point prediction graph, the feature point with the largest prediction score is selected from the feature points as the determined center point, and according to the prediction score of each feature point in the corner point prediction graph, the feature point with the prediction score larger than the preset value is selected from the feature points as the corner point.

And 104, matching each central point of the at least one central point with the plurality of corner points to determine a target corner point associated with each central point.

In the embodiment of the present disclosure, for each determined center point, the center point may be matched with the determined plurality of corner points, and a target corner point associated with the center point is determined.

For example, for each center point, a distance from each corner point to the center point may be calculated, and four corner points closest to the center point are selected from the plurality of corner points as target corner points associated with the center point.

Further, in order to improve the accuracy of the determined target corner point, in an optional embodiment of the present disclosure, the determined target corner point may also be verified. It can be understood that the determined target corner points should be located in different directions of the central point, and if at least two target corner points of the plurality of target corner points associated with a certain central point are located in the same direction (for example, lower left) of the central point, the document image cannot be accurately cropped by using the determined target corner points, in which case, the target corner point associated with the central point may be determined again.

And 105, based on the target corner point associated with each central point, cutting the image to be detected by adopting perspective transformation to obtain a document image corresponding to each document in the at least one document.

In the embodiment of the disclosure, after the target corner point associated with each central point is determined, based on the determined target corner point, a document image corresponding to each document in at least one document can be cut out from the image to be detected by adopting perspective transformation, so as to obtain at least one document image, wherein the number of the document images is consistent with the number of the documents contained in the image to be detected.

It should be noted that image cropping using perspective transformation is a mature technique in image processing technology, and this disclosure does not describe this in detail.

The multi-document detection method of the embodiment of the disclosure includes obtaining an image to be detected including at least one document, inputting the image to be detected into a center point and corner detection model trained in advance to obtain a center point prediction graph and a corner point prediction graph of the image to be detected, then determining at least one center point from the center point prediction graph, determining a plurality of corner points from the corner point prediction graph, matching each center point of the at least one center point with the plurality of corner points to determine a target corner point associated with each center point, and then cutting the image to be detected by adopting perspective transformation based on the target corner point associated with each center point to obtain a document image corresponding to each document of the at least one document. By adopting the technical scheme, the number of the documents contained in the image to be detected can be accurately obtained through central point detection, each corner coordinate of each document in the image to be detected can be obtained through corner point detection, and then the document image corresponding to each document is obtained, so that the plurality of documents in the image to be detected are split, and the accuracy of correcting the image containing the plurality of documents is favorably improved.

In an optional implementation manner of the present disclosure, the determining at least one center point from the center point prediction graph may include:

according to the first prediction score of each first feature point in the central point prediction graph, acquiring the first feature point with the first prediction score larger than a first score threshold value as a candidate central point;

and merging the candidate central points of which the distance between the candidate central points is smaller than a first distance threshold value to obtain at least one central point in the central point prediction graph.

The first score threshold may be preset, for example, the first score threshold may be set to 0.3, 0.5, and the like, which is not limited by the present disclosure.

In the embodiment of the present disclosure, for the center point prediction graph and the corner point prediction graph obtained from the center point and the corner point detection model, for convenience of distinguishing, each feature point in the center point prediction graph is referred to as a first feature point, a prediction score corresponding to each first feature point is referred to as a first prediction score, correspondingly, each feature point in the corner point prediction graph is referred to as a second feature point, and a prediction score corresponding to each second feature point is referred to as a second prediction score.

Furthermore, when the center point is determined, the first prediction score corresponding to each first feature point in the center point prediction graph may be compared with a preset first score threshold, and the first feature points with the first prediction scores larger than the first score threshold are screened out from all the first feature points as candidate center points. Then, performing Non-Maximum Suppression (NMS) processing on the determined candidate center points, calculating distances between the candidate center points, and performing merging processing on the candidate center points with the distances smaller than a preset first distance threshold to obtain at least one center point in the center point prediction graph. It can be understood that the obtained number of the at least one center point reflects the number of the documents contained in the image to be detected.

The first distance threshold may be preset, and the first distance threshold may refer to the number of first feature points spaced between two candidate center points, for example, the first distance threshold may be set to 5, and if the number of first feature points spaced between two candidate center points is less than 5, the two candidate center points are merged.

For example, when merging the candidate center points, mean merging may be performed on the coordinates of the candidate center points to be merged, that is, mean values of the abscissa and the ordinate corresponding to each candidate center point to be merged are calculated according to the mean values of the abscissa and the ordinate, respectively, and a new point is determined as a center point according to the obtained mean values of the abscissa and the ordinate.

In the embodiment of the disclosure, the first feature points with the first prediction scores larger than the first score threshold are obtained as candidate center points according to the first prediction scores of each first feature point in the center point prediction graph, and the candidate center points with the distances between the candidate center points smaller than the first distance threshold are merged to obtain at least one center point in the center point prediction graph, so that the filtering and merging processing of the points in the center point prediction graph is realized to determine the at least one center point, and the number of pages of the document contained in the image can be determined.

Typically, the corner points are distributed in different directions of the center point, including above left, below left, above right and below right of the center point, and accordingly, the corner points include upper left, lower left, upper right and lower right corner points. Accordingly, in an optional embodiment of the present disclosure, the corner point prediction map includes an upper left corner point prediction map, a lower left corner point prediction map, an upper right corner point prediction map, and a lower right corner point prediction map, and accordingly, the determining a plurality of corner points from the corner point prediction map may include:

according to a second prediction score of each second feature point in the corner prediction image, acquiring second feature points with the second prediction scores larger than a second score threshold value as candidate corners;

and combining the candidate corner points with the distances between the candidate corner points smaller than a second distance threshold value to obtain a plurality of corner points in the corner point prediction graph, wherein the plurality of corner points comprise at least one upper left corner point in the upper left corner point prediction graph, at least one lower left corner point in the lower left corner point prediction graph, at least one upper right corner point in the upper right corner point prediction graph and at least one lower right corner point in the lower right corner point prediction graph.

The second score threshold may be preset, for example, the second score threshold may be set to 0.3, 0.5, and the like, which is not limited by the present disclosure.

In the embodiment of the present disclosure, when determining the corner points, for each corner point prediction graph (i.e., each of the upper left corner point prediction graph, the lower left corner point prediction graph, the upper right corner point prediction graph, and the lower right corner point prediction graph), the corner point of the corresponding type in each corner point prediction graph may be determined, that is, the upper left corner point is determined from the upper left corner point prediction graph, the lower left corner point is determined from the lower left corner point prediction graph, the upper right corner point is determined from the upper right corner point prediction graph, and the lower right corner point is determined from the lower right corner point prediction graph. And when the corner points of the corresponding type in each corner point prediction image are determined, comparing a second prediction score corresponding to each second feature point in the corner point prediction image with a preset second score threshold value, and screening out second feature points with the second prediction scores larger than the second score threshold value from all the second feature points as candidate corner points. And then, for the determined candidate corner points, calculating the distance between the candidate corner points, and combining the candidate corner points of which the distance is smaller than a preset second distance threshold value to obtain at least one corner point in each corner point prediction image.

The second distance threshold may be preset, and the second distance threshold may refer to the number of second feature points spaced between two candidate corner points, for example, the second distance threshold may be set to 5, and if the number of second feature points spaced between two candidate corner points is less than 5, the two candidate corner points are combined.

Taking the determination of at least one upper left corner point from the upper left corner point prediction image as an example, comparing the second prediction score of each second feature point in the upper left corner point prediction image with a preset second score threshold, and screening out second feature points with second prediction scores larger than the second score threshold from all the second feature points in the upper left corner point prediction image as candidate upper left corner points. And then, NMS processing is carried out on the candidate upper left corner points, the distance between the candidate upper left corner points is calculated, and the candidate upper left corner points of which the distance is smaller than a preset second distance threshold value are combined to obtain at least one upper left corner point in the upper left corner point prediction graph. Similarly, at least one lower left corner point in the lower left corner point prediction map, at least one upper right corner point in the upper right corner point prediction map, and at least one lower right corner point in the lower right corner point prediction map can be obtained by using the similar processing method.

For example, in the embodiment of the present disclosure, when the candidate corner points are merged, mean merging may be performed on coordinates of the candidate corner points to be merged, that is, a mean value of abscissa and a mean value of ordinate are calculated according to an average value of abscissa and ordinate corresponding to each candidate corner point to be merged, and a new point is determined as a corner point according to the obtained mean value of abscissa and mean value of ordinate, so as to obtain corner point data in each corner point prediction graph.

In the embodiment of the disclosure, according to the second prediction score of each second feature point in the corner prediction graph, the second feature point of which the second prediction score is greater than the second score threshold is obtained as the candidate corner point, and the candidate corner points of which the distance between the candidate corner points is smaller than the second distance threshold are merged to obtain the multiple corner points in the corner prediction graph.

In an optional implementation manner of the present disclosure, the matching each central point of the at least one central point with the plurality of corner points to determine a target corner point associated with each central point may include:

calculating a distance between each of the at least one center point and each of the plurality of corner points;

and determining a target corner associated with each central point from the plurality of corner points according to the distance between each central point and each corner point, wherein the target corner comprises a target upper left corner, a target lower left corner, a target upper right corner and a target lower right corner.

In the embodiment of the present disclosure, for each determined center point, a distance between the center point and each corner point may be calculated, and then according to the distance between each center point and each corner point, four corner points with the smallest distance from the center point are determined from the plurality of corner points, and are used as a target upper left corner point, a target lower left corner point, a target upper right corner point, and a target lower right corner point associated with the center point.

It can be understood that, when the image to be detected contains a plurality of documents, if the plurality of documents are not overlapped, the distance between the center point of each document and each corner point of the same document is the minimum, therefore, by calculating the distance between each center point and each corner point, the four corner points with the minimum distance are determined as the target corner points associated with the center point, and the determined target corner points are the corner points which belong to the same document with the center point.

In an optional implementation manner of the present disclosure, each center point may be respectively matched with each corner point of a corresponding type in each corner point prediction graph, and a target corner point of a corresponding type is determined from each corner point prediction graph. Thus, the determining, from the plurality of corner points, a target corner point associated with each central point according to the distance between each central point and each corner point may include:

determining an upper left corner point closest to each central point from the at least one upper left corner point according to the distance between each central point and the at least one upper left corner point, and using the upper left corner point as a target upper left corner point associated with each central point;

determining a lower left corner point closest to each central point from the at least one lower left corner point according to the distance between each central point and the at least one lower left corner point, and using the lower left corner point as a target lower left corner point associated with each central point;

determining an upper right corner point closest to each central point from the at least one upper right corner point according to the distance between each central point and the at least one upper right corner point, and using the upper right corner point as a target upper right corner point associated with each central point; and

and determining a lower right corner point closest to each central point from the at least one lower right corner point according to the distance between each central point and the at least one lower right corner point, and using the lower right corner point as a target lower right corner point associated with each central point.

Further, in an optional implementation manner of the present disclosure, in a case that there are a plurality of top left corner points closest to each central point, according to a position of the top left corner point closest to each central point relative to each central point, a target top left corner point associated with each central point is determined from the plurality of top left corner points closest to each central point.

For example, assuming that the upper-left corner point prediction map includes an upper-left corner point a and an upper-left corner point B, and the distance between the center point O and the upper-left corner point a is the same as the distance between the center point O and the upper-left corner point B, where the upper-left corner point a is on the left side of the center point O, and the upper-left corner point B is on the right side of the center point O, the upper-left corner point a may be determined as the target upper-left corner point associated with the center point O according to the prior knowledge of the upper-left corner point on the left side of the center point.

Similarly, when there are a plurality of lower left corner points closest to each central point, determining a target lower left corner point associated with each central point from the plurality of lower left corner points closest to each central point according to the position of the lower left corner point closest to each central point relative to each central point; under the condition that the number of the upper right corner points closest to each central point is multiple, determining a target upper right corner point associated with each central point from the upper right corner points closest to each central point according to the position of the upper right corner point closest to each central point relative to each central point; and under the condition that the lower right corner point closest to each central point is multiple, determining a target lower right corner point associated with each central point from the lower right corner points closest to each central point according to the position of the lower right corner point closest to each central point relative to each central point.

It can be understood that the position relationship between the corner points and the center point in each direction can be determined according to the coordinates of the corner points and the coordinates of the center point. For example, if the upper left corner of the image to be detected is the origin of coordinates, the horizontal direction is the horizontal coordinate, and the vertical direction is the vertical coordinate, if the horizontal coordinate of a certain corner point is greater than the horizontal coordinate of the center point, and the vertical coordinate of the corner point is less than the vertical coordinate of the center point, the position of the corner point above the right side of the center point can be determined.

In the embodiment of the disclosure, when the number of the upper left corner points closest to each center point is multiple, the target upper left corner point associated with each center point is determined from the multiple upper left corner points closest to each center point according to the position of the upper left corner point closest to each center point relative to each center point, so that when the associated target upper left corner point cannot be uniquely determined according to the minimum distance, the target upper left corner point can be finally determined according to the position relationship of each upper left corner point relative to the center point, and the feasibility of the scheme is improved.

Further, in order to improve the accuracy of the determined target corner point, in an optional embodiment of the present disclosure, the determined target corner point may also be corrected. Specifically, for a certain central point, if the abscissa or ordinate of a certain target corner point in the target corner points determined in the manner of minimizing the distance from the central point is different from the abscissa or ordinate of the target corner point on the same side, the coordinates of the target corner point on the same side may be used to correct the coordinates of the target corner point, so as to improve the accuracy of document splitting.

For example, let the determined coordinates of the target corner points associated with a certain center point be: upper left corner (x)_min，y_min) Right upper corner point (x)_max，y_min) Lower left corner (x)_minY) and lower right corner (x)_max，y_max) Wherein, y_min<y<y_maxIt can be seen that the lower left corner and the lower right corner are not on the same horizontal line, and if image cropping is performed in this way, the cropped document image may be incomplete. In this case, the coordinates of the lower left corner point may be corrected according to the coordinates of the lower right corner point, and the coordinates of the lower left corner point may be corrected to (x)_min，y_max）。

Fig. 2 is a flowchart illustrating a multi-document detection method according to another exemplary embodiment of the disclosure, and as shown in fig. 2, a center point and corner point detection model in the embodiment of the disclosure may be obtained by training:

step 201, a training sample set is obtained, where the training sample set includes a sample image and center point labeling data and corner point labeling data of a document in the sample image.

In the embodiment of the disclosure, a plurality of sample images including at least one document may be obtained from pictures published on the internet or by offline collection, the obtained sample images include but are not limited to educational scene documents, the obtained sample images are labeled, center point labeling data and corner point labeling data corresponding to each document in the sample images are labeled, and a plurality of labeled sample images are obtained to form a training sample set.

Illustratively, when labeling the sample image, the center point of each document in the sample image is selected and assigned as 1, and the remaining points are assigned as 0, and the corner points of each document in the sample image are selected and assigned as 1, and the remaining points are assigned as 0. It can be understood that the center point labeling data and the corner point labeling data include not only the assignment of each point, but also the coordinates corresponding to each point.

Optionally, because there is a difference in size between different sample images, in order to facilitate model training, the collected sample images may be subjected to a size correction process to correct the sample images of different sizes to a uniform size, for example, to correct the size of the sample images to a uniform size of 320 × 3.

Step 202, inputting the sample image into a model to be trained for feature extraction, and obtaining an output feature map corresponding to the sample image.

For example, a lightweight backbone network (backbone) may be used to perform feature extraction on an input sample image, so as to obtain an output feature map corresponding to the sample image.

And 203, performing convolution processing on the output characteristic graph by using a convolution kernel of the central point prediction network in the model to be trained to obtain a central point prediction graph.

And 204, performing convolution processing on the output characteristic graph by using a plurality of convolution kernels of the corner prediction network in the model to be trained to obtain a corner prediction graph.

In the embodiment of the disclosure, the model to be trained may include two branches, a central point prediction network and a corner point prediction network, where the central point prediction network includes a convolution kernel for obtaining a central point prediction graph according to the output feature graph prediction, and the corner point prediction network includes a plurality of convolution kernels for obtaining a corner point prediction graph according to the output feature graph prediction.

Step 205, updating the network parameters of the model to be trained according to the difference between the feature point in the center point prediction graph and the center point labeling data and the difference between the feature point in the corner point prediction graph and the corner point labeling data until the loss function value of the model to be trained is less than or equal to a preset value, and obtaining the center point and corner point detection model.

The preset value may be preset, for example, the preset value is set to 0.01, 0.001, and the like.

It can be understood that the training of the model is a repeated iteration process, and the training is performed by continuously adjusting the network parameters of the model until the overall loss function value of the model is smaller than a preset value, or the overall loss function value of the model is not changed or the change amplitude is slow, and the model converges to obtain the trained model.

In the embodiment of the present disclosure, after the center point prediction map and the corner point prediction map are obtained, the network parameters of the model to be trained may be updated according to the difference between the feature point in the center point prediction map and the center point labeling data and the difference between the feature point in the corner point prediction map and the corner point labeling data until the loss function value of the model to be trained is less than or equal to the preset value, so as to obtain the trained center point and corner point detection model.

For example, in the embodiment of the present disclosure, the central point prediction network and the corner point prediction network of the model to be trained may be trained separately, and the two network branches may adopt the same loss function, for example, a Dice loss function and a Dice loss function (denoted as Dice loss function) may be both adoptedL _dice) As shown in equation (1).

Wherein, X represents a set of feature points in a prediction graph (a center point prediction graph or an angular point prediction graph), Y represents annotation data (center point annotation data or angular point annotation data), | X | represents the number of elements in X, | Y | represents the number of elements in Y, and | X |, n Y | represents the number of intersection between X and Y.

In the embodiment of the disclosure, in each iterative training process, a loss function value of a central point prediction network and a corner point prediction network in a model to be trained can be calculated according to a difference between a feature point and central point labeling data in a central point prediction graph and a difference between the feature point and the corner point labeling data in the corner point prediction graph, the calculated loss function value is compared with a preset value, if the loss function value is greater than the preset value, a network parameter of the model to be trained is updated, the central point prediction graph and the corner point prediction graph are re-acquired based on the model to be trained after the network parameter is updated, the loss function value of the model to be trained is re-calculated according to the newly acquired central point prediction graph and the newly acquired corner point prediction graph, and iteration is performed until the loss function value is less than the preset value, and a trained central point and corner point detection model is obtained.

In the embodiment of the disclosure, the center point and corner point detection model is obtained through pre-training, and conditions are provided for subsequently splitting at least one document in the image according to the center point and the corner points.

Fig. 3 is a structural diagram of a center point and corner point detection model according to an exemplary embodiment of the present disclosure, and a center point prediction graph and a corner point prediction graph of an image may be obtained by using the center point and corner point detection model shown in fig. 3. As shown in fig. 3, the center point prediction graph output by the center point and corner point detection model is 80 × 1, and the corner point prediction graph output by the center point and corner point detection model is 80 × 4, that is, the corner point prediction graph is 4 channels, which are the top left corner point prediction graph, the top right corner point prediction graph, the bottom left corner point prediction graph, and the bottom left corner point prediction graph, respectively. Fig. 4 shows an exemplary diagram of an image to be detected according to an exemplary embodiment of the present disclosure, as shown in fig. 4, the image to be detected includes two documents, the image to be detected shown in fig. 4 is processed into a size of 320 × 3 and then input into the central point and corner point detection model shown in fig. 3, a backbone network of the central point and corner point detection model performs feature extraction on the image to be detected to obtain output feature maps, and then the output feature maps are respectively convolved twice, wherein one convolution kernel is used to perform convolution on the output feature maps to obtain a central point prediction map of 80 × 1, as shown in fig. 5A; the output feature map is convolved with four convolution kernels, resulting in 80 × 4 corner prediction maps, as shown in fig. 5B. And then, post-processing the central point prediction image and the corner point prediction image output by the central point and corner point prediction model to finish the splitting of the document in the image to be detected and obtain at least one document image.

Fig. 6 is a schematic diagram illustrating a post-processing flow according to an exemplary embodiment of the disclosure, and as shown in fig. 6, for the obtained central point prediction graph, points with prediction scores greater than a first score threshold (for example, the first score threshold is 0.3 in fig. 6) may be obtained from the central point prediction graph, and NMS processing may be performed on the points to obtain coordinates of n central points, where a value of n represents the number of documents included in the document to be detected. For the obtained corner prediction map, points whose prediction scores are greater than a second score threshold (for example, the second score threshold is also 0.3 in fig. 6) are obtained from the corner prediction map, and NMS processing is performed on the points to obtain corners of each channel. When NMS processes, points with similar distances (for example, the distance between a point and a point is smaller than a preset distance threshold, and in fig. 6, both the first distance threshold and the second distance threshold are set to 5 as examples) are subjected to mean merging, and a new point is obtained as a central point or a corner point. And then, matching the center point with the corner points, selecting one corner point from the corner point prediction graph of each channel to obtain an upper left corner point, an upper right corner point, a lower left corner point and a lower right corner point, so as to obtain coordinates of 4 corner points corresponding to each document, cutting the image according to the 4 corner points corresponding to each document by adopting perspective transformation, and obtaining a document image corresponding to each document, wherein the cutting result is shown in fig. 7. By adopting the scheme disclosed by the invention, the number of the documents contained in the image can be accurately obtained through central point detection, 4 angular point coordinates of each document can be obtained through angular point detection, and then the image corresponding to each document is cut out, so that the splitting of multiple documents in the image is realized.

The disclosed exemplary embodiment also provides a multi-document detection device. Fig. 8 shows a schematic block diagram of a multiple document detecting apparatus according to an exemplary embodiment of the present disclosure, and as shown in fig. 8, the multiple document detecting apparatus 80 includes: an image acquisition module 801, a prediction map acquisition module 802, a determination module 803, a matching module 804, and a cropping module 805.

The image acquisition module 801 is configured to acquire an image to be detected, where the image to be detected includes at least one document;

a prediction image obtaining module 802, configured to input the image to be detected to a pre-trained center point and corner point detection model, so as to obtain a center point prediction image and a corner point prediction image of the image to be detected;

a determining module 803, configured to determine at least one center point from the center point prediction map, and determine a plurality of corner points from the corner point prediction map;

a matching module 804, configured to match each central point of the at least one central point with the plurality of corner points, so as to determine a target corner point associated with each central point;

and a cropping module 805, configured to crop the image to be detected by using perspective transformation based on the target corner point associated with each center point, so as to obtain a document image corresponding to each document in the at least one document.

Optionally, the determining module 803 may be further configured to:

Optionally, the corner point prediction graph includes an upper left corner point prediction graph, a lower left corner point prediction graph, an upper right corner point prediction graph, and a lower right corner point prediction graph; the determining module 803 may be further configured to:

Optionally, the matching module 804 may include:

a distance calculation unit for calculating a distance between each of the at least one center point and each of the plurality of corner points;

and the matching unit is used for determining a target corner associated with each central point from the plurality of corner points according to the distance between each central point and each corner point, wherein the target corner comprises a target upper left corner, a target lower left corner, a target upper right corner and a target lower right corner.

Optionally, the matching unit may be further configured to:

Optionally, the apparatus further comprises:

and the screening module is used for determining a target upper left corner point associated with each central point from a plurality of upper left corner points closest to the central points according to the position of the upper left corner point closest to the central points relative to the central points under the condition that the number of the upper left corner points closest to the central points is multiple.

Optionally, the apparatus further comprises: a model training module; the model training module is configured to:

acquiring a training sample set, wherein the training sample set comprises a sample image and central point marking data and corner point marking data of a document in the sample image;

inputting the sample image into a model to be trained for feature extraction, and acquiring an output feature map corresponding to the sample image;

performing convolution processing on the output characteristic graph by using a convolution kernel of a central point prediction network in the model to be trained to obtain a central point prediction graph;

performing convolution processing on the output characteristic graph by utilizing a plurality of convolution kernels of the corner point prediction network in the model to be trained to obtain a corner point prediction graph;

and updating the network parameters of the model to be trained according to the difference between the feature point in the central point prediction image and the central point marking data and the difference between the feature point in the corner point prediction image and the corner point marking data until the loss function value of the model to be trained is less than or equal to a preset value, so as to obtain the central point and corner point detection model.

The multi-document detection device provided by the embodiment of the disclosure can execute any multi-document detection method which can be applied to electronic equipment such as a server and the like and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.

An exemplary embodiment of the present disclosure also provides an electronic device including: the system comprises a processor and a memory for storing programs, wherein at least one processor is connected with the at least one processor in a communication mode. The program includes instructions that, when executed by the processor, cause the processor to perform a multiple document detection method according to an embodiment of the present disclosure.

The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing computer instructions, wherein the computer instructions, when executed by a processor of a computer, are configured to cause the computer to perform a multi-document detection method according to an embodiment of the present disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a multiple document detection method according to an embodiment of the present disclosure.

Referring to fig. 9, a block diagram of a structure of an electronic device 1100, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM1102, and the RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in electronic device 1100 connect to I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 1104 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above. For example, in some embodiments, the multiple document detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1100 via the ROM1102 and/or the communication unit 1109. In some embodiments, the computing unit 1101 may be configured to perform the multi-document detection method by any other suitable means (e.g., by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims

1. A method of multiple document detection, the method comprising:

matching each central point of the at least one central point with the plurality of angular points to determine a target angular point associated with each central point, wherein four angular points with the smallest distance from the central point are determined as the target angular points associated with the central point by calculating the distance between each central point and each angular point;

2. The method of claim 1, wherein the determining at least one center point from the center point prediction graph comprises:

3. The multi-document detection method according to claim 1, wherein the corner point prediction graph includes an upper left corner point prediction graph, a lower left corner point prediction graph, an upper right corner point prediction graph, and a lower right corner point prediction graph;

correspondingly, the determining a plurality of corner points from the corner point prediction map includes:

4. The multi-document detection method according to claim 3, wherein the target corner points include a target upper left corner point, a target lower left corner point, a target upper right corner point, and a target lower right corner point.

5. The multi-document detection method according to claim 4, wherein the matching each of the at least one center point with the plurality of corner points to determine a target corner point associated with the each center point comprises:

6. The multiple document detection method according to claim 5, wherein the method further comprises:

and under the condition that a plurality of upper left corner points are nearest to each central point, determining a target upper left corner point associated with each central point from the plurality of upper left corner points nearest to each central point according to the position of the upper left corner point nearest to each central point relative to each central point.

7. The method of any of claims 1-6, wherein the center point and corner point detection model is trained by:

8. A multiple document detection apparatus comprising:

a matching module, configured to match each center point of the at least one center point with the plurality of corner points to determine a target corner point associated with each center point, where four corner points having a smallest distance from the center point are determined as target corner points associated with the center point by calculating a distance between each center point and each corner point;

9. An electronic device, comprising:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the multiple document detection method according to any one of claims 1-7.

10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the multi-document detection method according to any one of claims 1-7.