WO2024078304A1 - 一种文档检测矫正方法及终端 - Google Patents
一种文档检测矫正方法及终端 Download PDFInfo
- Publication number
- WO2024078304A1 WO2024078304A1 PCT/CN2023/120852 CN2023120852W WO2024078304A1 WO 2024078304 A1 WO2024078304 A1 WO 2024078304A1 CN 2023120852 W CN2023120852 W CN 2023120852W WO 2024078304 A1 WO2024078304 A1 WO 2024078304A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- page
- image
- correction
- user interface
- Prior art date
Links
- 238000012937 correction Methods 0.000 title claims abstract description 188
- 238000000034 method Methods 0.000 title claims abstract description 102
- 238000001514 detection method Methods 0.000 title claims abstract description 51
- 230000011218 segmentation Effects 0.000 claims abstract description 94
- 230000004044 response Effects 0.000 claims abstract description 46
- 238000013145 classification model Methods 0.000 claims abstract description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 83
- 238000004590 computer program Methods 0.000 claims description 19
- 238000012790 confirmation Methods 0.000 description 53
- 230000006870 function Effects 0.000 description 30
- 230000008569 process Effects 0.000 description 29
- 230000000694 effects Effects 0.000 description 24
- 238000010586 diagram Methods 0.000 description 23
- 238000004891 communication Methods 0.000 description 17
- 238000013461 design Methods 0.000 description 17
- 230000009466 transformation Effects 0.000 description 16
- 238000012549 training Methods 0.000 description 12
- 238000007726 management method Methods 0.000 description 11
- 230000008447 perception Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000010295 mobile communication Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 238000013500 data storage Methods 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 238000003702 image correction Methods 0.000 description 2
- 238000011423 initialization method Methods 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/43—Editing text-bitmaps, e.g. alignment, spacing; Semantic analysis of bitmaps of text without OCR
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/1607—Correcting image deformation, e.g. trapezoidal deformation caused by perspective
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Definitions
- the present application relates to the field of computer vision technology, and in particular to a document detection and correction method and terminal.
- a semi-automatic single-page and double-page correction solution based on a traditional algorithm is constructed, and then single and double pages are distinguished by judging whether the width of the image is greater than the height and whether a sufficiently long center seam straight line can be found in the middle part of the image.
- This method cannot distinguish single and double pages when there is no obvious center line in the document. Therefore, in the prior art, for document types such as bent invoices, open books, test papers, exercise books, etc., only one page can be detected, which causes the loss of the other page document information, which has a serious impact on the user experience.
- the embodiment of the present application provides a document detection and correction method and terminal, which are used to detect and correct various document types such as single-page documents or multi-page documents.
- an embodiment of the present application provides a document detection and correction method, which can be executed by a terminal, and the method includes: in response to a first operation on a first application, displaying a scan preview interface, the scan preview interface is used to display a previewed first image, the first image including a first document; receiving a second operation; in response to the second operation, acquiring a second image, the second image including the first document; then, inputting the second image into a trained classification model and a segmentation model respectively, to obtain a classification result of the first document output by the classification model and an edge segmentation map of the first document output by the segmentation model; according to the classification result and the edge segmentation map of the first document, correcting the first document to obtain a first target document image; displaying a first user interface, the first user interface is used to display the first target document image.
- the document type of the first document contained in the second image can be accurately detected, such as a single-page document or a multi-page document.
- edge detection can be performed on the first document to obtain an edge segmentation map. Then, combining the classification results and the edge segmentation map, document images of various different document types can be corrected.
- the method further includes: in response to the third operation, displaying a second user interface, the second user interface displaying a second image and corner points of the first document, the corner points of the first document being used to correct the first document; receiving a fourth operation on at least one of the corner points of the first document; and in response to the fourth operation, adjusting the position of at least one corner point.
- the position of the corner point of the first document can be adjusted through user operation.
- the first document after adjusting the position of at least one corner point in response to the fourth operation, the first document may be corrected to obtain a second target document image according to the classification result of the first document and the coordinates of the corner points after the position is adjusted in response to the fifth operation; and a third user interface is displayed, the third user interface being used to display the second target document image.
- the first document is corrected based on the classification result of the first document and the coordinates of the corner points after the position is adjusted, and the accuracy of the correction can be improved.
- the method further includes: displaying a fourth user interface, the fourth user interface being used to display the second image and the corner points of the first document.
- the fourth user interface being used to display the second image and the corner points of the first document.
- the method further includes: receiving a sixth operation on at least one of the corner points of the first document; adjusting the position of the at least one corner point in response to the sixth operation; in response to a seventh operation, correcting the first document to obtain a third target document image according to the classification result of the first document and the coordinates of the corner points after the position adjustment; and displaying a fifth user interface for displaying the third target document image.
- the first document is corrected based on the classification result of the first document and the coordinates of the corner points after the position adjustment, which can improve the accuracy of the correction.
- the classification result of the first document includes at least one document type and the probability of each document type; the at least one document type includes at least one of the following types:
- the type of the first document can be determined according to the classification result, which is helpful for the subsequent selection of an appropriate correction algorithm for correction and obtaining a better correction effect.
- the first document is corrected according to the classification result and edge segmentation map of the first document to obtain a first target document image, including: determining the type of the first document according to the classification result and edge segmentation map of the first document; determining a document correction algorithm corresponding to the first document according to the type of the first document; and inputting the second image into the document correction algorithm corresponding to the first document to obtain the first target document image.
- the second image is input into the document correction algorithm corresponding to the first document to obtain the first target document image, which can be achieved by any of the following methods 1 to 6:
- Method 1 If the type of the first document is a single-page document type with a flat page, the second image is input into a general document correction algorithm for correction to obtain a first target document image; for example, when capturing an image of a single-page document with a flat page, the shooting angle is not perpendicular to the page where the first document is located, resulting in the first document contained in the captured second image having a flat page but the content in the page is tilted. Method 1 can be used to correct the first document so that the viewing effect of the first target document image obtained after correction is equivalent to the effect obtained by shooting at a shooting angle perpendicular to the plane where the single-page document with a flat edge is located, thereby improving the user's viewing experience of the target document image.
- Method 2 If the type of the first document is a single-page document type with curved pages, the second image is input into the first document correction algorithm for correction to obtain a single-page document image, and the single-page document image is input into the de-distortion correction algorithm for correction to obtain a first target document image; for example, in the captured second image, the first document is a single-page document, the edges of the first document are curved, and the text is not in a straight line.
- the document edges in the first target document image obtained after correction can be straight lines, and the text is in a straight line, which can improve the user's perception of the target document image.
- Method three if the type of the first document is a double-page document type with both the left and right pages being flat, the left page of the first document in the second image is input into a common document correction algorithm for correction to obtain a first page image, the right page of the first document is input into a common document correction algorithm for correction to obtain a second page image, and the first page image and the second page image are merged to obtain a first target document image; for example, when capturing an image of a double-page document with flat pages, the shooting angle of view is not perpendicular to the page where the first document is located, resulting in the first document contained in the captured second image having a flat page but tilted content on the page.
- Method three can be used to correct the first document so that the viewing effect of the first target document image obtained after correction is equivalent to the effect obtained by shooting with the shooting angle of view perpendicular to the plane where the double-page document with flat edges is located, thereby improving the user's perception of the target document image.
- Method four if the type of the first document is a double-page document type with a flat left page and a curved right page, the left page of the first document in the second image is input into a common document correction algorithm for correction to obtain a third page image, the right page of the first document is input into a common document correction algorithm for correction to obtain a fourth page image, and the fourth page image is input into a de-distortion correction algorithm for correction to obtain a fifth page image, and the third page image and the fifth page image are merged to obtain the first target document image; through this method four, when the acquired second image contains a double-page document with a flat left page and a curved right page, the left page and the right page of the double-page document are corrected respectively, and then the corrected page image corresponding to the left page and the corrected page image corresponding to the right page are merged, which can improve the user's perception of the double-page document.
- Method five if the type of the first document is a double-page document type with a bent left page and a flat right page, the left page of the first document in the second image is input into a common document correction algorithm for correction to obtain a sixth page image, the sixth page image is input into a de-distortion correction algorithm for correction to obtain a seventh page image, the right page of the first document is input into a common document correction algorithm for correction to obtain an eighth page image, and the sixth page image and the eighth page image are merged to obtain a first target document image; through this method five, when the acquired second image contains a double-page document with a bent left page and a flat right page, the left page and the right page of the double-page document are corrected respectively, and then the corrected page image corresponding to the left page and the corrected page image corresponding to the right page are merged, which can improve the user experience. How users perceive a two-page document.
- Method 6 If the type of the first document is a double-page document type with a bent left page and a bent right page, the left page of the first document in the second image is input into a common document correction algorithm to obtain a ninth page image, the ninth page image is input into a dewarping correction algorithm for correction to obtain a tenth page image, the right page of the first document is input into a common document correction algorithm to obtain an eleventh page image, the eleventh page image is input into a dewarping correction algorithm for correction to obtain a twelfth page image, and the tenth page image and the twelfth page image are merged to obtain the first target document image.
- an embodiment of the present application further provides a device, which includes modules/units for executing the method of the first aspect and any possible design of the first aspect. These modules/units can be implemented by hardware, or by executing corresponding software implementations by hardware.
- an embodiment of the present application provides a terminal, including a processor and a memory.
- the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor, the electronic device can implement the above-mentioned first aspect and any possible design method of the first aspect.
- a computer-readable storage medium is also provided in an embodiment of the present application, and the computer-readable storage medium includes a computer program.
- the terminal executes the above-mentioned first aspect and any possible design method of the first aspect.
- an embodiment of the present application further provides a method comprising a computer program product, which, when executed on a terminal, enables the terminal to execute the above-mentioned first aspect and any possible design of the first aspect.
- FIG1 is a schematic diagram of a hardware structure of a terminal provided in an embodiment of the present application.
- FIG2 is a schematic diagram of a process of document detection provided by an embodiment of the present application.
- FIG3 is a schematic diagram of a process of training a classification module provided in an embodiment of the present application.
- FIG4 is a schematic diagram of document types provided in an embodiment of the present application.
- FIG5 is a schematic diagram of the process of a training segmentation module provided in an embodiment of the present application.
- FIG6 is a schematic diagram of corner points of a single-page document provided in an embodiment of the present application.
- FIG7 is a schematic diagram of corner points of a double-page document provided in an embodiment of the present application.
- FIG8 is a schematic diagram of corner points of a double-page document provided in an embodiment of the present application.
- FIG9 is a schematic diagram of a process of document detection provided by an embodiment of the present application.
- FIG10 is a schematic diagram of the structure of a classification module provided in an embodiment of the present application.
- FIG11 is a schematic diagram of the structure of a segmentation module provided in an embodiment of the present application.
- FIG12 is a schematic diagram of a document detection process in scenario 1 provided in an embodiment of the present application.
- FIG13 is a schematic diagram of a document detection process in scenario 1 provided in an embodiment of the present application.
- FIG14 is a schematic diagram of a document detection process in scenario 1 provided in an embodiment of the present application.
- FIG15 is a schematic diagram of a document detection process in scenario 2 provided in an embodiment of the present application.
- FIG16 is a schematic diagram of a document detection process in scenario three provided in an embodiment of the present application.
- FIG17 is a schematic diagram of a document detection process in scenario 4 provided in an embodiment of the present application.
- FIG18 is a schematic diagram of a document detection process in scenario 5 provided in an embodiment of the present application.
- FIG19 is a schematic diagram of a document detection process in scenario 6 provided in an embodiment of the present application.
- FIG. 20 is a schematic diagram of the hardware structure of the terminal provided in an embodiment of the present application.
- the at least one involved in the embodiments of the present application includes one or more; wherein, more than one means greater than or equal to two.
- the word "exemplary” is used to indicate an example, illustration or description. Any embodiment or implementation described as “exemplary” in the present application should not be interpreted as being more preferred or more advantageous than other embodiments or implementations. Specifically, the use of the word “exemplary” is intended to present concepts in a concrete way.
- a terminal can be, for example, a vehicle or an electronic device that can be located on a vehicle, or a mobile phone, a tablet computer, a notebook computer, or a wearable device with wireless communication function (such as a smart watch or smart glasses, etc.).
- the terminal includes a device capable of performing data processing functions (such as a processor, or an application processor, or an image processor, or other processors), and a device capable of displaying a user interface (such as a display screen).
- Exemplary embodiments of the terminal include but are not limited to a device equipped with Hongmeng Or devices with other operating systems.
- the above-mentioned terminal may also be other portable devices, such as a laptop computer with a touch-sensitive surface (e.g., a touch panel). It should also be understood that in some other embodiments of the present application, the above-mentioned terminal may not be a portable device, but a desktop computer with a touch-sensitive surface (e.g., a touch panel).
- the structure of the terminal is further described below in conjunction with FIG. 1 .
- the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
- SIM subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
- the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc.
- different processing units can be independent devices or integrated in one or more processors.
- the solution provided by the embodiment of the present application can be controlled by the processor 110 or call other components to complete.
- the controller can be the nerve center and command center of the terminal 100.
- the controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.
- the terminal 100 can realize the display function through the GPU, the display screen 194, and the application processor.
- the display screen 194 is used to display images, videos, etc.
- the main interface of the terminal screen, or the lock screen interface, or the negative one screen interface, or the user interface of the communication application provided by the system, or the user interface of other third-party applications, etc. are displayed on the display screen 194, and the service cards described in the embodiments of the present application are displayed on these display interfaces.
- the GPU is a microprocessor for image processing, connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information. For example, the GPU performs graphics rendering based on card information, data, etc., and generates cards to be displayed.
- the display screen 194 includes a display panel.
- the display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, a quantum dot light-emitting diode (QLED), etc.
- the terminal 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
- the wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
- Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of the antennas.
- antenna 1 can be reused as a diversity antenna for a wireless local area network.
- the antenna may be used in conjunction with a tuning switch.
- the mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G applied on the terminal 100.
- the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc.
- the mobile communication module 150 can receive electromagnetic waves from the antenna 1, and filter, amplify, and process the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.
- at least some of the functional modules of the mobile communication module 150 can be set in the processor 110.
- at least some of the functional modules of the mobile communication module 150 can be set in the same device as at least some of the modules of the processor 110.
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
- the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
- the modem processor may be an independent device.
- the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied on the terminal 100.
- WLAN wireless local area networks
- BT wireless fidelity
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared
- the wireless communication module 160 can be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the frequency of the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110.
- the wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency of the signal, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2.
- the charging management module 140 is used to receive charging input from the charger.
- the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
- the power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
- the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, battery health status (leakage, impedance), etc.
- the power management module 141 can also be set in the processor 110.
- the power management module 141 and the charging management module 140 can also be set in the same device.
- the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos can be stored in the external memory card.
- the internal memory 121 can be used to store computer executable program codes, which include instructions.
- the processor 110 executes various functional applications and data processing of the terminal 100 by running the instructions stored in the internal memory 121.
- the internal memory 121 may include a program storage area and a data storage area.
- the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
- the data storage area can store data created during the use of the terminal 100 (such as audio data, a phone book, etc.), etc.
- the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
- UFS universal flash storage
- the touch sensor 180K is also called a "touch panel”.
- the touch sensor 180K can be arranged on the display screen 194.
- the touch sensor 180K and the display screen 194 form a touch screen, also called a "touch screen” or “touch screen”.
- the touch sensor 180K is used to detect touch operations acting on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- Visual output related to the touch operation can be provided through the display screen 194.
- the touch sensor 180K can also be arranged on the surface of the terminal 100, which is different from the position of the display screen 194.
- the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 100.
- the terminal 100 may include more or fewer components than shown in the figure, or combine certain components, or split certain components, or arrange the components differently.
- the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
- the document detection method provided in the present application can be applicable to the detection and correction of single-page documents or multi-page documents.
- multi-page documents are introduced by taking double-page documents as an example.
- FIG2 a schematic diagram of the document detection and correction process of the present application is provided.
- an original image is obtained, the original image includes a first document, and a document type detection is performed on the original image to obtain a classification of the first document.
- the classification result and the edge segmentation map of the first document are used to correct the first document to obtain a corrected target document image.
- the original image can be input into the classification module in the document image detection system to detect the document type, and obtain the classification result of the first document; on the other hand, the original image can be input into the segmentation module in the document image detection system to obtain the edge segmentation map of the first document. Then, the classification result and the edge segmentation map are input into the post-processing module in the document image correction system.
- the post-processing module determines the corner point coordinates of the first document according to the classification result and the edge segmentation map.
- the corner point coordinates of the first document can be understood as the coordinates of each corner point of the first document. For the first document with a flat edge, the corner point is the vertex of the first document.
- the corner point is the intersection of the tangent of the curved edge and the adjacent edge. It will not be repeated below.
- the original image and the corner point coordinates of the first document are input into the correction module in the document image correction system. If the pages of the first document in the original image are all flat pages, the post-processing module can input the corner point coordinates of the original image and the first document into the correction module to obtain the corrected target document image. If the pages of the first document in the original image include curved pages, the post-processing module can input the curved pages in the original image and the corner point coordinates of the first document into the correction module. After the correction module undergoes perspective transformation, the coordinates are further input into the de-distortion module to obtain the corrected target document image after de-distortion processing.
- Step 301 Acquire at least one first sample image.
- the user can obtain a batch of original images in advance.
- the original images can be images obtained by taking documents by a terminal camera or other camera device.
- the user can mark the probability of at least one document type on each original image in the batch of original images.
- the original image marked with the probability of at least one document type by the user can be called a first sample image.
- At least one document type may be predefined.
- the six document types may include a single-page document with a flat page (e.g., referred to as single-flat), a single-page document with a curved page (e.g., referred to as single-curved), a double-page document with a flat left page and a curved right page (e.g., referred to as left-flat-right-curved), a double-page document with both the left and right pages being flat (e.g., referred to as left-flat-right-flat), a double-page document with a curved left page and a flat right page (e.g., referred to as left-curved-right-flat), and a double-page document with both the left and right pages being curved (e.g., referred to as left-curved-right-curved).
- a single-page document with a flat page e.g., referred to as single-flat
- a single-page document with a curved page e.
- document types may be represented by words, such as single-flat, single-curved, left-flat-right-curved, left-flat-right-flat, left-curved-right-flat, and left-curved-right-curved. They may also be replaced by different symbols or numbers, such as 0, 1, 2, 3, 4, and 5, or by different letters.
- the present application does not impose any limitation on this.
- more or fewer document types may be predefined. For example, when a document with more pages needs to be detected, more document types may be predefined. This application is not limited to six document types.
- the probability of one document type can be marked in the first sample image.
- the first sample image 1 is a single flat type
- it can be marked as single flat: 1.0, that is, it can be understood that the probability of the first sample image 1 belonging to the single flat type is 1.0, and the probabilities of the first sample image 1 belonging to other document types are all 0.
- the probabilities of six document types can also be marked in the first sample image, that is, marked as single flat: 1.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.0, and left curved and right curved: 0.0.
- Step 302 input at least one first sample image into a classification model to be trained to obtain a prediction result of the document type of each first sample image.
- the classification model to be trained may include a feature extractor and a classifier. After at least one first sample image is input into the feature extractor, classification features can be obtained, and then the classification features are input into the classifier to obtain a prediction result of the document type. For a first sample image, the prediction result of the first sample image includes the probabilities of six document types.
- Step 303 input the predicted result and the true result of each first sample image in at least one first sample image into a cross entropy loss function calculation model to obtain a loss value between the predicted result and the true result of each first sample image.
- the probability of its true result being the marked document type, taking the first sample image whose document type is single flat as an example, if the probability of the first sample image marking a document type (such as single flat) is 1.0, then the probability of single flat included in the true result is 1.0, and the probabilities of the other five document types except single flat can be counted as 0.0 when the loss value between the predicted result and the true result is calculated later. If the first sample image marks six document types, the probability of single flat is 1.0, and the probabilities of the other five document types are 0.0, then the true result includes 6 probabilities, that is, the probability of single flat is 1.0, and the probabilities of the other five document types are 0.0.
- the loss value between the prediction result and the true result of each first sample image can be determined based on the loss value between the probability of each document type in the prediction result of the first sample image and the probability of the corresponding document type in the true result, and five document types correspond to five loss values, for example, the average of five loss values can be used as the loss value between the predicted result and the true result of each first sample image, which can also be called the loss value of the first sample image.
- Step 304 using the back propagation algorithm to update the weights of the classification model until the classification model is trained, and the training is completed.
- the principle of the back propagation algorithm is to first determine whether the loss value of at least one first sample image is less than a preset expected value. If not, a trained classification model is obtained and the training is terminated. If so, the loss value of one or more first sample images is reversely input into the classification model being trained, and the weight of the classification model is updated to obtain an updated classification model. Then, the above steps 302 to 304 are repeated, and the loss value is continuously returned, the weight of the classification model is continuously updated, and the loss value of the first sample image is continuously obtained through the classification model after the weight is updated, until the loss value of at least one first sample image is less than the preset expected value, and the training is terminated.
- the structure of the feature extractor and classifier in the classification model, the weight initialization method and the loss function can be set as needed without too many restrictions.
- the classification model trained in the above step 304 is the classification module used for subsequent document detection.
- Step 501 Acquire at least one second sample image.
- the user can obtain a batch of original images in advance.
- the original images can be images taken by a terminal camera or other camera devices.
- the user can mark at least one corner point on each original image in the batch of original images.
- at least one corner point can be connected to obtain document edge lines marked.
- the outer edge tangent of the curved edge is marked as the edge line of the curved edge.
- the outer edge tangent forms an intersection with the extension line of the adjacent edge.
- the document edge line can be obtained by connecting the intersection.
- the original image marked with the document edge line can be called the second sample image.
- the user may also mark at least one corner point on each original image, and then refer to the original image marked with at least one corner point as the second sample image.
- the segmentation model automatically connects the corner points to obtain the second sample image marked with the document edge line, and then continues the subsequent training process.
- the number of corner points that can be marked is also different.
- single-page documents such as single flat and single curved can be marked with 4 corner points
- the single-page document shown in Figure 6 is marked with 4 corner points.
- Double-page documents such as left flat and right curved, left flat and right flat, left curved and right flat, left curved and right curved, etc. can be marked with 8 corner points
- the double-page document shown in Figure 7 is marked with 8 corner points.
- Step 502 input at least one second sample image into the segmentation model to be trained to obtain a prediction result of the document edge line in each second sample image.
- the segmentation model to be trained may include a U-shaped encoder and decoder. After at least one second sample image is input into the encoder, a segmentation feature can be obtained, and then the segmentation feature is input into the decoder to obtain a prediction result of the document edge line, which may be a predicted edge segmentation map including the predicted document edge line.
- Step 503 Input the predicted result and the true result of each second sample image in at least one second sample image into a cross entropy loss function calculation model to obtain a loss value between the predicted result and the true result of each second sample image.
- the loss value between the predicted result and the true result of each second sample image can be determined based on the loss value between the grayscale value of each position in the predicted edge segmentation map corresponding to the second sample image and the grayscale value of the corresponding position in the labeled edge segmentation map.
- the resolution of the predicted edge segmentation map and the labeled edge segmentation map are both 100*100, so there will be a loss value corresponding to each pixel in 10,000 pixels.
- the average value of these 10,000 loss values can be used as the loss value between the predicted result and the true result of the second sample image, which can also be called the loss value of the second sample image.
- Step 504 using the back propagation algorithm to update the weights of the segmentation model until the segmentation model is trained and the training is completed.
- the principle of the back-propagation algorithm is to first determine whether the loss value of at least one second sample image is less than a preset expected value. If not, a trained segmentation model is obtained and the training is terminated. If so, the loss value of one or more second sample images is reversely input into the segmentation model being trained, and the weight of the segmentation model is updated to obtain an updated segmentation model. Then, the above steps 502 to 504 are repeated, and the loss value is continuously returned, the weight of the segmentation model is continuously updated, and the loss value of the second sample image is continuously obtained through the segmentation model after the weight is updated, until the loss value of at least one second sample image is less than the preset expected value, and the training is terminated.
- the structures of the encoder and decoder in the segmentation model, the weight initialization method and the loss function can be set as needed without too many restrictions.
- the segmentation model trained in the above step 504 is the segmentation module used for subsequent document detection.
- FIG9 a schematic diagram of the document detection and correction process of the present application is provided.
- Step 901 Acquire an original image, where the original image includes a first document.
- the original image may be an image captured by a terminal camera or other camera device, or may be an image obtained by other means, and the embodiment of the present application is not limited thereto.
- Step 902 input the original image into a trained classification module, and the classification module can output a classification result of the first document.
- the original image including the double-page document is first input into the classification module, and the classification module is downsampled several times to obtain classification features, and then the classification features are passed through the softmax function to calculate the classification result of the double-page document, that is, the probability of the double-page document being the six pre-defined document types.
- the probabilities of the six document types corresponding to the double-page document calculated by the softmax function are single flat: 0.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 1.0, left curved and right curved: 0.0, and the probabilities of the six document types in the classification results can determine that the first document is a document type of left curved and right flat.
- the classification result can also be the maximum value of the probabilities of the six document types, for example, left curved and right flat: 1.0. It should be understood that the probabilities of the six document types output by the classification module are the probabilities that the original image belongs to each document type.
- the probability of one document type is the highest.
- the probability of the left-curved and right-flat type is the highest, indicating that the first document is most likely to belong to the left-curved and right-flat document type.
- the probability value corresponding to the document type with the highest probability is 1.0; in some other embodiments, the probability value of the document type with the highest probability (left-curved and right-flat) may also be less than 1.0.
- the classification results are single-flat: 0.0, single-curved: 0.0, left-flat and right-curved: 0.0, left-flat and right-flat: 0.2, left-curved and right-flat: 0.7, left-curved and right-curved: 0.1.
- Step 903 input the original image into the trained segmentation module, and the segmentation module can output an edge segmentation map of the first document.
- the original image including the double-page document is first input into the segmentation module, and the original image is downsampled several times by the encoder in the segmentation module to obtain segmentation features, and then the segmentation features are upsampled several times by the decoder to obtain an edge segmentation map with the same resolution as the original image, and the edge segmentation map includes the document edge line.
- step 902 may be performed first, or step 903 may be performed first. This application does not impose any limitation on this.
- Step 904 Determine a correction algorithm corresponding to the document type of the first document according to the classification result of the first document and the edge segmentation map of the first document.
- the number of vertical lines in the document edge line can be determined based on the edge segmentation map of the first document. If the number of vertical lines is three, it indicates that it is a double page. If the number of vertical lines is two, it indicates that it is a single page. Double verification can be performed with the classification result. For example, if the segmentation result and the classification result both indicate the same number of pages, the document type can be directly determined. If the segmentation result is different from the classification result, the classification result can be used to accurately determine the document type, thereby improving the accuracy of classification of single-page and double-page documents.
- different document types may correspond to different correction algorithms.
- a document of single flat type or a document of left flat and right flat type corresponds to a common document correction algorithm.
- a document of single curved type, a document of left curved and right flat type, a document of left flat and right curved type, or a document of left curved and right curved type corresponds to a common document correction algorithm and a dewarping correction algorithm.
- the common document correction algorithm involved in the present application is, for example, a perspective transformation algorithm
- the dewarping correction algorithm is, for example, a dewarping algorithm.
- the specific algorithm names are not limited here.
- Step 905 Correct the first document in the original image according to the correction algorithm corresponding to the document type of the first document to obtain a target document image.
- the coordinates of the corner points of the first document may be determined according to the classification result of the first document and the edge segmentation map of the first document before step 904.
- the number of corner points is determined according to the classification result of the first document in the original image, and if it is a single-page document, the number of corner points is 4, and if it is a double-page document, the number of corner points is 8; then, the coordinates of each corner point of the first document may be determined in combination with the determined number of corner points and the edge segmentation map output by the segmentation module.
- step 905 based on the coordinates of each corner point of the original image and the first document and the text of the first document, the correction algorithm corresponding to the document type is used to correct the first document in the original image to obtain the target document image.
- the process of correcting the single-flat document in the original image includes: inputting the coordinates of the single-flat document and the four corner points of the single-flat document into the correction module for correction, and obtaining a target document image with a rectangular shape.
- the correction module involved in the embodiment of the present application can be implemented by a common document correction algorithm, which will not be described in detail below.
- the first document in the original image is a single-type document.
- the implementation process of correcting the single-type document in the original image includes: inputting the coordinates of the single-type document and the four corner points of the single-type document into the correction module for correction to obtain a single-page document image with a regular shape.
- the correction module can be implemented by a common document correction algorithm.
- the single-page document image with a regular shape is input into the dewarping module for further correction to obtain a flat and regularly shaped target document image.
- the dewarping module involved in the embodiment of the present application can be implemented by inputting a dewarping correction algorithm, which will not be repeated in the following text.
- the first document in the original image is a double-page document of the left-flat-right type.
- the implementation process of correcting the double-page document of the left-flat-right type in the original image includes: inputting the coordinates of the left page image and the four corner points of the left page in the double-page document of the left-flat-right type into the correction module for correction to obtain a left page image with a regular shape; and inputting the coordinates of the right page image and the four corner points of the right page into the correction module for correction to obtain a right page image with a regular shape.
- the left page image with a regular shape and the right page image with a regular shape are merged to obtain the target document image.
- the first document in the original image is a double-page document of the left-curved and right-flat type.
- the implementation process of correcting the double-page document of the left-curved and right-flat type in the original image includes: inputting the left page image of the double-page document of the left-curved and right-flat type and the coordinates of the four corner points of the left page into the correction module for correction to obtain a left page image with a regular shape, and then inputting the left page image with a regular shape into the de-distortion module for further correction to obtain a flat and regularly shaped corrected left page image.
- the first document in the original image is a double-page document of the left-flat-right-curved type.
- the implementation process of correcting the double-page document of the left-flat-right-curved type in the original image includes: inputting the coordinates of the left page image and the four corner points of the left page in the double-page document of the left-flat-right-curved type into the correction module for correction, so as to obtain a corrected left page image with a regular shape. Inputting the coordinates of the right page image and the four corner points of the right page in the double-page document of the left-flat-right-curved type into the correction module for correction, so as to obtain a right page image with a regular shape.
- the first document in the original image is a double-page document of the left-curved and right-curved type.
- the implementation process of correcting the double-page document of the left-curved and right-curved type in the original image includes: inputting the left page image of the double-page document of the left-curved and right-curved type and the coordinates of the four corner points of the left page into the correction module for correction to obtain a left page image with a regular shape; then inputting the left page image with a regular shape into the dewarping module for further correction to obtain a corrected left page image that is flat and has a regular shape.
- the user interface can display the target document image, and the user interface can also switch to an editing mode so that the user can manually perform secondary editing on both the left and right sides or the top and bottom sides, thereby improving the accuracy of the correction.
- the classification module can be used to identify the document type in the original image. Not only single-page documents but also multi-page documents can be detected, thereby improving the accuracy of the document detection results.
- the segmentation module can be used to detect the edges of single-page or multi-page documents to obtain edge segmentation maps. Combining the classification results and the edge segmentation maps, correction of various types of document images such as single-page and multi-page documents can be achieved.
- Scenario 1 The type of the first document is a single-page type scenario.
- the camera preview interface 12a includes buttons 1201, this button 1201 is used to enter the scan preview interface 12b.
- the shooting angle of the terminal's camera is tilted towards the single-page paper 1202 laid flat on the desktop, so the scan preview interface 12b displays the preview of the single-page paper 1202 with a smooth edge.
- the terminal can be used to perform document detection and correction.
- the edge line 1203 of the single-page paper 1202 with a smooth edge can also be displayed in the scan preview interface 12b.
- the scan preview interface 12b further includes a shooting button 1204.
- the user can click the shooting button 1204.
- the terminal responds to the user's click operation on the shooting button 1204, captures the original image of the single-page paper 1202 with a smooth edge, and performs document detection and correction on the original image of the single-page paper 1202 with a smooth edge.
- the terminal inputs the original image of the single-page paper 1202 with a smooth edge into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 1.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.0, left curved and right curved: 0.0.
- the document type of the single-page paper 1202 with a smooth edge is a single flat type; on the other hand, the terminal inputs the single-page paper 1202 with a smooth edge into the classification module to obtain a classification result.
- the original image of the paper 1202 is input into the segmentation module to obtain an edge segmentation map of the single-page paper 1202 with smooth edges.
- the single-page paper 1202 with smooth edges has four corner points, and the coordinates of the four corner points of the single-page paper 1202 with smooth edges are obtained.
- the original image of the single-page paper 1202 with smooth edges and the coordinates of the four corner points are input into a common correction algorithm for correction.
- the common correction algorithm is a perspective transformation algorithm to obtain a corrected single-page paper 1205.
- the terminal displays a user interface 12c, and the user interface 12c displays the corrected single-page paper 1205.
- the edge shape of the corrected single-page paper 1205 is a rectangle, and the text and pictures in the single-page paper 1205 are no longer tilted.
- the viewing effect of the corrected single-page paper 1205 is equivalent to the effect obtained by shooting with the shooting angle perpendicular to the plane where the single-page paper with smooth edges is located, which can improve the user's perception of the content in the single-page paper with smooth edges.
- the user interface 12c may further include a confirmation button 1206.
- the terminal displays a user interface 12d.
- the user interface 12d includes the corrected single-page paper 1205 and a dialog box 1208.
- the dialog box 1208 includes a save button.
- the terminal responds to the click operation on the save button and stores the corrected single-page paper 1205 in the album.
- the terminal may also display a user interface 12e, which displays the single-page paper 1205 in the album.
- the dialog box 1208 may also include function buttons such as "Export to PDF” and "Cancel”.
- the corrected single sheet 1205 is directly stored without switching the user interface, that is, the user interface 12c is still displayed.
- the user interface 12c displayed by the terminal may also include other function buttons, a crop button, an effect button, and a rotation button.
- the crop button is used to enter the editing mode, and the document image that needs to be corrected in the original image can be cropped by adjusting the corner point position;
- the effect button is used to select a variety of document display effects, and the rotation button is used to rotate the displayed document direction.
- the user can click the crop button 1207 in the user interface 12c (the button can also be named other names as long as the corresponding function can be achieved).
- the terminal displays the user interface 13a.
- the user interface 13a displays the original image of the single sheet of paper 1202, the four corner points of the single sheet of paper 1202 (respectively located at position A, position B, position C, and position D), and the edge line 1203 connecting the four corner points.
- the user can move any one or more corner points on the user interface 13a to change the size of the edge line.
- the user can long press the upper left corner point of user interface 13b and move the upper left corner point from position A to position A′.
- the user can also move the positions of several other corner points, for example, in user interface 13b, move the upper right corner point from position B to position B′, move the lower left corner point from position D to position D′, and move the lower right corner point from position C to position C′, to obtain the edge line 1212 of the single sheet of paper 1202.
- the user interface 13a and the user interface 13b also include a confirmation button 1209.
- the user can click the confirmation button 1209.
- the terminal can respond to the click operation on the confirmation button 1209 and display the user interface 13c.
- the user interface 13c includes the corrected single-sheet paper 1213.
- the user interface 13c also includes a confirmation button 1214.
- the terminal displays the user interface 13d.
- the user interface 13d includes a dialog box 1215.
- the dialog box 1215 includes a save button. The user clicks the save button. The terminal responds to the click operation on the save button and stores the corrected single-sheet paper 1213 in the photo album.
- the specific implementation of the dialog box 1215 can refer to the relevant description of the dialog box 1208 in Figure 12, which will not be repeated here.
- the user interface 13a and the user interface 13b further include a cancel button 1210 and a restore button 1211.
- the terminal displays the user interface 12c in response to the click operation on the cancel button 1210;
- the terminal restores each corner to the original state in response to the click operation on the restore button 1211. That is to say, the adjustments made by the user to the corner points are not saved, and the corner points return to their original positions.
- the terminal may also display a user interface 13e as shown in FIG. 13 , which displays the corrected single-page paper 1213 in the album.
- the user interface 13c is switched to the user interface 13e, and the user interface 12e includes the corrected single-page paper 1213 in the album.
- the corrected single-page paper 1213 is directly stored without switching the user interface, that is, the user interface 13c is still displayed, and the user interface 13d or the user interface 13e is not displayed.
- the shooting angle of the terminal camera is tilted toward the single-page paper 1401 laid flat on the desktop, so the scanning preview interface 14a displays the preview of the single-page paper 1401 with a smooth edge. Since the shooting angle of the camera is not perpendicular to the plane where the single-page paper 1401 is located, the text and pictures in the single-page paper 1401 displayed in the scanning preview interface 14a seen by the user are tilted, and the edge shape of the single-page paper 1401 is not a rectangle. For this reason, the terminal can be used to perform document detection and correction.
- the edge line 1402 of the single-page paper 1401 with a smooth edge can also be displayed in the scanning preview interface 14a.
- the scan preview interface 14a further includes a shooting button 1403.
- the user can click the shooting button 1403.
- the terminal responds to the user's click operation on the shooting button 1403, shoots the original image of the single-page paper 1401, and performs document detection on the original image of the single-page paper 1401.
- the original image of the single-page paper 1401 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 1.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.0, left curved and right curved: 0.0.
- the document type of the single-page paper 1401 is a single-flat type; on the other hand, the original image of the single-page paper 1401 is input into the segmentation module to obtain an edge segmentation map of the single-page paper 1401, and then combined with the classification result and the edge segmentation map, it can be determined that the single-page paper 1401 is four corner points, the coordinates of the four corner points of the single-page paper 1401 are obtained, and then the user interface 14b is displayed.
- the user interface 14b includes the original image of the single-page paper 1401, the four corner points of the single-page paper 1401 (such as position A, position B, position C, position D) and the edge line 1404 of the single-page paper 1401.
- the user interface 14b is in an editable mode, and the user can move any one or more corner points on the user interface 14b to change the size of the edge line. As shown in the user interface 14c, the user moves the corner point of the upper left corner of the single-sheet paper 1401 from position A to position B, and the edge line 1405 shown in the user interface 14d can be obtained. If the user moves the positions of multiple corner points, the size of the edge line of the single-sheet paper 1401 can be made larger. For specific implementation, please refer to the relevant descriptions of the user interface 14a and the user interface 14b in Figure 13, which will not be repeated here.
- the user interfaces 14b to 14d also include a confirmation button 1406. After moving the corner point, the user can click the confirmation button 1406.
- the terminal corrects the original image of the single-page paper 1401.
- the original image of the single-page paper 1401 and the coordinates of the four corner points are input into a common correction algorithm for correction.
- the common correction algorithm is a perspective transformation algorithm, and a corrected single-page paper 1408 is obtained, and the user interface 14e is displayed.
- the user interface 14e includes the corrected single-page paper 1408.
- the edge shape of the corrected single-page paper 1408 is a rectangle, and the text and pictures in the single-page paper 1408 are no longer tilted.
- the viewing effect of the corrected single-page paper 1408 is equivalent to the effect obtained by shooting with the shooting angle perpendicular to the plane where the single-page paper is located. This can enhance the user's perception of the single-page paper content in the image.
- the user interface 14e also includes a confirmation button 1407.
- the terminal displays the user interface 14f.
- the user interface 14f includes a dialog box 1409.
- the dialog box 1409 includes a save button.
- the dialog box 1409 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected single-page paper 1408 in the album.
- the terminal may also display a user interface 12e as shown in FIG. 12, which displays the corrected single-page paper in the album.
- the user interface 14e is switched to the user interface 12e as shown in FIG.
- the user interface 12e includes the corrected single-page paper 1408 in the album.
- the corrected single-page paper 1408 is directly stored without switching the user interface, that is, the user interface 14e is still displayed.
- the above-mentioned camera application can also implement the document correction solutions of the following scenarios 2 to 6.
- the document correction solutions of scenarios 2 to 6 are introduced by taking the scanning application as an example.
- Scenario 2 The document type is a single song.
- the shooting angle of the terminal camera is tilted toward the single-page paper 1501 with curved edges.
- the scanning preview interface 15a displays the previewed single-page paper 1501 with curved edges.
- Each line of text in the single-page paper 1501 displayed in the scanning preview interface 15a by the user is not in a straight line.
- the edge line of the single-page paper 1501 is curved, and the edge shape of the single-page paper 1501 is not a rectangle. Therefore, the terminal can be used to perform document detection and correction.
- the edge line 1502 of the single-page paper 1501 with curved edges can also be displayed in the scan preview interface 15a.
- the scan preview interface 15a further includes a shooting button 1503.
- the user can click the shooting button 1503.
- the terminal responds to the user's click operation on the shooting button 1503, shoots the original image of the single-page paper 1501, and performs document detection on the original image of the single-page paper 1501.
- the original image of the single-page paper 1501 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 0.0, single curved: 1.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.0, left curved and right curved: 0.0.
- the document type of the single-page paper 1501 is a single-curve type; on the other hand, the original image of the single-page paper 1501 is input into the segmentation module to obtain an edge segmentation map of the single-page paper 1501, and then combined with the classification result and the edge segmentation map, it can be determined that the single-page paper 1501 is four corner points, the coordinates of the four corner points of the single-page paper 1501 are obtained, and then the user interface 15b is displayed.
- the user interface 15b includes the original image of the single-page paper 1501, the four corner points of the single-page paper 1501 (such as position A, position B, position C, position D) and the edge line 1504 of the single-page paper 1501.
- the user interface 15b is in editable mode, and the user can move any one or more corner points on the user interface 15b to change the size of the edge line.
- the user can long press the corner point of the upper left corner of the user interface 13b and move the corner point of the upper left corner from position A to position A'.
- the user can also move the positions of several other corner points, for example, in the user interface 15b, the corner point of the upper right corner is moved from position B to position B', the corner point of the lower left corner is moved from position D to position D', and the corner point of the lower right corner is moved from position C to position C', to obtain the edge line 1505 of the single-sheet paper 1501.
- the user interface 15b and the user interface 15c also include a confirmation button 1506. After moving the corner point, the user can click the confirmation button 1506.
- the terminal corrects the original image of the single-sheet paper 1501.
- the original image of the single-sheet paper 1501 and the coordinates of the four corner points are first input into a common correction algorithm for perspective transformation correction, and then the result after perspective transformation correction is input into a de-distortion correction algorithm for further correction to obtain a corrected single-sheet paper image 1508, and a user interface 15d is displayed.
- the user interface 15d includes the corrected single-sheet paper image 1508.
- the edge shape of the corrected single-sheet paper image 1508 is a rectangle, and the text in the single-sheet paper image 1508 is on a straight line.
- the edge of the picture is a straight line, and the text and the picture are no longer tilted.
- the viewing effect of the corrected single-sheet paper image 1508 is equivalent to the effect obtained by shooting with the shooting angle perpendicular to the plane where the flat single-sheet paper is located. This can enhance the user's perception of the single-sheet paper content in the image.
- the user interface 15d also includes a confirmation button 1507.
- the terminal displays the user interface 15e.
- the user interface 15e includes a dialog box 1509.
- the dialog box 1509 includes a save button.
- the dialog box 1509 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected single-sheet image 1508 in the album.
- the terminal may also display a user interface 15f, which displays the corrected single-sheet in the album.
- the user interface 15d is switched to the user interface 15f, and the user interface 15f includes the corrected single-sheet image 1508 in the album.
- the corrected single-sheet image 1508 is directly stored without switching the user interface, that is, the user interface 15d is still displayed, and the user interface 15e or the user interface 15f is not displayed.
- Scenario three The document type is a left-curved and right-flat type.
- the shooting angle of the terminal camera is tilted toward the opened book.
- the opened book includes two pages of documents, wherein the edge of the left page is curved and the edge of the right page is flat.
- the scanning preview interface 16a displays the previewed double-page book 1601 with curved left and flat right.
- the text lines of the left page displayed in the scanning preview interface 16a seen by the user are not on a straight line, and the edge line of the picture is also curved.
- the edge of the right page is flat, the picture is also tilted.
- the edge shape of the double-page book 1601 is not a rectangle. Therefore, the terminal can be used to perform document detection and correction.
- the edge line 1602 of the double-page book 1601 can also be displayed in the scanning preview interface 16a.
- the scan preview interface 16a further includes a shooting button 1603.
- the user can click the shooting button 1603.
- the terminal responds to the user's click operation on the shooting button 1603 by shooting an original image of the double-page book 1601, and performs document detection on the original image of the double-page book 1601 with left curve and right flat.
- the original image of the double-page book 1601 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 0.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.9, left curved and right curved: 0.1
- the document type of the double-page book 1601 is a left-curved and right-flat type
- the original image of the double-page book 1601 is input into the segmentation module to obtain an edge segmentation map of the double-page book 1601, and then combined with the classification result and the edge segmentation map, it can be determined that the double-page book 1601 has eight corner points, and the coordinates of the eight corner points of the double-page book 1601 are obtained, and then the user interface 16b is displayed, and the user interface 16b includes the original image of the double-page book 1601, the eight corner points of the double-page book 1601 (such as the eight corner points at positions A to H), and the double-page book 1601.
- Edge line 1604 of page book 1601 is a left-curved and right-flat type
- the user interface 16b is in editable mode.
- the user can move any one or more corner points on the user interface 16b to change the size of the edge line.
- the specific implementation method of adjusting the corner points on the user interface 16b can be found in the user interfaces 13a ⁇ 13b in Figure 13, which will not be repeated here.
- the user interface 16b also includes a confirmation button 1605, and the user can click the confirmation button 1605.
- the terminal corrects the original image of the double-page book 1601 that is curved on the left and flat on the right.
- the coordinates of the left page image of the double-page book 1601 that is curved on the left and flat on the right and the four corner points (the corner points at positions A, B, C, and D) corresponding to the left page image are first input into a common correction algorithm for perspective transformation correction, and then the result after perspective transformation correction is input into a de-distortion correction algorithm for further correction, and the corrected left page image is output.
- the coordinates of the right page image of the double-page book 1601 that is curved on the left and flat on the right and the four corner points (the corner points at positions E, F, G, and H) corresponding to the right page image are input into a common correction algorithm, and the corrected right page image is output. Then, the corrected page image on the left and the corrected page image on the right are merged to obtain a corrected double-page book image 1606 as shown in the user interface 16c.
- both pages are flat rectangles, and the text in the double-page book image 1606 is on a straight line, the edge of the picture is a straight line, the text and the picture are no longer tilted, and the viewing effect of the corrected double-page book image 1606 is equivalent to the effect obtained by shooting with the shooting angle perpendicular to the plane where the flat double-page book with curved left and flat right is located, which can enhance the user's perception of the content of the double-page book with curved left and flat right in the image.
- the user interface 16c further includes a confirmation button 1607.
- the terminal displays the user interface 16d.
- the user interface 16d includes a dialog box 1608.
- the dialog box 1608 includes a save button.
- the dialog box 1608 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected double-page book image 1606 in the album.
- the terminal may also display a user interface 16e in response to the click operation.
- the user interface 16e displays the corrected double-page book image 1606 in the album.
- the user interface 16c is switched to the user interface 16e.
- the user interface 16e includes the corrected double-page book image 1606 in the album.
- the corrected double-page book image 1606 is directly stored without switching the user interface, that is, the user interface 16c is still displayed.
- Scenario 4 The document type is a left-curve and right-curve type scenario.
- the shooting angle of the terminal camera is tilted toward the opened book.
- the opened book includes two pages of documents, wherein the edges of the left page and the right page are curved.
- the scanning preview interface 17a displays the previewed double-page book 1701 that is curved left and right. Each line of text in the left page or the right page displayed in the scanning preview interface 17a seen by the user is not on a straight line, and the edge line of the picture is also curved.
- the edge shape of the double-page book 1701 is not a rectangle. Therefore, the terminal can be used to perform document detection and correction.
- the edge line 1702 of the double-page book 1701 can also be displayed in the scanning preview interface 17a.
- the scan preview interface 17a further includes a shooting button 1703.
- the user can click the shooting button 1703.
- the terminal responds to the user's click operation on the shooting button 1703, shoots the original image of the double-page book 1701, and performs document detection on the original image of the double-page book 1701 that is curved left and right.
- the original image of the double-page book 1701 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 0.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 0.0, left curved and right flat: 0.0, left curved and right curved: 1.0.
- the document type of the double-page book 1701 is a left-curved and right-curved type; on the other hand, the original image of the double-page book 1701 is input into the segmentation module to obtain an edge segmentation map of the double-page book 1701, and then combined with the classification result and the edge segmentation map, it can be determined that the double-page book 1701 has eight corner points, the coordinates of the eight corner points of the double-page book 1701 are obtained, and then the user interface 17b is displayed, and the user interface 17b includes the original image of the double-page book 1701, the eight corner points of the double-page book 1701 (such as the eight corner points at positions A to H) and the edge line 1704 of the double-page book 1701.
- the user interface 17b is in editable mode, and the user can move any one or more corner points on the user interface 17b to change the size of the edge line.
- the specific implementation method of adjusting the corner points on the user interface 17b can be seen in the user interfaces 13a-13b in Figure 13, which will not be repeated here.
- the user interface 17b also includes a confirmation button 1705.
- the user can click the confirmation button 1705.
- the terminal corrects the original image of the double-page book 1701 that is bent left and right.
- the coordinates of the left page image of the double-page book 1701 that is bent left and right and the four corner points (the corner points at positions A, B, C and D) corresponding to the left page image are first input into a common correction algorithm to perform perspective transformation correction, and then the result after perspective transformation correction is input into the dewarping correction algorithm.
- the algorithm further corrects and outputs the corrected page image on the left.
- the right page image of the double-page book 1701 with curved left pages and flat right pages and the coordinates of the four corner points (corner points at positions E, F, G, and H) corresponding to the right page image are first input into the common correction algorithm for perspective transformation correction, and then the result after perspective transformation correction is input into the de-distortion correction algorithm for further correction, and the corrected page image on the right is output. Then, the corrected page image on the left and the corrected page image on the right are merged to obtain a corrected double-page book image 1706 as shown in the user interface 17c.
- both pages are flat rectangles, and the text in the corrected double-page book image 1706 is on a straight line, the edge of the picture is a straight line, the text and the picture are no longer tilted, and the viewing effect of the corrected double-page book image 1706 is equivalent to the effect obtained by shooting with the shooting angle perpendicular to the plane where the flat double-page book with curved left and flat right is located. This can enhance the user's perception of the content of the double-page book with curved left and flat right in the image.
- the user interface 17c also includes a confirmation button 1707.
- the terminal displays the user interface 17d.
- the user interface 17d includes a dialog box 1708.
- the dialog box 1708 includes a save button.
- the dialog box 1708 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected double-page book image 1706 in the album.
- the terminal may also display a user interface 17e as shown in FIG. 17.
- the user interface 17e displays the corrected double-page book image 1706 in the album.
- the user interface 17c is switched to the user interface 17e.
- the user interface 17e includes the corrected double-page book image 1706 in the album.
- the corrected double-page book image 1706 is directly stored without switching the user interface, that is, the user interface 17c is still displayed.
- Scenario 5 The document type is a left-flat and right-curved type.
- the shooting angle of the terminal camera is tilted toward the opened book.
- the opened book includes two pages of documents, wherein the edge of the left page is flat and the edge of the right page is curved.
- the scanning preview interface 18a displays the preview of the double-page book 1801 with a flat left and a curved right.
- the left page displayed in the scanning preview interface 18a seen by the user is tilted, and each line of text in the right page is not on a straight line.
- the edge line of the picture is also curved, and the edge shape of the double-page book 1801 is not a rectangle. Therefore, the terminal can be used to perform document detection and correction.
- the edge line 1802 of the double-page book 1801 can also be displayed in the scanning preview interface 18a.
- the scan preview interface 18a further includes a shooting button 1803.
- the user can click the shooting button 1803.
- the terminal responds to the user's click operation on the shooting button 1803, shoots and obtains the original image of the double-page book 1801, and performs document detection on the original image of the double-page book 1801 with left flat and right curved.
- the original image of the double-page book 1801 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 0.0, single curved: 0.0, left flat and right curved: 1.0, left flat and right flat: 0.0, left curved and right flat: 0.0, left curved and right curved: 0.0.
- the document type of the double-page book 1801 is the left flat and right curved type; on the other hand, The original image of the double-page book 1801 is input into the segmentation module to obtain an edge segmentation map of the double-page book 1801, and then combined with the classification result and the edge segmentation map, it can be determined that the double-page book 1801 has eight corner points, the coordinates of the eight corner points of the double-page book 1801 are obtained, and then the user interface 18b is displayed, and the user interface 18b includes the original image of the double-page book 1801, the eight corner points of the double-page book 1801 (such as the eight corner points at positions A to H), and the edge line 1804 of the double-page book 1801, wherein the corner point of the lower right corner (at position C) of the left page of the double-page book 1801 coincides with the corner point of the lower left corner (at position H) of the right page.
- the user interface 18b is in editable mode.
- the user can move any one or more corner points on the user interface 18b to change the size of the edge line.
- the specific implementation method of adjusting the corner points on the user interface 18b can be found in the user interfaces 13a ⁇ 13b in Figure 13, which will not be repeated here.
- the user interface 18b also includes a confirmation button 1805, and the user can click the confirmation button 1805.
- the terminal corrects the original image of the left-flat and right-curved double-page book 1801. For example, the coordinates of the left page image of the left-flat and right-curved double-page book 1801 and the four corner points (corner points at positions A, B, C, and D) corresponding to the left page image are input into a common correction algorithm for perspective transformation correction, and the corrected left page image is output.
- the coordinates of the right page image of the left-flat and right-curved double-page book 1801 and the four corner points (corner points at positions E, F, G, and H) corresponding to the right page image are first input into a common correction algorithm for perspective transformation correction, and then the result after perspective transformation correction is input into the de-distortion correction algorithm for further correction, and the corrected right page image is output. Then, the corrected page image on the left and the corrected page image on the right are merged to obtain a corrected double-page book image 1806 as shown in the user interface 18c.
- both pages are flat rectangles, and the text in the corrected double-page book image 1806 is on a straight line, the edge of the picture is a straight line, and the text and the picture are no longer tilted.
- the viewing effect of the corrected double-page book image 1806 is equivalent to the effect obtained by shooting at a viewing angle perpendicular to the plane where the flat double-page book with curved left and flat right is located, which can improve the user's viewing of the content of the double-page book with curved left and flat right in the image. feel.
- the user interface 18c further includes a confirmation button 1807.
- the terminal displays the user interface 18d.
- the user interface 18d includes a dialog box 1808.
- the dialog box 1808 includes a save button.
- the dialog box 1808 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected double-page book image 1806 in the album and displays the user interface 18e shown in FIG. 18.
- the user interface 18e displays the corrected double-page book image 1806 in the album.
- the user interface 18c is switched to the user interface 18e.
- the user interface 18e includes the corrected double-page book image 1806 in the album.
- the corrected double-page book image 1806 is directly stored without switching the user interface, that is, the user interface 18c is still displayed.
- Scenario six The document type is left-flat and right-flat.
- the shooting angle of the terminal camera is tilted toward the double-page paper 1901 laid flat on the desktop, so the scanning preview interface 19a displays the preview of the double-page paper 1901 with a smooth edge. Since the shooting angle of the camera is not perpendicular to the plane where the double-page paper 1901 is located, the text and pictures in the double-page paper 1901 displayed in the scanning preview interface 19a seen by the user are tilted, and the edge shape of the double-page paper 1901 is not a rectangle. For this reason, the terminal can be used to perform document detection and correction.
- the edge line 1902 of the double-page paper 1901 with a smooth edge can also be displayed in the scanning preview interface 19a.
- the scan preview interface 19a further includes a shooting button 1903.
- the user can click the shooting button 1903.
- the terminal responds to the user's click operation on the shooting button 1903, shoots the original image of the double-page paper 1901, and performs document detection on the original image of the double-page paper 1901.
- the original image of the double-page paper 1901 is input into the classification module to obtain a classification result.
- the classification result is, for example, single flat: 0.0, single curved: 0.0, left flat and right curved: 0.0, left flat and right flat: 1.0, left curved and right flat: 0.0, left curved and right curved: 0.0.
- the document type of the double-page paper 1901 is a left flat and right flat type.
- the original image of the double-page paper 1901 is input into the segmentation module to obtain an edge segmentation map of the double-page paper 1901, and then combined with the classification result and the edge segmentation map, it can be determined that the double-page paper 1901 has eight corner points, the coordinates of the eight corner points of the double-page paper 1901 are obtained, and then the user interface 19b is displayed.
- the user interface 19b includes the original image of the double-page paper 1901, the eight corner points of the double-page paper 1901 (such as position A, position B, position C, position D, position E, position F, position G, position H) and the edge line 1904 of the double-page paper 1901, wherein position B coincides with position G, and position C coincides with position H.
- the user interface 19b is in editable mode, and the user can move any one or more corner points on the user interface 19b to change the size of the edge line.
- the user interface 19b also includes a confirmation button 1905.
- the user can click the confirmation button 1905.
- the terminal corrects the original image of the double-page paper 1901.
- the coordinates of the left page of the original image of the double-page paper 1901 and the four corner points of the left page are input into a common correction algorithm for correction.
- the common correction algorithm is a perspective transformation algorithm to obtain the corrected left page; the right page of the original image of the double-page paper 1901 and the four corner points of the right page (position E, position F, position G, position D) are input into a common correction algorithm for correction.
- the coordinates of the corner point at which H is set are input into a common correction algorithm for correction to obtain a corrected right page; the corrected left page and the corrected right page are merged to obtain a corrected double-page image 1906, and a user interface 19c is displayed.
- the user interface 19c includes the corrected double-page image 1906.
- the edge shape of the corrected double-page image 1906 is a rectangle, and the text and pictures in the double-page image 1906 are no longer tilted.
- the viewing effect of the corrected double-page image 1906 is equivalent to the effect obtained by shooting at a viewing angle perpendicular to the plane where the double-page paper is located, which can enhance the user's perception of the double-page content in the image.
- the user interface 19c also includes a confirmation button 1907.
- the terminal displays the user interface 19d.
- the user interface 19d includes a dialog box 1908.
- the dialog box 1908 includes a save button.
- the dialog box 1908 may also include function buttons such as "Export to PDF" and "Cancel".
- the terminal stores the corrected double-page image 1906 in the album.
- the terminal may also display a user interface 19e, which displays the corrected double-page image 1906 in the album.
- the user interface 19c is switched to the user interface 19e, and the user interface 19e includes the corrected double-page image 1906 in the album.
- the corrected double-page image 1906 is directly stored without switching the user interface, that is, the user interface 19c is still displayed.
- the names of the above applications can be other names, and the embodiments of the present application are not limited thereto, as long as the same functions can be achieved.
- the method provided in the embodiments of the present application is introduced from the perspective of the terminal as the execution subject.
- the terminal may include a hardware structure and/or a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether a function of the above functions is executed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
- the embodiments of the present application also provide a terminal for executing the steps executed by the terminal in the above method embodiments.
- the relevant features can be found in the above method embodiments and will not be repeated here.
- the terminal includes: one or more processors 2001 and a memory 2002, wherein the memory 2002 stores program instructions, and when the program instructions are executed by the device, the method steps in the embodiments of the present application can be implemented.
- the processor 2001 can be a general processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the methods, steps and logic block diagrams disclosed in the embodiments of the present application can be implemented or executed.
- the general processor can be a microprocessor or the processor can also be any conventional processor.
- the steps of the method disclosed in the embodiments of the present application can be directly embodied as a hardware decoding processor to perform, or a combination of hardware and software modules in the decoding processor to perform.
- the software module can be located in a mature storage medium in the field such as a random access memory (RAM), a flash memory, a read-only memory (ROM), a programmable read-only memory or an electrically erasable programmable memory, a register, etc.
- the storage medium is located in a memory, and the processor reads the instructions in the memory and completes the steps of the above method in combination with its hardware.
- an embodiment of the present application also provides a chip, which is coupled to a memory in a device so that the chip calls program instructions stored in the memory during operation to implement the above method of the embodiment of the present application.
- an embodiment of the present application also provides a computer storage medium, wherein the computer-readable storage medium includes a computer program.
- the computer program runs on an electronic device, the electronic device executes the above method of the embodiment of the present application.
- an embodiment of the present application also provides a computer program product, which includes instructions.
- the instructions When the instructions are executed, the computer executes the above method of the embodiment of the present application.
- the term "when" may be interpreted to mean “if" or “after" or “in response to determining" or “in response to detecting", depending on the context.
- the phrases “upon determining" or “if (the stated condition or event) is detected” may be interpreted to mean “if determining" or “in response to determining" or “upon detecting (the stated condition or event)” or “in response to detecting (the stated condition or event)", depending on the context.
- the computer program product includes one or more computer instructions.
- the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
- the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integration.
- the available medium can be a magnetic medium, (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the technical solution of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Processing Or Creating Images (AREA)
- Facsimiles In General (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
一种文档检测矫正方法及终端,该方法包括:响应于针对第一应用的第一操作,显示扫描预览界面,扫描预览界面用于显示预览的第一图像,第一图像中包含第一文档;接收第二操作;响应于第二操作,采集获得第二图像,第二图像中包括第一文档;然后,将第二图像分别输入到训练完成的分类模型与分割模型中,得到分类模型输出的第一文档的分类结果以及分割模型输出的第一文档的边缘分割图,从而可以准确的检测出第一文档的文档类型,之后根据第一文档的分类结果以及边缘分割图,对第一文档进行矫正得到第一目标文档图像,并显示第一用户界面,第一用户界面用于显示第一目标文档图像,从而可实现对不同文档类型的文档图像进行矫正。
Description
相关申请的交叉引用
本申请要求在2022年10月12日提交中国专利局、申请号为202211246325.9、申请名称为“一种文档检测矫正方法及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机视觉技术领域,尤其涉及一种文档检测矫正方法及终端。
随着在线教育和在线办公的需求日益增加,用户经常需要对发票、书本、试卷、作业本进行扫描或拍照得到对应的文档图像,然后再对文档图像进行检测,现有的文档扫描或者检测方案中,都只能适配单页的场景,例如基于深度学习的图像边缘检测分割算法,将文档边缘视为要检测分割的对象,最终得到二值化文档边缘分割图像,这种方式无法对单页、双页类型进行区分,只能对平整类型的文档进行检测矫正;又例如,基于垂直线的条数以及用户给出单页或者双页的先验信息的方式,构造了一种基于传统算法的半自动单双页矫正方案,然后通过判断图像的宽度是否大于高度,以及在图像中间部分是否能找到足够长的中缝直线来区分单双页,这种方式在文档没有明显中线的情况下,无法区分单双页;因此,现有技术中对于弯折的发票、打开的书籍、试卷、作业本等文档类型,往往只能检测出其中一页,这样就造成了另一页文档信息的丢失,给用户体验带来严重影响。
发明内容
本申请实施例提供一种文档检测矫正方法及终端,用以实现对单页文档或多页文档等多种文档类型进行检测并矫正。
第一方面,本申请实施例提供一种文档检测矫正方法,该方法可以由终端执行,该方法包括:响应于针对第一应用的第一操作,显示扫描预览界面,扫描预览界面用于显示预览的第一图像,第一图像中包含第一文档;接收第二操作;响应于第二操作,采集获得第二图像,第二图像中包括第一文档;然后,将第二图像分别输入到训练完成的分类模型与分割模型中,得到分类模型输出的第一文档的分类结果以及分割模型输出的第一文档的边缘分割图;根据第一文档的分类结果以及边缘分割图,对第一文档进行矫正得到第一目标文档图像;显示第一用户界面,第一用户界面用于显示第一目标文档图像。
本申请实施例中,通过将采集到的第二图像输入分类模块,可以准确的检测出第二图像包含的第一文档的文档类型,例如为单页文档,又例如多页文档,通过将第二图像输入分割模块,可以对第一文档进行边缘检测得到边缘分割图,然后结合分类结果以及边缘分割图可以实现对多种不同的文档类型的文档图像进行矫正。
一种可能的设计中,该方法还包括:响应于第三操作,显示第二用户界面,第二用户界面显示有第二图像以及第一文档的角点,第一文档的角点用于矫正第一文档;接收对第一文档的角点中至少一个角点的第四操作;响应于第四操作,调整至少一个角点的位置。通过该设计,可以实现通用户操作对第一文档的角点的位置进行调整。
一种可能的设计中,在响应于第四操作,调整至少一个角点的位置之后,还可以响应于第五操作,根据第一文档的分类结果以及调整位置后的角点坐标,对第一文档进行矫正得到第二目标文档图像;显示第三用户界面,第三用户界面用于显示第二目标文档图像。通过该设计,基于对第一文档的分类结果以及调整位置后的角点坐标对第一文档进行矫正,可以提高矫正的准确性。
一种可能的设计中,在响应于第二操作,采集获得第二图像之后,该方法还包括:显示第四用户界面,第四用户界面用于显示第二图像以及第一文档的角点。通过该设计,在获得第二图像之后,可以在第四显示界面上展示获得的第二图像以及第一文档的角点,便于用户对第一文档的角点进行调整。
一种可能的设计中,该方法还包括:接收对第一文档的角点中至少一个角点的第六操作;响应于第六操作,调整至少一个角点的位置;响应于第七操作,根据第一文档的分类结果以及调整位置后的角点坐标,对第一文档进行矫正得到第三目标文档图像;显示第五用户界面,第五用户界面用于显示第三目
标文档图像。通过该设计,基于对第一文档的分类结果以及调整位置后的角点坐标对第一文档进行矫正,可以提高矫正的准确性。
一种可能的设计中,第一文档的分类结果包括至少一种文档类型以及每种文档类型的概率;至少一种文档类型包括以下至少一种类型:
(1)页面平整的单页文档类型;
(2)页面弯曲的单页文档类型;
(3)左侧页面与右侧页面均平整的双页文档类型;
(4)左侧页面平整右侧页面弯曲的双页文档类型;
(5)左侧页面弯曲右侧页面平整的双页文档类型;
(6)左侧页面弯曲右侧页面弯曲的双页文档类型。
通过该设计,根据分类结果可以确定第一文档的类型,有助于后续选择合适的矫正算法进行矫正,获的更好的矫正效果。
一种可能的设计中,根据第一文档的分类结果以及边缘分割图,对第一文档进行矫正得到第一目标文档图像,包括:根据第一文档的分类结果以及边缘分割图,确定第一文档的类型;根据第一文档的类型,确定第一文档对应的文档矫正算法;将第二图像输入至第一文档对应的文档矫正算法,得到第一目标文档图像。
一种可能的设计中,将第二图像输入至第一文档对应的文档矫正算法,得到第一目标文档图像,可以通过以下方式一至方式六中的任一种方式实现:
方式一,若第一文档的类型为页面平整的单页文档类型,则将第二图像输入至普通文档矫正算法进行矫正得到第一目标文档图像;例如在采集页面平整的单页文档的图像时,拍摄视角并不是垂直于第一文档所在页面,导致采集到的第二图像所包含的第一文档虽然页面平整但是页面中的内容是倾斜的,通过方式一可以实现对第一文档进行矫正,使得矫正后得到的第一目标文档图像的观看效果相当于拍摄视角垂直于边缘平整的单页文档所在平面而拍摄得到的效果,这样可以提升用户对目标文档图像的观感。
方式二,若第一文档的类型为页面弯曲的单页文档类型,则将第二图像输入至第一文档矫正算法进行矫正得到单页文档图像,将单页文档图像输入至去扭曲矫正算法进行矫正,得到第一目标文档图像;例如采集到的第二图像中第一文档是单页文档,第一文档的边缘弯曲、文字不在一条直线,通过方式二对第一文档进行矫正,可以使得矫正后得到的第一目标文档图像中文档边缘为直线,而且文字在一条直线,这样可以提升用户对目标文档图像的观感。
方式三,若第一文档的类型为左侧页面与右侧页面均平整的双页文档类型,则将第二图像中第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第一页面图像,将第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第二页面图像,将第一页面图像与第二页面图像合并得到第一目标文档图像;例如在采集页面平整的双页文档的图像时,拍摄视角并不是垂直于第一文档所在页面,导致采集到的第二图像所包含的第一文档虽然页面平整但是页面中的内容是倾斜的,通过方式三可以实现对第一文档进行矫正,使得矫正后得到的第一目标文档图像的观看效果相当于拍摄视角垂直于边缘平整的双页文档所在平面而拍摄得到的效果,这样可以提升用户对目标文档图像的观感。
方式四,若第一文档的类型为左侧页面平整右侧页面弯曲的双页文档类型,则将第二图像中第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第三页面图像,将第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第四页面图像,并将第四页面图像输入至去扭曲矫正算法进行矫正得到第五页面图像,将第三页面图像与第五页面图像合并得到第一目标文档图像;通过该方式四,在采集到的第二图像包含左侧页面平整右侧页面弯曲的双页文档时,对双页文档中的左侧页面和右侧页面分别进行矫正,然后在对左侧页面对应的矫正后的页面图像和右侧页面对应的矫正后的页面图像进行合并,可以提升用户对双页文档的观感。
方式五,若第一文档的类型为左侧页面弯曲右侧页面平整的双页文档类型,则将第二图像中第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第六页面图像,将第六页面图像输入至去扭曲矫正算法进行矫正得到第七页面图像,将第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第八页面图像,将第六页面图像与第八页面图像合并得到第一目标文档图像;通过该方式五,在采集到的第二图像包含左侧页面弯曲右侧页面平整的双页文档时,对双页文档中的左侧页面和右侧页面分别进行矫正,然后在对左侧页面对应的矫正后的页面图像和右侧页面对应的矫正后的页面图像进行合并,可以提升用
户对双页文档的观感。
方式六,若第一文档的类型为左侧页面弯曲右侧页面弯曲的双页文档类型,则将第二图像中第一文档的左侧页面输入至普通文档矫正算法得到第九页面图像,将第九页面图像输入至去扭曲矫正算法进行矫正得到第十页面图像,将第一文档的右侧页面输入至普通文档矫正算法得到第十一页面图像,将第十一页面图像输入至去扭曲矫正算法进行矫正得到第十二页面图像,将第十页面图像与第十二页面图像合并得到第一目标文档图像。通过该方式六,在采集到的第二图像包含左侧页面弯曲右侧页面弯曲的双页文档时,对双页文档中的左侧页面和右侧页面分别进行矫正,然后在对左侧页面对应的矫正后的页面图像和右侧页面对应的矫正后的页面图像进行合并,可以提升用户对双页文档的观感。
第二方面,本申请实施例还提供一种装置,该装置包括执行上述第一方面以及第一方面的任意一种可能的设计的方法的模块/单元。这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第三方面,本申请实施例提供一种终端,包括处理器和存储器。其中,存储器用于存储一个或多个计算机程序;当存储器存储的一个或多个计算机程序被处理器执行时,使得该电子设备能够实现上述第一方面以及第一方面的任意一种可能的设计的方法。
第四方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在终端上运行时,使得所述终端执行上述第一方面以及第一方面的任意一种可能的设计的方法。
第五方面,本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在终端上运行时,使得所述终端执行上述第一方面以及第一方面的任意一种可能的设计的方法。
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
图1为本申请实施例提供的一种终端的硬件结构示意图;
图2为本申请实施例提供的文档检测的流程示意图;
图3为本申请实施例提供的训练分类模块的过程示意图;
图4为本申请实施例提供的文档类型示意图;
图5为本申请实施例提供的训练分割模块的过程示意图;
图6为本申请实施例提供的单页文档的角点示意图;
图7为本申请实施例提供的双页文档的角点示意图;
图8为本申请实施例提供的双页文档的角点示意图;
图9为本申请实施例提供的文档检测的流程示意图;
图10为本申请实施例提供的分类模块结构示意图;
图11为本申请实施例提供的分割模块结构示意图;
图12为本申请实施例提供的场景一的文档检测过程示意图;
图13为本申请实施例提供的场景一的文档检测过程示意图;
图14为本申请实施例提供的场景一的文档检测过程示意图;
图15为本申请实施例提供的场景二的文档检测过程示意图;
图16为本申请实施例提供的场景三的文档检测过程示意图;
图17为本申请实施例提供的场景四的文档检测过程示意图;
图18为本申请实施例提供的场景五的文档检测过程示意图;
图19为本申请实施例提供的场景六的文档检测过程示意图;
图20为本申请实施例提供的终端的硬件结构示意图。
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。
首先,对本申请中的部分用语进行解释说明,以便本领域技术人员理解。
本申请中的“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以
表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请实施例涉及的至少一个,包括一个或者多个;其中,多个是指大于或者等于两个。
另外,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
另外,在本申请实施例中,“示例的”一词用于表示作例子、例证或说明。本申请中被描述为“示例”的任何实施例或实现方案不应被解释为比其它实施例或实现方案更优选或更具优势。确切而言,使用示例的一词旨在以具体方式呈现概念。
本申请实施例中的技术方案可以应用于具有终端,该终端例如可以为车辆或能够位于车辆上的电子设备,又例如可以为手机、平板电脑、笔记本计算机或具备无线通讯功能的可穿戴设备(如智能手表或智能眼镜等)等。该终端包含能够实现数据处理功能的器件(比如处理器,或,应用处理器,或,图像处理器,或,其他处理器),以及能够显示用户界面的器件(比如显示屏)。该终端的示例性实施例包括但不限于搭载鸿蒙或者其它操作系统的设备。上述终端也可以是其它便携式设备,诸如具有触敏表面(例如触控面板)的膝上型计算机(laptop)等。还应当理解的是,在本申请其他一些实施例中,上述终端也可以不是便携式设备,而是具有触敏表面(例如触控面板)的台式计算机。
下面结合图1,进一步说明终端的结构。
如图1所示,终端100可以包括处理器110、外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(gr aphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中,本申请实施例提供的方案可以由处理器110来控制或调用其他部件来完成。其中,控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
终端100可以通过GPU、显示屏194、以及应用处理器等实现显示功能。显示屏194用于显示图像、视频等。比如,在显示屏194上显示终端屏幕的主界面,或者锁屏界面,或者负一屏界面,或者系统自带的通讯应用的用户界面,或者其他第三方应用的用户界面等,以及在这些显示界面上显示本申请实施例中所述的服务卡片。
GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。比如,GPU基于卡片信息、数据等进行图形渲染,生成需要展示的卡片。
显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏194,N为大于1的正整数。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。
在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
充电管理模块140用于从充电器接收充电输入。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”或“触摸屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。
可以理解的是,本申请实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
本申请提供的文档检测方法可以适用于单页文档或者多页文档的检测以及矫正,下文中多页文档以双页文档为例进行介绍。
如图2所示,提供了一种本申请的文档检测矫正的流程示意图。
首先获得原始图像,该原始图像包括第一文档,对原始图像进行文档类型检测,得到第一文档的分
类结果以及第一文档的边缘分割图,根据分类结果以及边缘分割图对第一文档进行文档矫正,得到矫正后的目标文档图像。
示例性的,如图2所示,一方面,可以将原始图像输入至文档图像检测系统中的分类模块中进行文档类型检测,得到第一文档的分类结果;另一方面,将原始图像输入至文档图像检测系统中的分割模块中,得到第一文档的边缘分割图。然后,将分类结果以及边缘分割图输入至文档图像矫正系统中的后处理模块,后处理模块根据分类结果以及边缘分割图,确定第一文档的角点坐标,第一文档的角点坐标可以理解为第一文档的各个角点的坐标,对于边缘平整的第一文档来说,角点为第一文档的顶点,对于边缘弯曲的第一文档来说,角点为弯曲的边的切线与相邻边的交点,下文不再赘述。然后将原始图像以及第一文档的角点坐标输入至文档图像矫正系统中的矫正模块,若原始图像中第一文档的页面均为平整页面,后处理模块可以将原始图像以及第一文档的角点坐标输入至矫正模块,得到矫正后的目标文档图像。若原始图像中第一文档的页面包括弯曲页面,后处理模块可以将原始图像中的弯曲页面以及第一文档的角点坐标输入至矫正模块,在矫正模块经过透视变换后,继续输入至去扭曲模块,经去扭曲处理后得到矫正后的目标文档图像。
本申请实施例中,在对原始图像进行文档检测矫正之前,需要先训练分类模块以及分割模块。
参见图3,训练分类模块的具体过程参见如下过程:
步骤301,获取至少一个第一样本图像。
用户可以预先获取一批原始图像,原始图像可以是终端相机或者其它摄像装置拍摄文档得到的图像,用户可以在这一批原始图像中的每个原始图像上标注出至少一个文档类型的概率,被用户标注至少一个文档类型的概率的原始图像可以称为第一样本图像。
本申请实施例中,可以预先定义至少一个文档类型,示例性的,以预先定义六种文档类型为例进行说明,如图4所示,六种文档类型可以包括页面平整的单页文档(例如简称单平)类型、页面弯曲的单页文档(例如简称单曲)类型、左侧页面平整右侧页面弯曲的双页文档(例如简称左平右曲)类型、左侧页面与右侧页面均平整的双页文档(例如简称左平右平)类型、左侧页面弯曲右侧页面平整的双页文档(例如简称左曲右平)类型、左侧页面与右侧页面均弯曲的双页文档(例如简称左曲右曲)类型,这六种文档类型可以分别用文字表示,例如分别为单平、单曲、左平右曲、左平右平、左曲右平、左曲右曲,也可以分别用不同的符号或数字代替,例如分别用0、1、2、3、4、5表示,又例如用不同的字母表示,本申请对此不作限制。当然,在具体实施中,也可以预定义更多或更少的文档类型,例如需要检测具有更多页面的文档时,可以预定义更多的文档类型,本申请不限制为六种文档类型。
示例性的,第一样本图像中可以标注一种文档类型的概率,例如第一样本图像1为单平类型,可以标注为单平:1.0,即可以理解为第一样本图像1属于单平类型的概率为1.0,第一样本图像1中属于其它几种文档类型的概率均为0。第一样本图像中也可以标注六种文档类型的概率,即标注为单平:1.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.0,左曲右曲:0.0。
步骤302,将至少一个第一样本图像输入至待训练的分类模型中,得到每个第一样本图像的文档类型的预测结果。
示例性的,待训练的分类模型可以包括特征提取器以及分类器,将至少一个第一样本图像输入至特征提取器后,可以得到分类特征,然后将分类特征输入至分类器得到文档类型的预测结果,对于一个第一样本图像来说,该第一样本图像的预测结果包括六个文档类型的概率。
步骤303,将至少一个第一样本图像中每个第一样本图像的预测结果以及真实结果输入至交叉熵损失函数计算模型,得到每个第一样本图像的预测结果和真实结果之间的损失值。
对于一个第一样本图像来说,其真实结果为标注的文档类型的概率,以文档类型为单平的第一样本图像为例,如果该第一样本图像标注一个文档类型(例如单平)的概率为1.0,那么真实结果中包括的单平的概率为1.0,可以在后续计算预测结果和真实结果之间的损失值时将除了单平之外的其它五个文档类型的概率可以计为0.0。如果该第一样本图像标注六个文档类型,分别为单平的概率为1.0,其它五个文档类型的概率为0.0,那么真实结果包括6个概率,即单平的概率为1.0,其它五个文档类型的概率为0.0。
示例性的,每个第一样本图像的预测结果和真实结果之间的损失值,可以为根据第一样本图像的预测结果中每个文档类型的概率与真实结果中相应文档类型的概率之间损失值确定,五个文档类型对应五
个损失值,例如可以将五个损失值的平均值作为每个第一样本图像的预测结果和真实结果之间的损失值,也可以称为第一样本图像的损失值。
步骤304,利用反向传播算法对分类模型进行权重更新,直至训练完成的分类模型,训练结束。
该反向传播算法的原理为先判断至少一个第一样本图像的损失值是否小于预先设定的期望值,若否,得到训练完成的分类模型,训练结束;若是,则将一个或多个第一样本图像的损失值反向输入至正在训练的分类模型,并更新分类模型的权重,得到更新后的分类模型,然后继续将重复上述步骤302至步骤304,经过不断返回损失值,不断更新分类模型的权重,不断通过更新权重后的分类模型得到第一样本图像的损失值,直至至少一个第一样本图像的损失值小于预先设定的期望值,训练结束。
在实际应用中,分类模型中的特征提取器和分类器的结构、权重初始化方式和损失函数可以根据需要设定,不作过多限定。
上述步骤304中的训练完成的分类模型即为用于后续进行文档检测的分类模块。
参见图5,训练分割模块的具体过程参见如下步骤:
步骤501,获取至少一个第二样本图像。
用户可以预先获取一批原始图像,原始图像可以是终端相机或者其它摄像装置拍摄的图像,用户可以在这一批原始图像中的每个原始图像上标注出至少一个角点,然后对于边缘平整的文档图像,可以将至少一个角点进行连线得到标注有文档边缘线。如图6所示,对于边缘弯曲的文档图像,标注出弯曲边缘的外边沿切线作为弯曲边缘的边缘线,该外边沿切线与相邻边的延长线形成交点,通过交点进行连线即可得到文档边缘线。本申请实施例中,标注有文档边缘线的原始图像可以称为第二样本图像。
在其它一些实施例中,用户也可以在每个原始图像上标注出至少一个角点,然后将标注有至少一个角点的原始图像称为第二样本图像,这种情况下,在将第二样本图像输入至训练的分割模型之后,分割模型自动将各个角点连线之后得到标注有文档边缘线的第二样本图像,再继续后续训练过程。
本申请实施例中,对于不同文档类型的原始图像来说,可以标注的角点数量也不同,例如,单平、单曲等单页文档可以标注4个角点,如图6所示的单页文档标注有4个角点。左平右曲、左平右平、左曲右平、左曲右曲等双页文档可以标注8个角点,如图7所示的双页文档标注有8个角点。在有些场景中,如果双页文档中的左侧页面的一条边与右侧页面的一条边完全重叠,如图8所示,那么这两条完全重叠的边的两端角点的坐标相同,这种情况下可以标注为6个角点。
步骤502,将至少一个第二样本图像输入至待训练的分割模型中,得到每个第二样本图像中文档边缘线的预测结果。
示例性的,待训练的分割模型可以包括U型的编码器和解码器,将至少一个第二样本图像输入至编码器后,可以得到分割特征,然后将分割特征输入至解码器得到文档边缘线的预测结果,该预测结果可以为包括预测的文档边缘线的预测边缘分割图。
步骤503,将至少一个第二样本图像中每个第二样本图像的预测结果以及真实结果输入至交叉熵损失函数计算模型,得到每个第二样本图像的预测结果和真实结果之间的损失值。
对于一个第二样本图像来说,其真实结果为标注有的文档边缘线的标注边缘分割图。示例性的,每个第二样本图像的预测结果和真实结果之间的损失值,可以为根据该第二样本图像对应的预测边缘分割图中每个位置的灰度值和标注边缘分割图中对应位置的灰度值之间的损失值确定,例如预测边缘分割图与标注边缘分割图的分辨率均为100*100,这样会有10000个像素点中每个像素点对应一个损失值,可以将这10000个损失值的平均值作为该第二样本图像的预测结果和真实结果之间的损失值,也可以称为第二样本图像的损失值。
步骤504,利用反向传播算法对分割模型进行权重更新,直至训练完成的分割模型,训练结束。
该反向传播算法的原理为先判断至少一个第二样本图像的损失值是否小于预先设定的期望值,若否,得到训练完成的分割模型,训练结束;若是,则将一个或多个第二样本图像的损失值反向输入至正在训练的分割模型,并更新分割模型的权重,得到更新后的分割模型,然后继续将重复上述步骤502至步骤504,经过不断返回损失值,不断更新分割模型的权重,不断通过更新权重后的分割模型得到第二样本图像的损失值,直至至少一个第二样本图像的损失值小于预先设定的期望值,训练结束。
在实际应用中,分割模型中的编码器和解码器的结构、权重初始化方式和损失函数可以根据需要设定,不作过多限定。
上述步骤504中的训练完成的分割模型即为用于后续进行文档检测的分割模块。
在训练完文档检测系统中的分类模块以及分割模块之后,接下来对本申请的文档检测矫正过程进行详细介绍。
如图9所示,提供了一种本申请的文档检测矫正的流程示意图。
步骤901,获取原始图像,该原始图像包括第一文档。
该原始图像可以是终端相机或者其它摄像装置拍摄的图像,也可以是通过其他途径获得的图像,本申请实施例不作限制。
步骤902,将原始图像输入至训练完成的分类模块中,该分类模块可以输出第一文档的分类结果。
示例性的,以第一文档为左侧页面边缘弯曲右侧页面边缘平整的双页文档为例,如图10所示,先将包括该双页文档的原始图像输入到分类模块,在分类模块通过若干次下采样,得到分类特征,然后该分类特征经过softmax函数,可以计算出该双页文档的分类结果,即该双页文档分别为预先定义的六种文档类型的概率。如图10所示,经过softmax函数计算出的该双页文档对应的六种文档类型的概率分别为单平:0.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:1.0,左曲右曲:0.0,从分类结果中六种文档类型的概率可以确定出第一文档为左曲右平的文档类型。在其它一些示例中,分类结果也可以为六种文档类型的概率中的最大值,例如为左曲右平:1.0。应理解,分类模块输出的六种文档类型的概率为原始图像属于每一种文档类型的概率,这六种文档类型的概率中存在某一种文档类型的概率最大,例如,六种文档类型的概率中左曲右平类型的概率最大,说明第一文档属于左曲右平的文档类型的可能性最大,上述示例中,这个概率最大的文档类型(左曲右平)对应的概率值为1.0;在其它一些实施例中,这个概率最大的文档类型(左曲右平)的概率值也可以小于1.0,示例性的,例如分类结果为单平:0.0,单曲:0.0,左平右曲:0.0,左平右平:0.2,左曲右平:0.7,左曲右曲:0.1。
本申请实施例中,通过将原始图像输入分类模块,可以智能识别不同的文档类型,有助于后续选择适合原始图像的文档类型的矫正算法,从而可以获得更好的文档矫正效果。
步骤903,将原始图像输入至训练完成的分割模块中,该分割模块可以输出第一文档的边缘分割图。
示例性的,以第一文档为左侧弯曲右侧平整的双页文档为例,如图11所示,先将包括该双页文档的原始图像输入到分割模块,原始图像在分割模块中经过编码器进行若干次下采样,得到分割特征,然后该分割特征经过解码器进行若干次上采样得到与原始图像同样分辨率的边缘分割图,该边缘分割图包括文档边缘线。
上述步骤902和步骤903不分先后顺序,可以先执行步骤902,也可以先执行步骤903,本申请对此不作限制。
步骤904,根据第一文档的分类结果以及第一文档的边缘分割图,确定第一文档的文档类型对应的矫正算法。
一种可能实现步骤904的方式中,可以根据第一文档的边缘分割图判断文档边缘线中垂直方向的线条个数,若垂直方向的线条个数为三条则说明是双页,若垂直方向的线条个数为两条则说明是单页,可以与分类结果进行双验证,例如分割结果与分类结果都说明页数相同,那么可以直接确定文档类型,如果分割结果与分类结果不相同,那么可以分类结果为准确定文档类型,从而提升对单双页文档分类的准确性。
本申请实施例中,不同的文档类型可以对应不同的矫正算法,例如文档类型为单平的文档或左平右平的文档对应普通文档矫正算法,又例如文档类型为单曲、左曲右平的文档、左平右曲的文档、左曲右曲的文档类型中的任一种,对应普通文档矫正算法以及去扭曲矫正算法。示例性的,本申请涉及的普通文档矫正算法例如为透视变换算法,去扭曲矫正算法例如为dewarping算法,此处对具体的算法名称不作限制。
步骤905,根据第一文档的文档类型对应的矫正算法,对原始图像中的第一文档进行矫正,得到目标文档图像。
在一种可能的实施方式中,在步骤904之前,还可以根据第一文档的分类结果以及第一文档的边缘分割图,确定第一文档的角点坐标。示例性的,根据原始图像中的第一文档的分类结果确定角点个数,如果为单页文档,角点个数为4个,如果为双页文档,角点个数为8个;然后结合确定出的角点个数以及分割模块输出的边缘分割图,可以确定出第一文档的各个角点的坐标。
一种可能实现步骤905的方式中,根据原始图像以及第一文档的各个角点的坐标以及第一文档的文
档类型对应的矫正算法,对原始图像中的第一文档进行矫正,得到目标文档图像。
示例性的,例如原始图像中的第一文档为单平类型的文档,对原始图像中的单平类型的文档进行矫正的实现过程包括:将单平类型的文档以及单平类型的文档的四个角点的坐标输入至矫正模块进行矫正,得到形状为矩形的目标文档图像。本申请实施例涉及的矫正模块可以为通过普通文档矫正算法实现,后文不再赘述。
又例如,原始图像中的第一文档为单曲类型的文档,对原始图像中的单曲类型的文档进行矫正的实现过程包括:将单曲类型的文档以及该单曲类型的文档的四个角点的坐标输入至矫正模块进行矫正,得到形状规则的单页文档图像,该矫正模块可以为通过普通文档矫正算法实现。然后,将形状规则的单页文档图像输入去扭曲模块进行进一步矫正,得到平整且形状规则的目标文档图像,本申请实施例涉及的去扭曲模块可以通过输入去扭曲矫正算法实现,后文不再赘述。
又例如,原始图像中的第一文档为左平右平类型的双页文档,对原始图像中的左平右平类型的双页文档进行矫正的实现过程包括:将左平右平类型的双页文档中的左侧页面图像以及左侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的左侧页面图像;并将右侧页面图像以及右侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的右侧页面图像。之后,将形状规则的左侧页面图像和形状规则的右侧页面图像进行合并,得到目标文档图像。
又例如,原始图像中的第一文档为左曲右平类型的双页文档,对原始图像中的左曲右平类型的双页文档进行矫正的实现过程包括:将左曲右平类型的双页文档中的左侧页面图像以及左侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的左侧页面图像,之后将形状规则的左侧页面图像输入去扭曲模块进行进一步矫正,得到平整且形状规则的矫正后左侧页面图像。将左曲右平类型的双页文档中的右侧页面图像以及右侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的矫正后右侧页面图像。之后,将矫正后左侧页面图像和矫正后右侧页面图像进行合并,得到目标文档图像。
又例如,原始图像中的第一文档为左平右曲类型的双页文档,对原始图像中的左平右曲类型的双页文档进行矫正的实现过程包括:将左平右曲类型的双页文档中的左侧页面图像以及左侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的矫正后左侧页面图像。将左平右曲类型的双页文档中的右侧页面图像以及右侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的右侧页面图像。之后将形状规则的右侧页面图像输入去扭曲模块进行进一步矫正,得到平整且形状规则的矫正后右侧页面图像。之后,将矫正后左侧页面图像和矫正后右侧页面图像进行合并,得到目标文档图像。
又例如,原始图像中的第一文档为左曲右曲类型的双页文档,对原始图像中的左曲右曲类型的双页文档进行矫正的实现过程包括:将左曲右曲类型的双页文档中的左侧页面图像以及左侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的左侧页面图像;之后将形状规则的左侧页面图像输入去扭曲模块进行进一步矫正,得到平整且形状规则的矫正后左侧页面图像。将左曲右曲类型的双页文档中的右侧页面图像以及右侧页面的四个角点的坐标输入矫正模块进行矫正,得到形状规则的右侧页面图像;之后将形状规则的右侧页面图像输入去扭曲模块进行进一步矫正,得到平整且形状规则的矫正后右侧页面图像。之后,将矫正后左侧页面图像和矫正后右侧页面图像进行合并,得到目标文档图像。
在一种可能的实施方式中,上述步骤905中对原始图像中的第一文档进行矫正之后,矫正模块输出的目标文档图像后,用户界面可以显示目标文档图像,该用户界面还可以切换到编辑模式,以便用户可以对左右两边或上下两边都进行手动二次编辑,从而提高矫正的准确性。
本申请实施例中,通过分类模块可以识别原始图像中的文档类型,不仅能够检测单页文档,还可以检测多页文档,可以提升文档检测结果的准确性,通过分割模块可以兼容单页或者多页文档的边缘检测得到边缘分割图,结合分类结果以及边缘分割图可以实现对单页、多页等多种类型的文档图像进行矫正。
为便于理解本申请实施例,接下来对本申请的各种应用场景进行介绍,本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
场景一,第一文档的类型为单平类型的场景。
下面以拍照应用为例,介绍单平类型的文档检测矫正的具体实现方式。
用户点击终端上的拍照应用,进入如图12所示的拍照预览界面12a,拍照预览界面12a包括按钮
1201,该按钮1201用于进入扫描预览界面12b。用户点击该按钮1201,终端显示扫描预览界面12b,终端的摄像头的拍摄视角倾斜对着平铺在桌面上的单页纸1202,所以扫描预览界面12b显示预览的边缘平整的单页纸1202,由于摄像头的拍摄视角并不是垂直于单页纸1202所在平面,导致用户看到的扫描预览界面12b中显示的单页纸1202中的文字和图片是倾斜的,单页纸1202的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面12b中还可以显示边缘平整的单页纸1202的边缘线1203。
示例性的,扫描预览界面12b还包括拍摄按钮1204,用户可以点击拍摄按钮1204,终端响应于用户针对拍摄按钮1204的点击操作,拍摄得到包括边缘平整的单页纸1202的原始图像,并对该边缘平整的单页纸1202的原始图像进行文档检测矫正。一种可能实现对单页纸1202的原始图像进行文档检测矫正的实施方式中,一方面,终端将边缘平整的单页纸1202的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:1.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.0,左曲右曲:0.0,根据分类结果可知该边缘平整的单页纸1202的文档类型为单平类型;另一方面,终端将边缘平整的单页纸1202的原始图像输入至分割模块得到边缘平整的单页纸1202的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该边缘平整的单页纸1202有四个角点,获取该边缘平整的单页纸1202的四个角点的坐标,然后将包含边缘平整的单页纸1202的原始图像以及四个角点的坐标输入至普通矫正算法中进行矫正,例如普通矫正算法为透视变换算法,得到矫正后的单页纸1205。然后,终端显示用户界面12c,该用户界面12c显示有矫正后的单页纸1205,矫正后的单页纸1205的边缘形状为矩形,而且单页纸1205中的文字和图片不再倾斜,矫正后的单页纸1205的观看效果相当于拍摄视角垂直于边缘平整的单页纸所在平面而拍摄得到的效果,这样可以提升用户对于边缘平整的单页纸中内容的观感。
如图12所示,用户界面12c还可以包括确认按钮1206,用户点击该确认按钮1206后,终端显示用户界面12d,用户界面12d包括矫正后的单页纸1205以及对话框1208,该对话框1208包括保存按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的单页纸1205存储于相册中,可选的,终端在对采集的照片(例如:单页纸1202)进行矫正后,也可以不将其显示在预览界面,而是直接保存。可选的,终端还可以显示用户界面12e,该用户界面12e显示相册中的单页纸1205。可选的,该对话框1208还可以包括“导出为PDF”、“取消”等功能按钮,用户点击导出为PDF按钮,终端将矫正后的单页纸1205导出并保存为PDF格式文件,若用户点击取消按钮,则终端取消保存至相册。
在另一些实施例中,用户点击确认按钮1206,终端从用户界面12c切换至显示用户界面12e,用户界面12e包括相册中的单页纸1205。在其它一些实施例中,用户点击确认按钮1206后,直接存储矫正后的单页纸1205,不切换用户界面,即仍旧显示用户界面12c。
在其它一些实施例中,终端显示的用户界面12c还可以包括其它功能按钮,裁剪按钮、效果按钮、旋转按钮,其中裁剪按钮用于进入编辑模式,可通过调整角点位置,实现对原始图像中需要矫正的文档图像进行裁剪;效果按钮用于选择多种文档显示效果,旋转按钮用于对显示的文档方向进行旋转。
如图13所示,用户可以点击用户界面12c中的裁剪按钮1207(该按钮也可以为其他名字,只要能实现对应功能即可),响应于针对裁剪按钮1207的点击操作,终端显示用户界面13a,用户界面13a显示包括单页纸1202的原始图像、单页纸1202的四个角点(分别位于位置A、位置B、位置C、位置D)以及连接四个角点的边缘线1203,用户可以移动用户界面13a上的任一个或多个角点,实现改变边缘线的大小。结合用户界面13a以及用户界面13b来说,用户可以长按用户界面13b左上角的角点,并将左上角的角点从位置A移动至位置A′处,用户还可以移动其它几个角点的位置,例如用户界面13b中将右上角的角点从位置B移动至位置B′处,将左下角的角点从位置D移动至位置D′处,将右下角的角点从位置C移动至位置C′处,得到单页纸1202的边缘线1212。
用户界面13a以及用户界面13b中还包括确认按钮1209,用户可以点击确认按钮1209,终端可以响应于针对该确认按钮1209的点击操作,显示用户界面13c,用户界面13c包括矫正后的单页纸1213,用户界面13c还包括确认按钮1214,用户点击该确认按钮1214后,终端显示用户界面13d,用户界面13d包括对话框1215,该对话框1215包括保存按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的单页纸1213存储于相册中。该对话框1215的具体实现可以参考针对图12中的对话框1208的相关说明,此处不再赘述。可选的,用户界面13a以及用户界面13b还包括取消按钮1210、还原按钮1211,用户点击该取消按钮1210,终端响应于针对取消按钮1210的点击操作,显示用户界面12c;用户点击该还原按钮1211,终端响应于针对还原按钮1211的点击操作,还原各个角
点的位置,也就是说,不保存用户对各个角点所做的调整,各个角点回到最初的位置。
可选的,终端还可以显示如图13中的用户界面13e,该用户界面13e显示相册中的矫正后的单页纸1213。在另一些实施例中,用户点击确认按钮1209后,从用户界面13c切换至显示用户界面13e,用户界面12e包括相册中的矫正后的单页纸1213。在其它一些实施例中,用户点击确认按钮1209后,直接存储矫正后的单页纸1213,不切换用户界面,即仍旧显示用户界面13c,不会显示用户界面13d或用户界面13e。
下面以扫描应用为例,介绍单平类型的文档检测矫正的具体实现方式。
用户点击终端上的扫描应用,进入如图14所示的扫描预览界面14a,终端的摄像头的拍摄视角倾斜对着平铺在桌面上的单页纸1401,所以扫描预览界面14a显示预览的边缘平整的单页纸1401,由于摄像头的拍摄视角并不是垂直于单页纸1401所在平面,导致用户看到的扫描预览界面14a中显示的单页纸1401中的文字和图片是倾斜的,单页纸1401的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面14a中还可以显示边缘平整的单页纸1401的边缘线1402。
扫描预览界面14a还包括拍摄按钮1403,用户可以点击拍摄按钮1403,终端响应于用户针对拍摄按钮1403的点击操作,拍摄得到单页纸1401的原始图像,并对单页纸1401的原始图像进行文档检测,一方面,将单页纸1401的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:1.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.0,左曲右曲:0.0,根据分类结果可知该单页纸1401的文档类型为单平类型;另一方面,将单页纸1401的原始图像输入至分割模块得到单页纸1401的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该单页纸1401为四个角点,获取该单页纸1401的四个角点的坐标,然后显示用户界面14b,用户界面14b包括单页纸1401的原始图像、单页纸1401的四个角点(如位置A、位置B、位置C、位置D)以及单页纸1401的边缘线1404。
用户界面14b为可编辑模式,用户可以移动用户界面14b上的任一个或多个角点,以实现改变边缘线的大小,如用户界面14c所示,用户将单页纸1401左上角的角点从位置A移动到位置B,可以得到用户界面14d所示的边缘线1405,若用户移动多个角点的位置,可以使得单页纸1401的边缘线尺寸更大。具体实现可以参见图13中的用户界面14a以及用户界面14b的相关描述,此处不再赘述。
用户界面14b~14d中还包括确认按钮1406,用户在移动角点之后,可以点击确认按钮1406,终端响应于用户点击该确认按钮1406,对该单页纸1401的原始图像进行矫正,示例性的,将包含单页纸1401的原始图像以及四个角点的坐标输入至普通矫正算法中进行矫正,例如普通矫正算法为透视变换算法,得到矫正后的单页纸1408,并显示用户界面14e,该用户界面14e包括矫正后的单页纸1408,矫正后的单页纸1408的边缘形状为矩形,而且单页纸1408中的文字和图片不再倾斜,矫正后的单页纸1408的观看效果相当于拍摄视角垂直于单页纸所在平面而拍摄得到的效果,这样可以提升用户对于图像中的单页纸内容的观感。
用户界面14e还包括确认按钮1407,终端响应于用户点击该确认按钮1407的点击操作,显示用户界面14f,用户界面14f包括对话框1409,该对话框1409包括保存按钮,可选的,该对话框1409还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的单页纸1408存储于相册中。可选的,终端还可以显示如图12中的用户界面12e,该用户界面12e显示相册中的矫正后的单页纸。在另一些实施例中,用户点击确认按钮1407后,从用户界面14e切换至显示如图12中的用户界面12e,用户界面12e包括相册中的矫正后的单页纸1408。在其它一些实施例中,用户点击确认按钮1407后,直接存储矫正后的单页纸1408,不切换用户界面,即仍旧显示用户界面14e。
上述拍照应用也可以实现下面场景二至场景六的文档矫正方案,以下实施例中以扫描应用为例进行介绍场景二至场景六的文档矫正方案。
场景二,文档类型为单曲类型的场景。
下面以扫描应用为例,介绍单曲类型的文档检测矫正的具体实现方式。
用户点击终端上的扫描应用(或者扫描按钮,或者其他操作),进入如图15所示的扫描预览界面15a,终端的摄像头的拍摄视角倾斜对着边缘弯曲的单页纸1501,扫描预览界面15a显示预览的边缘弯曲的单页纸1501,用户看到的扫描预览界面15a中显示的单页纸1501中的每一行文字并不在一条直线
上,单页纸1501的边缘线是弯曲的,单页纸1501的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面15a中还可以显示边缘弯曲的单页纸1501的边缘线1502。
扫描预览界面15a还包括拍摄按钮1503,用户可以点击拍摄按钮1503,终端响应于用户针对拍摄按钮1503的点击操作,拍摄得到单页纸1501的原始图像,并对单页纸1501的原始图像进行文档检测,一方面,将单页纸1501的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:0.0,单曲:1.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.0,左曲右曲:0.0,根据分类结果可知该单页纸1501的文档类型为单曲类型;另一方面,将单页纸1501的原始图像输入至分割模块得到单页纸1501的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该单页纸1501为四个角点,获取该单页纸1501的四个角点的坐标,然后显示用户界面15b,用户界面15b包括单页纸1501的原始图像、单页纸1501的四个角点(如位置A、位置B、位置C、位置D)以及单页纸1501的边缘线1504。
用户界面15b为可编辑模式,用户可以移动用户界面15b上的任一个或多个角点,以实现改变边缘线的大小。示例性的,用户可以长按用户界面13b左上角的角点,并将左上角的角点从位置A移动至位置A′处,用户还可以移动其它几个角点的位置,例如用户界面15b中将右上角的角点从位置B移动至位置B′处,将左下角的角点从位置D移动至位置D′处,将右下角的角点从位置C移动至位置C′处,得到单页纸1501的边缘线1505。
用户界面15b以及用户界面15c中还包括确认按钮1506,用户在移动角点之后,可以点击确认按钮1506,终端响应于用户点击该确认按钮1506,对该单页纸1501的原始图像进行矫正,示例性的,将包含单页纸1501的原始图像以及四个角点的坐标先输入至普通矫正算法,进行透视变换矫正,之后将经过透视变换矫正后的结果输入至去扭曲矫正算法中进一步矫正,得到矫正后的单页纸图像1508,并显示用户界面15d,该用户界面15d包括矫正后的单页纸图像1508,矫正后的单页纸图像1508的边缘形状为矩形,而且单页纸图像1508中的文字在一条直线上,图片的边缘为直线,文字和图片不再倾斜,矫正后的单页纸图像1508的观看效果相当于拍摄视角垂直于平整的单页纸所在平面而拍摄得到的效果,这样可以提升用户对于图像中的单页纸内容的观感。
用户界面15d还包括确认按钮1507,终端响应于用户点击该确认按钮1507的点击操作,显示用户界面15e,用户界面15e包括对话框1509,该对话框1509包括保存按钮,可选的,该对话框1509还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的单页纸图像1508存储于相册中。可选的,终端还可以显示用户界面15f,该用户界面15f显示相册中的矫正后的单页纸。在另一些实施例中,用户点击确认按钮1507后,从用户界面15d切换至显示用户界面15f,用户界面15f包括相册中的矫正后的单页纸图像1508。在其它一些实施例中,用户点击确认按钮1507后,直接存储矫正后的单页纸图像1508,不切换用户界面,即仍旧显示用户界面15d,不显示用户界面15e或用户界面15f。
场景三,文档类型为左曲右平类型的场景。
下面以扫描应用为例,介绍单曲类型的文档检测矫正的具体实现方式。
用户点击终端上的扫描应用,进入如图16所示的扫描预览界面16a,终端的摄像头的拍摄视角倾斜对着已翻开的书本,已翻开的书本包括两页文档,其中左侧页面的边缘弯曲,右侧页面的边缘平整,扫描预览界面16a显示预览的左曲右平的双页书本1601,用户看到的扫描预览界面16a中显示的左侧页面的每一行文字并不在一条直线上,图片的边缘线也是弯曲的,右侧页面虽然边缘平整,但是图片也是倾斜的,双页书本1601的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面16a中还可以显示双页书本1601的边缘线1602。
扫描预览界面16a还包括拍摄按钮1603,用户可以点击拍摄按钮1603,终端响应于用户针对拍摄按钮1603的点击操作,拍摄得到双页书本1601的原始图像,并对左曲右平的双页书本1601的原始图像进行文档检测,一方面,将双页书本1601的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:0.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.9,左曲右曲:0.1,根据分类结果可知该双页书本1601的文档类型为左曲右平类型;另一方面,将双页书本1601的原始图像输入至分割模块得到双页书本1601的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该双页书本1601为八个角点,获取该双页书本1601的八个角点的坐标,然后显示用户界面16b,用户界面16b包括双页书本1601的原始图像、双页书本1601的八个角点(如位置A~位置H处的八个角点)以及双
页书本1601的边缘线1604。
用户界面16b为可编辑模式,用户可以移动用户界面16b上的任一个或多个角点,以实现改变边缘线的大小,调整用户界面16b上的角点的具体实现方式可参见图13中的用户界面13a~13b,此处不再赘述。
用户界面16b中还包括确认按钮1605,用户可以点击确认按钮1605,终端响应于用户点击该确认按钮1605,对该左曲右平的双页书本1601的原始图像进行矫正,示例性的,将左曲右平的双页书本1601的左侧页面图像以及左侧页面图像对应的4个角点(位置A、位置B、位置C以及位置D处的角点)的坐标先输入普通矫正算法,进行透视变换矫正,之后将经过透视变换矫正后的结果输入至去扭曲矫正算法中进一步矫正,输出左侧矫正后页面图像。将左曲右平的双页书本1601的右侧页面图像以及右侧页面图像对应的4个角点(位置E、位置F、位置G以及位置H处的角点)的坐标输入普通矫正算法,输出右侧矫正后页面图像。然后对左侧矫正后页面图像以及右侧矫正后页面图像进行合并,得到如用户界面16c所示的矫正后的双页书本图像1606,矫正后的双页书本图像1606中双页都是平整的矩形,而且双页书本图像1606中的文字在一条直线上,图片的边缘为直线,文字和图片不再倾斜,矫正后的双页书本图像1606的观看效果相当于拍摄视角垂直于平整的左曲右平的双页书本所在平面而拍摄得到的效果,这样可以提升用户对于图像中的左曲右平的双页书本内容的观感。
用户界面16c还包括确认按钮1607,终端响应于用户点击该确认按钮1607的点击操作,显示用户界面16d,用户界面16d包括对话框1608,该对话框1608包括保存按钮,可选的,该对话框1608还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的双页书本图像1606存储于相册中。可选的,终端还可以响应于该点击操作,显示用户界面16e,该用户界面16e显示相册中的矫正后的双页书本图像1606。在另一些实施例中,用户点击确认按钮1607后,从用户界面16c切换至显示用户界面16e,用户界面16e包括相册中的矫正后的双页书本图像1606。在其它一些实施例中,用户点击确认按钮1607后,直接存储矫正后的双页书本图像1606,不切换用户界面,即仍旧显示用户界面16c。
场景四,文档类型为左曲右曲类型的场景。
下面以扫描应用为例,介绍单曲类型的文档检测矫正的具体实现方式。
用户点击终端上的扫描应用,进入如图17所示的扫描预览界面17a,终端的摄像头的拍摄视角倾斜对着已翻开的书本,已翻开的书本包括两页文档,其中左侧页面和右侧页面的边缘均弯曲,扫描预览界面17a显示预览的左曲右曲的双页书本1701,用户看到的扫描预览界面17a中显示的左侧页面或右侧页面中的每一行文字并不在一条直线上,图片的边缘线也是弯曲的,双页书本1701的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面17a中还可以显示双页书本1701的边缘线1702。
扫描预览界面17a还包括拍摄按钮1703,用户可以点击拍摄按钮1703,终端响应于用户针对拍摄按钮1703的点击操作,拍摄得到双页书本1701的原始图像,并对左曲右曲的双页书本1701的原始图像进行文档检测,一方面,将双页书本1701的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:0.0,单曲:0.0,左平右曲:0.0,左平右平:0.0,左曲右平:0.0,左曲右曲:1.0,根据分类结果可知该双页书本1701的文档类型为左曲右曲类型;另一方面,将双页书本1701的原始图像输入至分割模块得到双页书本1701的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该双页书本1701为八个角点,获取该双页书本1701的八个角点的坐标,然后显示用户界面17b,用户界面17b包括双页书本1701的原始图像、双页书本1701的八个角点(如位置A~位置H处的八个角点)以及双页书本1701的边缘线1704。
用户界面17b为可编辑模式,用户可以移动用户界面17b上的任一个或多个角点,以实现改变边缘线的大小。调整用户界面17b上的角点的具体实现方式可参见图13中的用户界面13a~13b,此处不再赘述。
用户界面17b中还包括确认按钮1705,用户可以点击确认按钮1705,终端响应于用户点击该确认按钮1705,对该左曲右曲的双页书本1701的原始图像进行矫正,示例性的,将左曲右曲的双页书本1701的左侧页面图像以及左侧页面图像对应的4个角点(位置A、位置B、位置C以及位置D处的角点)的坐标先输入普通矫正算法,进行透视变换矫正,之后将经过透视变换矫正后的结果输入至去扭曲矫正
算法中进一步矫正,输出左侧矫正后页面图像。将左曲右平的双页书本1701的右侧页面图像以及右侧页面图像对应的4个角点(位置E、位置F、位置G以及位置H处的角点)的坐标先输入普通矫正算法,进行透视变换矫正,之后将经过透视变换矫正后的结果输入至去扭曲矫正算法中进一步矫正,输出右侧矫正后页面图像。然后对左侧矫正后页面图像以及右侧矫正后页面图像进行合并,得到如用户界面17c所示的矫正后的双页书本图像1706,矫正后的双页书本图像1706中双页都是平整的矩形,而且矫正后的双页书本图像1706中的文字在一条直线上,图片的边缘为直线,文字和图片不再倾斜,矫正后的双页书本图像1706的观看效果相当于拍摄视角垂直于平整的左曲右平的双页书本所在平面而拍摄得到的效果,这样可以提升用户对于图像中的左曲右平的双页书本内容的观感。
用户界面17c还包括确认按钮1707,终端响应于用户点击该确认按钮1707的点击操作,显示用户界面17d,用户界面17d包括对话框1708,该对话框1708包括保存按钮,可选的,该对话框1708还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后双页书本图像1706存储于相册中。可选的,终端还可以显示如图17中的用户界面17e,该用户界面17e显示相册中的矫正后的双页书本图像1706。在另一些实施例中,用户点击确认按钮1707后,从用户界面17c切换至显示用户界面17e,用户界面17e包括相册中的矫正后的双页书本图像1706。在其它一些实施例中,用户点击确认按钮1707后,直接存储矫正后的双页书本图像1706,不切换用户界面,即仍旧显示用户界面17c。
场景五,文档类型为左平右曲类型的场景。
用户点击终端上的扫描应用,进入如图18所示的扫描预览界面18a,终端的摄像头的拍摄视角倾斜对着已翻开的书本,已翻开的书本包括两页文档,其中左侧页面的边缘平整,右侧页面的边缘弯曲,扫描预览界面18a显示预览的左平右曲的双页书本1801,用户看到的扫描预览界面18a中显示的左侧页面倾斜,右侧页面中的每一行文字并不在一条直线上,图片的边缘线也是弯曲的,双页书本1801的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面18a中还可以显示双页书本1801的边缘线1802。
扫描预览界面18a还包括拍摄按钮1803,用户可以点击拍摄按钮1803,终端响应于用户针对拍摄按钮1803的点击操作,拍摄得到双页书本1801的原始图像,并对左平右曲的双页书本1801的原始图像进行文档检测,一方面,将双页书本1801的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:0.0,单曲:0.0,左平右曲:1.0,左平右平:0.0,左曲右平:0.0,左曲右曲:0.0,根据分类结果可知该双页书本1801的文档类型为左平右曲类型;另一方面,将双页书本1801的原始图像输入至分割模块得到双页书本1801的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该双页书本1801为八个角点,获取该双页书本1801的八个角点的坐标,然后显示用户界面18b,用户界面18b包括双页书本1801的原始图像、双页书本1801的八个角点(如位置A~位置H处的八个角点)以及双页书本1801的边缘线1804,其中双页书本1801的左侧页面的右下角(位置C处)的角点与右侧页面的左下角(位置H处)的角点重合。
用户界面18b为可编辑模式,用户可以移动用户界面18b上的任一个或多个角点,以实现改变边缘线的大小,调整用户界面18b上的角点的具体实现方式可参见图13中的用户界面13a~13b,此处不再赘述。
用户界面18b中还包括确认按钮1805,用户可以点击确认按钮1805,终端响应于用户点击该确认按钮1805,对该左平右曲的双页书本1801的原始图像进行矫正,示例性的,将左平右曲的双页书本1801的左侧页面图像以及左侧页面图像对应的4个角点(位置A、位置B、位置C以及位置D处的角点)的坐标输入普通矫正算法,进行透视变换矫正,输出左侧矫正后页面图像。将左平右曲的双页书本1801的右侧页面图像以及右侧页面图像对应的4个角点(位置E、位置F、位置G以及位置H处的角点)的坐标先输入普通矫正算法,进行透视变换矫正,之后将经过透视变换矫正后的结果输入至去扭曲矫正算法中进一步矫正,输出右侧矫正后页面图像。然后对左侧矫正后页面图像以及右侧矫正后页面图像进行合并,得到如用户界面18c所示的矫正后的双页书本图像1806,矫正后的已翻开双页书本图像1806中双页都是平整的矩形,而且矫正后的双页书本图像1806中的文字在一条直线上,图片的边缘为直线,文字和图片不再倾斜,矫正后的双页书本图像1806的观看效果相当于拍摄视角垂直于平整的左曲右平的双页书本所在平面而拍摄得到的效果,这样可以提升用户对于图像中的左曲右平的双页书本内容的观
感。
用户界面18c还包括确认按钮1807,终端响应于用户点击该确认按钮1807的点击操作,显示用户界面18d,用户界面18d包括对话框1808,该对话框1808包括保存按钮,可选的,该对话框1808还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的双页书本图像1806存储于相册中,显示如图18中的用户界面18e,该用户界面18e显示相册中的矫正后的双页书本图像1806。在另一些实施例中,用户点击确认按钮1807后,从用户界面18c切换至显示用户界面18e,用户界面18e包括相册中的矫正后的双页书本图像1806。在其它一些实施例中,用户点击确认按钮1807后,直接存储矫正后的双页书本图像1806,不切换用户界面,即仍旧显示用户界面18c。
场景六,文档类型为左平右平类型的场景。
用户点击终端上的扫描应用,进入如图19所示的扫描预览界面19a,终端的摄像头的拍摄视角倾斜对着平铺在桌面上的双页纸1901,所以扫描预览界面19a显示预览的边缘平整的双页纸1901,由于摄像头的拍摄视角并不是垂直于双页纸1901所在平面,导致用户看到的扫描预览界面19a中显示的双页纸1901中的文字和图片是倾斜的,双页纸1901的边缘形状并不是一个矩形,为此,可以利用终端进行文档检测矫正。可选的,扫描预览界面19a中还可以显示边缘平整的双页纸1901的边缘线1902。
扫描预览界面19a还包括拍摄按钮1903,用户可以点击拍摄按钮1903,终端响应于用户针对拍摄按钮1903的点击操作,拍摄得到双页纸1901的原始图像,并对双页纸1901的原始图像进行文档检测,一方面,将双页纸1901的原始图像输入至分类模块得到分类结果,该分类结果例如为单平:0.0,单曲:0.0,左平右曲:0.0,左平右平:1.0,左曲右平:0.0,左曲右曲:0.0,根据分类结果可知该双页纸1901的文档类型为左平右平类型;另一方面,将双页纸1901的原始图像输入至分割模块得到双页纸1901的边缘分割图,然后结合分类结果以及边缘分割图,可以确定该双页纸1901为八个角点,获取该双页纸1901的八个角点的坐标,然后显示用户界面19b,用户界面19b包括双页纸1901的原始图像、双页纸1901的八个角点(如位置A、位置B、位置C、位置D、位置E、位置F、位置G、位置H)以及双页纸1901的边缘线1904,其中位置B与位置G重合,位置C与位置H重合。
用户界面19b为可编辑模式,用户可以移动用户界面19b上的任一个或多个角点,以实现改变边缘线的大小。具体实现可以参见图13中的用户界面13a以及用户界面13b的相关描述,此处不再赘述。
用户界面19b中还包括确认按钮1905,用户可以点击确认按钮1905,终端响应于用户点击该确认按钮1905,对该双页纸1901的原始图像进行矫正,示例性的,将包含双页纸1901的原始图像的左侧页面以及左侧页面的四个角点(位置A、位置B、位置C、位置D处的角点)的坐标输入至普通矫正算法中进行矫正,例如普通矫正算法为透视变换算法,得到矫正后的左侧页面;将包含双页纸1901的原始图像的右侧页面以及右侧页面的四个角点(位置E、位置F、位置G、位置H处的角点)的坐标输入至普通矫正算法中进行矫正,得到矫正后的右侧页面;将矫正后的左侧页面和矫正后的右侧页面合并,得到矫正后的双页纸图像1906,并显示用户界面19c,该用户界面19c包括矫正后的双页纸图像1906,矫正后的双页纸图像1906的边缘形状为矩形,而且双页纸图像1906中的文字和图片不再倾斜,矫正后的双页纸图像1906的观看效果相当于拍摄视角垂直于双页纸所在平面而拍摄得到的效果,这样可以提升用户对于图像中的双页纸内容的观感。
用户界面19c还包括确认按钮1907,终端响应于用户点击该确认按钮1907的点击操作,显示用户界面19d,用户界面19d包括对话框1908,该对话框1908包括保存按钮,可选的,该对话框1908还可以包括“导出为PDF”、“取消”等功能按钮,用户点击该保存按钮,终端响应于针对保存按钮的点击操作,将矫正后的双页纸图像1906存储于相册中。可选的,终端还可以显示用户界面19e,该用户界面19e显示相册中的矫正后的双页纸图像1906。在另一些实施例中,用户点击确认按钮1907后,从用户界面19c切换至显示用户界面19e,用户界面19e包括相册中的矫正后的双页纸图像1906。在其它一些实施例中,用户点击确认按钮1907后,直接存储矫正后的双页纸图像1906,不切换用户界面,即仍旧显示用户界面19c。
在具体过程中,以上各应用的名称可以为其他名称,本申请实施例不作限制,只要能实现相同的功能即可。
上述本申请提供的实施例中,从终端作为执行主体的角度对本申请实施例提供的方法进行了介绍。
为了实现上述本申请实施例提供的方法中的各功能,终端可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
基于以上实施例以及相同构思,本申请实施例还提供一种终端,用于执行上述方法实施例中终端所执行的步骤,相关特征的可参见上述方法实施例,在此不再赘述。
请参见图20,该终端包括:一个或多个处理器2001和存储器2002,其中存储器2002中存储有程序指令,当程序指令被设备执行时,可以实现本申请实施例中的方法步骤。
其中,处理器2001可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的指令,结合其硬件完成上述方法的步骤。
装置的具体实现方式的相关特征可以参照上文的方法部分,此处不再赘述。
基于相同的技术构思,本申请实施例还提供一种芯片,所述芯片与设备中的存储器耦合,使得所述芯片在运行时调用所述存储器中存储的程序指令,实现本申请实施例上述方法。
基于相同的技术构思,本申请实施例还提供一种计算机存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行本申请实施例上述方法。
基于相同的技术构思,本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括指令,当所述指令被执行时,使得计算机执行实现本申请实施例上述方法。
本申请的各个实施例可以单独使用,也可以相互结合使用,以实现不同的技术效果。
以上所述,以上实施例仅用以对本申请的技术方案进行了详细介绍,但以上实施例的说明只是用于帮助理解本申请实施例的方法,不应理解为对本申请实施例的限制。本技术领域的技术人员可轻易想到的变化或替换,都应涵盖在本申请实施例的保护范围之内。
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请的技术方案的目的。
以上所述,仅为本申请的具体实施方式。熟悉本技术领域的技术人员根据本申请提供的具体实施方式,可想到变化或替换,都应涵盖在本申请的保护范围之内。
Claims (11)
- 一种文档检测矫正方法,其特征在于,所述方法包括:响应于针对第一应用的第一操作,显示扫描预览界面,所述扫描预览界面用于显示预览的第一图像,所述第一图像中包含第一文档;接收第二操作;响应于所述第二操作,采集获得第二图像,所述第二图像中包括所述第一文档;将所述第二图像分别输入到训练完成的分类模型与分割模型中,得到所述分类模型输出的所述第一文档的分类结果以及所述分割模型输出的所述第一文档的边缘分割图;根据所述第一文档的分类结果以及所述边缘分割图,对所述第一文档进行矫正得到第一目标文档图像;显示第一用户界面,所述第一用户界面用于显示所述第一目标文档图像。
- 如权利要求1所述的方法,其特征在于,所述方法还包括:响应于第三操作,显示第二用户界面,所述第二用户界面显示有所述第二图像以及所述第一文档的角点,所述第一文档的角点用于矫正所述第一文档;接收对所述第一文档的角点中至少一个角点的第四操作;响应于所述第四操作,调整所述至少一个角点的位置。
- 如权利要求2所述的方法,其特征在于,在响应于所述第四操作,调整所述至少一个角点的位置之后,所述方法还包括:响应于第五操作,根据所述第一文档的分类结果以及调整位置后的角点坐标,对所述第一文档进行矫正得到第二目标文档图像;显示第三用户界面,所述第三用户界面用于显示所述第二目标文档图像。
- 如权利要求1所述的方法,其特征在于,在所述响应于所述第二操作,采集获得第二图像之后,所述方法还包括:显示第四用户界面,所述第四用户界面用于显示所述第二图像以及所述第一文档的角点。
- 如权利要求4所述的方法,其特征在于,所述方法还包括:接收对所述第一文档的角点中至少一个角点的第六操作;响应于所述第六操作,调整所述至少一个角点的位置;响应于第七操作,根据所述第一文档的分类结果以及调整位置后的角点坐标,对所述第一文档进行矫正得到第三目标文档图像;显示第五用户界面,所述第五用户界面用于显示所述第三目标文档图像。
- 如权利要求1-5任一项所述的方法,其特征在于,所述第一文档的分类结果包括至少一种文档类型以及每种文档类型的概率;所述至少一种文档类型包括以下至少一种类型:页面平整的单页文档类型;页面弯曲的单页文档类型;左侧页面与右侧页面均平整的双页文档类型;左侧页面平整右侧页面弯曲的双页文档类型;左侧页面弯曲右侧页面平整的双页文档类型;左侧页面弯曲右侧页面弯曲的双页文档类型。
- 如权利要求1-6任一项所述的方法,其特征在于,所述根据所述第一文档的分类结果以及所述边缘分割图,对所述第一文档进行矫正得到第一目标文档图像,包括:根据所述第一文档的分类结果以及所述边缘分割图,确定所述第一文档的类型;根据所述第一文档的类型,确定所述第一文档对应的文档矫正算法;将所述第二图像输入至所述第一文档对应的文档矫正算法,得到所述第一目标文档图像。
- 如权利要求7所述的方法,其特征在于,所述将所述第二图像输入至所述第一文档对应的文档矫正算法,得到所述第一目标文档图像,包括:若所述第一文档的类型为页面平整的单页文档类型,则将所述第二图像输入至普通文档矫正算法进 行矫正得到所述第一目标文档图像;或者,若所述第一文档的类型为页面弯曲的单页文档类型,则将所述第二图像输入至第一文档矫正算法进行矫正得到单页文档图像,将所述单页文档图像输入至去扭曲矫正算法进行矫正,得到所述第一目标文档图像;或者,若所述第一文档的类型为左侧页面与右侧页面均平整的双页文档类型,则将所述第二图像中所述第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第一页面图像,将所述第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第二页面图像,将所述第一页面图像与所述第二页面图像合并得到所述第一目标文档图像;或者,若所述第一文档的类型为左侧页面平整右侧页面弯曲的双页文档类型,则将所述第二图像中所述第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第三页面图像,将所述第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第四页面图像,并将所述第四页面图像输入至去扭曲矫正算法进行矫正得到第五页面图像,将所述第三页面图像与所述第五页面图像合并得到所述第一目标文档图像;或者,若所述第一文档的类型为左侧页面弯曲右侧页面平整的双页文档类型,则将所述第二图像中所述第一文档的左侧页面输入至普通文档矫正算法进行矫正得到第六页面图像,将所述第六页面图像输入至去扭曲矫正算法进行矫正得到第七页面图像,将所述第一文档的右侧页面输入至普通文档矫正算法进行矫正得到第八页面图像,将所述第六页面图像与所述第八页面图像合并得到所述第一目标文档图像;或者,若所述第一文档的类型为左侧页面弯曲右侧页面弯曲的双页文档类型,则将所述第二图像中所述第一文档的左侧页面输入至普通文档矫正算法得到第九页面图像,将所述第九页面图像输入至去扭曲矫正算法进行矫正得到第十页面图像,将所述第一文档的右侧页面输入至普通文档矫正算法得到第十一页面图像,将所述第十一页面图像输入至去扭曲矫正算法进行矫正得到第十二页面图像,将所述第十页面图像与第十二页面图像合并得到所述第一目标文档图像。
- 一种终端,其特征在于,包括处理器和存储器;所述存储器存储有计算机程序或指令;所述处理器在执行所述计算机程序或指令时,使得所述终端执行如权利要求1-8中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括计算机程序,当计算机程序在终端上运行时,使得所述终端执行如权利要求1至8任一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序或指令,当所述计算机程序或指令被终端执行时,实现如上述权利要求1至8中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211246325.9A CN117877051A (zh) | 2022-10-12 | 2022-10-12 | 一种文档检测矫正方法及终端 |
CN202211246325.9 | 2022-10-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024078304A1 true WO2024078304A1 (zh) | 2024-04-18 |
Family
ID=90579865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/120852 WO2024078304A1 (zh) | 2022-10-12 | 2023-09-22 | 一种文档检测矫正方法及终端 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117877051A (zh) |
WO (1) | WO2024078304A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118411729B (zh) * | 2024-07-02 | 2024-09-06 | 山东声通信息科技有限公司 | 一种图片中文字抽取识别处理方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150143237A1 (en) * | 2013-11-21 | 2015-05-21 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method and non-transitory computer-readable medium |
JP2015103916A (ja) * | 2013-11-22 | 2015-06-04 | キヤノン株式会社 | 画像処理装置及び画像処理方法 |
CN108885699A (zh) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | 字符识别方法、装置、存储介质及电子设备 |
CN111353961A (zh) * | 2020-03-12 | 2020-06-30 | 上海合合信息科技发展有限公司 | 一种文档曲面校正方法及装置 |
CN114155546A (zh) * | 2022-02-07 | 2022-03-08 | 北京世纪好未来教育科技有限公司 | 一种图像矫正方法、装置、电子设备和存储介质 |
CN115187995A (zh) * | 2022-07-08 | 2022-10-14 | 北京百度网讯科技有限公司 | 文档矫正方法、装置、电子设备和存储介质 |
-
2022
- 2022-10-12 CN CN202211246325.9A patent/CN117877051A/zh active Pending
-
2023
- 2023-09-22 WO PCT/CN2023/120852 patent/WO2024078304A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150143237A1 (en) * | 2013-11-21 | 2015-05-21 | Canon Kabushiki Kaisha | Image processing apparatus, image processing method and non-transitory computer-readable medium |
JP2015103916A (ja) * | 2013-11-22 | 2015-06-04 | キヤノン株式会社 | 画像処理装置及び画像処理方法 |
CN108885699A (zh) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | 字符识别方法、装置、存储介质及电子设备 |
CN111353961A (zh) * | 2020-03-12 | 2020-06-30 | 上海合合信息科技发展有限公司 | 一种文档曲面校正方法及装置 |
CN114155546A (zh) * | 2022-02-07 | 2022-03-08 | 北京世纪好未来教育科技有限公司 | 一种图像矫正方法、装置、电子设备和存储介质 |
CN115187995A (zh) * | 2022-07-08 | 2022-10-14 | 北京百度网讯科技有限公司 | 文档矫正方法、装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN117877051A (zh) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11115591B2 (en) | Photographing method and mobile terminal | |
US8345106B2 (en) | Camera-based scanning | |
CN109684980B (zh) | 自动阅卷方法及装置 | |
KR101999137B1 (ko) | 카메라를 구비하는 장치의 이미지 처리장치 및 방법 | |
WO2024078304A1 (zh) | 一种文档检测矫正方法及终端 | |
CN111586237B (zh) | 一种图像显示方法及电子设备 | |
CN110869944B (zh) | 使用移动设备读取测试卡 | |
CN110162286B (zh) | 一种图片显示方法及终端 | |
CN113627428A (zh) | 文档图像矫正方法、装置、存储介质及智能终端设备 | |
KR20170028591A (ko) | 컨볼루션 신경망을 이용한 객체 인식 장치 및 방법 | |
US11024016B2 (en) | Image processing apparatus and image processing method thereof | |
EP4310651A1 (en) | Screen capture method and apparatus, electronic device, and readable storage medium | |
CN110431563A (zh) | 图像校正的方法和装置 | |
CN110991457A (zh) | 二维码处理方法、装置、电子设备及存储介质 | |
CN109508713B (zh) | 图片获取方法、装置、终端和存储介质 | |
US9483834B1 (en) | Object boundary detection in an image | |
US20160035062A1 (en) | Electronic apparatus and method | |
CN110163192B (zh) | 字符识别方法、装置及可读介质 | |
CN116994272A (zh) | 一种针对目标图片的识别方法和装置 | |
CN107609446B (zh) | 一种码图识别方法、终端及计算机可读存储介质 | |
US8824732B2 (en) | Apparatus and method for recognizing hand rotation | |
US11417028B2 (en) | Image processing method and apparatus, and storage medium | |
US20240193974A1 (en) | Image processing system, image processing method, and program | |
US20160224854A1 (en) | Information processing apparatus, information processing method, and storage medium | |
JP6973524B2 (ja) | プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23876509 Country of ref document: EP Kind code of ref document: A1 |