WO2024082976A1 - 文本图像的ocr识别方法、电子设备及介质 - Google Patents

文本图像的ocr识别方法、电子设备及介质 Download PDF

Info

Publication number
WO2024082976A1
WO2024082976A1 PCT/CN2023/123403 CN2023123403W WO2024082976A1 WO 2024082976 A1 WO2024082976 A1 WO 2024082976A1 CN 2023123403 W CN2023123403 W CN 2023123403W WO 2024082976 A1 WO2024082976 A1 WO 2024082976A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text image
text
quality assessment
quality
Prior art date
Application number
PCT/CN2023/123403
Other languages
English (en)
French (fr)
Inventor
臧振飞
滕腾
陈玉梅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024082976A1 publication Critical patent/WO2024082976A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content

Definitions

  • the present application relates to the field of image processing, and in particular to an OCR recognition method, electronic device and medium for text images.
  • OCR optical character recognition
  • the embodiments of the present application provide an OCR recognition method, electronic device, and medium for text images, which are used to solve the problems of low recognition efficiency and poor recognition results caused by directly performing OCR recognition on text images in existing technologies.
  • an embodiment of the present application provides an OCR recognition method for a text image, for use in an electronic device, the method comprising:
  • the electronic device performs quality assessment on at least one text image
  • OCR recognition is performed on the at least one text image.
  • an image quality assessment process is added before performing OCR recognition of text images, and the quality assessment results of the text images are displayed to the user after the image quality assessment is completed, and subsequent OCR recognition is performed based on the quality assessment results, thereby improving the recognition efficiency and recognition effect of the text images, thereby improving the user experience.
  • displaying a first mark corresponding to at least one text image in the text image display interface further includes:
  • Quality prompt information corresponding to at least one text image is displayed in the text image display interface.
  • the image quality of the text image after the preliminary image quality assessment is displayed to the user, so that the user can understand the quality of the text image, and the user can have corresponding expectations for the subsequent OCR recognition efficiency and recognition results, thereby improving the user experience of OCR recognition.
  • the quality prompt information includes at least one of the following: image quality assessment dimensions and image quality improvement suggestions.
  • the electronic device performs quality assessment on at least one text image, including:
  • a second mark corresponding to at least one text image is displayed in the text image display interface, where the second mark is used to indicate that image quality assessment is performed on the text image in at least one image quality assessment dimension.
  • the status of image quality assessment of the text image is displayed to the user, so as to avoid the user mistakenly thinking that the OCR recognition process has stopped responding, thereby improving the user experience.
  • the method further includes:
  • the corresponding first mark or the second mark is displayed on the thumbnail of at least one text image in the text image display interface.
  • the quality assessment process or quality assessment result of the text image is displayed to the user in a simple and convenient form, which reduces the computing resources used by the display interface and makes it easier for the user to understand the current situation, thereby improving the user experience.
  • the image quality assessment dimension includes at least one of the following: shooting shake degree, document tilt degree, shadow occlusion degree, focus accuracy and light brightness.
  • the image quality of the text image is evaluated from multiple dimensions that affect OCR recognition, which can comprehensively evaluate the quality status of the text image.
  • performing image quality assessment on a text image in at least one image quality assessment dimension includes:
  • the focus accuracy is evaluated by the gradient function, and the average value of the sum of the gradients of the pixels in the text image in the horizontal and vertical directions is taken as the quality evaluation value of the focus accuracy.
  • the image quality of the text image can be evaluated from the dimension of focus accuracy, thereby improving the accuracy of image quality evaluation.
  • performing image quality assessment on a text image in at least one image quality assessment dimension includes:
  • the quality of shooting jitter is evaluated by the local grayscale variance product function, and the average value of the product of two grayscale differences in the neighborhood of each pixel in the text image is taken as the quality evaluation value of the shooting jitter.
  • the image quality of the text image can be evaluated from the dimension of shooting jitter degree, thereby improving the accuracy of image quality evaluation.
  • performing image quality assessment on a text image in at least one image quality assessment dimension includes:
  • the OTSU algorithm is used to evaluate the quality of shadow occlusion.
  • the image quality of the text image can be evaluated from the dimension of shadow occlusion degree, thereby improving the accuracy of image quality assessment.
  • performing image quality assessment on a text image in at least one image quality assessment dimension includes:
  • the quality of light brightness is evaluated by color space conversion.
  • the image quality of the text image can be evaluated from the dimension of light brightness, thereby improving the accuracy of image quality evaluation.
  • performing image quality assessment on a text image in at least one image quality assessment dimension includes:
  • the document tilt degree is evaluated by straight line detection, and the average of the absolute values of the differences between the horizontal tilt angle of the straight line detected in the text image and the horizontal line and the vertical tilt angle of the straight line detected in the text image and the vertical line is taken as the quality evaluation value of the document tilt degree.
  • the image quality of the text image can be evaluated from the dimension of the document tilt degree, thereby improving the accuracy of image quality evaluation.
  • the first operation includes at least one of the following: real-time image capture and image file selection.
  • the image quality assessment result of the text image is determined according to a quality assessment value of at least one text image in at least one image quality assessment dimension and a corresponding preset quality threshold.
  • the image quality assessment result of the text image can be determined in a simple and easy-to-understand manner without using too many computing resources, reducing the time required for image quality assessment, avoiding users from waiting too long, and improving user experience.
  • the method when the image quality assessment result of the first mark display text image does not meet a preset condition, the method further includes:
  • Image quality improvement is performed on at least one text image whose quality evaluation does not meet a preset condition.
  • the image quality of text images with poor quality assessment can be improved in an automated manner, which can improve the efficiency of the subsequent OCR recognition process and improve the recognition results.
  • improving the image quality of at least one text image with poor quality assessment includes:
  • quality improvement is performed on each image quality assessment dimension of the text image.
  • the quality of text images with poor quality assessment in different quality assessment dimensions can be targetedly improved, which can comprehensively improve the image quality of text images with poor quality assessment and facilitate subsequent OCR recognition.
  • An embodiment of the present application provides an OCR recognition method for text images.
  • the method displays a text image selection interface and receives a first operation input by a user.
  • the electronic device performs quality assessment on at least one text image, and in response to the completion of the image quality assessment of at least one text image, displays a first mark corresponding to at least one text image in the text image display interface.
  • OCR recognition is performed on at least one text image, thereby improving the recognition efficiency and recognition effect of the text image, thereby improving the user experience and avoiding a decline in user experience due to long waiting time and poor recognition effect.
  • an electronic device including:
  • a memory for storing instructions to be executed by one or more processors of the electronic device
  • the processor is one of the processors of the electronic device, and is used to execute the OCR recognition method of text images in the first aspect and any one of the various possible implementations of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, on which instructions are stored.
  • the computer executes the OCR recognition method for text images according to the first aspect and any one of the various possible implementations of the first aspect.
  • an embodiment of the present application provides a computer program product, including a computer program/instruction, which, when executed on a computer, enables the computer to execute the OCR recognition method of text images in the above-mentioned first aspect and any one of the various possible implementations of the first aspect.
  • FIG1 is a schematic diagram showing a scenario of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG2 is a schematic diagram showing a flow chart of a method for evaluating the quality of text images taken with a mobile phone according to some embodiments of the present application.
  • FIG3 is a schematic diagram showing a flow chart of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG. 4 shows a hardware structure diagram of an electronic device for an OCR recognition method for text images according to some embodiments of the present application.
  • FIG5 is a schematic diagram showing a flow chart of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG6 shows a schematic diagram of an implementation architecture of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG. 7 shows a schematic diagram of an interface display process of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG8( a ) and FIG8 ( b ) are schematic diagrams showing a text image selection interface according to some embodiments of the present application.
  • FIG8( c ) to FIG8 ( g ) are schematic diagrams showing a text image display interface according to some embodiments of the present application.
  • FIG. 9 shows a flow chart of an application scenario of an OCR recognition method for text images according to some embodiments of the present application.
  • FIG10 is a schematic flow chart of a method for improving text image quality based on an OCR recognition method for text images according to some embodiments of the present application.
  • FIG11 is a schematic diagram showing a display interface for improving the quality of a text image with poor quality assessment according to some embodiments of the present application.
  • FIG12 shows a hardware structure block diagram of an evaluation device for an OCR recognition method for text images according to some embodiments of the present application.
  • the illustrative embodiments of the present application include, but are not limited to, an OCR recognition method for text images, an electronic device, and a medium.
  • the OCR recognition method of text images of the present application is applicable to the scenario where a user performs OCR recognition on a text image through a mobile device.
  • the existing OCR recognition technology adopts the method of directly performing text recognition on the input text image.
  • the recognition effect is highly dependent on the quality of the input text image. If the quality of the input text image is low, it will lead to poor text recognition effect, waste recognition time and the computing resources required for recognition, and greatly affect the user experience.
  • the embodiment of the present application provides an OCR recognition method for text images, which pre-evaluates the input text image before performing OCR recognition and informs the user when the text image has a problem of poor quality. This can avoid the waste of recognition time and computing resources caused by performing OCR recognition on text images of poor quality, and can also improve user experience.
  • the quality of the text image can be evaluated in real time from the following five image quality evaluation dimensions: shooting shake, document tilt, shadow occlusion, focus accuracy, and light brightness, and the quality of each dimension can be judged to see whether it exceeds the preset threshold. If it exceeds the preset threshold, the quality of the corresponding dimension is determined to be poor, and feedback is then given to the user. This process can identify poor quality text images that may affect the recognition effect before OCR recognition and remind the user.
  • FIG1 is a schematic diagram of a scenario in which an electronic device performs OCR recognition in an OCR recognition method for text images in an embodiment of the present application.
  • the scenario may include an electronic device 100, an evaluation device 200, and a user 300.
  • the electronic device 100 is used to provide an OCR recognition related interface to the user 300, and obtain a text image selected by the user 300 for OCR recognition.
  • the electronic device 100 may perform image quality assessment on the selected text image locally, and display the quality assessment result of the text image to the user 300.
  • the electronic device 100 may send the selected text image to the evaluation device 200, and the evaluation device 200 performs image quality assessment on the received text image, and returns the quality assessment result of the text image to the electronic device 100, and the electronic device 100 displays the quality assessment result of the text image to the user 300 based on the received quality assessment result.
  • the user 300 obtains the OCR recognition related interface by operating the electronic device 100, for example, running an OCR recognition APP.
  • the OCR recognition related interface provided by the electronic device 100 may include multiple interfaces, such as a text image selection interface, a text image display interface, etc.
  • the user 300 can select the text image that he wants to perform OCR recognition on through the text image selection interface, and obtain the image quality assessment result of the text image through the text image display interface.
  • the user 300 can select one text image for OCR recognition or multiple text images for OCR recognition through the text image selection interface, and the embodiment of the present application does not impose any specific limitation on this.
  • the electronic device 100 receives an operation of the user 300, such as a double-click operation of the user 300 on the OCR recognition application icon, and displays an OCR recognition related interface such as a text image selection interface according to the received user operation, and displays a thumbnail of the selected text image on the text image display interface according to the image selection operation of the user 300.
  • the electronic device 100 can display an image quality assessment status mark on the thumbnail of the text image, as shown in the display interface (a), which includes thumbnails of 5 text images selected by the user 300, and a quality assessment status mark is displayed on each thumbnail, indicating that these 5 text images are in the image quality assessment state.
  • the electronic device 100 obtains the quality assessment result of the text image locally, and in other embodiments, the electronic device 100 receives the quality assessment result from the assessment device 200. After obtaining the quality assessment result, the electronic device 100 displays the quality assessment result of the selected text image to the user 300, so that the user 300 can understand the quality assessment of the selected text image.
  • the quality assessment result can be a qualitative result, such as “good”, “poor”, etc., or a quantitative result, such as “90”, "75”, etc., and the embodiments of the present application do not impose specific limitations on this.
  • the electronic device 100 marks the quality assessment result of the corresponding text image according to the quality assessment result. If the quality assessment result is a qualitative result, the quality assessment result is directly marked on the text image according to the directional result. For example, if the quality assessment result is "good”, the electronic device 100 can display a " ⁇ " mark on the thumbnail of the text image. If the quality assessment result is "poor", the electronic device 100 can display an " ⁇ ” mark on the thumbnail of the text image. If the quality assessment result is a quantitative result, the quantitative result is compared with the preset quality threshold of the corresponding dimension. If it is greater than the preset quality threshold, the electronic device 100 can display a " ⁇ " mark on the thumbnail of the text image. If it is less than the preset quality threshold, the electronic device 100 can display an " ⁇ " mark on the thumbnail of the text image.
  • the electronic device 100 can display a quality assessment result mark on the thumbnail of the text image.
  • a quality assessment result mark As shown in the display interface (b), among the quality assessment results of the five text images, three text images have good quality and two text images have poor quality.
  • the thumbnails corresponding to the text images with good quality have a " ⁇ " mark, and the thumbnails corresponding to the text images with poor quality have an " ⁇ " mark.
  • the user 300 can decide whether to perform OCR recognition on the text images with poor quality based on the image quality assessment result.
  • the OCR recognition effect of the text images with poor quality may be poor.
  • the user 300 can decide to still perform OCR recognition on the text images with poor quality, or decide to delete the text images with poor quality in the selection interface and only perform OCR recognition on the text images with good quality.
  • the electronic device 100 can automatically start OCR recognition for text images with good image quality assessment results, and can perform OCR recognition on text images with poor image quality assessment results after confirmation by the user 300, or can not perform OCR recognition based on the user 300's cancellation of recognition behavior.
  • the evaluation device 200 may be used to receive the selected text image sent by the electronic device 100, perform image quality evaluation on the selected text image, and then return the image quality evaluation result to the electronic device 100. Further, the evaluation device 200 may perform image quality evaluation on the selected text image from multiple image quality evaluation dimensions, determine the quality evaluation result of the selected text image in the corresponding dimension, and return the quality evaluation result to the electronic device 100.
  • the evaluation device 200 may be other electronic devices capable of establishing a communication connection with the electronic device 100, for example, a tablet computer, a personal computer, a server, etc. that communicates with the electronic device 100 via a near field connection, and the present embodiment of the application does not impose specific limitations on this.
  • the electronic device 100 and the evaluation device 200 can be the same electronic device or different electronic devices, and the embodiment of the present application does not impose any specific limitation on this.
  • the electronic device 100 in the embodiment of the present application may include but is not limited to mobile phones, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, laptop computers, ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (PDA), etc.
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computers
  • PDA personal digital assistants
  • a quality assessment of the text image is performed before OCR recognition is performed on the text image, and a quality assessment result mark is displayed after the quality assessment of the text image is completed, thereby enabling image quality assessment to be performed in advance before OCR recognition of the text image, displaying the image quality assessment result of the text image to the user, and performing OCR recognition on text images with better image quality assessment results, and determining whether to perform OCR recognition on text images with poor image quality assessment results based on user feedback, thereby avoiding wasting computing resources and excessively long recognition time due to OCR recognition of text images with poor image quality, and also improving the OCR recognition efficiency and OCR recognition quality of text images, thereby improving user experience.
  • the user 300 can collect text images for OCR recognition through an image acquisition device of the electronic device 100, such as a camera.
  • the electronic device 100 provides the user 300 with an interface for text image acquisition, and the user 300 collects text images through the image acquisition interface.
  • the electronic device 100 obtains the collected text images, performs image quality assessment on the collected text images, and then performs OCR recognition on the text images with good quality assessment results.
  • the process of image quality assessment and OCR recognition of the collected text images by the electronic device 100 is referred to the above solution, and will not be repeated here.
  • a method is adopted to sample the pictures taken by mobile phone in blocks, randomly extract the sample blocks for OCR recognition, and use the confidence of the text recognition result on each sample block as the basis for text image quality assessment.
  • the method for assessing the quality of text images taken by mobile phone photos includes the following steps: Step S201, OCR sampling and recognition, sampling the document pictures taken by the mobile phone, randomly extracting n image blocks, each image block as a sampling area for OCR recognition, and obtaining the confidence of the recognized characters in the sampling area; Step S202, calculating the confidence, calculating the confidence of the mobile phone photo file according to the confidence of each recognized character obtained by the sampling area recognition in the previous step; Step S203, image quality evaluation, obtaining the image quality value from the pre-stored image quality-confidence query table according to the sampling area confidence and attaching the judgment result of the image quality.
  • a method is adopted for stratifying the clarity of text images, and calculating and evaluating the image clarity at different levels.
  • the text image quality assessment method, apparatus, device and medium includes the following steps: step S301, receiving a text image to be assessed, wherein the text image to be assessed includes a medical text image or a text image involved in insurance business; step S302, determining a target text image corresponding to the received text image to be assessed according to a preset ideal image size; step S303, inputting the target text image into a pre-trained text quality assessment model, and determining a first clarity value corresponding to each pixel in the text area of the target text image; step S304, determining an average value of the first clarity value corresponding to each pixel in the text area of the target text image as a second clarity value of the target text image, and determining an average value of the second clarity value of each target text image as a third clarity value of the text image to be assessed; step S305, determining
  • the present application scheme can perform real-time and rapid text image quality assessment on a variety of high-frequency disturbance factors that affect the image quality of text images, and quantify the assessment results of each dimension.
  • it provides a seamless interactive process and real-time feedback on the evaluation results, which can significantly improve the focus and efficiency of text image quality assessment, provide users with a text image quality assessment with good experience and comprehensive information, and indirectly improve the accuracy of subsequent OCR recognition.
  • FIG4 shows a schematic diagram of the structure of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processor
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Different processing units may be independent devices or integrated into one or more processors.
  • the controller can generate operation control signals according to the instruction operation code and timing signal to complete the control of instruction fetching and execution.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or cyclically used. If the processor 110 needs to use the instruction or data again, it may be directly called from the memory. This avoids repeated access, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse pulse code modulation (PCM) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and/or universal serial bus (USB) interface, etc.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple groups of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may be coupled to the touch sensor 180K through the I2C interface, so that the processor 110 communicates with the touch sensor 180K through the I2C bus interface, thereby realizing the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 can include multiple I2S buses.
  • the processor 110 can be coupled to the audio module 170 via the I2S bus to achieve communication between the processor 110 and the audio module 170.
  • the audio module 170 can transmit an audio signal to the wireless communication module 160 via the I2S interface to achieve the function of answering a call through a Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 can be coupled via a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 via the PCM interface to realize the function of answering calls via a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit an audio signal to the wireless communication module 160 through the UART interface to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193.
  • the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate via the CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate via the DSI interface to implement the display function of the electronic device 100.
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, etc.
  • the GPIO interface can also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically can be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and a peripheral device. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration and does not constitute a structural limitation on the electronic device 100.
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from a wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. While the charging management module 140 is charging the battery 142, it may also power the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, battery health status (leakage, impedance), etc.
  • the power management module 141 can also be set in the processor 110.
  • the power management module 141 and the charging management module 140 can also be set in the same device. middle.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas.
  • antenna 1 can be reused as a diversity antenna for a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc.
  • the mobile communication module 150 may receive electromagnetic waves from the antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 may also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the processor 110.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the same device as at least some of the modules of the processor 110.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the electronic device 100.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared
  • the wireless communication module 160 can be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the frequency of the electromagnetic wave signal and performs filtering processing, and sends the processed signal to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency of the signal, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS) and/or a satellite based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation system
  • the electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Mini-LED, Micro-LED, Micro-OLED, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can be connected to the ISP, the camera 193, the video codec, the GPU, the display 194 and the application processor. etc. to realize the shooting function.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converts it into an image visible to the naked eye.
  • the ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. The ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP can be set in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CMOS complementary metal oxide semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to be converted into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • the DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital videos.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • applications such as intelligent cognition of electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music and videos can be stored in the external memory card.
  • the internal memory 121 can be used to store computer executable program codes, which include instructions.
  • the internal memory 121 may include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the data storage area may store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.), etc.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.
  • the speaker 170A also called a "speaker" is used to convert an audio electrical signal into a sound signal.
  • the electronic device 100 can listen to music or listen to a hands-free call through the speaker 170A.
  • the receiver 170B also called a "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by putting their mouth close to microphone 170C to input the sound signal into microphone 170C.
  • the electronic device 100 can be provided with at least one microphone 170C. In other embodiments, the electronic device 100 can be provided with two microphones 170C, which can not only collect sound signals but also realize noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the sound source, realize directional recording function, etc.
  • the earphone interface 170D is used to connect a wired earphone and can be a USB interface 130 or a 3.5 mm open mobile terminal platform (OMTP) standard interface or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A can be set on the display screen 194.
  • a capacitive pressure sensor can be a parallel plate including at least two conductive materials.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes i.e., x, y, and z axes
  • the gyro sensor 180B can be used for anti-shake shooting. For example, when the shutter is pressed, the gyro sensor 180B detects the angle of the electronic device 100 shaking, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shaking of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the electronic device 100 when the electronic device 100 is a flip phone, the electronic device 100 can detect the opening and closing of the flip cover according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, the flip cover can be automatically unlocked.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in all directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of the electronic device and is applied to applications such as horizontal and vertical screen switching and pedometers.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outward through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode and pocket mode to automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket to prevent accidental touches.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photography, fingerprint call answering, etc.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of a processor located near the temperature sensor 180J to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 180K is also called a "touch control device”.
  • the touch sensor 180K can be set on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a "touch control screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K can also be set on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can obtain vibration signals. In some embodiments, the bone conduction sensor 180M can obtain vibration signals of the vibrating bones of the human body. The bone conduction sensor 180M can also contact the human pulse to receive blood pressure beating signals. In some embodiments, the bone conduction sensor 180M can also be set in the earphones to form bone conduction earphones.
  • the audio module 170 can The voice signal is parsed based on the vibration signal of the vocal bone obtained by the bone conduction sensor 180M to realize the voice function.
  • the application processor can parse the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M to realize the heart rate detection function.
  • the key 190 includes a power key, a volume key, etc.
  • the key 190 may be a mechanical key or a touch key.
  • the electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100.
  • Motor 191 can generate vibration prompts.
  • Motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • touch operations acting on different areas of the display screen 194 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminders, receiving messages, alarm clocks, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power changes, messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to and separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195.
  • the electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, and the like. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the electronic device 100 uses an eSIM, i.e., an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the execution subject of the OCR recognition method of text images may be a processor of the electronic device 100, and may include the following steps:
  • Step S501 Acquire a text image.
  • the text image refers to an image including various texts, and the texts may include but are not limited to: English characters, numbers, Chinese characters, Japanese characters and other characters in different languages, mathematical symbols and the like.
  • the text image may be an image captured by the user through the electronic device 100 , for example, an image captured by the user through a camera or other optical sensor in a mobile phone, tablet computer or other mobile text recognition device.
  • the electronic device 100 may obtain a text image that needs to be OCR recognized according to an image selection operation of the user.
  • the image selection operation of the user may include but is not limited to: taking a photo with the electronic device 100, selecting an image in a local album of the electronic device 100, selecting a file including an image such as a PDF file or a DOC file from a file directory, etc.
  • the electronic device 100 can obtain the text image from the image acquisition cache of the camera or other sensors, or from a local storage device, such as a secure digital card (Secure Digital, SD) that stores local photo albums, or from an external storage device, such as a mobile hard disk, etc.
  • a local storage device such as a secure digital card (Secure Digital, SD) that stores local photo albums, or from an external storage device, such as a mobile hard disk, etc.
  • SD Secure Digital
  • an external storage device such as a mobile hard disk, etc.
  • the electronic device 100 obtains the text image selected by the user through the application interface.
  • the electronic device 100 provides the user with an interface for selecting a text image for OCR recognition, and the user selects the text image desired for OCR recognition through the image selection interface, and the electronic device 100 obtains the text image according to the image selection operation of the user in the selection interface.
  • Step S502 pre-process the text image to determine the target text image.
  • the electronic device 100 can pre-process the acquired text image based on the user's display confirmation operation, such as based on the user's click operation on the image quality assessment button on the image selection interface, or it can pre-process the text image in real time after acquiring the text image without the need for user confirmation operation.
  • the embodiments of the present application do not impose specific restrictions on this.
  • the electronic device 100 performs image morphological processing on the acquired text image to determine the range of the text area of the text image.
  • image morphology refers to a series of image processing technologies that process image shape features. It is an image analysis discipline based on topology. The basic idea is to use a special structural element to measure or extract the corresponding shape or feature in the input image for further image analysis and target recognition.
  • Image morphological operations may include but are not limited to: erosion, dilation, opening operation, closing operation, image transformation, etc.
  • Image transformation may include but is not limited to geometric transformation, scale transformation, etc.
  • Geometric transformations of images include translation, rotation, mirroring, transposition, etc.
  • Scale transformations of images include scaling and interpolation, etc.
  • Image erosion is used to delete some pixels at the edge of the image, which has the effect of shrinking the image and can eliminate image edges and noise.
  • Figure Image dilation is used to add some pixels to the image boundary, which has the effect of expanding the image.
  • Image opening is equivalent to performing an erosion operation on the image first and then a dilation operation, which can eliminate discrete points and burrs and separate two objects.
  • Image closing is equivalent to performing an erosion operation on the image first and then a dilation operation, which can fill the internal holes and concave corners of the image and connect two adjacent objects.
  • the electronic device 100 intercepts the range of the text area of the text image as the target text image.
  • only part of the image area in the text image may include text.
  • the electronic device 100 identifies the range of the text area from the text image through detection, takes the range of the text area as the region of interest, and takes the region of interest as the target text image for subsequent image quality assessment.
  • the region of interest (ROI) is an image area selected from the entire image that has key analysis value.
  • Step S503 performing a quality assessment of the focus accuracy of the target text image.
  • steps S503 to S507 respectively evaluate the quality of the target text image from different image quality evaluation dimensions: focus accuracy, shooting jitter, shadow occlusion, light brightness and document tilt, and determine the quality evaluation result of the target text image in each dimension.
  • the quality assessment result of the target text image in each dimension is expressed using an assessment score, and the units of the assessment scores can be different.
  • the assessment score of the document tilt degree of the target text image is 10°
  • the assessment score of the shadow occlusion degree is 0.4, etc.
  • the quality assessment of the target text image can be performed using any one of steps S503 to S507, that is, the quality assessment of the target text image is performed from one image quality assessment dimension, or the quality assessment can be performed using any two or more of steps S503 to S507, that is, the quality assessment of the target text image is performed from two or more image quality assessment dimensions.
  • the embodiment of the present application does not impose any specific restrictions on this.
  • the two or more quality assessment steps can be executed in parallel or in serial, and the embodiments of the present application do not impose specific restrictions on this.
  • the quality assessment of the focus accuracy can be performed based on the gradients of the target text image in the horizontal and vertical directions to determine the assessment result of the focus accuracy.
  • the average value of the sum of the gradients of the pixels in the target text image in the horizontal and vertical directions is used as the evaluation result of the focus accuracy, that is, the gradient value of each pixel in the horizontal and vertical directions is calculated, and the gradient is calculated by traversing all the pixels in the target text image to obtain the sum of the gradient values of all the pixels, and then divided by the number of pixels in the target text image to obtain the average gradient value of the pixels in the target text image, and the average gradient value is determined as the quality assessment value of the focus accuracy of the target text image.
  • a target text image with better focus accuracy has a sharper edge, that is, has a larger gradient in a certain direction.
  • the focus accuracy of the target text image can be quality evaluated.
  • the Tenengrad gradient function is used to calculate the gradient of the pixel points of the target text image.
  • the Tenengrad function uses the Sobel operator to extract the gradient values in the horizontal and vertical directions. The specific formula is as follows:
  • the gradient S(x, y) of the target text image I at the pixel point (x, y) is defined as follows:
  • Gx and Gy are Sobel convolution kernels.
  • the Tenengrad function value Ten of the target text image is defined as follows:
  • n is the number of all pixels in the target text image.
  • the calculated Ten value is used as the quality evaluation value of the target text image in the dimension of focus accuracy.
  • Step S504 performing a quality assessment of the degree of shooting jitter on the target text image.
  • the image quality of the target text image will be affected by the degree of hand shaking of the user during shooting.
  • the text in the image will produce ghosting.
  • the greater the hand shake the lower the quality of the captured image.
  • the image quality evaluation of the shooting jitter degree is performed by using the local grayscale variance product function (SMD2) to determine the evaluation result of the shooting jitter degree. Specifically, the two grayscale differences of each pixel neighborhood in the target text image are multiplied and then accumulated pixel by pixel.
  • SMD2 local grayscale variance product function
  • D(f) is the image quality evaluation value of the shooting jitter degree
  • f(x,y) is the grayscale value of the pixel point (x,y) of the target text image
  • f(x+1,y) is the grayscale value of the pixel point (x+1,y) to the right of the pixel point (x,y) of the target text image
  • f(x,y+1) is the grayscale value of the pixel point (x,y+1) to the bottom of the pixel point (x,y) of the target text image.
  • the target text image can be first converted into a grayscale image, and a convolution operation is performed on the grayscale image of the target text image through a preset convolution kernel.
  • the size of the convolution kernel here can be preset, such as 3*3 or 5*5, etc.
  • the convolution kernel is defined by the user according to his own needs. After the convolution operation, the image convolution matrix corresponding to the grayscale image is obtained, and the absolute value of the image convolution matrix is calculated and multiplied, so as to calculate the mean value of the grayscale image on channel 0, and the obtained mean value is used as the quality assessment value of the degree of shooting jitter.
  • Step S505 Perform a quality assessment of the shadow occlusion degree of the target text image.
  • the shadow occlusion range in the target text image is too large, it will also affect the image quality, making it difficult for the subsequent OCR recognition to proceed smoothly.
  • the image quality of the target text image can also be evaluated.
  • the image quality assessment of the shadow occlusion degree is performed by using the Otsu algorithm (OTSU) to determine the assessment result of the shadow occlusion degree.
  • Otsu algorithm is an algorithm for binarizing an image proposed by Japanese scholar OTSU in 1979, which can divide the original image into two images, a foreground image and a background image, by using a threshold.
  • the target text image can be converted into a grayscale image through the OTSU algorithm, and the grayscale image can be binarized to obtain the foreground image and background image of the grayscale image, and the histogram of the foreground image and the background image can be calculated. Then, the pixel value of channel 0 can be calculated according to the histogram, and the ratio of the pixel value of channel 0 to the number of pixels of the target text image can be determined as the quality evaluation value of the shadow occlusion degree.
  • the processing logic can be simplified by calculating the local and global pixel connected domains of the grayscale image of the target text image, thereby reducing the amount of calculation for a single target text image.
  • the region of interest can be randomly cropped from the grayscale image and the histogram of the region of interest can be calculated.
  • the first three high-frequency RGB pixel intervals are then selected, and the pixel point areas in the region of interest distributed in the high-frequency RGB pixel intervals are divided into connected domains with approximate RGB values, and are processed as connected domain units in the subsequent shadow occlusion assessment.
  • Step S506 Performing a quality assessment of the light brightness of the target text image.
  • the quality evaluation of light brightness is performed by converting the color space of the target text image to determine the evaluation result of light brightness.
  • the target text image can be converted from the RGB color space to the HSL color space, and the L component in the HSL color space is determined according to the average value, maximum value and minimum value of the three channels R, G and B in the RGB color space, and the obtained L component is used as the quality evaluation result of light brightness.
  • RGB color space is the most commonly used color space.
  • An image is represented by three channels, namely red (R), green (G) and blue (B). Different linear combinations of these three colors can form almost all other colors.
  • the three channels of RGB color space are closely related to brightness, that is, as long as the brightness changes, the three channels will change accordingly. Therefore, RGB color space is a color space with poor uniformity. It is suitable for display systems but not for image processing.
  • HSL color space also uses three channels to represent an image, namely hue (H), saturation (S) and brightness (L). Among them, brightness L of 100 represents white, and brightness L of 0 represents black.
  • the processing logic can be simplified by performing local and global pixel connected domain calculations on the grayscale image of the target text image, thereby reducing the amount of calculation for a single target text image.
  • Step S507 Performing a quality assessment of the degree of document tilt on the target text image.
  • the image quality assessment of the document tilt is performed based on the average of the absolute values of the differences between the horizontal tilt angle of the straight line detected in the target text image and the vertical tilt angle of the straight line and the vertical line to determine the assessment result of the document tilt.
  • a region of interest is determined randomly in the target text image, and straight line detection is performed in the region of interest. After the straight line is detected, the horizontal inclination angle between the straight line and the horizontal line and the vertical inclination angle between the straight line and the vertical line are calculated respectively, and the absolute value of the difference between the two inclination angles is calculated. Every time a straight line is detected, the absolute value of the difference between the horizontal inclination angle of the straight line and the horizontal line and the vertical inclination angle of the straight line and the vertical line is calculated.
  • the sum of the absolute values of the differences between the horizontal inclination angles and the vertical inclination angles of all detected straight lines is calculated, and then the sum of the absolute values is divided by the number of all detected straight lines to obtain the average value of the absolute values of the differences between the horizontal inclination angles and the vertical inclination angles of the straight lines.
  • the average value is determined as the quality evaluation value of the target text image in the document tilt dimension.
  • Step S508 Determine the quality assessment result of the target text image.
  • each quality assessment value is compared with the corresponding preset threshold value, so as to obtain the quality status of the target text image in each image quality assessment dimension.
  • the preset threshold value in each image quality assessment dimension is obtained according to a pre-conducted independent perturbation analysis test, and the preset threshold value can be a critical value suitable for OCR recognition scenarios that is manually screened after classifying, distinguishing and evaluating a large number of real text images, such as a document inclination threshold of 8°, a shadow threshold of 0.6, etc.
  • the OCR recognition method of the text image can be implemented as two parts: background processing and interface display.
  • the background processing is used to perform quality assessment on the text image
  • the interface display is used to display the quality assessment status when performing quality assessment in the background, so that the user has an intuitive feeling.
  • Figure 6 shows the implementation architecture of the OCR recognition method of the text image in some embodiments of the present application.
  • the background processing includes a data preprocessing module and a quality assessment module.
  • the data preprocessing module is used to complete the functions of image scale transformation, morphological preprocessing, text area detection, and region of interest capture.
  • the quality assessment module is used to complete the quality assessment of the degree of focus accuracy, the quality assessment of the degree of shooting jitter, the quality assessment of the degree of shadow occlusion, the quality assessment of the degree of light brightness, and the quality assessment of the degree of document tilt.
  • the interaction process between the data preprocessing module and the quality assessment module in the background processing is as described in the above steps S501 to S508, which will not be repeated here.
  • the evaluation result display includes the display of the text image selection interface, the display of the quality assessment process, and the display of the evaluation results. The following describes the process of interface display.
  • FIG. 7 shows an interface display process of the OCR recognition method for text images in some embodiments of the present application.
  • the execution subject of the interface display process of the OCR recognition method for text images in some embodiments of the present application may be a processor of the electronic device 100, and may include the following steps:
  • Step S701 Displaying a text image selection interface.
  • the electronic device 100 displays a text image selection interface to the user, and the text image selection interface is used to provide the user with a graphical interface for performing text image selection operations.
  • the user selects one or more text images that are desired to be OCR recognized through operations on the text image selection interface.
  • FIG8(a) shows a text image selection interface in some embodiments of the present application.
  • the text image selection interface provides an add button for text images, and the user can select a text image that is desired to be OCR recognized by clicking or touching the add button.
  • the electronic device 100 provides the user with multiple ways of selecting text images through a text image selection interface, such as photographing the text image through the electronic device 100, selecting the text image through a local photo album, etc.
  • the electronic device 100 displays the text image selected by the user according to the user's image selection operation. Specifically, the electronic device 100 displays a thumbnail of the text image selected by the user.
  • FIG8( b) shows a display interface of a text image selected by the user in some embodiments of the present application. As shown in FIG8( b), the user has selected 4 text images that are desired to be OCR recognized, and the user can use the 4 text images as images for subsequent OCR recognition by clicking the Done button in the interface.
  • Step S702 Displaying a UX mark indicating that quality assessment is in progress on the selected text image.
  • the electronic device 100 performs image quality assessment on the text image selected by the user, which can be performed after detecting in real time that the user has selected the text image, or it can be performed based on the user's confirmation operation for performing the image quality assessment, such as the user clicking the image quality assessment button provided in the interface.
  • the embodiment of the present application does not impose specific restrictions on this.
  • the electronic device 100 when the electronic device 100 detects that the user has selected a text image, the electronic device 100 performs image quality assessment on the text image selected by the user in real time.
  • the image quality assessment of the selected text image can be performed on the electronic device 100, or the electronic device 100 can send the selected text image to the assessment device 200 for assessment, and the embodiment of the present application does not impose any specific limitation on this.
  • the electronic device 100 when the electronic device 100 or the evaluation device 200 performs image quality evaluation on the selected text image, the electronic device 100 displays a corresponding UX mark for each text image, and the UX mark is used to indicate that the text image is undergoing image quality evaluation.
  • UX markup is a graphical representation in UX design that can be used to describe the process or the result of a process.
  • UX design is the design of interactive experiences that deals with the interaction between users and products or services. Unlike User Interface (UI), UX focuses on the process of users solving problems, while UI focuses on the appearance and functionality of the product surface.
  • UI refers to the actual interface of the product, such as the visual design of the screens that users navigate when using a mobile app, or the buttons they click when browsing a website.
  • UX focuses on all the visual and interactive elements of the product interface, covering everything from typography and color palette to animations and navigation touch points (such as buttons and scroll bars).
  • FIG8(c) shows a text image display interface for evaluating the image quality of a selected text image in some embodiments of the present application.
  • a rotating UX mark is displayed on the thumbnails of the five text images selected by the user, and the rotating UX mark indicates that the image quality evaluation is in progress.
  • the corresponding UX mark is displayed on all five thumbnails, indicating that the image quality evaluation of these five text images is in progress and has not been completed.
  • Step S703 Displaying a UX mark indicating a quality evaluation result on the text image after quality evaluation is completed.
  • the electronic device 100 performs image quality assessment on two or more selected text images in a parallel processing manner.
  • image quality assessment by performing image quality assessment in a parallel processing manner, the time for performing image quality assessment on text images can be greatly shortened, the efficiency of performing image quality assessment can be improved, and the reduction in user experience caused by excessive assessment time can be avoided.
  • the time of the entire image quality assessment process can be controlled to be completed within 0.5 seconds.
  • the electronic device 100 displays a corresponding UX mark for the text image whose image quality assessment is completed in real time, and the UX mark is used to indicate the quality assessment result.
  • the quality assessment result can be good or poor, which is indicated by different UX marks.
  • FIG8(d) shows a text image display interface for completing image quality assessment of selected text images in some embodiments of the present application.
  • two text images have completed image quality assessment
  • three text images have not yet completed image quality assessment.
  • the quality assessment result UX corresponding to the two text images that have completed quality assessment is marked as " ⁇ ", indicating that the quality assessment result is good.
  • the electronic device 100 may display prompt information of the image quality assessment according to the user's operation.
  • the image quality assessment of the text image is not completely completed, and the user performs some interface operations such as click or touch operations, click operations on the OCR recognition button, etc. After receiving these operations, the electronic device 100 displays a prompt information to the user that the image quality assessment is still in progress.
  • FIG8(e) shows a text image display interface that reminds the user that image quality assessment is still in progress in some embodiments of the present application.
  • 3 of the 5 text images are still undergoing image quality assessment, and the user performs an OCR recognition operation.
  • the electronic device 100 displays a prompt message "Image quality assessment in progress, please wait” and provides an "Identify now” button for the user to choose to start OCR recognition immediately without waiting for the image quality assessment to be completed.
  • FIG8(f) shows a text image display interface including image quality assessment results of text images in some embodiments of the present application.
  • image quality assessment has been completed for all five text images, and UX marks of corresponding image quality assessment results are displayed respectively, wherein UX marks of three text images indicate good quality assessment results, and UX marks of two text images indicate poor quality assessment results.
  • a UX mark of " ⁇ " indicates that the image quality assessment has been completed and the result of the image quality assessment is good, and a UX mark of " ⁇ " indicates that the image quality assessment has been completed and the result of the image quality assessment is poor, which may indicate that there are problems to be improved in some image quality assessment dimensions.
  • Step S704 displaying quality prompt information of text images with poor quality assessment.
  • the quality prompt information may include, but is not limited to: image quality assessment dimensions where the image quality assessment is poor, image quality improvement suggestions corresponding to the dimensions, and the like.
  • the electronic device 100 displays quality prompt information of a text image with poor quality assessment to the user, so that the user can understand which image quality assessment dimension the text image has a problem in and how to solve the problem, thereby improving the user experience.
  • FIG8(g) shows a text image display interface including quality prompt information of text images with poor quality assessment in some embodiments of the present application.
  • the displayed quality prompt information includes quality prompt information of pictures 2 and 4 with poor quality assessment, wherein picture 2 has poor quality assessment in the dimension of shooting jitter, and the quality improvement suggestion given is to hold the device steady and retake the picture, and picture 4 has poor quality assessment in the dimensions of shadow occlusion and focus accuracy, and the quality improvement suggestion given is to adjust the image quality. Adjust the position and maintain focus before shooting again.
  • a flowchart of an application scenario of an OCR recognition method for text images is also provided. As shown in FIG9 , applying the OCR recognition method for text images to an OCR recognition scenario may include the following steps:
  • Step S901 Displaying a text image selection interface for OCR recognition to the user.
  • the user runs a mobile application (APP) that implements the text image OCR recognition method in this application on the electronic device 100, and the electronic device 100 displays a text image selection interface according to the user's operation on the APP, and the user selects a text image for OCR recognition through the interface.
  • APP mobile application
  • Step S902 Acquire the text image selected by the user through the text image selection interface.
  • the user can shoot a text image through the electronic device 100 or select a text image from a local album. After the user completes the text image selection, the electronic device 100 acquires the text image selected by the user.
  • Step S903 Perform image quality assessment on the text image selected by the user. After acquiring the text image selected by the user, the electronic device 100 performs image quality assessment on the text image from multiple image quality assessment dimensions to determine the quality assessment value of the text image in each image quality assessment dimension.
  • Step S904 Determine the evaluation dimension in which the text image with poor quality evaluation has quality problems.
  • the electronic device 100 compares the quality evaluation value of the text image in each image quality evaluation dimension with the corresponding preset quality evaluation threshold, determines the image quality evaluation dimension that does not meet the preset quality evaluation condition, where the preset quality evaluation condition can be that the quality evaluation value exceeds the preset quality evaluation threshold, etc., and determines the image quality evaluation dimension as the evaluation dimension with quality problems.
  • Step S905 Display the quality assessment result of the text image to the user.
  • the electronic device 100 determines the quality assessment result of the text image as good, and displays a UX mark indicating good quality assessment on the thumbnail of the text image; if the text image does not meet the preset quality assessment conditions in any quality assessment dimension, the electronic device 100 determines the quality assessment result of the text image as poor, and displays a UX mark indicating poor quality assessment on the thumbnail of the text image.
  • Step S906 Determine whether the quality assessment results of all selected text images are good. If so, the electronic device 100 executes step S907; if not, the electronic device 100 executes step S908.
  • Step S907 Perform OCR recognition of the text image.
  • the electronic device 100 directly performs the OCR recognition process on the text image.
  • Step S908 Displaying quality prompt information of the text image with poor quality assessment to the user.
  • the electronic device 100 displays the assessment dimensions of the quality problems of the text image with poor quality assessment and the corresponding quality improvement methods to the user, and provides the user with two options of improving the image quality and forcibly performing OCR recognition on the text image with poor quality.
  • Step S909 Determine whether the received user selection operation is forced recognition. If so, the electronic device 100 executes step S907 to perform OCR recognition of the text image; if not, the electronic device 100 executes step S902, and the user can re-select the text image.
  • a relevant interface for OCR recognition is provided to the user, and the text image for OCR recognition is obtained through the user's operation on the relevant interface, and the quality assessment result is displayed to the user after the quality recognition of the text image is completed.
  • OCR recognition is performed in real time when the quality assessment results of all text images are good.
  • the user is prompted, and corresponding forced recognition or reselection is performed according to the user's operation, thereby indirectly improving the accuracy of subsequent OCR recognition and improving the user experience.
  • a method for improving the quality of text images based on an OCR recognition method for text images comprises the following steps:
  • Step S1001 Obtaining a quality assessment value of a text image with poor quality assessment.
  • the quality assessment value of the text image with poor quality assessment is obtained by executing the OCR recognition method of the text image.
  • the specific implementation of the OCR recognition method of the text image is referred to the above description and will not be repeated here.
  • the quality assessment value obtained for a text image with poor quality assessment is the quality assessment value corresponding to the image quality assessment dimension of the text image that does not meet the preset quality assessment conditions. For example, if the text image has quality problems in the shooting jitter and shadow occlusion dimensions, the quality assessment value of the text image in the shooting jitter and shadow occlusion dimensions is obtained.
  • Step S1002 Determine the quality improvement priority coefficient corresponding to each dimension of the text image.
  • a weighing factor of the quality evaluation value of the text image in each image quality evaluation dimension relative to the comprehensive evaluation score can be determined.
  • the weighing factor can be based on the quality evaluation value of the text image in each image quality evaluation dimension.
  • the quality assessment value on the assessment dimension and the threshold of the corresponding dimension are determined. For example, the difference between the quality assessment value on each image quality assessment dimension and the threshold of the corresponding dimension can be calculated, and then the ratio between the difference and the threshold of the corresponding dimension can be calculated, and the ratio is determined as the measurement factor.
  • each image quality assessment dimension After determining the measurement factors corresponding to each image quality assessment dimension, sort the measurement factors from large to small to determine the degree of influence of each image quality assessment dimension on the comprehensive quality assessment of the text image, and determine the corresponding weight for the corresponding image quality assessment dimension according to the degree of influence. For example, a text image with poor quality assessment has document tilt and shadow occlusion problems, and the measurement factor of the document tilt dimension is greater than the measurement factor of the shadow occlusion dimension.
  • the weights of the two dimensions can be determined based on the ratio between the measurement factor of the document tilt dimension and the measurement factor of the shadow occlusion dimension.
  • the quality improvement priority coefficients corresponding to each image quality assessment dimension are determined.
  • the priority coefficients corresponding to the dimensions are determined from high to low according to the weights corresponding to each image quality assessment dimension, and in the subsequent stage, the priority processing order for image quality improvement of different dimensions can be determined according to the quality improvement priority coefficients of each image quality assessment dimension.
  • Step S1003 Determine the quality improvement intensity coefficient of each dimension according to the quality improvement priority coefficient of each dimension and the comprehensive evaluation score.
  • the quality level of the text image is determined according to the comprehensive evaluation score of the text image, the corresponding relationship between the comprehensive evaluation score and the quality level is predetermined, and the quality improvement intensity coefficients corresponding to the quality improvement priority coefficients in different quality levels are different.
  • the quality level of the text image is determined according to the interval where the comprehensive evaluation score of the text image is located, and the quality level of the text image can be divided into multiple levels, such as level 1, level 2, level 3, etc.
  • the corresponding quality improvement strength coefficient is determined according to the quality improvement priority coefficient of each dimension and the quality level of the text image, and the quality improvement strength coefficient corresponds to the preset quality improvement strength parameter set.
  • the comprehensive evaluation score of the text image corresponds to the highest level 3 of the preset quality levels.
  • the quality improvement strength coefficient corresponding to the document tilt dimension is determined as the quality improvement strength coefficient corresponding to the level 3 quality level
  • the quality improvement strength coefficient corresponding to the shadow occlusion dimension is determined as a quality improvement strength coefficient one level lower than the quality improvement strength coefficient of the document tilt dimension, that is, a quality improvement strength coefficient corresponding to the level 2 quality level.
  • the quality improvement intensity coefficients corresponding to different quality levels correspond to the quality improvement intensity parameter sets of the corresponding quality levels.
  • Step S1004 improving the quality of each dimension of the text image according to the quality improvement strength coefficient of each dimension.
  • the quality of each dimension of the text image can be improved according to the quality improvement priority coefficient of each dimension of the text image and the quality improvement strength coefficient of each dimension.
  • the corresponding dimensions are focus accuracy, light brightness, and document tilt, and the image quality improvement processing is performed on each dimension in the order of focus accuracy, light brightness, and document tilt.
  • the sharpening method can be used to improve the quality of the focus accuracy dimension
  • the equalization method can be used to improve the quality of the light brightness dimension
  • the correction method can be used to improve the quality of the document tilt dimension.
  • the quality improvement of each dimension is performed through a quality improvement intensity parameter corresponding to the quality improvement intensity coefficient.
  • the corresponding preset quality improvement intensity parameter is obtained according to the quality improvement intensity coefficient, and the obtained quality improvement intensity parameter is used to improve the quality of the corresponding dimension of the text image.
  • the mean document tilt angles of multiple batches obtained through random sampling can be first obtained, and the image area in the text image where the document tilt exists can be reversely corrected, with the correction angle being the mean document tilt angle.
  • the corrected text image is then separated from the shadow area and the regular area based on the shadow area coordinates and the average Light component of the area obtained in the shadow occlusion detection, and the percentage of the Light component in the HSL space is increased in the shadow area to improve the brightness, with the percentage ratio determined by the preset parameters of the corresponding level.
  • the global brightness mean difference between the text image after brightness improvement and the original text image is calculated, and the difference between each pixel in the text image after brightness improvement and the original text image is calculated, and the pixels whose Light component difference is greater than the global brightness mean difference are retained, and the processed text image is output as the corrected text image.
  • Step S1005 Acquire the text image after quality improvement.
  • the text image after the image quality is improved through the above steps is obtained, and OCR recognition is performed on the text image after the quality is improved, so as to improve the accuracy of OCR recognition.
  • FIG11 shows a display interface for improving the quality of text images with poor quality assessment in some embodiments of the present application.
  • the image quality assessment of two text images does not meet the preset conditions, that is, the quality assessment results are poor, then the electronic device 100 improves the image quality of the two text images in real time, and displays prompt information for improving the image quality of picture 2 and picture 4 to the user.
  • FIG12 shows a hardware structure block diagram of an evaluation device 200 for an OCR recognition method for text images according to some embodiments of the present application.
  • the evaluation device 200 may include one or more processors 201, a system control logic 202 connected to at least one of the processors 201, a system memory 203 connected to the system control logic 202, a non-volatile memory (NVM) 204 connected to the system control logic 202, and a network interface 206 connected to the system control logic 202.
  • NVM non-volatile memory
  • processor 201 may include one or more single-core or multi-core processors. In some embodiments, processor 201 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In an embodiment where evaluation device 200 employs an enhanced Node B (eNB) or a radio access network (RAN) controller, processor 201 may be configured to execute various compliant embodiments. For example, processor 201 may be used to implement an OCR recognition method for text images.
  • eNB enhanced Node B
  • RAN radio access network
  • system control logic 202 may include any suitable interface controller to provide any suitable interface to any suitable device or component in at least one of processor 201 that communicates with system control logic 202 .
  • the system control logic 202 may include one or more memory controllers to provide an interface to the system memory 203.
  • the system memory 203 may be used to load and store data and/or instructions.
  • the system memory 203 may load text image data in the embodiments of the present application.
  • system memory 203 of the evaluation device 200 may include any suitable volatile memory, such as a suitable dynamic random access memory (DRAM).
  • DRAM dynamic random access memory
  • the NVM memory 204 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the NVM memory 204 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as at least one of a hard disk drive (HDD), a compact disc (CD) drive, and a digital versatile disc (DVD) drive.
  • the NVM memory 204 may be used to store text image files, etc.
  • NVM storage 204 may include a portion of storage resources on the device on which evaluation device 200 is installed, or it may be accessible to the device but not necessarily a portion of the device. For example, NVM storage 204 may be accessed over a network via network interface 206 .
  • system memory 203 and the NVM memory 204 may respectively include: a temporary copy and a permanent copy of the instruction 205.
  • the instruction 205 may include: when executed by at least one of the processors 201, causing the evaluation device 200 to implement the text image pre-processing operation of the method shown in FIG5, etc.
  • the instruction 205, hardware, firmware and/or software components thereof may be additionally/alternatively placed in the system control logic 202, the network interface 206 and/or the processor 201.
  • the network interface 206 may include a transceiver for providing a radio interface for the evaluation device 200, and then communicating with any other suitable device (such as a front-end module, an antenna, etc.) through one or more networks.
  • the network interface 206 may be integrated with other components of the evaluation device 200.
  • the network interface 206 may be integrated with at least one of the processor 201, the system memory 203, the NVM memory 204, and a firmware device (not shown) with instructions.
  • the evaluation device 200 implements the method shown in the method embodiment.
  • the network interface 206 can be used to receive text images, etc. sent by an electronic device.
  • the network interface 206 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface.
  • the network interface 206 may be a network adapter, a wireless network adapter, a telephone modem and/or a wireless modem.
  • At least one of the processors 201 may be packaged with the logic of one or more controllers for the system control logic 202 to form a system in a package (SiP). In some embodiments, at least one of the processors 201 may be integrated with the logic of one or more controllers for the system control logic 202 on the same die to form a system on chip (SoC).
  • SoC system on chip
  • the evaluation device 200 may further include an input/output (I/O) device 207.
  • the I/O device 207 may include a user interface, The user is enabled to interact with the evaluation device 200; the peripheral component interface is designed so that the peripheral components can also interact with the evaluation device 200.
  • the evaluation device 200 further includes a sensor for determining at least one of an environmental condition and location information related to the evaluation device 200.
  • the user interface may include, but is not limited to, a display (e.g., an LCD display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., a still image camera and/or a video camera), a flashlight (e.g., an LED flash), and a keyboard.
  • a display e.g., an LCD display, a touch screen display, etc.
  • a speaker e.g., a speaker
  • a microphone e.g., a microphone
  • one or more cameras e.g., a still image camera and/or a video camera
  • a flashlight e.g., an LED flash
  • the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
  • the sensors may include, but are not limited to, gyroscope sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units.
  • the positioning unit may also be part of the network interface 206 or interact with the network interface 206 to communicate with components of the positioning network (e.g., Beidou satellites).
  • the structure shown in FIG12 does not constitute a specific limitation on the evaluation device 200.
  • the evaluation device 200 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented by hardware or software, or a combination of software and hardware.
  • the various embodiments of the mechanism disclosed in the present application can be implemented in hardware, software, firmware or a combination of these implementation methods.
  • the embodiments of the present application can be implemented as a computer program or program code executed on a programmable system, which includes at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device and at least one output device.
  • Program code may be applied to input instructions to perform the functions described herein and to generate output information.
  • the output information may be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • Program code can be implemented with high-level programming language or object-oriented programming language to communicate with the processing system.
  • program code can also be implemented with assembly language or machine language.
  • the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried by or stored on one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors.
  • the instructions may be distributed over a network or through other computer-readable media.
  • a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical discs, CD-ROMs, magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, EEPROMs, magnetic or optical cards, flash memory, or a tangible machine-readable storage medium for transmitting information using the Internet in electrical, optical, acoustic, or other forms of propagation signals (e.g., carrier waves, infrared signals, digital signals, etc.).
  • a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
  • a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules.
  • the physical implementation method of these logical units/modules themselves is not the most important.
  • the combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application.
  • the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

本申请涉及图像处理领域,公开了一种文本图像的OCR识别方法、电子设备及介质。其中,该方法包括显示文本图像选择界面,接收用户输入的第一操作,响应于第一操作,电子设备对至少一张文本图像进行质量评估,响应于至少一张文本图像的图像质量评估完成,在文本图像显示界面中显示至少一张文本图像对应的第一标记,再根据至少一张文本图像的图像质量评估结果,对至少一张文本图像进行OCR识别,能够提高文本图像的识别效率和识别效果,进而提高用户体验,避免用户因为等待时间过长和识别效果不佳带来的体验下降。

Description

文本图像的OCR识别方法、电子设备及介质
本申请要求2022年10月21日提交中国专利局、申请号为202211298105.0、申请名称为“文本图像的OCR识别方法、电子设备及介质”的中国专利申请的优先权,上述申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,特别涉及一种文本图像的OCR识别方法、电子设备及介质。
背景技术
在进行光学字符识别(Optical Character Recognition,OCR)的场景中,进行识别的文本图像的质量在很大程度上影响OCR文字识别的效果。高质量的文本图像可以使OCR文字识别的识别结果更加准确,而低质量的文本图像会导致OCR文字识别的识别结果偏离预期,白白耗费识别的时间和识别使用的算力,还极度影响OCR文字识别的用户体验。
发明内容
本申请实施例提供了一种文本图像的OCR识别方法、电子设备及介质,用于解决现在技术中直接进行文本图像的OCR识别导致的识别效率低和识别结果不佳的问题。
第一方面,本申请实施例提供了一种文本图像的OCR识别方法,用于电子设备,该方法包括:
显示文本图像选择界面;
接收用户输入的第一操作,第一操作用于在文本图像选择界面选择进行OCR识别的至少一张文本图像,
响应于第一操作,电子设备对至少一张文本图像进行质量评估;
响应于至少一张文本图像的图像质量评估完成,在文本图像显示界面中显示至少一张文本图像对应的第一标记,第一标记用于表示文本图像的图像质量评估结果;
根据至少一张文本图像的图像质量评估结果,对至少一张文本图像进行OCR识别。
可以理解,通常对文本图像进行OCR识别时,尤其是在对多张文本图像进行OCR识别的场景中,质量不佳的文本图像对OCR识别过程的时间和识别效果有很大影响,如果不对文本图像进行预先的质量评估而直接进行OCR识别会导致识别时间很长,而且在耗费很多的计算资源后识别效果不佳,导致大量计算资源的浪费。然而通过本申请的文本图像的OCR识别方法,能够在对文本图像进行OCR识别之前进行图像质量评估,避免对质量不佳的文本图像进行OCR识别,从而能够提高文本图像的OCR识别效率、避免计算资源的浪费,还能够提高用户体验,避免用户因为等待时间过长和识别效果不佳带来的体验下降。
通过上述方法,在进行文本图像的OCR识别前增加了图像质量评估过程,并在图像质量评估完成后向用户显示文本图像的质量评估结果,并根据质量评估结果进行后续的OCR识别,从而提高文本图像的识别效率和识别效果,进而提高用户体验。
在上述第一方面的一种可能的实现中,在文本图像显示界面中显示至少一张文本图像对应的第一标记,还包括:
在文本图像显示界面中显示至少一张文本图像对应的质量提示信息。
通过上述方法,向用户展示对文本图像进行预先图像质量评估后的图像质量情况,让用户了解文本图像的质量,使得用户可以对后续的OCR识别效率和识别结果产生相应预期,从而提升OCR识别的用户体验。
在上述第一方面的一种可能的实现中,质量提示信息至少包括如下一种:图像质量评估维度、图像质量改进建议。
通过上述方法,使得用户了解对文本图像进行图像质量评估的相关依据和图像质量改进方向,提升用户体验。
在上述第一方面的一种可能的实现中,电子设备对至少一张文本图像进行质量评估,包括:
在文本图像显示界面中显示至少一张文本图像对应的第二标记,第二标记用于表示在至少一个图像质量评估维度上对文本图像进行图像质量评估。
通过上述方法,向用户展示正在对文本图像进行图像质量评估的状态,避免用户误以为OCR识别过程停止响应,能够提升用户体验。
在上述第一方面的一种可能的实现中,该方法还包括:
在文本图像显示界面中至少一张文本图像的缩略图上显示对应的第一标记或第二标记。
通过上述方法,以简单、方便的形式向用户展示文本图像的质量评估过程或质量评估结果,减少显示界面使用的计算资源的同时方便用户了解当前状况,提升了用户体验。
在上述第一方面的一种可能的实现中,图像质量评估维度至少包括如下一种:拍摄抖动程度、文档倾斜程度、阴影遮挡程度、对焦准确程度和光线明亮程度。
通过上述方法,从影响进行OCR识别的多个维度对文本图像进行图像质量评估,能够全面评价文本图像的质量状况。
在上述第一方面的一种可能的实现中,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
通过梯度函数对对焦准确程度进行质量评估,将文本图像中像素点在水平和垂直方向上的梯度之和的平均值作为对焦准确程度的质量评估值。
通过上述方法,能够从对焦准确程度维度对文本图像的图像质量进行评估,提升了图像质量评估的准确性。
在上述第一方面的一种可能的实现中,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
通过局部灰度方差乘积函数对拍摄抖动程度进行质量评估,将文本图像中每个像素邻域两个灰度差的乘积的平均值作为拍摄抖动程度的质量评估值。
通过上述方法,能够从拍摄抖动程度维度对文本图像的图像质量进行评估,提升了图像质量评估的准确性。
在上述第一方面的一种可能的实现中,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
通过OTSU算法对阴影遮挡程度进行质量评估。
通过上述方法,能够从阴影遮挡程度维度对文本图像的图像质量进行评估,提升了图像质量评估的准确性。
在上述第一方面的一种可能的实现中,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
通过颜色空间转换对光线明亮程度进行质量评估。
通过上述方法,能够从光线明亮程度维度对文本图像的图像质量进行评估,提升了图像质量评估的准确性。
在上述第一方面的一种可能的实现中,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
通过直线检测对文档倾斜程度进行质量评估,将文本图像中检测到的直线与水平线的水平倾角和与垂直线的垂直倾角之差的绝对值的平均值作为文档倾斜程度的质量评估值。
通过上述方法,能够从文档倾斜程度维度对文本图像的图像质量进行评估,提升了图像质量评估的准确性。
在上述第一方面的一种可能的实现中,第一操作至少包括如下一种:实时图像拍摄、图像文件选择。
通过上述方法,能够支持用户以多种形式确定要进行OCR识别的文本图像,提升了用户选择文本图像时的方便性。
在上述第一方面的一种可能的实现中,文本图像的图像质量评估结果根据至少一张文本图像在至少一个图像质量评估维度上的质量评估值和对应的预设质量阈值所确定。
通过上述方法,能够以简单、易于理解的方式确定文本图像的图像质量评估结果,无需使用过多的计算资源,减少图像质量评估所需的时间,避免用户等待时间过长,能够提升用户体验。
在上述第一方面的一种可能的实现中,在第一标记显示文本图像的图像质量评估结果不满足预设条件的情况下,该方法还包括:
对质量评估不满足预设条件的至少一个文本图像进行图像质量改善。
通过上述方法,可以自动化方式对质量评估不佳的文本图像进行图像质量改善,能够提升后续的OCR识别过程的效率和改善识别结果。
在上述第一方面的一种可能的实现中,对质量评估不佳的至少一个文本图像进行图像质量改善,包括:
获取质量评估不佳的至少一个文本图像在图像质量评估维度上的质量评估值;
确定至少一个文本图像的各图像质量评估维度对应的质量改善优先级系数,其中,质量改善优先级系数用于进行质量改善时所述图像质量评估维度的优先处理顺序;
根据各图像质量评估维度的质量改善优先级系数和综合评估得分确定各图像质量评估维度的质量改善强度系数;
根据各图像质量评估维度的质量改善强度系数,对文本图像的各图像质量评估维度进行质量改善。
通过上述方法,可以对质量评估不佳的文本图像在不同质量评估维度上的质量情况进行针对性质量改善,能够全面提升质量评估不佳的文本图像的图像质量,便于后续的OCR识别。
本申请实施例中提供了文本图像的OCR识别方法,该方法通过显示文本图像选择界面,并接收用户输入的第一操作,响应于第一操作,电子设备对至少一张文本图像进行质量评估,并响应于至少一张文本图像的图像质量评估完成,在文本图像显示界面中显示至少一张文本图像对应的第一标记,再根据至少一张文本图像的图像质量评估结果,对至少一张文本图像进行OCR识别,从而能够提高文本图像的识别效率和识别效果,进而提高用户体验,避免用户因为等待时间过长和识别效果不佳带来的体验下降。
第二方面,本申请实施例提供了一种电子设备,包括:
存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及
处理器,是电子设备的处理器之一,用于执行上述第一方面以及第一方面的各种可能实现中的任意一种文本图像的OCR识别方法。
第三方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质上存储有指令,该指令在计算机上执行时使计算机执行上述第一方面以及第一方面的各种可能实现中的任意一种文本图像的OCR识别方法。
第四方面,本申请实施例提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令在计算机上执行时使计算机执行上述第一方面以及第一方面的各种可能实现中的任意一种文本图像的OCR识别方法。
附图说明
图1根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的场景示意图。
图2根据本申请的一些实施例,示出了一种手机拍照文本图像质量评估方法的流程示意图。
图3根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的流程示意图。
图4根据本申请的一些实施例,示出了一种用于文本图像的OCR识别方法的电子设备的硬件结构图。
图5根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的流程示意图。
图6根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的实现架构示意图。
图7根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的界面显示流程示意图。
图8(a)和图8(b)根据本申请的一些实施例,示出了一种文本图像选择界面的示意图。
图8(c)至图8(g)根据本申请的一些实施例,示出了一种文本图像显示界面的示意图。
图9根据本申请的一些实施例,示出了一种文本图像的OCR识别方法的应用场景的流程图。
图10根据本申请的一些实施例,示出了一种基于文本图像的OCR识别方法的文本图像质量改善方法的流程示意图。
图11根据本申请的一些实施例,示出了一种对质量评估不佳的文本图像进行质量改善的显示界面示意图。
图12根据本申请一些实施例,示出了一种用于文本图像的OCR识别方法的评估设备的硬件结构框图。
具体实施方式
本申请的说明性实施例包括但不限于文本图像的OCR识别方法、电子设备及介质。
可以理解,本申请的文本图像的OCR识别方法适用于用户通过移动端设备对文本图像进行OCR识别的场景。
如前所述,现有的OCR识别技术中,采用的是直接对输入的文本图像进行文字识别的方式,识别效果极大依赖输入的文本图像的质量,如果输入的文本图像的质量较低,会导致文字识别效果不佳,浪费识别时间和识别所需的计算资源,还极大影响用户体验。
为了解决该问题,本申请实施例提供了一种文本图像的OCR识别方法,在进行OCR识别之前对输入的文本图像进行预评估,在文本图像存在质量不佳的问题时对用户进行告知。可以避免对质量不佳的文本图像进行OCR识别导致的识别时间和计算资源的浪费,也能够提高用户体验。
进一步的,可以从如下五个图像质量评估维度:拍摄抖动程度、文档倾斜程度、阴影遮挡程度、对焦准确程度和光线明亮程度并行实时对文本图像进行质量评估,判断每个维度的质量评估结果是否超过预设阈值,在超过预设阈值的情况下确定对应维度的质量不佳,进而对用户进行反馈。这样处理,能在进行OCR识别之前识别出可能影响识别效果的质量不佳的文本图像并提醒用户,
下面将结合附图对本申请的实施例作进一步地详细描述。
图1为本申请实施例的文本图像的OCR识别方法中电子设备进行OCR识别的场景示意图。如图1所示,该场景可以包括电子设备100、评估设备200和用户300。电子设备100用于向用户300提供OCR识别相关界面,并获取用户300选择的用于进行OCR识别的文本图像。在一些实施例中,电子设备100可以在本地对选择的文本图像进行图像质量评估,并向用户300展示文本图像的质量评估结果。在另外一些实施例中,电子设备100可以将选择的文本图像发送给评估设备200,由评估设备200对接收的文本图像进行图像质量评估,并将文本图像的质量评估结果返回给电子设备100,电子设备100根据接收的质量评估结果向用户300展示文本图像的质量评估结果。
用户300通过对电子设备100的操作例如运行OCR识别的APP来获得OCR识别相关界面,这里由电子设备100提供的OCR识别相关界面可以包括多个界面,例如文本图像选择界面,文本图像显示界面等,用户300可以通过文本图像选择界面选择想要进行OCR识别的文本图像,并通过文本图像显示界面获知文本图像的图像质量评估结果。
可以理解,用户300通过文本图像选择界面可以选择一张进行OCR识别的文本图像,也可以选择多张进行OCR识别的文本图像,本申请实施例对此不做具体限制。
在一些实施例中,电子设备100接收用户300的操作例如用户300对OCR识别应用图标的双击操作,并根据接收的用户操作展示OCR识别相关界面如文本图像选择界面,并根据用户300的图像选择操作在文本图像显示界面上展示选择文本图像的缩略图。在此,电子设备100可以在文本图像的缩略图上显示图像质量评估状态标记,如显示界面(a)所示,该界面包括用户300选择的5张文本图像的缩略图,在每张缩略图上显示有质量评估状态标记,表明这5张文本图像均处于图像质量评估状态中。
在一些实施例中,电子设备100从本地获取文本图像的质量评估结果,在另外一些实施例中,电子设备100从评估设备200接收质量评估结果。电子设备100获取到质量评估结果后,向用户300展示选择文本图像的质量评估结果,让用户300了解选择文本图像的质量评估情况。
可以理解,质量评估结果可以是定性结果,例如“较好”、“不佳”等,也可以是定量结果,例如“90”、“75”等,本申请实施例对此不做具体限制。
在一些实施例中,电子设备100根据质量评估结果对相应的文本图像进行质量评估结果标记。如果质量评估结果为定性结果,则根据定向结果直接对文本图像进行质量评估结果标记,例如质量评估结果为“较好”,则电子设备100可以在文本图像的缩略图上显示“√”标记,质量评估结果为“不佳”,则电子设备100可以在文本图像的缩略图上显示“×”标记。如果质量评估结果为定量结果,则将定量结果与相应维度的预设质量阈值进行比较,大于预设质量阈值则电子设备100可以在文本图像的缩略图上显示“√”标记,小于预设质量阈值则电子设备100可以在文本图像的缩略图上显示“×”标记。
例如,电子设备100可以在文本图像的缩略图上显示质量评估结果标记,如显示界面(b)所示,5张文本图像的质量评估结果中,有3张文本图像质量较好,有2张文本图像质量较差,质量较好的文本图像对应的缩略图上有“√”标记,质量较差的文本图像对应的缩略图上有“×”标记。用户300可以根据图像质量评估结果决定是否对质量不佳的文本图像进行OCR识别,质量不佳的文本图像的OCR识别效果可能较差。例如,用户300看到电子设备100展示的显示界面(b)后,可以决定仍然要对质量不佳的文本图像进行OCR识别,也可以决定在选择界面中删除质量不佳的文本图像,只对质量较好的文本图像进行OCR识别。
在此,电子设备100可以自动开始对图像质量评估结果较好的文本图像进行OCR识别,图像质量评估结果较差的文本图像可以在用户300确认后进行OCR识别,也可以根据用户300的取消识别行为不进行OCR识别。
在一些实施例中,评估设备200可以用于接收电子设备100发送的选择文本图像,并对选择文本图像进行图像质量评估,再将图像质量评估结果返回电子设备100。进一步地,评估设备200可以从多个图像质量评估维度对选择文本图像进行图像质量评估,确定选择文本图像在相应维度上的质量评估结果,并将质量评估结果返回给电子设备100。
可以理解,评估设备200可以是能够与电子设备100建立通信连接的其它电子设备,例如可以是与电子设备100通过近场连接通信的平板电脑、个人计算机、服务器等,本申请实施例对此不做具体限制。
可以理解,电子设备100和评估设备200可以是同一个电子设备,也可以是不同的电子设备,本申请实施例对此不作具体限制。
可以理解,本申请实施例中的电子设备100可以包括但不限于手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本申请实施例对电子设备的具体类型不作具体限制。
本申请的实施例中,在用户选择进行OCR识别的文本图像后,对文本图像进行OCR识别前先进行文本图像的质量评估,并且在文本图像的质量评估完成后显示质量评估结果标记,从而能够对文本图像进行OCR识别之前预先进行图像质量评估,将文本图像的图像质量评估结果展示给用户,并对图像质量评估结果较好的文本图像进行OCR识别,对图像质量评估结果较差的文本图像根据用户的反馈确定是否进行OCR识别,从而能够避免对图像质量不佳的文本图像进行OCR识别导致计算资源浪费和识别时间过长,也能够提高文本图像的OCR识别效率和OCR识别质量,进而提升用户体验。
在另外一些实施例中,用户300可以通过电子设备100的图像采集装置如摄像头采集文本图像进行OCR识别。在此,电子设备100向用户300提供用于文本图像采集的界面,用户300通过图像采集界面采集文本图像,电子设备100获取采集的文本图像,对采集的文本图像进行图像质量评估,再对质量评估结果较好的文本图像进行OCR识别。电子设备100对采集的文本图像进行图像质量评估和OCR识别的过程参见上述方案,在此不再赘述。
在一些实施例中,采用了对手机拍照图片进行分块采样,再随机抽取采样块进行OCR识别,将每个采样块上文字识别的结果置信度作为文本图像质量评估依据的方法。例如,如图2所示的手机拍照文本图像质量评估方法,该专利申请包括如下步骤:步骤S201,OCR采样识别,对手机拍摄的文档图片进行采样,随机抽取n个图像块,每一个图像块作为一个采样区域进行OCR识别,并获得采样区域被识别字符的置信度;步骤S202,计算置信度,根据上一步中采样区域识别获得的每一个识别字符的置信度计算出该手机拍照文件的置信度;步骤S203,图像质量评价,根据采样区域置信度从预存的图像质量-置信度查询表获得图像质量数值并附加上图像质量好坏的判定结果。
在另外一些实施例中,采用了对文本图像的清晰度进行分层,并在不同层级上对图像清晰度进行计算和评估的方法。例如,如图3所示的文本图像质量评估方法、装置、设备及介质,该专利申请包括如下步骤:步骤S301,接收待评估文本图像,其中待评估文本图像包括医疗文本图像或保险业务涉及的文本图像;步骤S302,根据预设的理想图像尺寸,确定接收到的待评估文本图像对应的目标文本图像;步骤S303,将目标文本图像输入到预先训练完成文本质量评估模型,确定目标文本图像的文本区域中每个像素点对应的第一清晰度值;步骤S304,将目标文本图像的文本区域中每个像素点对应的第一清晰度值的平均值,确定为该目标文本图像的第二清晰度值并将每个目标文本图像的第二清晰度值的平均值,确定为待评估文本图像的第三清晰度值;步骤S305,判断第三清晰度值是否大于预设的清晰度阈值,若否,确定待评估文本图像的图像质量不满足条件;步骤S306,输出重新上传图像质量满足条件的文本图像的提示信息。
与前述第一种实施方法和第二种实施方式相比,本申请方案能够针对影响文本图像的图像质量的多种高频扰动因素进行实时快速地文本图像质量评估,并对每个维度的评估结果进行结果量化,同时提供了无感交互流程和评估结果实时反馈等方式,能够显著提升文本图像质量评估的关注范围和关注效率,向用户提供体验良好且信息全面的文本图像质量评估,能够间接提高后续OCR识别的准确率。
图4根据本申请实施例示出了一种电子设备100的结构示意图。如图4所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉 冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件 中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Mini-LED,Micro-LED,Micro-OLED,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器 等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能应用以及数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可 以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
下面结合上述图4所示的结构,根据图5并结合具体场景,详细介绍本申请的技术方案。如图5所示,本申请的一些实施例中文本图像的OCR识别方法的执行主体可以是电子设备100的处理器,并且可以包括如下步骤:
步骤S501:获取文本图像。
在此,文本图像是指包括各种文本的图像,文本可以包括但不限于:英文字符、数字、中文字符、日文字符等不同语言的字符、数学符号等。
在一些实施例中,文本图像可以是用户通过电子设备100采集的图像,例如用户通过手机、平板电脑或其它移动端文字识别设备中的摄像头或其它光传感器采集的图像。
在一些实施例中,电子设备100可以根据用户的图像选择操作获取需要进行OCR识别的文本图像。在此,用户的图像选择操作可以包括但不限于:使用电子设备100拍照、在电子设备100的本地相册选择图像、从文件目录中选择包括图像的文件如PDF文件或DOC文件等。
可以理解,电子设备100获取文本图像可以从摄像头或其它传感器的图像采集缓存中获取,也可以从本地存储设备获取,例如存放本地相册的安全数字卡(Secure Digital,SD),也可以从外部存储设备获取,例如移动硬盘等,本申请实施例对此不作具体限制。
在一些实施例中,电子设备100通过应用界面获取用户选择的文本图像。在此,电子设备100向用户提供用于选择进行OCR识别的文本图像的界面,用户通过图像选择界面选择期望进行OCR识别的文本图像,电子设备100根据用户在选择界面的图像选择操作获取文本图像。
步骤S502:对文本图像进行预处理,确定目标文本图像。
可以理解,电子设备100对获取的文本图像进行预处理,可以根据用户的显示确认操作进行,例如根据图像选择界面上用户对图像质量评估按钮的点击操作进行,也可以在获取文本图像后实时进行预处理,无需用户的确认操作,本申请实施例对此不作具体限制。
在一些实施例中,电子设备100对获取的文本图像进行图像形态学处理,确定文本图像的文本区域所在范围。在此,图像形态学是指一系列处理图像形状特征的图像处理技术,是建立在拓扑学基础上的图像分析学科,基本思想是利用一种特殊的结构元来测量或提取输入图像中相应的形状或特征,以便进一步进行图像分析和目标识别。图像形态学运算可以包括但不限于:腐蚀、膨胀、开运算、闭运算、图像变换等。图像变换可以包括但不限于几何变换、尺度变换等。图像的几何变换例如图像的平移、旋转、镜像、转置等。图像的尺度变换例如图像的缩放、插值等。图像的腐蚀用于删除图像边界的一些像素,具有收缩图像的作用,可以消除图像边缘和杂点。图 像的膨胀用于添加图像边界的一些像素,具有扩大图像的作用。图像的开运算相当于对图像先进行腐蚀运算再进行膨胀运算,可以消除离散点和毛刺,可以将两个物体分开。图像的闭运算相当于对图像先进行膨胀运算再进行腐蚀运算,可以填充图像的内部孔洞和图像的凹角点,可以连接两个邻近的目标。
在一些实施例中,电子设备100截取文本图像的文本区域所在范围作为目标文本图像。在此,文本图像中可以只有部分图像区域包括文本,电子设备100通过检测从文本图像中识别出文本区域所在范围,将文本区域所在范围作为感兴趣区域,将感兴趣区域作为后续进行图像质量评估的目标文本图像。感兴趣区域(Region of Interest,ROI)是从整个图像中选择的一个具有重点分析价值的图像区域。通过截取目标文本图像,可以避免使用整个文本图像进行图像质量评估,减少图像质量评估使用的图像大小,降低图像质量评估所需的计算资源消耗,提高图像质量评估的效率。
步骤S503:对目标文本图像进行对焦准确程度的质量评估。
在此,步骤S503~S507分别从不同的图像质量评估维度:对焦准确程度、拍摄抖动程度、阴影遮挡程度、光线明亮程度和文档倾斜程度对目标文本图像进行质量评估,确定目标文本图像在每个维度上的质量评估结果。
在一些实施例中,目标文本图像在每个维度上的质量评估结果使用评估分数来表示,评估分数的单位可以不同,例如目标文本图像的文档倾斜程度的评估分数为10°,阴影遮挡程度的评估分数为0.4等。
可以理解,对目标文本图像进行质量评估,可以使用步骤S503~S507中的任意一个步骤进行质量评估,即从一个图像质量评估维度对目标文本图像进行质量评估,也可以使用步骤S503~S507中的任意两个或两个以上步骤进行质量评估,即从两个或两个以上图像质量评估维度对目标文本图像进行质量评估,本申请实施例对此不作具体限制。
另外,可以理解,在选择使用的两个或两个以上质量评估步骤后,在执行两个或两个以上质量评估步骤时,可以并行方式执行两个或两个以上质量评估步骤,也可以串行方式执行两个或两个以上质量评估步骤,本申请实施例对此不作具体限制。
在一些实施例中,可以根据目标文本图像在水平和垂直方向上的梯度进行对焦准确程度的质量评估,确定对焦准确程度的评估结果。具体来说,是将目标文本图像中像素点在水平和垂直方向上的梯度之和的平均值作为对焦准确程度的评价结果,即计算每个像素点在水平方向和垂直方向上的梯度值,并遍历目标文本图像中所有的像素点计算梯度,得到所有像素点的梯度值之和,再除以目标文本图像中像素点的个数得到目标文本图像中像素点的平均梯度值,将该平均梯度值确定为目标文本图像的对焦准确程度的质量评估值。
在此,对焦准确程度更好的目标文本图像具有更尖锐的边缘,即在某个方向上具有更大的梯度,通过梯度函数计算目标文本函数中像素点的梯度值,可以对目标文本图像的对焦准确程度进行质量评估。
在一些实施例中,使用Tenengrad梯度函数对目标文本图像的像素点计算梯度。其中,Tenengrad函数使用Sobel算子提取水平和垂直方向的梯度值。具体公式表示如下:
目标文本图像I在像素点(x,y)处的梯度S(x,y)定义如下:
其中,Gx和Gy为Sobel卷积核。
目标文本图像的Tenengrad函数值Ten定义如下:
其中,n为目标文本图像中所有像素点的个数。
在此,将计算得到的Ten值作为目标文本图像在对焦准确程度维度上的质量评估值。
步骤S504:对目标文本图像进行拍摄抖动程度的质量评估。
在此,目标文本图像的图像质量会受到拍摄时用户手部抖动程度的影响,手部抖动会导致图 像中的文本产生重影,手部抖动程度越大,拍摄图像的质量越低,通过检测目标文本图像的拍摄抖动程度,可以对目标文本图像进行图像质量评估。
在一些实施例中,通过局部灰度方差乘积函数(SMD2)进行拍摄抖动程度的图像质量评估,确定拍摄抖动程度的评估结果。具体来说,是对目标文本图像中每个像素邻域两个灰度差相乘后再逐个像素累加,具体使用公式如下:
其中,D(f)为拍摄抖动程度的图像质量评估值,f(x,y)是目标文本图像的像素点(x,y)的灰度值,f(x+1,y)为目标文本图像的像素点(x,y)的正右像素点(x+1,y)的灰度值,f(x,y+1)为目标文本图像的像素点(x,y)的正下像素点(x,y+1)的灰度值。
在具体进行拍摄抖动程度的质量评估时,可以先将目标文本图像转换为灰度图,并通过预设的卷积核对目标文本图像的灰度图像进行卷积操作,这里的卷积核的尺寸可以预先设定,例如3*3或5*5等,卷积核由用户根据自身需求定义。进行卷积操作后得到灰度图像对应的图像卷积矩阵,计算该图像卷积矩阵的绝对值并进行累乘,从而计算得到灰度图像在通道0上的均值,将得到的均值作为拍摄抖动程度的质量评估值。
步骤S505:对目标文本图像进行阴影遮挡程度的质量评估。
在此,目标文本图像中阴影遮挡的范围过大也会对图像质量造成影响,使得后续的OCR识别难以顺利进行。通过检测目标文本图像的阴影遮挡程度,同样可以评估目标文本图像的图像质量。
在一些实施例中,通过大津算法(OTSU)进行阴影遮挡程度的图像质量评估,确定阴影遮挡程度的评估结果。大津算法(OTSU)是由日本学者OTSU于1979年提出的一种对图像进行二值化的算法,可以通过阈值将原始图像分为前景、背景两张图像。
具体来说,可以通过OTSU算法将目标文本图像转换为灰度图像,并对该灰度图像进行二值化处理得到灰度图像的前景图像和背景图像,并计算出前景图像和背景图像的直方图,再根据直方图计算出通道0的像素值,将通道0的像素值与目标文本图像的像素点数量的比值确定为阴影遮挡程度的质量评估值。
在具体进行阴影遮挡程度的质量评估时,可以通过对目标文本图像的灰度图像进行局部和全局像素连通域计算来简化处理逻辑,降低单张目标文本图像的计算量。在计算局部和全局像素连通域时,可以在灰度图像中以随机方式裁剪得到感兴趣区域并计算感兴趣区域的直方图,再选取前三个高频RGB像素区间,将感兴趣区域中分布在高频RGB像素区间的像素点区域划分为具有近似RGB值的连通域,并在后续的阴影遮挡评估中作为连通域单元进行整体处理。
步骤S506:对目标文本图像进行光线明亮程度的质量评估。
在一些实施例中,通过转换目标文本图像的颜色空间进行光线明亮程度的质量评估,确定光线明亮程度的评估结果。具体来说,可以将目标文本图像由RGB颜色空间转换为HSL颜色空间,根据RGB颜色空间中R、G、B三个通道的平均值、最大值和最小值确定HSL颜色空间中的L分量,将得到的L分量作为光线明亮程度的质量评估结果。使用的公式表示如下:
L=(max(R,G,B)+min(R,G,B))/2
RGB颜色空间是最常用的颜色空间,由三个通道表示一幅图像,分别为红色(R),绿色(G)和蓝色(B),这三种颜色的不同线性组合可以形成几乎所有的其它颜色。RGB颜色空间的三个通道都与亮度密切相关,即只要亮度改变,三个通道都会随之相应地改变,因此RGB颜色空间是一种均匀性较差的颜色空间,适用于显示系统,不太适用于图像处理。HSL颜色空间同样用三个通道来表示一幅图像,分别为色相(H),饱和度(S)和亮度(L),其中,亮度L为100表示白色,亮度L为0表示黑色。
类似地,在具体进行光线明亮程度的质量评估时,可以通过对目标文本图像的灰度图像进行局部和全局像素连通域计算来简化处理逻辑,降低单张目标文本图像的计算量。
步骤S507:对目标文本图像进行文档倾斜程度的质量评估。
在一些实施例中,根据目标文本图像中检测到的直线与水平线的水平倾角和与垂直线的垂直倾角之差的绝对值的平均值进行文档倾斜程度的图像质量评估,确定文档倾斜程度的评估结果。
具体来说,是在目标文本图像以随机方式确定感兴趣区域,并在感兴趣区域中进行直线检测,在检测到直线后分别计算该直线与水平线之间的水平倾角和与垂直线之间的垂直倾角,并计算两个倾角之间差值的绝对值,每检测到一根直线就计算一次该直线与水平线的水平倾角和与垂直线的垂直倾角之间差值的绝对值,最终计算所有检测到的直线的水平倾角与垂直倾角之间差值的绝对值的和,再使用该绝对值的和除以所有检测到的直线的数量,即可得到直线的水平倾角与垂直倾角之间差值的绝对值的平均值,将该平均值确定为目标文本图像在文档倾斜程度维度上的质量评估值。
步骤S508:确定目标文本图像的质量评估结果。
在此,通过上述步骤得到目标文本图像在每个图像质量评估维度上的质量评估值后,将每个质量评估值与相对应的预设阈值进行比较,从而得到目标文本图像在每个图像质量评估维度上的质量状况。这里,每个图像质量评估维度上的预设阈值根据预先进行的独立扰动分析试验得到,预设阈值可以是通过对大量真实的文本图像进行分类判别评估后以人工方式筛选出的适用于OCR识别场景的临界值,例如文档倾角阈值为8°,阴影阈值为0.6等。
本申请的实施例中,文本图像的OCR识别方法可以实现为后台处理和界面显示两个部分,后台处理用于对文本图像进行质量评估,界面显示用于在后台进行质量评估时展示质量评估状态,让用户有直观感受。图6示出了本申请的一些实施例中文本图像的OCR识别方法的实现架构。如图6所示,后台处理包括数据预处理模块和质量评估模块。其中,数据预处理模块用于完成图像尺度变换、形态学预处理、文本区域检测、感兴趣区域截取等功能。质量评估模块用于完成对焦准确程度的质量评估、拍摄抖动程度的质量评估、阴影遮挡程度的质量评估、光线明亮程度的质量评估和文档倾斜程度的质量评估。后台处理中数据预处理模块与质量评估模块之间的交互过程如上述步骤S501至步骤S508所述,在此不再赘述。评估结果展示包括对文本图像选择界面的显示,对质量评估过程的展示和评估结果的展示等。以下对界面显示的过程进行描述。
图7示出了本申请的一些实施例中文本图像的OCR识别方法的界面显示流程,如图7所示,本申请的一些实施例中文本图像的OCR识别方法的界面显示流程的执行主体可以是电子设备100的处理器,并且可以包括如下步骤:
步骤S701:展示文本图像选择界面。
在此,电子设备100向用户展示文本图像的选择界面,文本图像的选择界面用于向用户提供进行文本图像选择操作的图形化界面,用户通过文本图像选择界面上的操作选择期望进行OCR识别的一个或多个文本图像。图8(a)示出了本申请的一些实施例中的文本图像选择界面。如图8(a)所示,文本图像选择界面提供了文本图像的添加按钮,用户可以通过对添加按钮的点击或触摸操作来选择期望进行OCR识别的文本图像。
在一些实施例中,电子设备100通过文本图像选择界面向用户提供多种选择文本图像的方式,例如通过电子设备100拍摄文本图像、通过本地相册选取文本图像等。
在一些实施例中,电子设备100根据用户的图像选择操作展示用户选择的文本图像。具体来说,电子设备100展示用户选择的文本图像的缩略图。图8(b)示出了本申请的一些实施例中用户选择的文本图像的展示界面。如图8(b)所示,用户已经选择了4张期望进行OCR识别的文本图像,用户通过点击界面中的完成按钮即可将4张文本图像作为后续OCR识别的图像。
步骤S702:在选择的文本图像上显示表示质量评估进行中的UX标记。
可以理解,电子设备100对用户选择的文本图像进行图像质量评估,可以在实时检测到用户选择了文本图像后进行,也可以根据用户对进行图像质量评估的确认操作进行,例如用户点击界面提供的图像质量评估按钮,本申请实施例对此不作具体限制。
在一些实施例中,电子设备100检测到用户选择了文本图像,则电子设备100实时对用户选择的文本图像进行图像质量评估。
可以理解,对选择的文本图像进行图像质量评估,可以在电子设备100上进行,也可以由电子设备100将选择的文本图像发送至评估设备200上进行,本申请实施例对此不作具体限制。
在一些实施例中,电子设备100或评估设备200对选择的文本图像进行图像质量评估时,电子设备100为每张文本图像显示对应的UX标记,UX标记用于表示该张文本图像正在进行图像质 量评估。在此,UX标记是UX设计中一种图形化表示,可以用于描述处理过程或处理结果。
用户体验(User Experience,UX)设计即UX交互体验设计,用于处理用户与产品或服务之间的交互。与用户界面(User Interface,UI)不同的是,UX专注于用户解决问题的过程,而UI专注于产品表面的外观和功能。在此,UI是指产品的实际界面,例如用户在使用移动应用程序时浏览的屏幕的视觉设计,或在浏览网站时单击的按钮等。UX关注产品界面的所有视觉和交互元素,涵盖从排版和调色板到动画和导航触摸点(如按钮和滚动条)的所有内容。
图8(c)示出了本申请的一些实施例中表示对选择的文本图像进行图像质量评估的文本图像显示界面。如图8(c)所示,在用户选择的5张文本图像的缩略图上显示转圈的UX标记,转圈UX标记表示正在进行图像质量评估,5张缩略图上均显示有相应UX标记,表明这5张文本图像的图像质量评估均在进行中,还没有完成。
步骤S703:在质量评估完成的文本图像上显示表示质量评估结果的UX标记。
在一些实施例中,电子设备100以并行处理方式对选择的两张或两张以上文本图像进行图像质量评估。在此,通过以并行处理方式进行图像质量评估,可以大大缩短对文本图像进行图像质量评估的时间,提高进行图像质量评估的效率,避免评估时间过长导致的用户体验降低。在实际的图像质量评估过程中,可以将整个图像质量评估过程的时间控制在0.5秒以内完成。
由于采用并行处理方式进行两张或两张以上文本图像的图像质量评估,不同的文本图像的质量评估过程的结束时间可以不同。在一些实施例中,电子设备100实时对图像质量评估完成的文本图像显示相应的UX标记,该UX标记用于表示质量评估结果。在此,质量评估结果可以是较好或较差,使用不同的UX标记表示。
图8(d)示出了本申请的一些实施例中选择的部分文本图像完成图像质量评估的文本图像显示界面。如图8(d)所示,在5张进行图像质量评估的文本图像中,有2张文本图像已经完成了图像质量评估,有3张文本图像还没有完成图像质量评估。2张已经完成质量评估的文本图像对应的质量评估结果UX标记为“√”,表示质量评估结果为较好。
在一些实施例中,在对选择的文本图像进行图像质量评估时,电子设备100可以根据用户的操作展示图像质量评估的提示信息。在此,文本图像的图像质量评估没有全部完成,用户进行了一些界面操作如点击或触摸操作、对OCR识别按钮的点击操作等,电子设备100接收到这些操作后,向用户显示图像质量评估还在进行中的提示信息。
图8(e)示出了本申请的一些实施例中提醒用户图像质量评估还在进行中的文本图像显示界面。如图8(e)所示,全部5张文本图像中还有3张图像正在进行图像质量评估,用户进行了进行OCR识别的操作,电子设备100显示“图像质量评估中,请稍候”的提示信息,并提供“立即识别”按钮供用户选择不等待图像质量评估完成立即开始OCR识别。
图8(f)示出了本申请的一些实施例中包括文本图像的图像质量评估结果的文本图像显示界面。如图8(f)所示,5张文本图像均已经完成图像质量评估,并分别显示对应图像质量评估结果的UX标记,其中有3张文本图像的UX标记表示质量评估结果较好,有2张文本图像的UX标记表示质量评估结果较差。其中UX标记为“√”表示图像质量评估已完成,并且图像质量评估的结果为较好,UX标记为“×”表示表示图像质量评估已完成,并且图像质量评估的结果为较差,可能在某些图像质量评估维度上存在待改善问题。
步骤S704:展示质量评估不佳的文本图像的质量提示信息。
在此,质量提示信息可以包括但不限于:图像质量评估不佳的图像质量评估维度、对应该维度的图像质量改进建议等。
在一些实施例中,电子设备100向用户显示质量评估不佳的文本图像的质量提示信息,供用户了解该文本图像在哪个图像质量评估维度上存在问题,及如何解决该问题,从而可以提升用户体验。
图8(g)示出了本申请的一些实施例中包括质量评估不佳的文本图像的质量提示信息的文本图像显示界面。如图8(g)所示,显示的质量提示信息包括对质量评估不佳的图片2和图片4的质量提示信息,其中图片2是在拍摄抖动维度上质量评估不佳,给出的质量改进建议是持稳设备重新拍摄,图片4是在阴影遮挡和对焦准确两个维度上质量评估不佳,给出的质量改进建议是调 整位置和保持对焦后重新拍摄。
本申请的一些实施例中,还提供了一种文本图像的OCR识别方法的应用场景的流程图。如图9所示,将文本图像的OCR识别方法应用到OCR识别场景,可以包括如下步骤:
步骤S901:向用户展示用于OCR识别的文本图像选择界面。在此,用户在电子设备100上运行实现本申请中文本图像的OCR识别方法的移动端应用(APP),电子设备100根据用户在APP上的操作展示文本图像选择界面,用户通过该界面选择进行OCR识别的文本图像。
步骤S902:通过文本图像选择界面获取用户选择的文本图像。在文本图像选择界面中,用户可以通过电子设备100拍摄或从本地相册选择文本图像,用户完成文本图像的选择后,电子设备100获取用户选择的文本图像。
步骤S903:对用户选择的文本图像进行图像质量评估。电子设备100获取到用户选择的文本图像后,从多个图像质量评估维度对文本图像进行图像质量评估,确定文本图像在每个图像质量评估维度上的质量评估值。
步骤S904:确定质量评估不佳的文本图像存在质量问题的评估维度。在此,电子设备100将文本图像在每个图像质量评估维度上的质量评估值与对应的预设质量评估阈值相比较,确定没有达到预设质量评估条件的图像质量评估维度,这里的预设质量评估条件可以是质量评估值超过预设的质量评估阈值等,将该图像质量评估维度确定为存在质量问题的评估维度。
步骤S905:向用户展示文本图像的质量评估结果。在此,文本图像在所有质量评估维度上均满足对应的预设质量评估条件,则电子设备100将该文本图像的质量评估结果确定为较好,并在该文本图像的缩略图上显示表示质量评估较好的UX标记;如果文本图像在任一质量评估维度上不满足预设的质量评估条件,则电子设备100将该文本图像的质量评估结果确定为较差,并在该文本图像的缩略图上显示表示质量评估较差的UX标记。
步骤S906:判断是否所有选择的文本图像的质量评估结果均为较好,如果是,电子设备100执行步骤S907;如果否,则电子设备100执行步骤S908。
步骤S907:进行文本图像的OCR识别。在所有选择的文本图像的质量评估结果均为较好的情况下,电子设备100直接执行对文本图像的OCR识别过程。
步骤S908:向用户展示质量评估不佳的文本图像的质量提示信息。在此,电子设备100向用户展示质量评估不佳的文本图像存在质量问题的评估维度和对应的质量提升方法,并向用户提供进行图像质量提升和强行对质量不佳的文本图像进行OCR识别两种选择操作。
步骤S909:判断接收的用户选择操作是否为强行识别,如果是,电子设备100执行步骤S907进行文本图像的OCR识别;如果否,则电子设备100执行步骤S902,用户可以重新对文本图像进行选择。
在上述应用场景中,向用户提供用于OCR识别的相关界面,通过用户在相关界面上的操作获取进行OCR识别的文本图像,并在文本图像的质量识别完成后向用户展示质量评估结果,在所有文本图像的质量评估结果较好的情况下实时进行OCR识别,在存在文本图像的质量评估结果不佳的情况下对用户进行提示,并根据用户的操作进行相应的强行识别或重新选择,从而能够间接提高后续OCR识别的准确率,提高用户体验。
本申请的一些实施例中,还提供了基于文本图像的OCR识别方法的文本图像质量改善方法。如图10所示,基于文本图像的OCR识别方法的文本图像质量改善方法包括如下步骤:
步骤S1001:获取质量评估不佳的文本图像的质量评估值。
在此,质量评估不佳的文本图像的质量评估值由执行文本图像的OCR识别方法所得到,文本图像的OCR识别方法的具体实现方式参考前述描述,在此不再赘述。
获取的质量评估不佳的文本图像的质量评估值,是该文本图像的不满足预设质量评估条件的图像质量评估维度所对应的质量评估值,例如文本图像在拍摄抖动和阴影遮挡维度上存在质量问题,则获取该文本图像在在拍摄抖动和阴影遮挡维度上的质量评估值。
步骤S1002:确定文本图像的各维度对应的质量改善优先级系数。
获取文本图像的质量评估值之后,可以确定文本图像在各图像质量评估维度上的质量评估值相对于综合评估得分的衡量因子。在一些实施例中,衡量因子可以根据文本图像在各个图像质量 评估维度上的质量评估值与对应维度的阈值来确定。例如,可以计算各个图像质量评估维度上的质量评估值与对应维度的阈值之间的差值,再计算该差值与对应维度的阈值之间的比值,将该比值确定为衡量因子。
确定各图像质量评估维度对应的衡量因子后,将衡量因子按照从大到小的顺序进行排序,从而确定各图像质量评估维度对文本图像的综合质量评估的影响程度,并根据影响程度大小为对应的图像质量评估维度确定对应的权重。例如,质量评估不佳的文本图像存在文档倾斜和阴影遮挡问题,文档倾斜维度的衡量因子大于阴影遮挡维度的衡量因子,则可以根据文档倾斜维度的衡量因子与阴影遮挡维度的衡量因子之间的比值确定两个维度所占的权重,
再根据不同图像质量评估维度对应的权重,确定各图像质量评估维度对应的质量改善优先级系数。例如,根据各图像质量评估维度对应权重由高到低确定对应于维度的优先级系数,在后续阶段可以根据各图像质量评估维度的质量改善优先级系数决定对不同维度进行图像质量改善时的优先处理顺序。
步骤S1003:根据各维度的质量改善优先级系数和综合评估得分确定各维度的质量改善强度系数。
在此,首先根据文本图像的综合评估得分确定文本图像的质量等级,综合评估得分与质量等级之间的对应关系预先确定,不同的质量等级中质量改善优先级系数对应的质量改善强度系数不同。例如,根据文本图像的综合评估得分所在的区间确定文本图像的质量等级,文本图像的质量等级可以划分为多个等级,例如1级、2级、3级等。
随后,根据各维度的质量改善优先级系数和文本图像的质量等级确定相应的质量改善强度系数,质量改善强度系数与预设的质量改善强度参数集合对应。例如,文本图像的综合评估得分对应预设的质量等级中最高的3级,由于该文本图像的文档倾斜维度对应的权重和质量改善优先级系数大于阴影遮挡维度对应的权重和质量改善优先级系数,因此将文档倾斜维度对应的质量改善强度系数确定为对应3级质量等级的质量改善强度系数,将阴影遮挡维度对应的质量改善强度系数确定为比文档倾斜维度的质量改善强度系数低一级的质量改善强度系数,即对应2级质量等级的质量改善强度系数。
在此,对应不同质量等级的质量改善强度系数与相应质量等级的质量改善强度参数集合相对应。
步骤S1004:根据各维度的质量改善强度系数,对文本图像的各维度进行质量改善。
在此,可以根据文本图像各维度的质量改善优先级系数和各维度的质量改善强度系数,对文本图像的各维度进行质量改善。例如,将质量改善优先级系数从高到低进行排序后,相应维度为对焦准确、光线明暗、文档倾斜,则按照对焦准确、光线明暗、文档倾斜的先后顺序对各维度进行图像质量改善的处理。
另外,对各维度进行质量改善的处理方法也存在不同,例如对对焦准确维度进行质量改善可以使用锐化的方法,对光线明亮维度进行质量改善可以使用均衡化的方法,对文档倾斜维度进行质量改善可以使用校正的方法等。
具体来说,对各维度进行质量改善,是通过与质量改善强度系数对应的质量改善强度参数进行,根据质量改善强度系数获取对应的预设质量改善强度参数,并使用得到的质量改善强度参数对文本图像的相应维度进行质量改善。
例如,确定文本图像需要进行文档倾斜维度上的3级质量改善,阴影遮挡维度上的2级质量改善,则可以首先获取多批次通过随机采样得到的文档倾角均值,对文本图像中存在文档倾斜的图像区域进行逆向校正,校正角度为该文档倾角均值;再对校正后的文本图像根据阴影遮挡检测中得到的阴影区域坐标和区域平均Light分量均值进行阴影区域和常规区域的分离,对阴影区域进行HSL空间中Light分量的百分比提升以提升明暗度,提升的百分比比例由对应等级的预设参数确定;再计算明暗度提升后的文本图像与原始文本图像的全局明暗均值差,并计算明暗度提升后的文本图像与原始文本图像中每个像素点之间的差值,保留Light分量的差值大于全局明暗均值差的像素点,再将处理得到的文本图像作为校正后文本图像输出。
步骤S1005:获取质量改善后的文本图像。
在此,获取经过前述步骤进行图像质量改善后的文本图像,并对质量改善后的文本图像进行OCR识别,从而可以提高OCR识别的准确性。
图11示出了本申请的一些实施例中对质量评估不佳的文本图像进行质量改善的显示界面。如图11所示,5张文本图像中,有2张文本图像的图像质量评估不满足预设条件,即质量评估结果不佳,则电子设备100实时对这2张文本图像进行图像质量改善,并向用户展示对图片2和图片4进行图像质量改善的提示信息。
图12根据本申请的一些实施例,示出了一种用于文本图像的OCR识别方法的评估设备200的硬件结构框图。在图12所示的实施例中,评估设备200可以包括一个或多个处理器201,与处理器201中的至少一个连接的系统控制逻辑202,与系统控制逻辑202连接的系统内存203,与系统控制逻辑202连接的非易失性存储器(Non-Volatile Memory,NVM)204,以及与系统控制逻辑202连接的网络接口206。
在一些实施例中,处理器201可以包括一个或多个单核或多核处理器。在一些实施例中,处理器201可以包括通用处理器和专用处理器(例如,图形处理器,应用处理器,基带处理器等)的任意组合。在评估设备200采用增强型基站(Evolved Node B,eNB)或无线接入网(Radio Access Network,RAN)控制器的实施例中,处理器201可以被配置为执行各种符合的实施例。例如,处理器201可以用于实现文本图像的OCR识别方法。
在一些实施例中,系统控制逻辑202可以包括任意合适的接口控制器,以向处理器201中至少一个与系统控制逻辑202通信的、任意合适的设备或组件提供任意合适的接口。
在一些实施例中,系统控制逻辑202可以包括一个或多个存储器控制器,以提供连接到系统内存203的接口。系统内存203可以用于加载以及存储数据和/或指令。例如,系统内存203可以加载本申请实施例中的文本图像数据。
在一些实施例中评估设备200的系统内存203可以包括任意合适的易失性存储器,例如合适的动态随机存取存储器(Dynamic Random Access Memory,DRAM)。
NVM存储器204可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性的计算机可读介质。在一些实施例中,NVM存储器204可以包括闪存等任意合适的非易失性存储器和/或任意合适的非易失性存储设备,例如硬盘驱动器(Hard Disk Drive,HDD),光盘(Compact Disc,CD)驱动器,数字通用光盘(Digital Versatile Disc,DVD)驱动器中的至少一个。在本申请实施例中,NVM存储器204可以用于存储文本图像文件等。
NVM存储器204可以包括安装评估设备200的装置上的一部分存储资源,或者它可以由设备访问,但不一定是设备的一部分。例如,可以经由网络接口206通过网络访问NVM存储器204。
特别地,系统内存203和NVM存储器204可以分别包括:指令205的暂时副本和永久副本。指令205可以包括:由处理器201中的至少一个执行时导致评估设备200实施如图5所示的方法的文本图像预处理操作等。在一些实施例中,指令205、硬件、固件和/或其软件组件可另外地/替代地置于系统控制逻辑202,网络接口206和/或处理器201中。
网络接口206可以包括收发器,用于为评估设备200提供无线电接口,进而通过一个或多个网络与任意其他合适的设备(如前端模块,天线等)进行通信。在一些实施例中,网络接口206可以集成于评估设备200的其他组件。例如,网络接口206可以集成于处理器201的,系统内存203,NVM存储器204,和具有指令的固件设备(未示出)中的至少一种,当处理器201中的至少一个执行所述指令时,评估设备200实现如方法实施例中示出的方法。在本申请实施例中,网络接口206可以用于接收电子设备发送的文本图像等。
网络接口206可以进一步包括任意合适的硬件和/或固件,以提供多输入多输出无线电接口。例如,网络接口206可以是网络适配器,无线网络适配器,电话调制解调器和/或无线调制解调器。
在一些实施例中,处理器201中的至少一个可以与用于系统控制逻辑202的一个或多个控制器的逻辑封装在一起,以形成系统封装(System In a Package,SiP)。在一些实施例中,处理器201中的至少一个可以与用于系统控制逻辑202的一个或多个控制器的逻辑集成在同一管芯上,以形成片上系统(System on Chip,SoC)。
评估设备200可以进一步包括:输入/输出(I/O)设备207。I/O设备207可以包括用户界面, 使得用户能够与评估设备200进行交互;外围组件接口的设计使得外围组件也能够与评估设备200交互。在一些实施例中,评估设备200还包括传感器,用于确定与评估设备200相关的环境条件和位置信息的至少一种。
在一些实施例中,用户界面可包括但不限于显示器(例如,液晶显示器,触摸屏显示器等),扬声器,麦克风,一个或多个相机(例如,静止图像照相机和/或摄像机),手电筒(例如,发光二极管闪光灯)和键盘。
在一些实施例中,外围组件接口可以包括但不限于非易失性存储器端口、音频插孔和电源接口。
在一些实施例中,传感器可包括但不限于陀螺仪传感器,加速度计,近程传感器,环境光线传感器和定位单元。定位单元还可以是网络接口206的一部分或与网络接口206交互,以与定位网络的组件(例如,北斗卫星)进行通信。
可以理解的是,图12示意的结构并不构成对评估设备200的具体限定。在本申请另外一些实施例中评估设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以由硬件或软件,或软件和硬件的组合实现。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(Digital Signal Processor,DSP)、微控制器、专用集成电路(Application Specific Integrated Circuit,ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将 一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (18)

  1. 一种文本图像的OCR识别方法,用于电子设备,其特征在于,该方法包括:
    显示文本图像选择界面;
    接收用户输入的第一操作,其中,所述第一操作用于在所述文本图像选择界面选择进行OCR识别的至少一张文本图像,
    响应于所述第一操作,所述电子设备对所述至少一张文本图像进行质量评估;
    响应于所述至少一张文本图像的图像质量评估完成,在文本图像显示界面中显示所述至少一张文本图像对应的第一标记,所述第一标记用于表示文本图像的图像质量评估结果;
    根据所述至少一张文本图像的图像质量评估结果,对所述至少一张文本图像进行OCR识别。
  2. 根据权利要求1所述的方法,其特征在于,在所述文本图像显示界面中显示所述至少一张文本图像对应的第一标记,还包括:
    在所述文本图像显示界面中显示所述至少一张文本图像对应的质量提示信息。
  3. 根据权利要求2所述的方法,其特征在于,所述质量提示信息至少包括如下一种:
    图像质量评估维度、图像质量改进建议。
  4. 根据权利要求1所述的方法,其特征在于,所述电子设备对所述至少一张文本图像进行质量评估,包括:
    在所述文本图像显示界面中显示所述至少一张文本图像对应的第二标记,所述第二标记用于表示在至少一个图像质量评估维度上对文本图像进行图像质量评估。
  5. 根据权利要求4所述的方法,其特征在于,该方法还包括:
    在所述文本图像显示界面中所述至少一张文本图像的缩略图上显示对应的所述第一标记或所述第二标记。
  6. 根据权利要求4所述的方法,其特征在于,所述图像质量评估维度至少包括如下一种:拍摄抖动程度、文档倾斜程度、阴影遮挡程度、对焦准确程度和光线明亮程度。
  7. 根据权利要求4所述的方法,其特征在于,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
    通过梯度函数对对焦准确程度进行质量评估,将文本图像中像素点在水平和垂直方向上的梯度之和的平均值作为对焦准确程度的质量评估值。
  8. 根据权利要求4所述的方法,其特征在于,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
    通过局部灰度方差乘积函数对拍摄抖动程度进行质量评估,将文本图像中每个像素邻域两个灰度差的乘积的平均值作为拍摄抖动程度的质量评估值。
  9. 根据权利要求4所述的方法,其特征在于,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
    通过OTSU算法对阴影遮挡程度进行质量评估。
  10. 根据权利要求4所述的方法,其特征在于,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
    通过颜色空间转换对光线明亮程度进行质量评估。
  11. 根据权利要求4所述的方法,其特征在于,在至少一个图像质量评估维度上对文本图像进行图像质量评估,包括:
    通过直线检测对文档倾斜程度进行质量评估,将文本图像中检测到的直线与水平线的水平倾角和与垂直线的垂直倾角之差的绝对值的平均值作为文档倾斜程度的质量评估值。
  12. 根据权利要求1所述的方法,其特征在于,所述第一操作至少包括如下一种:
    实时图像拍摄、图像文件选择。
  13. 根据权利要求1所述的方法,其特征在于,所述文本图像的图像质量评估结果,根据所述至少一张文本图像在至少一个图像质量评估维度上的质量评估值和对应的预设质量阈值所确定。
  14. 根据权利要求1所述的方法,其特征在于,在所述第一标记显示文本图像的图像质量评估结果不满足预设条件的情况下,该方法还包括:
    对质量评估不满足预设条件的所述至少一个文本图像进行图像质量改善。
  15. 根据权利要求14所述的方法,其特征在于,对质量评估不佳的所述至少一个文本图像进行图像质量改善,包括:
    获取质量评估不佳的所述至少一个文本图像在所述图像质量评估维度上的质量评估值;
    确定所述至少一个文本图像的各所述图像质量评估维度对应的质量改善优先级系数,其中,所述质量改善优先级系数用于进行质量改善时所述图像质量评估维度的优先处理顺序;
    根据各所述图像质量评估维度的质量改善优先级系数和综合评估得分确定各所述图像质量评估维度的质量改善强度系数;
    根据各所述图像质量评估维度的质量改善强度系数,对文本图像的各所述图像质量评估维度进行质量改善。
  16. 一种电子设备,其特征在于,包括:
    存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及
    处理器,是电子设备的处理器之一,用于执行权利要求1-15中任一项所述的文本图像的OCR识别方法。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有指令,该指令在计算机上执行时使计算机执行权利要求1-15中任一项所述的文本图像的OCR识别方法。
  18. 一种计算机程序产品,包括计算机程序/指令,其特征在于,该计算机程序/指令在计算机上执行时使计算机执行权利要求1-15中任一项所述的文本图像的OCR识别方法。
PCT/CN2023/123403 2022-10-21 2023-10-08 文本图像的ocr识别方法、电子设备及介质 WO2024082976A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211298105.0A CN117953508A (zh) 2022-10-21 2022-10-21 文本图像的ocr识别方法、电子设备及介质
CN202211298105.0 2022-10-21

Publications (1)

Publication Number Publication Date
WO2024082976A1 true WO2024082976A1 (zh) 2024-04-25

Family

ID=90736862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/123403 WO2024082976A1 (zh) 2022-10-21 2023-10-08 文本图像的ocr识别方法、电子设备及介质

Country Status (2)

Country Link
CN (1) CN117953508A (zh)
WO (1) WO2024082976A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978578A (zh) * 2015-04-21 2015-10-14 深圳市前海点通数据有限公司 手机拍照文本图像质量评估方法
US20160092754A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Identifying image transformations for improving optical character recognition quality
CN106296690A (zh) * 2016-08-10 2017-01-04 北京小米移动软件有限公司 图像素材的质量评估方法及装置
CN107644415A (zh) * 2017-09-08 2018-01-30 众安信息技术服务有限公司 一种文本图像质量评估方法及设备
CN115131789A (zh) * 2021-03-26 2022-09-30 华为技术有限公司 文字识别方法、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160092754A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Identifying image transformations for improving optical character recognition quality
CN104978578A (zh) * 2015-04-21 2015-10-14 深圳市前海点通数据有限公司 手机拍照文本图像质量评估方法
CN106296690A (zh) * 2016-08-10 2017-01-04 北京小米移动软件有限公司 图像素材的质量评估方法及装置
CN107644415A (zh) * 2017-09-08 2018-01-30 众安信息技术服务有限公司 一种文本图像质量评估方法及设备
CN115131789A (zh) * 2021-03-26 2022-09-30 华为技术有限公司 文字识别方法、设备及存储介质

Also Published As

Publication number Publication date
CN117953508A (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
US11798162B2 (en) Skin detection method and electronic device
WO2020134877A1 (zh) 一种皮肤检测方法及电子设备
WO2021078001A1 (zh) 一种图像增强方法及装置
WO2021052111A1 (zh) 图像处理方法及电子装置
CN114140365B (zh) 基于事件帧的特征点匹配方法及电子设备
CN114089932B (zh) 多屏显示方法、装置、终端设备及存储介质
CN113542580B (zh) 去除眼镜光斑的方法、装置及电子设备
EP4280586A1 (en) Point light source image detection method and electronic device
WO2020015149A1 (zh) 一种皱纹检测方法及电子设备
CN113452969A (zh) 图像处理方法和装置
CN116389884B (zh) 缩略图显示方法及终端设备
EP3813012B1 (en) Skin spot detection method and electronic device
US20230014272A1 (en) Image processing method and apparatus
CN114283195B (zh) 生成动态图像的方法、电子设备及可读存储介质
EP4310769A1 (en) Image processing method and related device
US20230419562A1 (en) Method for Generating Brush Effect Picture, Image Editing Method, Device, and Storage Medium
WO2022062985A1 (zh) 视频特效添加方法、装置及终端设备
WO2022033344A1 (zh) 视频防抖方法、终端设备和计算机可读存储介质
WO2024082976A1 (zh) 文本图像的ocr识别方法、电子设备及介质
CN114527903A (zh) 一种按键映射方法、电子设备及系统
WO2024078275A1 (zh) 一种图像处理的方法、装置、电子设备及存储介质
CN116522400B (zh) 图像处理方法和终端设备
US20240236504A9 (en) Point light source image detection method and electronic device
CN117666810A (zh) 一种输入方法和电子设备
CN115480682A (zh) 一种拖拽方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878968

Country of ref document: EP

Kind code of ref document: A1