WO2023193607A1 - 一种基于摄像头的文本识别方法、装置、设备及存储介质 - Google Patents

一种基于摄像头的文本识别方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023193607A1
WO2023193607A1 PCT/CN2023/083265 CN2023083265W WO2023193607A1 WO 2023193607 A1 WO2023193607 A1 WO 2023193607A1 CN 2023083265 W CN2023083265 W CN 2023083265W WO 2023193607 A1 WO2023193607 A1 WO 2023193607A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
focus mode
preset
blur
text
Prior art date
Application number
PCT/CN2023/083265
Other languages
English (en)
French (fr)
Inventor
叶运林
Original Assignee
广州视源电子科技股份有限公司
广州视睿电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司, 广州视睿电子科技有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2023193607A1 publication Critical patent/WO2023193607A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04812Interaction techniques based on cursor appearance or behaviour, e.g. being affected by the presence of displayed objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/20Combination of acquisition, preprocessing or recognition functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10141Special mode during image acquisition
    • G06T2207/10148Varying focus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • the present application relates to the field of terminal technology, and more specifically, to a camera-based text recognition method, device, equipment and storage medium.
  • the terminal device can be connected to the camera to collect images through the camera; or, a camera is provided on the terminal device, and the terminal device can collect images through the camera to automatically recognize text on the image.
  • the terminal device automatically focuses through the camera to collect the image to be recognized.
  • the terminal device uses the front camera to collect images of items located in front of the screen of the terminal device or on the desktop, and recognizes the text in the image. .
  • the purpose of this application is to provide a camera-based text recognition method, device, equipment and storage medium to solve the problem of camera focus errors and improve text recognition accuracy.
  • this application provides a camera text recognition method, which method is applied to a terminal device.
  • the method includes:
  • prompt information is displayed on the visual interface, and the prompt information is used to prompt the user to adjust the focus mode on the visual interface.
  • the first image issues a trigger command
  • the current focus mode is switched to the target focus mode, a second image of the text to be recognized located within the preset camera viewing range is collected, and the text is processed on the second image. Identify.
  • this application provides a camera-based text recognition device, which is configured on a terminal device.
  • the device includes:
  • a first blur degree determination module configured to collect the first image of the text to be recognized within the preset camera viewing range in the current focus mode, and determine the first blur degree of the first image
  • a prompt information display module configured to display prompt information on the visual interface if it is determined that the current focus mode satisfies the preset focus mode switching condition according to the first fuzziness, and the prompt information is used to prompt the user to select the desired location. Issue a triggering instruction to the first image on the visual interface;
  • a text recognition module configured to respond to a user's trigger instruction, switch the current focus mode to a target focus mode, collect a second image of the text to be recognized located within the preset camera viewing range, and analyze the The second image is used for text recognition.
  • the present application provides an electronic device, including: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer execution instructions
  • the processor executes the computer execution instructions stored in the memory to implement the camera-based text recognition method as described in any embodiment of this application.
  • the present application provides a computer-readable storage medium in which computer-executable instructions are stored. When executed by a processor, the computer-executable instructions are used to implement any of the embodiments of the present application. camera-based text recognition method.
  • the present application provides a computer program product, which includes a computer program.
  • the computer program When the computer program is executed by a processor, the computer program implements the camera-based text recognition method described in any embodiment of the present application.
  • this application obtains the first blur of the first image by collecting the first image of the text to be recognized in the current focus mode.
  • prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface.
  • the terminal device After the terminal device responds to the trigger command, it automatically switches the current focus mode to the preset target focus mode. Re-acquisition is performed in target focus mode to obtain a second image, and text recognition is performed on the second image.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • Figure 1 is a schematic diagram of an image displayed on a terminal device provided by this application.
  • Figure 2 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of the prompt information in the image displayed by the terminal device provided by this application.
  • Figure 4 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application.
  • Figure 6 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of a camera-based text recognition device provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of another terminal device provided by an embodiment of the present application.
  • the terminal device involved in this application is provided with a USB interface, a memory, a processor and buttons, and the processor is connected to the USB interface, memory and buttons respectively.
  • the attribute of the USB interface is USB device, which is used to connect to the display device.
  • the terminal device is connected to the display device (such as a laptop) through a USB interface.
  • the terminal device can be combined with a USB cable to form a customized special USB cable, or it can be a combination of a USB dongle with a terminal device and a standard USB cable.
  • the terminal device can be powered directly through the USB interface of the display device.
  • One of the uses of the memory is to store specific programs or downloaders of specific programs that are required for image processing.
  • One of the purposes of the processor is to load a specific program stored in the memory or to control the downloader of a specific program.
  • the button is used to trigger the processor to generate relevant control instructions. For example, after the button is clicked, the processor will receive the operation data sent by the button and generate a corresponding instruction according to the operation data.
  • the terminal device can be connected to the camera and then collect images through the camera; or, a camera is provided on the terminal device, and the terminal device can collect images through the camera.
  • the terminal device can collect images through a camera, and the terminal device can be a product such as a learning machine or a computer.
  • the terminal device is placed on the desktop, and objects are placed in front of the terminal on the desktop.
  • the objects can be documents or books waiting to be recognized as text.
  • the terminal device collects images of the text to be recognized through the front camera. That is, the terminal device can collect images of items located in front of the screen of the terminal device through the camera.
  • the placement direction of the object in front of the screen is not limited. It can be placed horizontally with the desktop or vertically with the desktop, as long as it is within the viewing range of the camera.
  • Figure 1 is a schematic diagram of a terminal device displaying an image provided by this application.
  • the terminal device can collect an image of the text to be recognized located in front of the screen of the terminal device through a camera; then, the terminal device displays the image, and then Perform character recognition on the text to be recognized.
  • terminal equipment such as learning machines cannot move or shake freely to perform fast autofocus in scenarios such as fixed desks, which can easily lead to abnormal focus, blurred images, and inability to accurately recognize text.
  • This application provides a camera-based text recognition method, device, equipment and storage medium to solve the above problems. Introduced below.
  • Figure 2 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application. As shown in Figure 2, the method provided by this embodiment can be applied to a terminal device, and the terminal device is equipped with a camera. The method includes the following steps:
  • S201 In the current focus mode, collect the first image of the text to be recognized located within the preset camera viewing range, and determine the first blur of the first image.
  • the text to be recognized is placed in front of the terminal device, and the text to be recognized can be a document or book with characters on it.
  • the camera on the terminal device collects images of the text to be recognized according to the preset camera viewing range, and the obtained image is the first image of the text to be recognized.
  • the camera is a front-facing camera installed in the middle of the top of the terminal device screen.
  • the preset camera viewing range can be a semicircle.
  • the bottom of the terminal device screen is the diameter side, and 30 cm in front of the terminal device is the radius distance. semicircle range.
  • the camera may have multiple focus modes, for example, it may include an automatic focus mode and a manual focus mode.
  • the focus mode of the camera when collecting the first image is the current focus mode. For example, set the autofocus mode to the default focus mode, place the text to be recognized in front of the terminal device, and after the camera is started, the current focus mode when the first image is collected is the autofocus mode.
  • the actual focus mode of the camera may be independent of the current focus mode, that is, the current actual focus mode may be the current focus mode, or it may not be the current focus mode.
  • the camera collects the first image in manual focus mode as the current focus mode.
  • the manual focus mode automatically converts to the default autofocus mode, that is, the current actual focus mode of the camera changes to the autofocus mode.
  • the camera The actual focus mode is different from the current focus mode.
  • the camera's focus mode can remain unchanged without changing the focus mode.
  • the focus mode of the camera is always maintained in the auto-focus mode, that is, the current focus mode is the same as the actual focus mode after the first image is collected.
  • determining the first blurriness of the first image includes: obtaining the first blurriness of the first image according to a preset blurriness determination algorithm.
  • a blur determination algorithm can be set in advance to calculate the image and determine the blur of the image.
  • the blur of the first image is the first blur.
  • Blur can also be expressed as the sharpness of an image.
  • the relationship between the blur level and the image clarity can be set in the blur determination algorithm. For example, the lower the blur level, the clearer the image, and the higher the text recognition accuracy.
  • the ambiguity determination algorithm may include at least one of the Brenner gradient function, gray difference function or entropy function.
  • the gray difference function may be an SMD (Sum of Modulus of gray Difference, sum of absolute values of gray difference) function. In this embodiment, there is no specific restriction on the application of the ambiguity determination algorithm.
  • the blur determination algorithm can output a blur value to represent the blur degree of the image. For example, for a document to be recognized For the first image of this text, the blur value of the first blur degree is 6.0; for the first image of another text to be recognized, the blur value of the first blur degree is 5.2.
  • the beneficial effect of this setting is that the first blur degree of the first image can be automatically and quickly obtained, thereby determining the degree of blur or clarity of the first image without the need for human judgment, which facilitates subsequent refocusing of the blurred first image. Improve focus accuracy and text recognition efficiency.
  • prompt information is displayed on the visual interface, and the prompt information is used to prompt the user to issue a trigger instruction to the first image on the visual interface.
  • the focus mode switching condition is preset, and the focus mode switching condition can be used to determine whether the current focus mode of the camera needs to be switched.
  • the focus mode switching condition may be that if the first blur is within a preset blur value range, it is determined that the current focus mode satisfies the preset focus mode switching condition.
  • the current focus mode is autofocus mode
  • the first blurriness of the first image is 6.0.
  • the focus mode switching condition if the blurriness value that determines the image to be blurry is greater than or equal to 0.6, it is determined that the autofocus mode needs to be switched. for other focus modes.
  • the first blur degree it can be determined whether the current focus mode meets the preset focus mode switching conditions. If it is satisfied, it is determined that the focus mode needs to be switched, and the new focus mode is used to re-acquire the image.
  • Prompt information can be displayed on the visual interface of the terminal device.
  • the prompt information can be used to prompt the user to issue a trigger instruction to the first image on the visual interface, and the trigger instruction can be used to change the current focus mode.
  • the prompt information may be to display a pop-up window on the screen of the terminal device, and text may be displayed in the pop-up window to prompt the user to click on the desired focus position on the first image on the screen to enable manual operation. Focus mode for precise focus.
  • the prompt information includes voice and/or animation reminders
  • the triggering instructions include touch screen operations.
  • the prompt information may be a voice reminder and/or an animation reminder.
  • the terminal device may issue a voice saying "Please click on the desired text position on the screen to focus", or a trembling little hand may appear on the first image on the screen.
  • the user is reminded to touch any position on the first image with his hand to issue a triggering instruction.
  • the triggering instruction may be a touch screen operation, and the position where the user touches the screen is the position of the text that needs to be focused.
  • the triggering instruction may also be an instruction clicked by a mouse, etc.
  • Figure 3 is a schematic diagram of the prompt information in the image displayed by the terminal device provided by this application.
  • the hand in Figure 3 is an animated reminder, reminding the user to point to the screen and perform touch screen operations.
  • the beneficial effect of this setting is that it can promptly remind the user that the focus mode needs to be switched, allowing the user to specify the position where the focus is required, which facilitates precise focusing, improves the accuracy of text recognition, and enhances the user experience.
  • the user issues a triggering instruction through the visual interface. For example, the user clicks any position on the first image on the screen with his finger.
  • the terminal device responds to the user's trigger command and switches the current focus mode to the target focus mode.
  • the terminal device refocuses in the target focus mode and obtains the image of the text to be recognized within the camera's viewing range again as the second image.
  • the target focus mode is set in advance.
  • the target focus mode is manual focus mode.
  • the current focus mode is switched to manual focus mode. It can be seen from this that in this embodiment, the current focus mode is not the manual focus mode, that is, the current focus mode is not the target focus mode. If the current focus mode is the target focus mode, the There is a possibility that the first image acquired in the previous mode is consistent with the second image acquired in the target mode, so mode switching does not need to be performed.
  • the current focus mode and the target focus mode can be judged in the focus mode switching condition of S102. If the current focus mode is not the target focus mode, it is determined that the current focus mode meets the focus mode switching condition.
  • the current focus mode is the auto focus mode
  • the target focus mode is the manual focus mode
  • the camera focuses on the text position corresponding to the coordinates of the user's finger touching the screen. Since the camera is located above the text to be recognized, the movement of the finger can easily trigger the camera's autofocus. When the autofocus mode is blocked, it can avoid autofocusing on the user's finger in the camera, reducing the problem of manual focus failure and improving manual focus. Focusing efficiency and precision.
  • the text in the second image can be recognized according to a preset text recognition algorithm to obtain a text recognition result.
  • the text recognition algorithm may be an OCR (Optical Character Recognition) algorithm.
  • OCR Optical Character Recognition
  • the text recognition algorithm is not specifically limited.
  • text recognition can be performed directly after obtaining the second image, and there is no need to calculate the fuzziness of the second image, that is, the second fuzziness. Because if the image is still blurry after the focus adjustment, it can be considered that the text to be recognized is not clear, or it is caused by abnormal lighting, camera, etc., so there is no need to guide the user to adjust the focus. If the text cannot be recognized, an error prompt can be issued on the visual interface to remind the user to check factors such as the text to be recognized, the camera or the environment.
  • This application obtains the first blur of the first image by collecting the first image of the text to be recognized in the current focus mode.
  • prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface.
  • the terminal device After the terminal device responds to the trigger command, it automatically switches the current focus mode to the preset target focus mode. Re-acquisition is performed in target focus mode to obtain a second image, and text recognition is performed on the second image.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • Figure 4 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application. This embodiment is an optional embodiment based on the above embodiment, and the method is applied to mobile terminals.
  • the current focus mode satisfies the preset focus mode switching condition, which can be refined as follows: if it is determined that the first blur degree satisfies the preset blur threshold comparison condition, then the current focus mode is determined to satisfy the preset blur threshold comparison condition. The mode is compared with the preset target focus mode; if the current focus mode is not the preset target focus mode, it is determined that the current focus mode meets the preset focus mode switching conditions.
  • the method includes the following steps:
  • S401 In the current focus mode, collect the first image of the text to be recognized located within the preset camera viewing range, and determine the first blur of the first image.
  • the focus mode switching condition may include multiple matching conditions, for example, it may be determined by the first blur Whether the first image is blurred can also determine whether the current focus mode is already the target focus mode.
  • the blur threshold comparison condition may be set in advance, and after the first blur degree is obtained, it is determined whether the first blur degree satisfies the preset blur threshold comparison condition. For example, it can be determined whether the first blur is higher than a preset value or lower than a preset value; for another example, it can be determined whether the first blur is within a preset value range.
  • determining that the first blur meets a preset blur threshold comparison condition includes: if the first blur exceeds the preset blur threshold, determining that the first blur satisfies the preset blur threshold ratio. on condition.
  • a blur threshold can be preset, and the blur threshold comparison condition can be that when the first blur exceeds the blur threshold, the first image is considered blurry, and the first blur satisfies the blur threshold comparison. condition.
  • the first blurriness After obtaining the first blurriness, compare the first blurriness with the blurriness threshold to determine whether the first blurriness exceeds the preset blurriness threshold. If so, determine that the first blurriness satisfies the preset blurriness threshold ratio. condition; if not, it is determined that the first ambiguity does not satisfy the ambiguity threshold comparison condition.
  • the beneficial effect of this setting is that by performing threshold comparison, it can be quickly determined whether the first blur meets the blur threshold comparison condition.
  • the judgment process is simple and fast, and the blur threshold can be flexibly adjusted to improve the accuracy and flexibility of blur judgment.
  • the method further includes: if the first blur does not meet the preset blur threshold comparison condition, then based on the preset text recognition algorithm, the first image is Perform text recognition.
  • the first blur does not meet the blur threshold comparison condition, for example, the first blur does not exceed the preset blur threshold, it is determined that the first image is clear, and no focus adjustment is needed, and text recognition can be performed directly.
  • text recognition is performed on the first image to obtain a text recognition result.
  • the beneficial effect of this setting is that when the first image is relatively clear, subsequent condition judgments are no longer performed, and focus mode switching is not performed, and recognition is performed directly, effectively improving the efficiency of text recognition.
  • the method before determining whether the first blur satisfies the preset blur threshold comparison condition and comparing whether the current focus mode is with the preset target focus mode, the method further includes: obtaining a pre-collected sample. Image set; determine the blur value of any sample image in the sample image set according to the preset blur determination algorithm; determine the blur threshold based on the blur value and the preset blur threshold rules.
  • the ambiguity threshold needs to be determined in advance.
  • the ambiguity threshold may be determined before S401 or before S402, and this embodiment does not impose specific limitations on this.
  • sample image set may include clear images or blurred images. Obtain the sample image set, and obtain the blur value of each sample image in the sample image set according to the preset blur determination algorithm.
  • the fuzziness threshold value rules are set in advance, and the fuzziness threshold is determined based on the fuzziness value and the fuzziness threshold value rules.
  • the blur threshold value rule may be to obtain the maximum value of the blur value of the blurred image in the sample image set as the blur threshold.
  • the staff can also check the clarity and blur value of each sample image, and find out the blur value corresponding to the blurry sample image as the target blur value.
  • the sample images corresponding to the blur value above the target blur value are blur images, and the degree of blur gradually increases, then the target blur value can be The standard blur value is set as the blur threshold.
  • the beneficial effect of this setting is that the blur threshold is pre-set according to actual needs, which improves the flexibility of determining the blur threshold, makes it easier to determine whether the first image is blurred, and performs subsequent operations, which is beneficial to improving the accuracy of text recognition.
  • the current focus mode is a preset target focus mode. If the current focus mode is not the target focus mode, it is determined that the current focus mode meets the preset focus mode switching conditions, and the focus mode needs to be adjusted. For example, if the current focus mode is the default autofocus mode and the target focus mode is manual focus mode, the current focus mode is not the target focus mode.
  • the method further includes: if it is determined that the current focus mode is the target focus mode, then performing a process on the first image according to the preset text recognition algorithm. Text recognition.
  • the current focus mode is the target focus mode
  • the beneficial effect of this setting is that if the first image is obtained in the target focus mode, the reason for the blur of the first image may be that the file itself is not clear, or due to abnormal light or camera, etc., and there is no need to guide the user to focus. Therefore, the recognition can be performed directly, saving the focusing procedure and improving the efficiency of text recognition.
  • the current focus mode after determining that the current focus mode is the target focus mode, it also includes: determining the current number of image acquisitions of the text to be recognized by the camera in the target focus mode; if the current number of image acquisitions exceeds the preset number threshold, based on The preset text recognition algorithm performs text recognition on the first image; if the current number of image acquisitions does not exceed the preset threshold, it is determined that the current focus mode meets the preset focus mode switching conditions.
  • a times threshold can be set in advance, and the times threshold can represent the maximum number of times that the text to be recognized is collected in the target focus mode.
  • the current number of image acquisitions of the text to be recognized by the camera in the target focus mode can be determined, that is, the number of times the text to be recognized has been captured in the target focus mode.
  • the current number of image collections may be the number of consecutive collections within a preset time period.
  • the beneficial effect of this setting is to preset a threshold of times to avoid multiple useless image acquisitions in target focus mode, reduce the user's repeated manual focusing process, improve focusing efficiency, and thereby improve the efficiency of text recognition.
  • the prompt information is used to prompt the user to issue a triggering instruction to the first image on the visual interface.
  • the embodiment of the present application obtains the first image of the first image by collecting the first image of the text to be recognized in the current focus mode. A degree of ambiguity. According to the first blur degree and the preset focus mode switching condition, it is determined whether the first blur degree meets the blur threshold comparison condition, and whether the current focus mode is the target focus mode. Through secondary judgment, the judgment accuracy of focus mode switching conditions can be improved. If the focus mode switching conditions are met, prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface. After the terminal device responds to the trigger command, it automatically switches the current focus mode to the preset target focus mode. Re-acquisition is performed in target focus mode to obtain a second image, and text recognition is performed on the second image.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • switch the focus mode to shield the current focus mode to avoid the impact of the current focus mode in the target focus mode.
  • Figure 5 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application. This embodiment is an optional embodiment based on the above embodiment, and the method is applied to mobile terminals.
  • the current focus mode is the auto focus mode
  • the target focus mode is the manual focus mode
  • the current focus mode is switched to the target focus mode, and the acquisition is within the preset camera viewing range.
  • the second image of the text to be recognized can be refined into: determining the target focus position in response to the user's touch screen operation on any coordinate point in the first image on the visual interface; switching the auto focus mode to manual focus mode , according to the target focus position, image collection is performed on the text to be recognized within the camera viewing range to obtain the second image.
  • the method includes the following steps:
  • the current focus mode collect the first image of the text to be recognized located within the preset camera viewing range, and determine the first blur of the first image; the current focus mode is the autofocus mode.
  • prompt information is displayed on the visual interface, and the prompt information is used to prompt the user to issue a trigger instruction to the first image on the visual interface.
  • the user sees and/or hears prompt information on the visual interface, and makes triggering instructions based on the prompt information.
  • the user can click any coordinate position in the first image on the screen with a finger, and the clicked position is the position that needs to be focused.
  • the triggering instruction made by the user may be a touch screen operation, that is, the user touches any coordinate point on the first image.
  • the terminal device responds to the user's touch screen operation, determines the coordinate point touched by the user, and determines the position of the coordinate point as the target focus position.
  • determining the target focus position includes: based on the user's touch operation on any coordinate point in the first image on the visual interface.
  • the touch screen operation determines the target coordinate position specified by the user; and determines the target focus position in the text to be recognized based on the target coordinate position in the first image.
  • the user performs a touch screen operation on any coordinate point in the first image on the visual interface.
  • the visual interface displays a moving finger, prompting the user to use the finger to click on the position that needs to be focused.
  • the user clicks on the location to focus on through the touch screen.
  • the terminal device responds to the user's touch screen operation and determines the location clicked by the user as the specified target coordinate location.
  • the first image is an image of the text to be recognized, and each coordinate on the first image is the same as each coordinate on the text to be recognized.
  • the actual position of the text corresponds one to one. According to the target coordinate position on the first image, the position on the text to be recognized that needs to be focused can be determined as the target focus position.
  • the target coordinate position is the position in the first image on the visualization interface
  • the target focus position is the actual position on the text to be recognized. For example, if the user clicks on the coordinate position of the first word in the first row in the first image, it is determined that the target focus position is the position of the first word in the first row on the text to be recognized.
  • the beneficial effect of this setting is that the user can touch the screen directly on the visual interface, and directly determine the position that needs to be focused on the text to be recognized based on the user's operation on the visual interface. It is convenient for users to perform focusing operations. Through the association between the target coordinate position and the target focus position, the accuracy of determining the target focus position is improved, thereby improving the focus efficiency and focus accuracy, and facilitating the subsequent text recognition process.
  • S540 Switch the autofocus mode to the manual focus mode, and collect images of the text to be recognized within the camera's viewing range according to the target focus position to obtain a second image.
  • the terminal device After determining the target focus position, the terminal device determines that the focus mode needs to be switched, and switches the focus mode from the original auto focus mode to the set manual focus mode.
  • the auto focus mode automatically focuses on the text to be recognized, while the manual focus mode focuses on the target focus position according to the user's specification.
  • the target focus position is the position that needs to be focused in manual focus mode, and the camera focuses on the target focus position.
  • the terminal device collects images of the text to be recognized within the viewing range of the camera to obtain a second image. That is, the second image is an image in which the target focus position is focused.
  • the automatic focus mode is switched to the manual focus mode, and the image is collected from the text to be recognized within the camera's viewing range according to the target focus position to obtain the second image, which includes: turning off the automatic focus mode, and in the manual focus mode Next, focus on the target focus position of the text to be recognized within the viewing range of the camera, collect images of the text to be recognized, and obtain a second image.
  • the autofocus mode is switched to the manual focus mode.
  • the camera turns off the autofocus mode and turns on the manual focus mode. That is, in the case of manual focus mode, the autofocus mode is disabled, and the camera only focuses in the manual focus mode.
  • the camera on the terminal device only focuses on the target focus position of the text to be recognized within the viewing range in manual focus mode.
  • the user's finger moves on the screen, and the camera acquires the user's moving finger and does not automatically focus on the finger.
  • image collection can be performed to obtain a second image.
  • the second image is an image of the text to be recognized, and there is no user's finger.
  • the beneficial effect of this setting is that by blocking the autofocus mode, it can avoid the problem that when the user clicks on the screen, the arm is just within the camera's viewing range, and accidentally touches the camera's autofocus function, causing the focus to be incorrectly positioned on the hand. It effectively prevents blurry images, improves focus accuracy, thereby improving text recognition accuracy and enhancing user experience.
  • the method further includes: if the maintenance time of the manual focus mode exceeds a preset time threshold, switching the manual focus mode to the automatic focus mode.
  • the start time of the manual focus mode is recorded, and the maintenance time of the manual focus mode is determined in real time.
  • a time threshold is set in advance, and the time threshold may be the maximum maintenance time allowed for the manual focus mode.
  • the time threshold is one second, and manual focus is performed within one second after it is determined that the user clicks on the first image on the screen. After one second, the user has finished clicking the screen and their arms leave the camera's viewing range, and can automatically switch back to autofocus mode.
  • the time threshold can be adjusted according to actual needs. For example, the time threshold can be set to a longer time to provide the user with more time to perform trigger operations and facilitate the user to focus.
  • the beneficial effect of this setting is that the focus mode can be automatically switched. After the manual focus mode is maintained for a certain period of time, it will automatically change to the auto-focus mode, which facilitates the continued auto-focus on subsequent text to be recognized, reduces user operations, and effectively improves focus. efficiency.
  • the method further includes: determining the second blurriness of the second image; if the second blurriness meets the preset blurriness threshold Comparing conditions, determine the current number of image acquisitions of the text to be recognized by the camera in target focus mode; if the current number of image acquisitions exceeds the preset threshold, a prompt message is displayed on the visual interface to guide the user on the visual interface Issue a trigger command to the second image.
  • text recognition can be directly performed on the second image.
  • the blur degree of the second image may also be determined again as the second blur degree.
  • the second blurriness of the second image can be obtained according to a preset blurriness determination algorithm, for example, Brenner gradient function, grayscale difference function or entropy function.
  • the second blurriness Compare the second blurriness with a preset blurriness threshold to determine whether the second blurriness satisfies the preset blurriness threshold comparison condition. For example, it is determined whether the first blur degree exceeds a preset blur threshold, and if so, it is determined that the first blur meets the preset blur threshold comparison condition. If the second blur does not meet the preset blur threshold comparison conditions, it is determined that the second image is clear, and text recognition can be directly performed; if the second blur meets the preset blur threshold comparison conditions, it is further determined that the camera In target focus mode, the current number of image acquisitions for the text to be recognized.
  • the beneficial effect of this setting is that after the second blur degree is obtained, it can be continued to determine whether text recognition can be performed on the second image. If not, manual focusing can be continued to ensure the clarity of the image and improve the accuracy of text recognition. If the text to be recognized has been collected too many times in manual focus mode and the image is still blurred, it is determined that the reason for the blurred image is that the file itself is not clear or due to abnormal light or camera, etc., and the focus mode will not be switched to effectively improve Focus efficiency and recognition efficiency.
  • the embodiment of the present application obtains the first blur of the first image by collecting the first image of the text to be recognized in the current focus mode.
  • prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface.
  • the terminal device responds to the trigger command, it determines the target focus position that needs to be focused, automatically switches the current focus mode to the preset target focus mode, and blocks the autofocus mode. According to the target focus position, re-acquire the image in manual focus mode to obtain the second image, and compare the Two images for text recognition.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • switch the focus mode to shield the current focus mode to avoid the impact of the current focus mode in the target focus mode.
  • Figure 6 is a flow chart of a camera-based text recognition method provided by an embodiment of the present application. This embodiment is an optional embodiment based on the above embodiment, and the method is applied to mobile terminals. As shown in Figure 6, the method includes the following steps:
  • S602. In the autofocus mode, collect the first image of the text to be recognized located within the preset camera viewing range, and determine the text blur of the first image.
  • the prompt information is used to prompt the user to issue a triggering instruction to the first image on the visual interface.
  • the embodiment of the present application obtains the first blur of the first image by collecting the first image of the text to be recognized in the current focus mode.
  • prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface.
  • the terminal device After the terminal device responds to the trigger command, it automatically switches the current focus mode to the preset target focus mode. Re-acquisition is performed in target focus mode to obtain a second image, and text recognition is performed on the second image.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • Figure 7 is a schematic structural diagram of a camera-based text recognition device provided by an embodiment of the present application.
  • the device is applied to a terminal device, and a camera is installed on the terminal device; the device can be implemented through software, hardware, or a combination of both.
  • the device includes: a first ambiguity determination module 701 , a prompt information display module 702 and a text recognition module 703 .
  • the first blur determination module 701 is used to collect the first image of the text to be recognized within the preset camera viewing range in the current focus mode, and determine the first blur of the first image;
  • Prompt information display module 702 is used to determine that the current focus mode satisfies the predetermined value according to the first blur degree. Assuming the focus mode switching condition, prompt information is displayed on the visual interface, and the prompt information is used to prompt the user to issue a trigger instruction to the first image on the visual interface;
  • the text recognition module 703 is configured to respond to the user's trigger instruction, switch the current focus mode to the target focus mode, collect a second image of the text to be recognized located within the preset camera viewing range, and The second image is subjected to text recognition.
  • Optional prompt information display module 702 includes:
  • a blur threshold comparison unit configured to compare the current focus mode with a preset target focus mode if it is determined that the first blur meets a preset blur threshold comparison condition
  • the target focus mode comparison unit is configured to determine that the current focus mode satisfies the preset focus mode switching condition if the current focus mode is not a preset target focus mode.
  • fuzzy threshold comparison unit specifically used for:
  • the first blur degree exceeds the preset blur threshold, it is determined that the first blur meets the preset blur threshold comparison condition.
  • the device also includes:
  • the first image recognition module is configured to, after determining the first blur degree of the first image, if the first blur degree does not meet the preset blur threshold comparison condition, according to the preset text recognition algorithm, Perform text recognition on the first image.
  • the device also includes:
  • a sample image set acquisition module configured to obtain the sample image set before comparing whether the current focus mode is with a preset target focus mode if it is determined whether the first blur meets a preset blur threshold comparison condition. Pre-collected sample image sets;
  • a blur degree value determination module configured to determine the blur degree value of any sample image in the sample image set according to a preset blur degree determination algorithm
  • a fuzziness threshold determination module is configured to determine the fuzziness threshold according to the fuzziness value and a preset fuzziness threshold value rule.
  • the device also includes:
  • a first image text recognition module configured to, after comparing whether the current focus mode is the preset target focus mode, and if it is determined that the current focus mode is the target focus mode, recognize the text according to the preset Algorithm to perform text recognition on the first image.
  • the device also includes:
  • a current image collection number determination module configured to determine the current number of image collections of the text to be recognized by the camera in the target focus mode after determining that the current focus mode is the target focus mode;
  • a frequency comparison module configured to perform text recognition on the first image according to a preset text recognition algorithm if the current number of image collections exceeds a preset frequency threshold
  • a focus mode switching condition satisfying module is configured to determine that the current focus mode satisfies the preset focus mode switching condition if the current number of image acquisitions does not exceed a preset threshold.
  • the prompt information includes voice and/or animation reminders
  • the triggering instructions include touch screen operations.
  • the current focus mode is the auto focus mode
  • the target focus mode is the manual focus mode
  • the text recognition module 703 includes:
  • a target focus position determination unit configured to determine the target focus position in response to the user's touch screen operation on any coordinate point in the first image on the visual interface
  • the second image obtaining unit is used to switch the automatic focus mode to the manual focus mode, and collect images of the text to be recognized within the viewing range of the camera according to the target focus position to obtain a second image.
  • target focus position determination unit specifically used for:
  • the target focus position in the text to be recognized is determined according to the target coordinate position in the first image.
  • the second image obtaining unit is specifically used for:
  • the device also includes:
  • a manual focus mode switching module configured to switch the manual focus mode to the automatic focus mode if the maintenance time of the manual focus mode exceeds a preset time threshold after switching the automatic focus mode to the manual focus mode. Focus mode.
  • the device also includes:
  • a second blur degree determination module configured to determine the second blur degree of the second image after collecting the second image of the text to be recognized located within the preset camera viewing range;
  • a collection number determination module configured to determine the number of current image collections of the text to be recognized by the camera in the target focus mode if the second blur meets a preset blur threshold comparison condition
  • a second image triggering module configured to display prompt information on the visual interface if the current number of image collections does not exceed a preset threshold to guide the user to issue a triggering instruction to the second image on the visual interface.
  • the first ambiguity determination module 701 is specifically used for:
  • the first blur of the first image is obtained.
  • the ambiguity determination algorithm includes at least one of Brenner gradient function, gray difference function or entropy function.
  • the embodiment of the present application obtains the first blur of the first image by collecting the first image of the text to be recognized in the current focus mode.
  • prompt information can be displayed on the visual interface to prompt the user to make a triggering instruction for the first image on the visual interface.
  • the terminal device After the terminal device responds to the trigger command, it automatically switches the current focus mode to the preset target focus mode. Re-acquisition is performed in target focus mode to obtain a second image, and text recognition is performed on the second image.
  • This application can determine whether the first image is blurred through the first blur degree and focus mode switching conditions, thereby changing the focus mode to improve the blur condition of the image.
  • Figure 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device may include: a processor 81 and a memory 82; wherein the memory 82 stores a computer program, and the computer program is adapted to be processed by The server 81 loads and executes the above method steps.
  • the terminal device may also include a transmitter 83 and a receiver 84.
  • the terminal device does not have the ISP function.
  • the processing tool of the terminal device Equipped with ISP function, or the terminal device also includes an ISP chip.
  • Embodiments of the present application also provide a computer storage medium.
  • the computer storage medium can store multiple instructions.
  • the instructions are suitable for being loaded by the processor and executing the method steps of the above embodiments.
  • For the specific execution process please refer to the details of the above embodiments. Description, no details will be given here.
  • the device where the storage medium is located may be a camera or a terminal device.
  • Figure 9 is a schematic structural diagram of another terminal device according to an embodiment of the present application.
  • the terminal device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to realize connection communication between these components.
  • the user interface 1003 may include a display screen (Display) and a camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • Display display screen
  • Camera Camera
  • the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the processor 1001 may include one or more processing cores.
  • the processor 1001 uses various interfaces and lines to connect various parts of the entire terminal device 1000, and executes by running or executing instructions, programs, code sets or instruction sets stored in the memory 1005, and calling data stored in the memory 1005.
  • the processor 1001 can use at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). implemented in hardware form.
  • the processor 1001 can integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a modem, etc.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • modem etc.
  • the CPU mainly handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content that needs to be displayed on the display; and the modem is used to handle wireless communications. It can be understood that the above-mentioned modem may not be integrated into the processor 1001 and may be implemented by a separate chip.
  • the memory 1005 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 1005 includes non-transitory computer-readable storage medium.
  • Memory 1005 may be used to store instructions, programs, codes, sets of codes, or sets of instructions.
  • the memory 1005 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), Instructions, etc., used to implement each of the above method embodiments; the storage data area can store data, etc. involved in each of the above method embodiments.
  • the memory 1005 may optionally be at least one storage device located remotely from the aforementioned processor 1001. As shown in Figure 9, memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and an operating application program of a terminal device.
  • the user interface 1003 is mainly used to provide an input interface for the user and obtain data input by the user; and the processor 1001 can be used to call the operation application program of the terminal device stored in the memory 1005, And specifically implement the method provided by the above embodiment.
  • Embodiments of the present application also provide a computer program product.
  • the computer program product includes: a computer program.
  • the computer program is stored in a readable storage medium.
  • At least one processor of the terminal device can read the calculation from the readable storage medium.
  • Computer program, at least one processor executes the computer program so that the terminal device executes the solution provided by any of the above embodiments.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-volatile memory in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Studio Devices (AREA)

Abstract

本申请提供一种基于摄像头的文本识别方法、装置、设备及存储介质。该方法应用于终端设备,包括:在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定第一图像的第一模糊度;若根据第一模糊度,确定当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,提示信息用于提示用户在可视化界面上对第一图像发出触发指令;响应于用户的触发指令,将当前对焦模式切换为目标对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第二图像,并对第二图像进行文本识别。本申请的方法,实现了对焦模式的自动切换,保证图像清晰度,提高文本识别精度。

Description

一种基于摄像头的文本识别方法、装置、设备及存储介质
本申请要求于2022年4月8日提交中国专利局、申请号为202210366298.2、申请名称为“一种基于摄像头的文本识别方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,更为具体地,涉及一种基于摄像头的文本识别方法、装置、设备及存储介质。
背景技术
随着终端技术的发展,终端设备已经成为人们生活中的重要工具。终端设备可以与摄像头连接,进而通过摄像头采集图像;或者,在终端设备上设置有摄像头,终端设备可以通过摄像头采集图像,从而对图像上的文字进行自动识别。
现有技术中,终端设备通过摄像头自动对焦,采集待识别的图像,例如,终端设备通过前置摄像头,采集位于终端设备的屏幕前方或桌面上的物品的图像,并对图像中的文字进行识别。
然而现有技术中,在终端设备通过摄像头采集物品图像进行文本识别时,由于摄像头位置固定,无法灵活调节摄像头的位置进行对焦,导致物品图像的清晰度较低,图像画面模糊,影响文本识别的精度。
发明内容
本申请的目的在于提供一种基于摄像头的文本识别方法、装置、设备及存储介质,用以解决摄像头对焦错误的问题,提高文本识别精度。
一方面,本申请提供一种摄像头的文本识别方法,该方法应用于终端设备,该方法包括:
在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定所述第一图像的第一模糊度;
若根据所述第一模糊度,确定所述当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,所述提示信息用于提示用户在所述可视化界面上对所述第一图像发出触发指令;
响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,并对所述第二图像进行文本识别。
另一方面,本申请提供一种基于摄像头的文本识别装置,该装置配置于终端设备上,该装置包括:
第一模糊度确定模块,用于在当前对焦模式下,采集预设的摄像头取景范围内待识别文本的第一图像,并确定所述第一图像的第一模糊度;
提示信息显示模块,用于若根据所述第一模糊度,确定所述当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,所述提示信息用于提示用户在所述可视化界面上对所述第一图像发出触发指令;
文本识别模块,用于响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,并对所述第二图像进行文本识别。
另一方面,本申请提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,以实现如本申请任意实施例所述的基于摄像头的文本识别方法。
另一方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如本申请任意实施例所述的基于摄像头的文本识别方法。
另一方面,本申请提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现本申请任意实施例所述的基于摄像头的文本识别方法。
结合上述技术方案,本申请通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第一模糊度。根据第一模糊度和预设的对焦模式切换条件,可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,自动将当前对焦模式切换为预设的目标对焦模式。在目标对焦模式下重新进行采集,得到第二图像,并对第二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1为本申请提供的终端设备显示图像的示意图;
图2为本申请实施例提供的一种基于摄像头的文本识别方法的流程图;
图3为本申请提供的终端设备显示图像中提示信息的示意图;
图4为本申请实施例提供的一种基于摄像头的文本识别方法的流程图;
图5为本申请实施例提供的一种基于摄像头的文本识别方法的流程图;
图6为本申请实施例提供的一种基于摄像头的文本识别方法的流程图;
图7为本申请实施例提供的一种基于摄像头的文本识别装置的结构示意图;
图8为本申请实施例提供的一种终端设备的结构示意图;
图9为本申请实施例提供的另一种终端设备的结构示意图。
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例方式作进一步地详细描述。
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在本申请的描述中,需要理解的是,术语“第一”、“第二”、“第三”等仅用于区别类似的对象,而不必用于描述特定的顺序或先后次序,也不能理解为指示或暗示相对重要性。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本申请中的具体含义。此外,在本申请的描述中,除非另有说明,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请所涉及的终端设备设置有USB接口、存储器、处理器以及按键,处理器分别与USB接口、存储器和按键连接。USB接口的属性为USB device,用于与显示设备相连。终端设备通过USB接口与显示设备(例如笔记本电脑)连接。终端设备可以与USB线材结合成一根定制的特殊USB线材,也可以是一个带终端设备的USB dongle加标准USB线材的结合的方式存在。终端设备可直接通过显示设备的USB接口获得供电。存储器的用途之一为用于存放进行图像处理需要用到的特定程序或特定程序的下载器。处理器的用途之一为用于加载存储器存储的特定程序或特定程序的下载器后进行控制。按键用于触发处理器产生相关的控制指令,例如按键被点击操作后,处理器将接收到按键发送的操作数据,并根据该操作数据对应生成响应的指令。
随着终端技术的发展,终端设备已经成为人们生活中的重要工具。终端设备可以与摄像头连接,进而通过摄像头采集图像;或者,在终端设备上设置有摄像头,终端设备可以通过摄像头采集图像。
一个示例中,终端设备可以通过摄像头采集图像,终端设备可以是学习机或电脑等产品。终端设备放置在桌面上,桌面上终端前方的位置放置物品,物品可以是文件或书本等待识别文本,终端设备通过前置摄像头采集待识别文本的图像。即,终端设备可以通过摄像头,采集位于终端设备的屏幕的前面的物品的图像。本实施例中,不限定屏幕前面物品的放置方向,可以与桌面水平放置,也可以与桌面竖直放置,只要在摄像头的取景范围内即可。
图1为本申请提供的终端设备显示图像的示意图,如图1所示,终端设备可以通过摄像头,采集位于终端设备的屏幕的前面的待识别文本的图像;然后,终端设备显示出图像,再对待识别文本进行文字识别。如图1所示,学习机等终端设备,固定放置课桌等场景下,无法自由移动或晃动来进行快速自动对焦,容易导致对焦异常,采集的图像画面模糊,无法准确识别文本。
本申请提供一种基于摄像头的文本识别方法、装置、设备及存储介质,以解决上述问题。下面进行介绍。
需要注意的是,由于篇幅所限,本申请说明书没有穷举所有可选的实施方式,本领域技术人员在阅读本申请说明书后,应该能够想到,只要技术特征不互相矛盾,那么技术特征的任意组合均可以构成可选的实施方式。下面对各实施例进行详细说明。
图2为本申请实施例提供的一种基于摄像头的文本识别方法的流程图,如图2所示,本实施例提供的方法可以应用于终端设备,终端设备上安装有摄像头。该方法包括以下步骤:
S201、在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定第一图像的第一模糊度。
其中,在终端设备前方放置待识别文本,待识别文本可以是上面存在字符的文件或书本等。终端设备上的摄像头根据预设的摄像头取景范围对待识别文本进行图像采集,得到的图像为待识别文本的第一图像。例如,摄像头为前置摄像头,安装在终端设备屏幕顶部中间的位置,预设的摄像头取景范围可以是一个半圆,例如,以终端设备屏幕底端为直径所在边,终端设备前方30厘米作为半径距离的半圆范围。
摄像头的对焦模式可以包括多种,例如,可以包括自动对焦模式和手动对焦模式等。摄像头在采集第一图像时的对焦模式为当前对焦模式。例如,设置自动对焦模式为默认的对焦模式,将待识别文本放在终端设备前方,摄像头启动后,采集到第一图像时的当前对焦模式就是自动对焦模式。
在得到第一图像后,摄像头的实际对焦模式可以与当前对焦模式无关,即,当前实际的对焦模式可以是当前对焦模式,也可以不是当前对焦模式。例如,摄像头以手动对焦模式作为当前对焦模式采集到第一图像,在采集之后,手动对焦模式自动转换为默认的自动对焦模式,即,摄像头当前实际的对焦模式变为自动对焦模式,摄像头此时实际的对焦模式与当前对焦模式不同。在没有改变对焦模式的情况下,摄像头的对焦模式也可以一直不变。例如,摄像头的对焦模式一直维持为自动对焦模式,即,当前对焦模式和采集第一图像后实际的对焦模式相同。
本实施例中,确定第一图像的第一模糊度,包括:根据预设的模糊度确定算法,得到第一图像的第一模糊度。
具体的,可以预先设置模糊度确定算法对图像进行计算,确定图像的模糊度。第一图像的模糊度为第一模糊度。模糊度也可以表示为图像的清晰度。可以在模糊度确定算法中设置模糊度高低与图像清晰程度关系,例如,模糊度越低,图像越清晰,文本的识别精度也就越高。模糊度确定算法可以包括Brenner梯度函数、灰度差分函数或熵函数中的至少一项,灰度差分函数可以是SMD(Sum of Modulus of gray Difference,灰度差分绝对值之和)函数。本实施例中,对模糊度确定算法的应用不做具体限制。
模糊度确定算法可以输出模糊度值来表示图像的模糊程度,例如,对于一个待识别文 本的第一图像,可以得到第一模糊度的模糊度值为6.0;对于另一个待识别文本的第一图像,得到第一模糊度的模糊度值为5.2。
这样设置的有益效果在于,可以自动快速得到第一图像的第一模糊度,从而确定第一图像的模糊程度或清晰程度,不需要人为进行判断,便于后续针对模糊的第一图像进行重新对焦,提高对焦精度和文本识别的效率。
S202、若根据第一模糊度,确定当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,提示信息用于提示用户在可视化界面上对第一图像发出触发指令。
其中,预先设置对焦模式切换条件,对焦模式切换条件可以用于确定是否需要切换摄像头的当前对焦模式。例如,对焦模式切换条件可以是,若第一模糊度在预设的模糊度数值范围之内,则确定当前对焦模式满足预设的对焦模式切换条件。
示例的,当前对焦模式为自动对焦模式,第一图像的第一模糊度为6.0,对焦模式切换条件中,确定图像为模糊的模糊度数值是大于或等于0.6,则确定需要将自动对焦模式切换为其他的对焦模式。
根据第一模糊度,可以确定当前对焦模式是否满足预设的对焦模式切换条件,若满足,则确定需要进行对焦模式的切换,采用新的对焦模式重新进行图像采集。可以在终端设备的可视化界面上显示提示信息。提示信息可以用于提示用户在可视化界面上对第一图像发出触发指令,触发指令可以用于改变当前对焦模式。例如,当前对焦模式为自动对焦模式,提示信息可以是在终端设备的屏幕上显示一个弹窗,弹窗中可以显示文字,提示用户点击屏幕中第一图像上期望被对焦的位置,来启用手动对焦模式,实现精确对焦。
本实施例中,提示信息包括语音和/或动画提醒,触发指令包括触屏操作。
具体的,提示信息可以是语音提醒和/或动画提醒,例如,终端设备可以发出语音“请点击屏幕中期望对焦的文本位置”,或者屏幕中的第一图像上可以是出现一个抖动的小手,提醒用户用手触摸第一图像上的任一位置,发出触发指令,即,触发指令可以是触屏操作,用户触屏的位置即为需要对焦的文本位置。本实施例中,触发指令还可以是通过鼠标点击的指令等。
图3为本申请提供的终端设备显示图像中提示信息的示意图。图3中的手为动画提醒,提醒用户用手指向屏幕,进行触屏操作。
这样设置的有益效果在于,可以及时提醒用户需要进行对焦模式切换,使用户指定需要对焦的位置,便于进行精确对焦,提高文本识别的精度,提升用户的使用体验。
S203、响应于用户的触发指令,将当前对焦模式切换为目标对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第二图像,并对第二图像进行文本识别。
其中,用户通过可视化界面上发出触发指令,例如,用户用手指点击屏幕上第一图像上的任一位置。终端设备响应到用户的触发指令,将当前对焦模式切换为目标对焦模式,终端设备在目标对焦模式下重新进行对焦,再次获取摄像头取景范围内的待识别文本的图像,作为第二图像。
预先设置目标对焦模式,例如,目标对焦模式为手动对焦模式,在响应到触发指令后,将当前对焦模式切换为手动对焦模式。由此可知,本实施例中,当前对焦模式不是手动对焦模式,即,当前对焦模式不是目标对焦模式。若当前对焦模式为目标对焦模式,则在当 前模式下获取的第一图像与目标模式下获取的第二图像存在一致的可能,因此,可以不进行模式切换。可以在S102的对焦模式切换条件中进行当前对焦模式和目标对焦模式的判断,若当前对焦模式不是目标对焦模式,则确定当前对焦模式满足对焦模式切换条件。
将当前对焦模式切换为目标对焦模式,摄像头只以目标对焦模式进行图像采集,屏蔽当前对焦模式。即,使摄像头获取第二图像时的对焦模式只有一种,就是目标对焦模式,避免当前对焦模式造成影响。例如,当前对焦模式为自动对焦模式,目标对焦模式为手动对焦模式,摄像头对用户手指触屏的坐标所对应的文本位置进行对焦。由于摄像头位于待识别文本的上方,手指的移动容易触发摄像头的自动对焦,而在屏蔽自动对焦模式的情况下,可以避免对摄像头中的用户手指进行自动对焦,减少手动对焦失效的问题,提高手动对焦的效率和精度。
在得到第二图像后,可以根据预设的文本识别算法,对第二图像中的文本进行识别,得到文本识别结果。文本识别算法可以是OCR(Optical Character Recognition,光学字符识别)算法,本实施例中,对文本识别算法不做具体限定。
本实施例中,在得到第二图像后可以直接进行文本识别,不需要再计算第二图像的模糊度,即第二模糊度。因为如果进行对焦调整后,图像仍然模糊,则可以认为是由于待识别文本本身不清晰,或者是由于光线、摄像头异常等原因造成的,所以不需要再引导用户进行对焦调整。若无法识别出文本,则可以在可视化界面上发出错误提示,提醒用户检查待识别文本、摄像头或环境等因素。
本申请通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第一模糊度。根据第一模糊度和预设的对焦模式切换条件,可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,自动将当前对焦模式切换为预设的目标对焦模式。在目标对焦模式下重新进行采集,得到第二图像,并对第二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
图4为本申请实施例提供的一种基于摄像头的文本识别方法的流程图,该实施例是在上述实施例基础上的可选实施例,该方法应用于移动终端。
本实施例中,根据第一模糊度,确定当前对焦模式满足预设的对焦模式切换条件,可细化为:若确定第一模糊度满足预设的模糊度阈值比对条件,则将当前对焦模式与预设的目标对焦模式进行对比;若当前对焦模式不是预设的目标对焦模式,则确定当前对焦模式满足预设的对焦模式切换条件。
如图4所示,该方法包括以下步骤:
S401、在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定第一图像的第一模糊度。
S402、若确定第一模糊度满足预设的模糊度阈值比对条件,则将当前对焦模式与预设的目标对焦模式进行对比。
其中,对焦模式切换条件中可以包括多个匹配条件,例如,可以通过第一模糊度确定 第一图像是否模糊,还可以确定当前对焦模式是不是已经是目标对焦模式。
可以预先设置模糊度阈值比对条件,在得到第一模糊度后,确定第一模糊度是否满足预设的模糊度阈值比对条件。例如,可以判断第一模糊度是否高于一个预设值或低于一个预设值;又例如,可以判断第一模糊度是否在一个预设的数值范围内。
若确定第一模糊度满足预设的模糊度阈值比对条件,则可以继续判断当前对焦模式是否为目标对焦模式,即,可以将当前对焦模式与预设的目标对焦模式进行对比,实现对对焦模式切换条件的二次判断。
本实施例中,确定第一模糊度满足预设的模糊度阈值比对条件,包括:若第一模糊度超过预设的模糊度阈值,则确定第一模糊度满足预设的模糊度阈值比对条件。
具体的,可以预设一个模糊度阈值,模糊度阈值比对条件可以是,在第一模糊度超过模糊度阈值时,则认为第一图像是模糊的,第一模糊度满足模糊度阈值比对条件。
在得到第一模糊度后,将第一模糊度与模糊度阈值进行比较,确定第一模糊度是否超过预设的模糊度阈值,若是,则确定第一模糊度满足预设的模糊度阈值比对条件;若否,则确定第一模糊度不满足模糊度阈值比对条件。
这样设置的有益效果在于,通过进行阈值比较,可以快速确定第一模糊度是否满足模糊度阈值比对条件,判断过程简单快捷,模糊度阈值可以灵活调节,提高模糊度的判断精度和灵活性。
本实施例中,在确定第一图像的第一模糊度之后,还包括:若第一模糊度不满足预设的模糊度阈值比对条件,则根据预设的文本识别算法,对第一图像进行文本识别。
具体的,若第一模糊度不满足模糊度阈值比对条件,例如,第一模糊度没有超过预设的模糊度阈值,则确定第一图像清晰,不需要进行对焦调节,可以直接进行文本识别。根据预设的文本识别算法,对第一图像进行文本识别,得到文本识别结果。
这样设置的有益效果在于,在第一图像较为清晰时,不再进行后续的条件判断,也不进行对焦模式的切换,直接进行识别,有效提高文本识别的效率。
本实施例中,在若确定第一模糊度是否满足预设的模糊度阈值比对条件,则将当前对焦模式是否为与预设的目标对焦模式进行对比之前,还包括:获取预先采集的样本图像集;根据预设的模糊度确定算法,确定样本图像集中任一样本图像的模糊度数值;根据模糊度数值和预设的模糊度阈值取值规则,确定模糊度阈值。
具体的,在进行第一模糊度和模糊度阈值的比对之前,需要预先确定模糊度阈值。本实施例中,模糊度阈值的确定可以在S401之前,也可以在S402之前,本实施例对此不做具体限制。
预先收集多张样本图像,作为样本图像集,样本图像集中可以包括清晰图像,也可以包括模糊图像。获取样本图像集,根据预设的模糊度确定算法,得到样本图像集中各张样本图像的模糊度数值。
预先设置模糊度阈值取值规则,根据模糊度数值和模糊度阈值取值规则,确定模糊度阈值。例如,模糊度阈值取值规则可以是获取样本图像集中模糊图像的模糊度数值的最大值,作为模糊度阈值。也可以由工作人员查看各张样本图像的清晰程度和模糊度数值,从中找出较为模糊的样本图像所对应的模糊度数值,作为目标模糊度数值。目标模糊度数值以上的模糊度数值所对应的样本图像均为模糊图像,且模糊程度逐渐增加,则可以将该目 标模糊度数值设置为模糊度阈值。
这样设置的有益效果在于,根据实际需求预先设置模糊度阈值,提高模糊度阈值确定的灵活性,便于判断第一图像是否模糊,从而进行后续操作,有利于提高文本识别的精度。
S403、若当前对焦模式不是预设的目标对焦模式,则确定当前对焦模式满足预设的对焦模式切换条件。
其中,判断当前对焦模式是否为预设的目标对焦模式,若当前对焦模式不是目标对焦模式,则确定当前对焦模式满足预设的对焦模式切换条件,需要进行对焦模式的调整。例如,当前对焦模式为默认的自动对焦模式,目标对焦模式为手动对焦模式,则当前对焦模式不是目标对焦模式。
本实施例中,在将当前对焦模式是否为与预设的目标对焦模式进行对比之后,还包括:若确定当前对焦模式是目标对焦模式,则根据预设的文本识别算法,对第一图像进行文本识别。
具体的,若当前对焦模式是目标对焦模式,则确定第一图像已经是在目标对焦模式下采集到的,不需要再次在目标对焦模式下进行采集。因此,可以直接对第一图像进行文本识别。根据预设的文本识别算法,对第一图像进行文本识别,得到识别结果。
这样设置的有益效果在于,若第一图像是在目标对焦模式下获得,则第一图像的模糊原因可能是文件本身就不清晰,或者是由于光线、摄像头异常等,不需要再引导用户进行对焦调整,因此,可以直接进行识别,节约对焦程序,提高文本识别的效率。
本实施例中,在确定当前对焦模式是目标对焦模式之后,还包括:确定摄像头在目标对焦模式下,对待识别文本的当前图像采集次数;若当前图像采集次数超过预设的次数阈值,则根据预设的文本识别算法,对第一图像进行文本识别;若当前图像采集次数未超过预设的次数阈值,则确定当前对焦模式满足预设的对焦模式切换条件。
具体的,可以预先设置一个次数阈值,次数阈值可以表示待识别文本在目标对焦模式下被采集的最大次数。在确定当前对焦模式是目标对焦模式之后,可以确定摄像头在目标对焦模式下,对待识别文本的当前图像采集次数,即,待识别文本已经在目标对焦模式下被采集的次数。本实施例中,当前图像采集次数可以是预设时间段内被连续采集的次数。
将当前图像采集次数与次数阈值进行比较,判断当前图像采集次数是否超过次数阈值。若当前图像采集次数超过预设的次数阈值,则确定待识别文本已经不能再在目标对焦模式下进行采集,直接根据预设的文本识别算法,对第一图像进行文本识别。若当前图像采集次数未超过预设的次数阈值,则确定当前对焦模式满足预设的对焦模式切换条件,可以继续执行S404,并在目标对焦模式下采集第二图像。
这样设置的有益效果在于,预设一个次数阈值,避免在目标对焦模式下进行多次无用的图像采集,减少用户重复进行手动对焦的过程,提高对焦效率,进而提高文本识别的效率。
S404、在可视化界面上显示提示信息,提示信息用于提示用户在可视化界面上对第一图像发出触发指令。
S405、响应于用户的触发指令,将当前对焦模式切换为目标对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第二图像,并对第二图像进行文本识别。
本申请实施例通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第 一模糊度。根据第一模糊度和预设的对焦模式切换条件,判断第一模糊度是否满足模糊度阈值比对条件,以及确定当前对焦模式是否为目标对焦模式。通过二次判断,可以提高对焦模式切换条件的判断精度。若满足对焦模式切换条件,则可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,自动将当前对焦模式切换为预设的目标对焦模式。在目标对焦模式下重新进行采集,得到第二图像,并对第二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
图5为本申请实施例提供的一种基于摄像头的文本识别方法的流程图,该实施例是在上述实施例基础上的可选实施例,该方法应用于移动终端。
本实施例中,当前对焦模式为自动对焦模式,目标对焦模式为手动对焦模式;相应地,响应于用户的触发指令,将当前对焦模式切换为目标对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第二图像,可细化为:响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置;将自动对焦模式切换为手动对焦模式,根据目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像。
如图5所示,该方法包括以下步骤:
S510、在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定第一图像的第一模糊度;当前对焦模式为自动对焦模式。
S520、若根据第一模糊度,确定当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,提示信息用于提示用户在可视化界面上对第一图像发出触发指令。
S530、响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置。
其中,用户在可视化界面上看到和/或听到提示信息,根据提示信息做出触发指令。例如,用户可以用手指点击屏幕上第一图像中的任意坐标位置,所点击的位置为需要对焦的位置。本实施例中,用户做出的触发指令可以是触屏操作,即用户触碰第一图像上的任一坐标点。终端设备响应到用户的触屏操作,确定用户所触碰的坐标点,将坐标点位置确定为目标对焦位置。
本实施例中,响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置,包括:根据用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定用户指定的目标坐标位置;根据第一图像中的目标坐标位置,确定待识别文本中的目标对焦位置。
具体的,用户在可视化界面上对第一图像中的任一坐标点进行触屏操作,例如,可视化界面显示一个移动的手指,提示用户用手指点击需要对焦的位置。用户通过触屏点击需要对焦的位置。终端设备响应到用户的触屏操作,确定用户所点击的位置,作为指定的目标坐标位置。第一图像为待识别文本的图像,第一图像上的各个坐标与待识别文本上各个 文本的实际位置一一对应。根据第一图像上的目标坐标位置,可以确定待识别文本上需要被对焦的位置,作为目标对焦位置。即,目标坐标位置是可视化界面上第一图像中的位置,目标对焦位置是待识别文本上的实际位置。例如,用户点击第一图像中第一排第一个字所在的坐标位置,则确定目标对焦位置是待识别文本上第一排第一个字的位置。
这样设置的有益效果在于,用户可以直接在可视化界面上进行触屏,根据用户在可视化界面上的操作直接对待识别文本上需要对焦的位置进行确定。方便用户做出对焦操作,通过目标坐标位置和目标对焦位置的关联,提高目标对焦位置的确定精度,进而提高对焦效率和对焦精度,便于后续的文本识别过程。
S540、将自动对焦模式切换为手动对焦模式,根据目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像。
其中,终端设备在确定目标对焦位置之后,确定需要切换对焦模式,将对焦模式由原来的自动对焦模式切换为设定的手动对焦模式。自动对焦模式是自动对待识别文本进行对焦,手动对焦模式是根据用户的指定,对目标对焦位置进行对焦。
在对焦模式切换为手动对焦模式后,以手动对焦模式对待识别文本进行图像采集,采集得到的图像为第二图像。
目标对焦位置即为手动对焦模式下需要对焦的位置,摄像头对目标对焦位置进行对焦。终端设备在摄像头对焦后,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像。即,第二图像是对目标对焦位置进行对焦后的图像。
本实施例中,将自动对焦模式切换为手动对焦模式,根据目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像,包括:关闭自动对焦模式,并在手动对焦模式下,对焦摄像头取景范围内的待识别文本的目标对焦位置,对待识别文本进行图像采集,得到第二图像。
具体的,将自动对焦模式切换为手动对焦模式,此时摄像头关闭自动对焦模式,打开手动对焦模式。即,在手动对焦模式的情况下,屏蔽自动对焦模式,摄像头只以手动对焦模式进行对焦。
终端设备上的摄像头只以手动对焦模式对取景范围内的待识别文本的目标对焦位置进行对焦。用户的手指在屏幕上移动,摄像头获取到用户的移动手指,不会对手指进行自动对焦。对目标对焦位置进行对焦后即可进行图像采集,得到第二图像,第二图像为待识别文本的图像,不存在用户的手指。
这样设置的有益效果在于,通过屏蔽自动对焦模式,可以避免用户点击屏幕时,手臂正好位于摄像头取景范围内,而误触摄像头的自动对焦功能所导致焦点错误的定位在手上的问题。有效防止画面模糊,提高对焦精度,进而提高文本识别精度,提升用户的体验感。
本实施例中,在将自动对焦模式切换为手动对焦模式之后,还包括:若手动对焦模式的维持时间超过预设的时间阈值,则将手动对焦模式切换为自动对焦模式。
具体的,在将自动对焦模式切换为手动对焦模式时,记录手动对焦模式的开始时间,并实时确定手动对焦模式的维持时间。预先设置一个时间阈值,时间阈值可以是允许手动对焦模式的最大维持时间。
将确定的维持时间与时间阈值进行实时比较,判断维持时间是否超过时间阈值。若手动对焦模式的维持时间没有超过预设的时间阈值,则继续保持手动对焦模式,响应用户的 对焦操作。若手动对焦模式的维持时间超过预设的时间阈值,则不再以手动对焦模式进行对焦,将手动对焦模式切换回默认的自动对焦模式,对待识别文本进行自动对焦。例如,预设的时间阈值为一秒,在确定用户点击屏幕第一图像后的一秒内,进行手动对焦。在一秒之后,用户已经点击完屏幕,手臂离开摄像头的取景范围,可以自动切换回自动对焦模式。时间阈值可以根据实际需求进行调整,例如,可以将时间阈值设置为更长的时间,给用户提供更多做出触发操作的时间,便于用户进行对焦。
这样设置的有益效果在于,可以自动进行对焦模式的切换,在手动对焦模式维持一定时间后,自动变为自动对焦模式,便于继续对后续的待识别文本进行自动对焦,减少用户操作,有效提高对焦效率。
本实施例中,在采集位于预设的摄像头取景范围内的待识别文本的第二图像之后,还包括:确定第二图像的第二模糊度;若第二模糊度满足预设的模糊度阈值比对条件,则确定摄像头在目标对焦模式下,对待识别文本的当前图像采集次数;若当前图像采集次数超过预设的次数阈值,则在可视化界面上显示提示信息,以引导用户在可视化界面上对第二图像发出触发指令。
具体的,在得到第二图像之后,可以直接对第二图像进行文本识别。也可以再次确定第二图像的模糊度,作为第二模糊度。可以根据预设的模糊度确定算法,例如,Brenner梯度函数、灰度差分函数或熵函数等,得到第二图像的第二模糊度。
将第二模糊度与预设的模糊度阈值进行比较,判断第二模糊度是否满足预设的模糊度阈值比对条件。例如,判断第一模糊度是否超过预设的模糊度阈值,若是,则确定第一模糊度满足预设的模糊度阈值比对条件。若第二模糊度不满足预设的模糊度阈值比对条件,则确定第二图像清晰,可以直接进行文本识别;若第二模糊度满足预设的模糊度阈值比对条件,则进一步确定摄像头在目标对焦模式下,对待识别文本的当前图像采集次数。
判断当前图像采集次数是否超过预设的次数阈值,若是,则不再进行手动对焦模式的切换,直接对第二图像进行文本识别;若否,则可以在可视化界面上显示提示信息,引导用户在可视化界面上对第二图像再次发出触发指令。响应于用户的触发指令,将摄像头此时实际的对焦模式切换为手动对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第三图像,并重复进行第三图像的第三模糊度的确定,直至采集的图像满足预设的模糊度阈值比对条件,或者当前图像采集次数超过预设的次数阈值,则进行文本识别。
这样设置的有益效果在于,在得到第二模糊度后,可以继续判断是否能对第二图像进行文本识别,若不能,则可以继续进行手动对焦,保证图像的清晰度,提高文本识别的精度。若待识别文本在手动对焦模式下的采集次数过多,图像仍然模糊,则确定图像模糊的原因是文件本身就不清晰或者是由于光线或摄像头异常等,不再进行对焦模式的切换,有效提高对焦效率和识别效率。
S550、对第二图像进行文本识别。
本申请实施例通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第一模糊度。根据第一模糊度和预设的对焦模式切换条件,可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,确定需要对焦的目标对焦位置,自动将当前对焦模式切换为预设的目标对焦模式,屏蔽自动对焦模式。根据目标对焦位置,在手动对焦模式下重新进行采集,得到第二图像,并对第 二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
图6为本申请实施例提供的一种基于摄像头的文本识别方法的流程图,该实施例是在上述实施例基础上的可选实施例,该方法应用于移动终端。如图6所示,该方法包括以下步骤:
S601、进入拍摄界面。
S602、在自动对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定第一图像的文本模糊度。
S603、根据模糊度阈值比对条件,对文本模糊度进行判断,若文本模糊度满足预设的模糊度阈值比对条件,则执行S604;若文本模糊度不满足预设的模糊度阈值比对条件,则执行S605。
S604、判断是否在手动对焦模式下进行过待识别文本的图像采集,若是,则执行S605;若否,则执行S606。
S605、进行文本识别。
S606、在可视化界面上显示提示信息,提示信息用于提示用户在可视化界面上对第一图像发出触发指令。
S607、响应于用户的触发指令,将自动对焦模式切换为手动对焦模式,采集位于预设的摄像头取景范围内的待识别文本的第二图像。
S608、确定第二图像的文本模糊度,执行S603。
本申请实施例通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第一模糊度。根据第一模糊度和预设的对焦模式切换条件,可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,自动将当前对焦模式切换为预设的目标对焦模式。在目标对焦模式下重新进行采集,得到第二图像,并对第二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
图7为本申请实施例提供的一种基于摄像头的文本识别装置的结构示意图,该装置应用于终端设备,终端设备上安装有摄像头;该装置可以通过软件、硬件或者两者的结合实现。如图7所示,该装置包括:第一模糊度确定模块701、提示信息显示模块702和文本识别模块703。
第一模糊度确定模块701,用于在当前对焦模式下,采集预设的摄像头取景范围内待识别文本的第一图像,并确定所述第一图像的第一模糊度;
提示信息显示模块702,用于若根据所述第一模糊度,确定所述当前对焦模式满足预 设的对焦模式切换条件,则在可视化界面上显示提示信息,所述提示信息用于提示用户在所述可视化界面上对所述第一图像发出触发指令;
文本识别模块703,用于响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,并对所述第二图像进行文本识别。
可选的,提示信息显示模块702,包括:
模糊度阈值比对单元,用于若确定所述第一模糊度满足预设的模糊度阈值比对条件,则将所述当前对焦模式与预设的目标对焦模式进行对比;
目标对焦模式比对单元,用于若所述当前对焦模式不是预设的目标对焦模式,则确定所述当前对焦模式满足预设的对焦模式切换条件。
可选的,模糊度阈值比对单元,具体用于:
若所述第一模糊度超过预设的模糊度阈值,则确定所述第一模糊度满足预设的模糊度阈值比对条件。
可选的,该装置还包括:
第一图像识别模块,用于在确定所述第一图像的第一模糊度之后,若所述第一模糊度不满足预设的模糊度阈值比对条件,则根据预设的文本识别算法,对所述第一图像进行文本识别。
可选的,该装置还包括:
样本图像集获取模块,用于在若确定所述第一模糊度是否满足预设的模糊度阈值比对条件,则将所述当前对焦模式是否为与预设的目标对焦模式进行对比之前,获取预先采集的样本图像集;
模糊度数值确定模块,用于根据预设的模糊度确定算法,确定所述样本图像集中任一样本图像的模糊度数值;
模糊度阈值确定模块,用于根据所述模糊度数值和预设的模糊度阈值取值规则,确定所述模糊度阈值。
可选的,该装置还包括:
第一图像文本识别模块,用于在将所述当前对焦模式是否为与预设的目标对焦模式进行对比之后,若确定所述当前对焦模式是所述目标对焦模式,则根据预设的文本识别算法,对所述第一图像进行文本识别。
可选的,该装置还包括:
当前图像采集次数确定模块,用于在确定所述当前对焦模式是所述目标对焦模式之后,确定摄像头在所述目标对焦模式下,对所述待识别文本的当前图像采集次数;
次数比对模块,用于若所述当前图像采集次数超过预设的次数阈值,则根据预设的文本识别算法,对所述第一图像进行文本识别;
对焦模式切换条件满足模块,用于若所述当前图像采集次数未超过预设的次数阈值,则确定所述当前对焦模式满足预设的对焦模式切换条件。
可选的,提示信息包括语音和/或动画提醒,所述触发指令包括触屏操作。
可选的,当前对焦模式为自动对焦模式,所述目标对焦模式为手动对焦模式;
相应地,文本识别模块703,包括:
目标对焦位置确定单元,用于响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置;
第二图像得到单元,用于将所述自动对焦模式切换为手动对焦模式,根据所述目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像。
可选的,目标对焦位置确定单元,具体用于:
根据用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定用户指定的目标坐标位置;
根据所述第一图像中的目标坐标位置,确定所述待识别文本中的目标对焦位置。
可选的,第二图像得到单元,具体用于:
关闭自动对焦模式,并在所述手动对焦模式下,对焦摄像头取景范围内的待识别文本的目标对焦位置,对所述待识别文本进行图像采集,得到所述第二图像。
可选的,该装置还包括:
手动对焦模式切换模块,用于在将所述自动对焦模式切换为手动对焦模式之后,若所述手动对焦模式的维持时间超过预设的时间阈值,则将所述手动对焦模式切换为所述自动对焦模式。
可选的,该装置还包括:
第二模糊度确定模块,用于在采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像之后,确定所述第二图像的第二模糊度;
采集次数确定模块,用于若所述第二模糊度满足预设的模糊度阈值比对条件,则确定摄像头在所述目标对焦模式下,对所述待识别文本的当前图像采集次数;
第二图像触发模块,用于若所述当前图像采集次数未超过预设的次数阈值,则在可视化界面上显示提示信息,以引导用户在所述可视化界面上对所述第二图像发出触发指令。
可选的,第一模糊度确定模块701,具体用于:
根据预设的模糊度确定算法,得到所述第一图像的第一模糊度。
可选的,模糊度确定算法包括Brenner梯度函数、灰度差分函数或熵函数中的至少一项。
本申请实施例通过采集当前对焦模式下的待识别文本的第一图像,得到第一图像的第一模糊度。根据第一模糊度和预设的对焦模式切换条件,可以在可视化界面上显示提示信息,提示用户在可视化界面上对第一图像做出触发指令。终端设备响应到触发指令后,自动将当前对焦模式切换为预设的目标对焦模式。在目标对焦模式下重新进行采集,得到第二图像,并对第二图像进行文本识别。本申请通过第一模糊度和对焦模式切换条件,可以确定第一图像是否模糊,从而改变对焦模式,以改善图像的模糊情况。在图像模糊的情况下,通过切换对焦模式,屏蔽当前对焦模式,避免在目标对焦模式下,当前对焦模式产生影响。解决了现有技术中,用户在手动对焦时,摄像头自动对焦到用户的手上的问题,减少用户重复手动对焦的过程,有效提高对焦精度,增加图像的清晰度,进而提高文本识别精度。
图8为本申请实施例提供的一种终端设备的结构示意图,如图8所示,终端设备可包括:处理器81和存储器82;其中,存储器82存储有计算机程序,计算机程序适于由处理器81加载并执行上述的方法步骤。终端设备还可以包括发送器83和接收器84。
其中,一个示例中,终端设备不具备ISP功能。另一个示例中,终端设备的处理器具 备ISP功能,或者,终端设备还包括ISP芯片。
本申请实施例还提供了一种计算机存储介质,计算机存储介质可以存储有多条指令,指令适于由处理器加载并执行如上述实施例的方法步骤,具体执行过程可以参见上述实施例的具体说明,在此不进行赘述。
其中,存储介质所在设备可以是摄像头,或者是终端设备。
图9为本申请实施例提供了另一种终端设备的结构示意图。如图9所示,终端设备1000可以包括:至少一个处理器1001,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。
其中,通信总线1002用于实现这些组件之间的连接通信。
其中,用户接口1003可以包括显示屏(Display)、摄像头(Camera),可选用户接口1003还可以包括标准的有线接口、无线接口。
其中,网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
其中,处理器1001可以包括一个或者多个处理核心。处理器1001利用各种接口和线路连接整个终端设备1000内的各个部分,通过运行或执行存储在存储器1005内的指令、程序、代码集或指令集,以及调用存储在存储器1005内的数据,执行终端设备1000的各种功能和处理数据。可选的,处理器1001可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器1001可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器1001中,单独通过一块芯片进行实现。
其中,存储器1005可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器1005包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1005可用于存储指令、程序、代码、代码集或指令集。存储器1005可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图9所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及终端设备的操作应用程序。
在图9所示的终端设备1000中,用户接口1003主要用于为用户提供输入的接口,获取用户输入的数据;而处理器1001可以用于调用存储器1005中存储的终端设备的操作应用程序,并具体执行上述实施例提供的方法。
本申请实施例还提供一种计算机程序产品,计算机程序产品包括:计算机程序,计算机程序存储在可读存储介质中,终端设备的至少一个处理器可以从可读存储介质读取计算 机程序,至少一个处理器执行计算机程序使得终端设备执行上述任一实施例提供的方案。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要 素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (19)

  1. 一种基于摄像头的文本识别方法,其特征在于,所述方法应用于终端设备,所述方法包括:
    在当前对焦模式下,采集位于预设的摄像头取景范围内的待识别文本的第一图像,并确定所述第一图像的第一模糊度;
    若根据所述第一模糊度,确定所述当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,所述提示信息用于提示用户在所述可视化界面上对所述第一图像发出触发指令;
    响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,并对所述第二图像进行文本识别。
  2. 根据权利要求1所述的方法,其特征在于,根据所述第一模糊度,确定所述当前对焦模式满足预设的对焦模式切换条件,包括:
    若确定所述第一模糊度满足预设的模糊度阈值比对条件,则将所述当前对焦模式与预设的目标对焦模式进行对比;
    若所述当前对焦模式不是预设的目标对焦模式,则确定所述当前对焦模式满足预设的对焦模式切换条件。
  3. 根据权利要求2所述的方法,其特征在于,确定所述第一模糊度满足预设的模糊度阈值比对条件,包括:
    若所述第一模糊度超过预设的模糊度阈值,则确定所述第一模糊度满足预设的模糊度阈值比对条件。
  4. 根据权利要求2所述的方法,其特征在于,在确定所述第一图像的第一模糊度之后,还包括:
    若所述第一模糊度不满足预设的模糊度阈值比对条件,则根据预设的文本识别算法,对所述第一图像进行文本识别。
  5. 根据权利要求2所述的方法,其特征在于,在若确定所述第一模糊度是否满足预设的模糊度阈值比对条件,则将所述当前对焦模式是否为与预设的目标对焦模式进行对比之前,还包括:
    获取预先采集的样本图像集;
    根据预设的模糊度确定算法,确定所述样本图像集中任一样本图像的模糊度数值;
    根据所述模糊度数值和预设的模糊度阈值取值规则,确定所述模糊度阈值。
  6. 根据权利要求2所述的方法,其特征在于,在将所述当前对焦模式是否为与预设的目标对焦模式进行对比之后,还包括:
    若确定所述当前对焦模式是所述目标对焦模式,则根据预设的文本识别算法,对所述第一图像进行文本识别。
  7. 根据权利要求6所述的方法,其特征在于,在确定所述当前对焦模式是所述目标对焦模式之后,还包括:
    确定摄像头在所述目标对焦模式下,对所述待识别文本的当前图像采集次数;
    若所述当前图像采集次数超过预设的次数阈值,则根据预设的文本识别算法,对所述第一图像进行文本识别;
    若所述当前图像采集次数未超过预设的次数阈值,则确定所述当前对焦模式满足预设的对焦模式切换条件。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述提示信息包括语音 和/或动画提醒,所述触发指令包括触屏操作。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述当前对焦模式为自动对焦模式,所述目标对焦模式为手动对焦模式;
    相应地,响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,包括:
    响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置;
    将所述自动对焦模式切换为手动对焦模式,根据所述目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像。
  10. 根据权利要求9所述的方法,其特征在于,响应于用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定目标对焦位置,包括:
    根据用户在可视化界面上对第一图像中任一坐标点进行的触屏操作,确定用户指定的目标坐标位置;
    根据所述第一图像中的目标坐标位置,确定所述待识别文本中的目标对焦位置。
  11. 根据权利要求9所述的方法,其特征在于,将所述自动对焦模式切换为手动对焦模式,根据所述目标对焦位置,对摄像头取景范围内的待识别文本进行图像采集,得到第二图像,包括:
    关闭自动对焦模式,并在所述手动对焦模式下,对焦摄像头取景范围内的待识别文本的目标对焦位置,对所述待识别文本进行图像采集,得到所述第二图像。
  12. 根据权利要求9所述的方法,其特征在于,在将所述自动对焦模式切换为手动对焦模式之后,还包括:
    若所述手动对焦模式的维持时间超过预设的时间阈值,则将所述手动对焦模式切换为所述自动对焦模式。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,在采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像之后,还包括:
    确定所述第二图像的第二模糊度;
    若所述第二模糊度满足预设的模糊度阈值比对条件,则确定摄像头在所述目标对焦模式下,对所述待识别文本的当前图像采集次数;
    若所述当前图像采集次数未超过预设的次数阈值,则在可视化界面上显示提示信息,以引导用户在所述可视化界面上对所述第二图像发出触发指令。
  14. 根据权利要求1-13中任一项所述的方法,其特征在于,确定所述第一图像的第一模糊度,包括:
    根据预设的模糊度确定算法,得到所述第一图像的第一模糊度。
  15. 根据权利要求14所述的方法,其特征在于,所述模糊度确定算法包括Brenner梯度函数、灰度差分函数或熵函数中的至少一项。
  16. 一种基于摄像头的文本识别装置,其特征在于,所述装置配置于终端设备上,所述装置包括:
    第一模糊度确定模块,用于在当前对焦模式下,采集预设的摄像头取景范围内待识别文本的第一图像,并确定所述第一图像的第一模糊度;
    提示信息显示模块,用于若根据所述第一模糊度,确定所述当前对焦模式满足预设的对焦模式切换条件,则在可视化界面上显示提示信息,所述提示信息用于提示用户在所述可视化界面上对所述第一图像发出触发指令;
    文本识别模块,用于响应于用户的触发指令,将所述当前对焦模式切换为目标对焦模式,采集位于所述预设的摄像头取景范围内的所述待识别文本的第二图像,并对 所述第二图像进行文本识别。
  17. 一种电子设备,其特征在于,包括:处理器,以及与所述处理器通信连接的存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,以实现如权利要求1-15中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如权利要求1至15任一项所述的方法。
  19. 一种计算机程序产品,其特征在于,包括计算机程序,该计算机程序被处理器执行时实现权利要求1-15中任一项所述的方法。
PCT/CN2023/083265 2022-04-08 2023-03-23 一种基于摄像头的文本识别方法、装置、设备及存储介质 WO2023193607A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210366298.2 2022-04-08
CN202210366298.2A CN116935391A (zh) 2022-04-08 2022-04-08 一种基于摄像头的文本识别方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023193607A1 true WO2023193607A1 (zh) 2023-10-12

Family

ID=88243982

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083265 WO2023193607A1 (zh) 2022-04-08 2023-03-23 一种基于摄像头的文本识别方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN116935391A (zh)
WO (1) WO2023193607A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601879A (zh) * 2014-11-29 2015-05-06 深圳市金立通信设备有限公司 一种对焦方法
CN105704378A (zh) * 2016-02-29 2016-06-22 广东欧珀移动通信有限公司 控制方法、控制装置及电子装置
CN111970437A (zh) * 2020-08-03 2020-11-20 广东小天才科技有限公司 文本拍摄方法、可穿戴设备和存储介质
CN112312016A (zh) * 2020-10-28 2021-02-02 维沃移动通信有限公司 拍摄处理方法、装置、电子设备和可读存储介质
CN112822391A (zh) * 2020-07-28 2021-05-18 腾讯科技(深圳)有限公司 对焦模式的控制方法、装置、设备及计算机可读存储介质
CN113132620A (zh) * 2019-12-31 2021-07-16 华为技术有限公司 一种图像拍摄方法及相关装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601879A (zh) * 2014-11-29 2015-05-06 深圳市金立通信设备有限公司 一种对焦方法
CN105704378A (zh) * 2016-02-29 2016-06-22 广东欧珀移动通信有限公司 控制方法、控制装置及电子装置
CN113132620A (zh) * 2019-12-31 2021-07-16 华为技术有限公司 一种图像拍摄方法及相关装置
CN112822391A (zh) * 2020-07-28 2021-05-18 腾讯科技(深圳)有限公司 对焦模式的控制方法、装置、设备及计算机可读存储介质
CN111970437A (zh) * 2020-08-03 2020-11-20 广东小天才科技有限公司 文本拍摄方法、可穿戴设备和存储介质
CN112312016A (zh) * 2020-10-28 2021-02-02 维沃移动通信有限公司 拍摄处理方法、装置、电子设备和可读存储介质

Also Published As

Publication number Publication date
CN116935391A (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
US11635876B2 (en) Devices, methods, and graphical user interfaces for moving a current focus using a touch-sensitive remote control
US11488406B2 (en) Text detection using global geometry estimators
AU2019338180B2 (en) User interfaces for simulated depth effects
KR102054633B1 (ko) 주변 장치들과 무선 페어링하고 주변 장치들에 관한 상태 정보를 디스플레이하기 위한 디바이스들, 방법들, 및 그래픽 사용자 인터페이스들
JP7032572B2 (ja) ユーザアカウントに対する認証資格証明を管理するためのデバイス、方法、及びグラフィカルユーザインターフェース
CN110795018B (zh) 用于在相机界面之间进行切换的设备、方法和图形用户界面
CN111698230B (zh) 相关通信模式选择
US20120174029A1 (en) Dynamically magnifying logical segments of a view
US20130159903A1 (en) Method of displaying graphic user interface using time difference and terminal supporting the same
WO2020055613A1 (en) User interfaces for simulated depth effects
US9930287B2 (en) Virtual noticeboard user interaction
WO2023193607A1 (zh) 一种基于摄像头的文本识别方法、装置、设备及存储介质
EP3407174B1 (en) Method and apparatus for operating a plurality of objects on pressure touch-control screen
CN113110770B (zh) 一种控制方法及装置
CA3003002A1 (en) Systems and methods for using image searching with voice recognition commands

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23784171

Country of ref document: EP

Kind code of ref document: A1