WO2020097909A1 - 文本检测方法、装置及存储介质 - Google Patents

文本检测方法、装置及存储介质 Download PDF

Info

Publication number
WO2020097909A1
WO2020097909A1 PCT/CN2018/115874 CN2018115874W WO2020097909A1 WO 2020097909 A1 WO2020097909 A1 WO 2020097909A1 CN 2018115874 W CN2018115874 W CN 2018115874W WO 2020097909 A1 WO2020097909 A1 WO 2020097909A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection frame
text
detection
image
frame
Prior art date
Application number
PCT/CN2018/115874
Other languages
English (en)
French (fr)
Inventor
柯福全
王喜顺
王俊
Original Assignee
北京比特大陆科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京比特大陆科技有限公司 filed Critical 北京比特大陆科技有限公司
Priority to PCT/CN2018/115874 priority Critical patent/WO2020097909A1/zh
Priority to CN201880098360.6A priority patent/CN112789623B/zh
Publication of WO2020097909A1 publication Critical patent/WO2020097909A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Definitions

  • Embodiments of the present invention relate to the field of image processing technology, and in particular, to a text detection method, device, and storage medium.
  • the smart terminal can recognize the text contained in the image, and then convert the text in the image into editable text according to the recognition result, so as to realize the secondary editing and quick sharing of the text information in the image.
  • Text detection is a prerequisite step for text recognition. Text detection is used to determine where the text is in the image.
  • the current detection methods can be divided into two categories: one is single-word detection, and then the detection frame is merged; the other is the detection frame regression, which mainly outputs many candidate rectangular frames through neural network detection, and then performs non-polarization based on these candidate rectangular frames Large values inhibit the selection of the final detection frame.
  • the labeling of word detection is very heavy, and it is difficult to obtain large-scale training data.
  • the rectangular frame selected by the regression of the detection frame either has an intersection area or cannot completely cover the original text area, resulting in multiple inspections or missed inspections.
  • the text detection method, device and storage medium provided by the embodiments of the present invention improve the accuracy of acquiring the text detection frame.
  • the present invention provides the following technical solutions:
  • a first aspect of the present invention provides a text detection method, including:
  • the first detection frame If the first detection frame satisfies the preset cutting condition, the first detection frame is cut to obtain a second detection frame;
  • the image corresponding to the second detection frame is used as the text detection result.
  • the neural network model is obtained by training convolutional neural network U-Net structure on image data marked with a text truth box.
  • the acquiring the first detection frame of the text area based on the mask image includes:
  • cutting the first detection frame to obtain a second detection frame includes:
  • the first detection frame Cut to obtain the second detection frame.
  • the cutting the first detection frame to obtain a second detection frame includes:
  • the using the image corresponding to the second detection frame as a text detection result includes:
  • the image corresponding to the adjusted second detection frame is used as the text detection result.
  • the adjusting the position of the cutting point includes:
  • a new cutting point position is determined according to the average gradient curve.
  • the determining a new cutting point location according to the average gradient curve includes:
  • the position of the first image corresponding to the smallest average gradient value in the average gradient curve is used as the new cutting point position.
  • a second aspect of the present invention provides a text detection device, including:
  • the acquisition module is used to acquire the mask image including the text area in the target image through the neural network model;
  • the acquiring module is further configured to acquire the first detection frame of the text area based on the mask image;
  • a cutting module configured to cut the first detection frame to obtain a second detection frame if the first detection frame meets a preset cutting condition
  • the determination module is configured to use the image corresponding to the second detection frame as a text detection result.
  • a third aspect of the present invention provides a text detection device, including:
  • the computer program is stored in the memory, and is configured to be executed by the processor to implement the text detection method according to any one of the first aspects of the present invention.
  • a fourth aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, which is executed by a processor to implement the text detection method according to any one of the first aspects of the present invention.
  • Embodiments of the present invention provide a text detection method, device, and storage medium.
  • a neural network model is used to obtain a mask image including a text area in a target image; a first detection frame of the text area is obtained based on the mask image; if the first detection frame If the preset cutting conditions are met, the first detection frame is cut to obtain a second detection frame; the image corresponding to the second detection frame is used as the text detection result.
  • the above text detection method can be used to process long text boxes and curved text boxes, which improves the accuracy of acquiring text detection boxes.
  • FIG. 1 is a schematic flowchart of a text detection method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a target image provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a mask diagram corresponding to a target image provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an outer contour of a white area in a mask diagram provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a first detection frame of a target image provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a second detection frame after cutting according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a text detection method according to another embodiment of the present invention.
  • FIG. 8 is a schematic diagram of adjusting the position of the cutting point of the second detection frame according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a text detection device according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a hardware structure of a text detection device according to an embodiment of the present invention.
  • the "and / or” in the present invention describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and / or B, which can mean: there are A alone, A and B exist simultaneously, and B alone exists Kind of situation.
  • the character "/" generally indicates that the related object is a "or" relationship.
  • One embodiment or “another embodiment” mentioned throughout the specification of the present invention means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, “in some embodiments” or “in this embodiment” appearing throughout the specification does not necessarily refer to the same embodiment. It should be noted that the embodiments of the present invention and the features in the embodiments can be combined with each other without conflict.
  • the text detection method provided by the embodiment of the present invention specifically proposes a new detection frame generation method. After generating a mask image of text through a neural network model, image processing is performed based on the mask image to determine the final text detection frame. Use the image corresponding to the text detection frame as the final text detection result for subsequent text recognition and other processing.
  • the text detection method provided in this embodiment can process long text boxes and curved text boxes, and has higher detection accuracy.
  • FIG. 1 is a schematic flowchart of a text detection method provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a target image provided by an embodiment of the present invention
  • FIG. 3 is a mask diagram corresponding to a target image provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an outer contour of a white area in a mask diagram provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a first detection frame of a target image provided by an embodiment of the present invention;
  • the text detection method provided in this embodiment includes the following steps:
  • the target image in this embodiment is a color or black-and-white image captured by a user through a smart terminal, and the image includes text information.
  • the target image includes cartoon characters and text description content, as shown in FIG. 2.
  • the text information in the image may be bent and deformed.
  • the text in the target image is captured because the book itself is not placed flat The information is bent and deformed.
  • the text detection method provided in this embodiment can accurately confirm the deformed text area in the image.
  • the neural network model in this embodiment is obtained by training convolutional neural network U-Net structure on the image data marked with the text truth box.
  • the training process is as follows:
  • the U-Net structure in this embodiment actually solves a binary classification problem.
  • the text frame of the sample image is a positive sample, and the background is a negative sample. Because the samples are not balanced, the neural network model is trained using dice loss as the loss function.
  • U-Net is a variant of the convolutional neural network, and its structure resembles the letter U, hence the name U-Net.
  • U-Net is based on FCN (Fully Convultional Neural Network: Fully Convolutional Neural Network) to improve, and the use of data augmentation can train some relatively small sample data.
  • the entire neural network is mainly composed of two parts: contraction path and expansion path.
  • the contraction path is mainly used to capture the context information in the target image, and the expansion path commensurate with it is for accurately positioning the segment that needs to be segmented in the target image.
  • the mask image including the text area in the target image can be obtained, that is, multiple candidates corresponding to the text information in the target image can be obtained The location of the region and multiple candidate regions in the target image.
  • the mask image is a picture composed of black and white.
  • the black area of the mask image is a non-text area in the target image
  • the white area of the mask image is a text area in the target image.
  • the oblique lines in the figure represent the black area of the mask image, that is, the non-text area in the target image
  • the white area is the text area in the target image.
  • the solid black rectangular frame shown in FIG. 5 is the first detection frame of the text area. It should be noted that the first detection frame is the initial detection frame of the text area in the target image.
  • the acquired first detection frame may have an intersection area.
  • the acquired first detection frame may not cover the original text area, or may include too many non-text areas.
  • further image processing is performed on the acquired first detection frame to obtain a more accurate detection frame. For details, refer to S103.
  • the preset cutting conditions include a first preset cutting condition and a second preset cutting condition. Only when the first detection frame satisfies both the first preset cutting condition and the second cutting condition, the first detection frame is cut.
  • the first detection frame is cut to obtain Second detection frame.
  • the area of the outer contour of the text area in the target image extracted in S102 is necessarily smaller than the area of the first detection frame after fitting.
  • the ratio of the area of the outer contour to the area of the first detection frame is 0.6, which is less than the preset ratio (eg 0.8), then the first detection frame '0' satisfies Preset cutting conditions; in addition, the size of the first detection frame '0' is 24 * 2, that is, the length direction is 24pix, the width direction is 2pix, and the preset aspect ratio is 8, the first detection frame '0' can be determined Has an aspect ratio of 12, which is greater than the preset aspect ratio of 8. At this time, the first detection frame '0' satisfies the second preset cutting condition. Therefore, the first detection frame '0' needs to be cut. Similarly, based on the above-mentioned preset cutting conditions, it is determined that the first detection frame '1' needs to be cut.
  • the preset ratio eg 0.8
  • the size of the first detection frame '2' in Fig. 5 is 28 * 2, and its aspect ratio is 14, which is greater than the preset aspect ratio of 8.
  • the ratio of the area of '2' is 0.9, which is greater than the preset ratio of 0.8, indicating that the detection frame has fully covered the text area in the target image, and if only the second preset cutting condition is met, no further cutting of the detection frame is performed .
  • first detection frame '2' in FIG. 5 there is also a possibility that the first detection frame satisfies the first preset cutting condition, but does not satisfy the second preset cutting condition, nor does it Perform further cutting of the detection frame.
  • first detection frame satisfies the first preset cutting condition, but does not satisfy the second preset cutting condition, nor does it Perform further cutting of the detection frame.
  • the first detection frame satisfies the first preset cutting condition, but does not satisfy the second preset cutting condition, nor does it Perform further cutting of the detection frame.
  • the cutting of the first detection frame is to divide the first detection frame in equal proportions according to a preset aspect ratio to obtain at least two second detection frames.
  • a preset aspect ratio for example, the size of the first detection frame '0' in Fig. 5 is 24 * 2, and the preset aspect ratio is 8, then the first detection frame '0' is cut into two parts in proportion, and the cut dimensions For 16 * 2, 8 * 2, the second detection frames '3' and '4' are obtained, as shown in Figure 6.
  • the image corresponding to the second detection frame is used as the text detection result for subsequent text recognition and other processing.
  • the detection frame obtained through the above process is compared with the prior art. The accuracy is higher, the unnecessary background images are eliminated, and the calculation workload of subsequent text recognition is reduced.
  • the text detection method obtaineds a mask image of a text area in a target image through a neural network model; obtains a first detection frame of the text area based on the mask image; if the first detection frame meets the preset cutting conditions, Then, the first detection frame is cut to obtain a second detection frame; the image corresponding to the second detection frame is used as the text detection result.
  • the above text detection method can be used to process long text boxes and curved text boxes, which improves the accuracy of acquiring text detection boxes.
  • the text detection method provided in this embodiment is mainly to solve the problems existing in cutting the first detection frame in the above embodiments.
  • the first detection frame is cut and detected according to the proportional cutting method.
  • the connection of the frame cutting point may be on the text in the target image, which will cause the text recognition to fail because the position of the cutting point needs to be adjusted.
  • FIG. 7 is a schematic flowchart of a text detection method according to another embodiment of the present invention
  • FIG. 8 is a schematic diagram of adjusting the position of a cutting point of a second detection frame according to an embodiment of the present invention.
  • the text detection method provided in this embodiment includes the following steps:
  • S201-S203 in this embodiment are the same as S101-S103 in the above embodiments, and the implementation principles and technical effects are the same. For details, refer to the above embodiments, and details are not described here.
  • the position of the cutting point needs to be adjusted.
  • the specific adjustment rules are as follows:
  • the position of the first image corresponding to the smallest average gradient value in the average gradient curve is taken as the new cutting point position.
  • the adjustment process is to adjust the position of the two sides on the left and right sides of the cutting position, including the following steps:
  • the image on the original target image corresponding to the position adjustment rectangular frame (ie, the first image) is intercepted and scaled to a preset height, for example, the original image has a height of 8pix, and an image with a height of 32pix is obtained after zooming.
  • Based on the scaled image calculate the gradient map of the image. For example, a small window with a height of 32pix and a width of 4pix slides along the horizontal direction of the image to calculate the average gradient of all positions of the image.
  • the gradient of each pixel in the position sliding window is divided by the number of pixels in the sliding window.
  • the adjusted second detection frame After determining the updated cutting point position, the adjusted second detection frame is obtained, and the image corresponding to the adjusted second detection frame is used as the text detection result.
  • the adjusted second detection frame obtained by the text detection method provided in this embodiment does not have the problem of cutting characters, which improves the accuracy of text detection.
  • the text detection method obtaineds a mask image of a text area in a target image through a neural network model; obtains a first detection frame of the text area based on the mask image; if the first detection frame meets the preset cutting conditions, Then cut the first detection frame to obtain the second detection frame; when determining that the line of the cutting point of the second detection frame cuts the text, adjust the position of the cutting point; adjust the corresponding second detection frame
  • the image is used as the text detection result.
  • the text detection method of this embodiment has higher text detection accuracy than the above embodiments.
  • An embodiment of the present invention also provides a text detection device. As shown in FIG. 9, the embodiment of the present invention only uses FIG. 9 as an example for description, and does not mean that the present invention is limited to this.
  • FIG. 9 is a schematic structural diagram of a text detection device according to an embodiment of the present invention. As shown in FIG. 9, the text detection device 30 provided in this embodiment includes:
  • the obtaining module 31 is used to obtain a mask image including a text area in the target image through a neural network model
  • the acquiring module 32 is further configured to acquire the first detection frame of the text area based on the mask image;
  • a cutting module 33 configured to cut the first detection frame to obtain a second detection frame if the first detection frame meets the preset cutting conditions
  • the determination module 34 is configured to use the image corresponding to the second detection frame as a text detection result.
  • the text detection device includes an acquisition module, a cutting module, and a determination module, wherein the acquisition module is used to acquire a mask image including a text area in a target image through a neural network model, and acquire the text area based on the mask image The first detection frame; if the first detection frame satisfies the preset cutting conditions, the cutting module is used to cut the first detection frame to obtain the second detection frame; the determination module is used to apply the image corresponding to the second detection frame As a result of text detection.
  • the above text detection device can be used to process long text boxes and curved text boxes, which improves the accuracy of acquiring text detection boxes.
  • the neural network model is obtained by training the image data marked with the text truth box by using a convolutional neural network U-Net structure.
  • the obtaining module 31 is specifically used to:
  • the cutting module 33 is specifically used for:
  • the first detection frame Cut to obtain the second detection frame.
  • the cutting module 33 is specifically used for:
  • the determination module 34 is specifically used to:
  • the image corresponding to the adjusted second detection frame is used as the text detection result.
  • the adjusting the position of the cutting point includes:
  • a new cutting point position is determined according to the average gradient curve.
  • the determining a new cutting point location according to the average gradient curve includes:
  • the position of the first image corresponding to the smallest average gradient value in the average gradient curve is used as the new cutting point position.
  • the text detection device provided in this embodiment can execute the technical solutions of the foregoing method embodiments, and its implementation principles and technical effects are similar, and will not be repeated here.
  • An embodiment of the present invention also provides a text detection device. As shown in FIG. 10, the embodiment of the present invention only uses FIG. 10 as an example for description, and does not mean that the present invention is limited to this.
  • FIG. 10 is a schematic diagram of a hardware structure of a text detection device provided by an embodiment of the present invention. As shown in FIG. 10, the text detection device 40 provided by this embodiment includes:
  • the computer program is stored in the memory 41 and is configured to be executed by the processor 42 to implement the technical solution of any one of the foregoing method embodiments.
  • the implementation principles and technical effects are similar, and are not repeated here.
  • the memory 41 may be independent or integrated with the processor 42.
  • the text detection apparatus 40 further includes:
  • the bus 43 is used to connect the memory 41 and the processor 42.
  • An embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by the processor 42 to implement various steps performed by the text detection apparatus 40 in the above method embodiment.
  • processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), or may be other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP), special integrated circuits (English: Application, Integrated, Circuit, ASIC for short), etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the invention can be directly embodied and executed by a hardware processor, or can be executed and completed by a combination of hardware and software modules in the processor.
  • the memory may include a high-speed RAM memory, or may also include a non-volatile storage NVM, such as at least one magnetic disk memory, and may also be a U disk, a mobile hard disk, a read-only memory, a magnetic disk, or an optical disk.
  • NVM non-volatile storage
  • the bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnection (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus in the drawings of this application does not limit to only one bus or one type of bus.
  • the above storage medium may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable In addition to programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable In addition to programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • optical disk any available medium that can be accessed by a general-purpose or special-purpose computer.
  • An exemplary storage medium is coupled to the processor so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and the storage medium may be located in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short).
  • ASIC Application Specific Integrated Circuits
  • the processor and the storage medium may also exist as discrete components in the electronic device or the main control device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

本发明提供一种文本检测方法、装置及存储介质,通过神经网络模型获取目标图像中包括文字区域的掩码图;基于掩码图获取文字区域的第一检测框;若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;将第二检测框对应的图像作为文本检测结果。上述文本检测方法可用于处理长文本框以及曲形文本框,提高了获取文本检测框的精度。

Description

文本检测方法、装置及存储介质 技术领域
本发明实施例涉及图像处理技术领域,特别是涉及一种文本检测方法、装置及存储介质。
背景技术
随着通信技术的发展,用户可以方便地通过智能终端采集感兴趣的图像,获取图像中包含的文字信息。智能终端可以根据图像中包含的文字进行识别,进而根据识别结果将图像中的文字转换为可编辑的文本,实现对图像中文本信息的二次编辑和快速分享。
文本检测是文本识别的前提步骤,通过文本检测确定文字在图像中的所在区域。目前的检测方法可以分为两类:一类是单字检测,然后合并检测框;一类是检测框回归,主要是通过神经网络检测输出很多候选矩形框,然后再基于这些候选矩形框进行非极大值抑制筛选出最终的检测框。
单字检测的标注工作量很大,难以得到大规模的训练数据。检测框回归筛选出来的矩形框不是存在交叉区域,就是无法完全覆盖原始的文本区域,导致多检或漏检。
发明内容
本发明实施例提供的文本检测方法、装置及存储介质,提高获取文本检测框的精度。
为达到上述目的,本发明提供如下技术方案:
本发明的第一方面提供一种文本检测方法,包括:
通过神经网络模型获取目标图像中包括文字区域的掩码图;
基于所述掩码图获取所述文字区域的第一检测框;
若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框;
将所述第二检测框对应的图像作为文本检测结果。
在一种可能的实现方式中,所述神经网络模型是采用卷积神经网络U-Net结构对标注有文字真值框的图像数据进行训练得到的。
在一种可能的实现方式中,所述基于所述掩码图获取所述文字区域的第一检测框,包括:
提取所述掩码图的外部轮廓;
对所述外部轮廓进行拟合,得到所述文字区域的第一检测框。
在一种可能的实现方式中,所述若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框,包括:
若所述外部轮廓的面积与所述第一检测框的面积的比值小于预设比值,且,所述第一检测框的长宽比大于预设长宽比,则对所述第一检测框进行切割,得到第二检测框。
在一种可能的实现方式中,所述对所述第一检测框进行切割,得到第二检测框,包括:
根据所述预设长宽比对所述第一检测框进行等比例分割,得到至少两个所述第二检测框。
在一种可能的实现方式中,所述将所述第二检测框对应的图像作为文本检测结果,包括:
判断所述第二检测框的切割点的连线是否切割到文字,若是,则对所述切割点的位置进行调整;
将调整后的第二检测框对应的图像作为文本检测结果。
在一种可能的实现方式中,所述对所述切割点的位置进行调整,包括:
截取所述第二检测框中所述切割点连线预设范围内的第一图像;
获取所述第一图像对应的平均梯度曲线;
根据所述平均梯度曲线确定新的切割点位置。
在一种可能的实现方式中,所述根据所述平均梯度曲线确定新的切割点位置,包括:
将所述平均梯度曲线中最小平均梯度值对应的所述第一图像的位置作为新的切割点位置。
本发明的第二方面提供一种文字检测装置,包括:
获取模块,用于通过神经网络模型获取目标图像中包括文字区域的掩码 图;
所述获取模块,还用于基于所述掩码图获取所述文字区域的第一检测框;
切割模块,用于若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框;
确定模块,用于将所述第二检测框对应的图像作为文本检测结果。
本发明的第三方面提供一种文字检测装置,包括:
存储器;
处理器;以及
计算机程序;
其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行以实现如本发明第一方面任一项所述的文本检测方法。
本发明的第四方面提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现如本发明第一方面任一项所述的文本检测方法。
本发明实施例提供一种文本检测方法、装置及存储介质,通过神经网络模型获取目标图像中包括文字区域的掩码图;基于掩码图获取文字区域的第一检测框;若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;将第二检测框对应的图像作为文本检测结果。上述文本检测方法可用于处理长文本框以及曲形文本框,提高了获取文本检测框的精度。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是示例性的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一实施例提供的文本检测方法的流程示意图;
图2为本发明一实施例提供的目标图像的示意图;
图3为本发明一实施例提供的目标图像对应的掩码图的示意图;
图4为本发明一实施例提供的掩码图中白色区域的外部轮廓的示意图;
图5为本发明一实施例提供的目标图像的第一检测框的示意图;
图6为本发明一实施例提供的切割后的第二检测框的示意图;
图7为本发明另一实施例提供的文本检测方法的流程示意图;
图8为本发明一实施例提供的对第二检测框的切割点位置的调整示意图;
图9为本发明一实施例提供的文本检测装置的结构示意图;
图10为本发明一实施例提供的文本检测装置的硬件结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。
本发明的说明书和权利要求书中的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
本发明中的“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本发明的说明书中通篇提到的“一实施例”或“另一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一些实施例中”或“在本实施例中”未必一定指相同的实施例。需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。
本发明实施例提供的文本检测方法,具体提出了一种新的检测框的生成方式,通过神经网络模型生成文本的掩码图之后,基于该掩码图进行图像处理确定最终的文本检测框,将该文本检测框对应的图像作为最终的文本检测结果,以便后续进行文本识别等处理。与现有技术方案相比,本实施例提供 的文本检测方法能够处理长文本框以及曲形文本框,检测精度较高。
下面以具体的实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
图1为本发明一实施例提供的文本检测方法的流程示意图,图2为本发明一实施例提供的目标图像的示意图,图3为本发明一实施例提供的目标图像对应的掩码图的示意图,图4为本发明一实施例提供的掩码图中白色区域的外部轮廓的示意图,图5为本发明一实施例提供的目标图像的第一检测框的示意图;图6为本发明一实施例提供的切割后的第二检测框的示意图。
如图1所示,本实施例提供的文本检测方法包括如下步骤:
S101、通过神经网络模型获取目标图像中包括文字区域的掩码图;
本实施例的目标图像为用户通过智能终端拍摄的彩色或黑白图像,该图像中包括文字信息。例如,用户拍摄儿童绘本,目标图像中包括卡通人物以及文字描述内容,如图2所示。
需要指出的是,用户拍摄的目标图像由于拍摄角度或者目标物的状态不同,图像中文字信息可能发生弯曲形变,例如,用户拍摄儿童绘本时,由于书本本身放置不平,导致拍摄的目标图像中文字信息发生弯曲变形。对此,本实施例提供的文本检测方法可以精准地确认图像中的发生形变的文字区域。
本实施例中的神经网络模型是采用卷积神经网络U-Net结构对标注有文字真值框的图像数据进行训练得到的。其训练过程如下:
按行标注样本图像中的文字,每行文字画一个真值框,将标注的真值框适当缩小(主要是考虑到形变);将标注了真值框的样本图像输入到卷积神经网络U-Net结构中进行训练。本实施例中的U-Net结构实际解决的是一个二分类问题,样本图像的文字框为正样本,背景为负样本。由于样本不均衡,因此采用dice loss作为损失函数对神经网络模型进行训练。
其中,U-Net是卷积神经网络的一种变形,其结构形似字母U,因而得名U-Net。U-Net基于FCN(Fully Convultional Neural Network:全卷积神经网络)进行改进,并且利用数据增强可以对一些比较少样本的数据进行训练。整个神经网络主要有两部分组成:收缩路径和扩展路径。收缩路径主要是用来捕捉目标图像中的上下文信息,而与之相对称的扩展路径则是为了对目标 图像中所需要分割出来的部分进行精准定位。
在本步骤中,通过将目标图像输入至上述卷积神经网络模型中,即可得到目标图像中包括文字区域的掩码图,也就是说,即可获取目标图像中文字信息对应的多个候选区域以及多个候选区域在目标图像中的位置。
其中,掩码图是由黑色和白色组成的图,掩码图的黑色区域为目标图像中的非文字区域,掩码图的白色区域为目标图像中的文字区域。如图3所示,图中斜线部分表示掩码图的黑色区域,即目标图像中的非文字区域,白色区域为目标图像中的文字区域。
S102、基于掩码图获取文字区域的第一检测框;
通过S101获取目标图像中包括文字区域的掩码图之后,基于掩码图,提取掩码图的外部轮廓,具体来说,提取掩码图中白色区域的外部轮廓,如图4所示的3个虚线框。
对外部轮廓进行拟合,得到文字区域的第一检测框,如图5所示的黑色实线矩形框为文字区域的第一检测框。需要指出的是,第一检测框为目标图像中文字区域的初始检测框。
通常情况下,获取到的目标图像的第一检测框不止一个,因此,存在一种可能的情况,获取的第一检测框可能存在交叉区域。另外,还存在另一种可能的情况,获取的第一检测框可能无法覆盖原始的文本区域,也可能包括了过多的非文本区域。针对上述问题,现有技术方案并不能很好地解决。对此,本实施例对获取到的第一检测框作进一步图像处理,得到更精准的检测框,具体参见S103。
S103、若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;
在本步骤中,预设切割条件包括第一预设切割条件和第二预设切割条件。只有在第一检测框同时满足第一预设切割条件和第二切割条件时,才对第一检测框进行切割处理。
具体来说,若外部轮廓的面积与第一检测框的面积的比值小于预设比值,且,第一检测框的长宽比大于预设长宽比,则对第一检测框进行切割,得到第二检测框。
本领域技术人员可以理解,若用户拍摄的目标图像中的文字信息存在形 变,则在S102中提取的目标图像中文字区域的外部轮廓的面积必然小于拟合后的第一检测框的面积。
如图5中的第一检测框‘0’,其外部轮廓的面积与第一检测框的面积的比值为0.6,小于预设比值(如0.8),则第一检测框‘0’满足第一预设切割条件;另外,第一检测框‘0’的尺寸为24*2,即长度方向为24pix,宽度方向为2pix,预设长宽比为8,则可以确定第一检测框‘0’的长宽比为12,大于预设长宽比8,此时第一检测框‘0’满足第二预设切割条件。因此,需要对第一检测框‘0’进行切割处理。同样的,基于上述预设切割条件,判定需要对第一检测框‘1’进行切割处理。
需要指出的是,图5中的第一检测框‘2’的尺寸为28*2,其长宽比为14,大于预设长宽比8,但由于其外部轮廓的面积与第一检测框‘2’的面积的比值为0.9,大于预设比值0.8,说明该检测框已充分覆盖目标图像中文字区域,在仅满足第二预设切割条件的情况下,不执行对检测框的进一步切割。
除了图5中的第一检测框‘2’之外,还存在一种可能的情况,即第一检测框满足第一预设切割条件,但不满足第二预设切割条件,此时也不执行对检测框的进一步切割。例如,存在一定形变的较短的第一检测框。
在本实施例中,对第一检测框的切割是根据预设长宽比对第一检测框进行等比例分割,得到至少两个第二检测框。例如,图5中的第一检测框‘0’,其尺寸为24*2,预设长宽比为8,则将第一检测框‘0’等比例切割为二部分,切割后的尺寸分别为16*2、8*2,得到第二检测框‘3’和‘4’,如图6所示。
S104、将第二检测框对应的图像作为文本检测结果。
在确定第二检测框之后,将第二检测框对应的图像作为文本检测结果,以便后续进行文本识别等处理,通过上述过程得到的检测框,与现有技术相比,得到的文本检测框的精确度更高,剔除了不必要的背景图像,减小了后续文本识别的计算工作量。
本发明实施例提供的文本检测方法,通过神经网络模型获取目标图像中包括文字区域的掩码图;基于掩码图获取文字区域的第一检测框;若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;将第二检测框对应的图像作为文本检测结果。上述文本检测方法可用于处理长文 本框以及曲形文本框,提高了获取文本检测框的精度。
在上述实施例的基础上,本实施例提供的文本检测方法,主要在于解决上述实施例中对第一检测框进行切割时存在的问题,按照等比例切割方式对第一检测框进行切割,检测框切割点的连线可能对在目标图像中的文字上,这会导致文本识别失败,因为需要对切割点的位置进行调整。
下面结合附图对本实施例提供的文本检测方法进行详细说明。
图7为本发明另一实施例提供的文本检测方法的流程示意图,图8为本发明一实施例提供的对第二检测框的切割点位置的调整示意图。
如图7所示,本实施例提供的文本检测方法包括如下步骤:
S201、通过神经网络模型获取目标图像中包括文字区域的掩码图;
S202、基于掩码图获取文字区域的第一检测框;
S203、若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;
本实施例的S201-S203与上述实施例的S101-S103相同,其实现原理和技术效果相同,具体参见上述实施例,此处不再赘述。
S204、判断第二检测框的切割点的连线是否切割到文字,若是,则对切割点的位置进行调整;
在本实施例中,在确定第二检测框的切割点的连线切割到目标图像中的文字时,需要对切割点的位置进行调整。具体的调整规则如下:
截取第二检测框中切割点连线预设范围内的第一图像;
获取第一图像对应的平均梯度曲线;
根据平均梯度曲线确定新的切割点位置。具体的,
将平均梯度曲线中最小平均梯度值对应的第一图像的位置作为新的切割点位置。
如图8所示,上述实施例中的第一检测框‘0’切割后,得到两个第二检测框‘3’、‘4’,两个第二检测框正好将目标图像中的文字“你”切割。图中包括四个切割点p 0、p 1、p 2、p 3。其中,左边的第二检测框‘3’对应的边为p 1p 2,右边的第二检测框‘4’对应的边为p 0p 3。调整过程为调整切割位置左右两侧的两条边的位置,包括如下步骤:
1)以切割点左侧位置的边p 1p 2为中心左右各沿着横轴方向扩充h个像素,其中,h为边p 1p 2所在第二检测框的高度,得到一个位置调整矩形框;
2)截取位置调整矩形框对应的原始目标图像上的图像(即第一图像),将其缩放到预设高度,例如原始图像高度为8pix,放大后得到高度为32pix的图像。基于缩放后的图像,计算该图像的梯度图,例如以高度为32pix、宽度为4pix的小窗口沿着图像的水平方向滑动,计算图像所有位置的平均梯度,其中,某位置的平均梯度等于该位置滑动窗口内各像素的梯度和除以滑动窗口的像素数。
3)取平均梯度最小的位置作为新的切割点位置,将该位置按照比例关系换算回第二检测框的尺度位置,得到第二检测框新的切割点位置,使用该位置更新两个第二检测框的切割点,得到p 0’、p 1’、p 2’、p 3’。
S205、将调整后的第二检测框对应的图像作为文本检测结果。
在确定更新后的切割点位置之后,得到调整后的第二检测框,将调整后的第二检测框对应的图像作为文本检测结果。通过本实施例提供的文本检测方法得到的调整后的第二检测框不存在切割文字的问题,提高了文本检测的精度。
本发明实施例提供的文本检测方法,通过神经网络模型获取目标图像中包括文字区域的掩码图;基于掩码图获取文字区域的第一检测框;若第一检测框满足预设切割条件,则对第一检测框进行切割,得到第二检测框;在确定第二检测框的切割点的连线切割到文字时,对切割点的位置进行调整;将调整后的第二检测框对应的图像作为文本检测结果。本实施例的文本检测方法,较上述实施例具有更高的文本检测精度。
本发明实施例还提供一种文本检测装置,参见图9所示,本发明实施例仅以图9为例进行说明,并不表示本发明仅限于此。
图9为本发明一实施例提供的文本检测装置的结构示意图,如图9所示,本实施例提供的文本检测装置30包括:
获取模块31,用于通过神经网络模型获取目标图像中包括文字区域的掩码图;
所述获取模块32,还用于基于所述掩码图获取所述文字区域的第一检测 框;
切割模块33,用于若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框;
确定模块34,用于将所述第二检测框对应的图像作为文本检测结果。
本发明实施例提供的文本检测装置,包括获取模块、切割模块和确定模块,其中获取模块用于通过神经网络模型获取目标图像中包括文字区域的掩码图,基于所述掩码图获取文字区域的第一检测框;若所述第一检测框满足预设切割条件,则切割模块用于对第一检测框进行切割,得到第二检测框;确定模块用于将第二检测框对应的图像作为文本检测结果。上述文本检测装置可用于处理长文本框以及曲形文本框,提高了获取文本检测框的精度。
在上述实施例的基础上,可选的,所述神经网络模型是采用卷积神经网络U-Net结构对标注有文字真值框的图像数据进行训练得到的。
可选的,所述获取模块31,具体用于:
提取所述掩码图的外部轮廓;
对所述外部轮廓进行拟合,得到所述文字区域的第一检测框。
可选的,所述切割模块33,具体用于:
若所述外部轮廓的面积与所述第一检测框的面积的比值小于预设比值,且,所述第一检测框的长宽比大于预设长宽比,则对所述第一检测框进行切割,得到第二检测框。
可选的,所述切割模块33,具体用于:
根据所述预设长宽比对所述第一检测框进行等比例分割,得到至少两个所述第二检测框。
所述确定模块34,具体用于:
判断所述第二检测框的切割点的连线是否切割到文字,若是,则对所述切割点的位置进行调整;
将调整后的第二检测框对应的图像作为文本检测结果。
可选的,所述对所述切割点的位置进行调整,包括:
截取所述第二检测框中所述切割点连线预设范围内的第一图像;
获取所述第一图像对应的平均梯度曲线;
根据所述平均梯度曲线确定新的切割点位置。
可选的,所述根据所述平均梯度曲线确定新的切割点位置,包括:
将所述平均梯度曲线中最小平均梯度值对应的所述第一图像的位置作为新的切割点位置。
本实施例提供的文本检测装置,可以执行上述方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
本发明实施例还提供一种文本检测装置,参见图10所示,本发明实施例仅以图10为例进行说明,并不表示本发明仅限于此。
图10为本发明一实施例提供的文本检测装置的硬件结构示意图,如图10所示,本实施例提供的文本检测装置40,包括:
存储器41;
处理器42;以及
计算机程序;
其中,计算机程序存储在存储器41中,并被配置为由处理器42执行以实现如前述任一项方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
可选的,存储器41既可以是独立的,也可以跟处理器42集成在一起。
当存储器41是独立于处理器42之外的器件时,文本检测装置40还包括:
总线43,用于连接存储器41和处理器42。
本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器42执行以实现如上方法实施例中文本检测装置40所执行的各个步骤。
应理解,上述处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。
总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。
上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。当然,处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (11)

  1. 一种文本检测方法,其特征在于,包括:
    通过神经网络模型获取目标图像中包括文字区域的掩码图;
    基于所述掩码图获取所述文字区域的第一检测框;
    若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框;
    将所述第二检测框对应的图像作为文本检测结果。
  2. 根据权利要求1所述的方法,其特征在于,所述神经网络模型是采用卷积神经网络U-Net结构对标注有文字真值框的图像数据进行训练得到的。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述掩码图获取所述文字区域的第一检测框,包括:
    提取所述掩码图的外部轮廓;
    对所述外部轮廓进行拟合,得到所述文字区域的第一检测框。
  4. 根据权利要求3所述的方法,其特征在于,所述若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框,包括:
    若所述外部轮廓的面积与所述第一检测框的面积的比值小于预设比值,且,所述第一检测框的长宽比大于预设长宽比,则对所述第一检测框进行切割,得到第二检测框。
  5. 根据权利要求4所述的方法,其特征在于,所述对所述第一检测框进行切割,得到第二检测框,包括:
    根据所述预设长宽比对所述第一检测框进行等比例分割,得到至少两个所述第二检测框。
  6. 根据权利要求1所述的方法,其特征在于,所述将所述第二检测框对应的图像作为文本检测结果,包括:
    判断所述第二检测框的切割点的连线是否切割到文字,若是,则对所述切割点的位置进行调整;
    将调整后的第二检测框对应的图像作为文本检测结果。
  7. 根据权利要求6所述的方法,其特征在于,所述对所述切割点的位置进行调整,包括:
    截取所述第二检测框中所述切割点连线预设范围内的第一图像;
    获取所述第一图像对应的平均梯度曲线;
    根据所述平均梯度曲线确定新的切割点位置。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述平均梯度曲线确定新的切割点位置,包括:
    将所述平均梯度曲线中最小平均梯度值对应的所述第一图像的位置作为新的切割点位置。
  9. 一种文字检测装置,其特征在于,包括:
    获取模块,用于通过神经网络模型获取目标图像中包括文字区域的掩码图;
    所述获取模块,还用于基于所述掩码图获取所述文字区域的第一检测框;
    切割模块,用于若所述第一检测框满足预设切割条件,则对所述第一检测框进行切割,得到第二检测框;
    确定模块,用于将所述第二检测框对应的图像作为文本检测结果。
  10. 一种文字检测装置,其特征在于,包括:
    存储器;
    处理器;以及
    计算机程序;
    其中,所述计算机程序存储在所述存储器中,并被配置为由所述处理器执行以实现如权利要求1~8任一项所述的文本检测方法。
  11. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1~8任一项所述的文本检测方法。
PCT/CN2018/115874 2018-11-16 2018-11-16 文本检测方法、装置及存储介质 WO2020097909A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/115874 WO2020097909A1 (zh) 2018-11-16 2018-11-16 文本检测方法、装置及存储介质
CN201880098360.6A CN112789623B (zh) 2018-11-16 2018-11-16 文本检测方法、装置及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/115874 WO2020097909A1 (zh) 2018-11-16 2018-11-16 文本检测方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2020097909A1 true WO2020097909A1 (zh) 2020-05-22

Family

ID=70731920

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115874 WO2020097909A1 (zh) 2018-11-16 2018-11-16 文本检测方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN112789623B (zh)
WO (1) WO2020097909A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753812A (zh) * 2020-07-30 2020-10-09 上海眼控科技股份有限公司 文本识别方法及设备
CN111881050A (zh) * 2020-07-31 2020-11-03 北京爱奇艺科技有限公司 一种文本图层的剪裁方法、装置及电子设备
CN112085010A (zh) * 2020-10-28 2020-12-15 成都信息工程大学 一种基于图像识别的口罩检测和部署系统及方法
CN112528889A (zh) * 2020-12-16 2021-03-19 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112651394A (zh) * 2020-12-31 2021-04-13 北京一起教育科技有限责任公司 一种图像检测方法、装置及电子设备
CN112949642A (zh) * 2021-02-23 2021-06-11 北京三快在线科技有限公司 一种文字生成方法、装置、存储介质及电子设备
CN112966678A (zh) * 2021-03-11 2021-06-15 南昌航空大学 一种文本检测方法及系统
CN113033543A (zh) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 曲形文本识别方法、装置、设备及介质
CN113449724A (zh) * 2021-06-09 2021-09-28 浙江大华技术股份有限公司 一种图像文本校正方法、装置、设备及存储介质
CN114973268A (zh) * 2022-04-29 2022-08-30 北京智通东方软件科技有限公司 文本识别方法、装置、存储介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301414A (zh) * 2017-06-23 2017-10-27 厦门商集企业咨询有限责任公司 一种自然场景图像中的中文定位、分割和识别方法
US10007863B1 (en) * 2015-06-05 2018-06-26 Gracenote, Inc. Logo recognition in images and videos
CN108520254A (zh) * 2018-03-01 2018-09-11 腾讯科技(深圳)有限公司 一种基于格式化图像的文本检测方法、装置以及相关设备
CN108549893A (zh) * 2018-04-04 2018-09-18 华中科技大学 一种任意形状的场景文本端到端识别方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8494284B2 (en) * 2011-11-21 2013-07-23 Nokia Corporation Methods and apparatuses for facilitating detection of text within an image
CN103699895B (zh) * 2013-12-12 2018-02-09 天津大学 一种视频中文字的检测与提取方法
CN105574513B (zh) * 2015-12-22 2017-11-24 北京旷视科技有限公司 文字检测方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007863B1 (en) * 2015-06-05 2018-06-26 Gracenote, Inc. Logo recognition in images and videos
CN107301414A (zh) * 2017-06-23 2017-10-27 厦门商集企业咨询有限责任公司 一种自然场景图像中的中文定位、分割和识别方法
CN108520254A (zh) * 2018-03-01 2018-09-11 腾讯科技(深圳)有限公司 一种基于格式化图像的文本检测方法、装置以及相关设备
CN108549893A (zh) * 2018-04-04 2018-09-18 华中科技大学 一种任意形状的场景文本端到端识别方法

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753812A (zh) * 2020-07-30 2020-10-09 上海眼控科技股份有限公司 文本识别方法及设备
CN111881050A (zh) * 2020-07-31 2020-11-03 北京爱奇艺科技有限公司 一种文本图层的剪裁方法、装置及电子设备
CN111881050B (zh) * 2020-07-31 2024-06-04 北京爱奇艺科技有限公司 一种文本图层的剪裁方法、装置及电子设备
CN112085010A (zh) * 2020-10-28 2020-12-15 成都信息工程大学 一种基于图像识别的口罩检测和部署系统及方法
CN112528889A (zh) * 2020-12-16 2021-03-19 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112528889B (zh) * 2020-12-16 2024-02-06 中国平安财产保险股份有限公司 Ocr信息检测修正方法、装置、终端及存储介质
CN112651394B (zh) * 2020-12-31 2023-11-14 北京一起教育科技有限责任公司 一种图像检测方法、装置及电子设备
CN112651394A (zh) * 2020-12-31 2021-04-13 北京一起教育科技有限责任公司 一种图像检测方法、装置及电子设备
CN112949642A (zh) * 2021-02-23 2021-06-11 北京三快在线科技有限公司 一种文字生成方法、装置、存储介质及电子设备
CN112966678A (zh) * 2021-03-11 2021-06-15 南昌航空大学 一种文本检测方法及系统
CN113033543B (zh) * 2021-04-27 2024-04-05 中国平安人寿保险股份有限公司 曲形文本识别方法、装置、设备及介质
CN113033543A (zh) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 曲形文本识别方法、装置、设备及介质
CN113449724B (zh) * 2021-06-09 2023-06-16 浙江大华技术股份有限公司 一种图像文本校正方法、装置、设备及存储介质
CN113449724A (zh) * 2021-06-09 2021-09-28 浙江大华技术股份有限公司 一种图像文本校正方法、装置、设备及存储介质
CN114973268A (zh) * 2022-04-29 2022-08-30 北京智通东方软件科技有限公司 文本识别方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN112789623B (zh) 2024-08-16
CN112789623A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2020097909A1 (zh) 文本检测方法、装置及存储介质
CN110348294B (zh) Pdf文档中图表的定位方法、装置及计算机设备
CN109685055B (zh) 一种图像中文本区域的检测方法及装置
US20200160040A1 (en) Three-dimensional living-body face detection method, face authentication recognition method, and apparatuses
US11636604B2 (en) Edge detection method and device, electronic equipment, and computer-readable storage medium
US10902283B2 (en) Method and device for determining handwriting similarity
WO2018010657A1 (zh) 结构化文本检测方法和系统、计算设备
CN110866871A (zh) 文本图像矫正方法、装置、计算机设备及存储介质
CN109961040B (zh) 身份证区域定位方法、装置、计算机设备及存储介质
WO2022057607A1 (zh) 识别对象边缘的方法、系统及计算机可读存储介质
Mahesh et al. Sign language translator for mobile platforms
CN1937698A (zh) 图像畸变自动校正的图像处理方法
CN110647882A (zh) 图像校正方法、装置、设备及存储介质
CN112597940B (zh) 证件图像识别方法、装置及存储介质
CN112396047B (zh) 训练样本生成方法、装置、计算机设备和存储介质
CN114359932B (zh) 文本检测方法、文本识别方法及装置
CN111325798A (zh) 相机模型纠正方法、装置、ar实现设备及可读存储介质
WO2022002262A1 (zh) 基于计算机视觉的字符序列识别方法、装置、设备和介质
US20210027045A1 (en) Method and device for face selection, recognition and comparison
CN111179287A (zh) 人像实例分割方法、装置、设备及存储介质
CN112183250A (zh) 文字识别方法、装置、存储介质及电子设备
CN111738272A (zh) 一种目标特征提取方法、装置及电子设备
CN113129298A (zh) 文本图像的清晰度识别方法
WO2020244076A1 (zh) 人脸识别方法、装置、电子设备及存储介质
US20220335704A1 (en) Method and system of recognizing and processing object edges and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18940437

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 09.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18940437

Country of ref document: EP

Kind code of ref document: A1