WO2023284502A1 - Image processing method and apparatus, device, and storage medium - Google Patents

Image processing method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023284502A1
WO2023284502A1 PCT/CN2022/100269 CN2022100269W WO2023284502A1 WO 2023284502 A1 WO2023284502 A1 WO 2023284502A1 CN 2022100269 W CN2022100269 W CN 2022100269W WO 2023284502 A1 WO2023284502 A1 WO 2023284502A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
bounding boxes
initial
blocks
image blocks
Prior art date
Application number
PCT/CN2022/100269
Other languages
French (fr)
Chinese (zh)
Inventor
徐青松
李青
Original Assignee
杭州睿胜软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110788327.XA external-priority patent/CN113486828B/en
Application filed by 杭州睿胜软件有限公司 filed Critical 杭州睿胜软件有限公司
Publication of WO2023284502A1 publication Critical patent/WO2023284502A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Abstract

An image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium. The image processing method comprises: obtaining an initial image, wherein the initial image comprises at least one target object; processing the initial image to obtain an intermediate image; recognizing the intermediate image by using a region detection model to obtain a connected image comprising M object connected regions; determining M bounding boxes in the connected image respectively corresponding to the M object connected regions; capturing N image blocks from the initial image on the basis of the M bounding boxes, each image block comprising at least one target object; and recognizing the N image blocks by using an object recognition model to obtain the target object in the initial image.

Description

图像处理方法、装置、设备和存储介质Image processing method, device, device and storage medium 技术领域technical field
本公开的实施例涉及一种图像处理方法、图像处理装置、电子设备和计算机可读存储介质。Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
随着数字化技术的发展,可以利用文本识别技术等对文本图像进行识别,以获取文本图像记载的信息,例如利用OCR(Optical Character Recognition,光学字符识别)识别技术,将图片、照片上的文字内容,直接转换为可编辑的文本。然而,目前的文字识别算法的复杂度高、计算量大,因而对使用环境存在限制,只适用于在服务器等硬件配置较高的设备上执行,而在终端设备等硬件配置较低的设备上执行时会导致识别速度很慢甚至不能识别的问题,因而在终端设备离线的情况下不易进行文字识别。With the development of digital technology, it is possible to use text recognition technology to recognize text images to obtain information recorded in text images, such as using OCR (Optical Character Recognition, optical character recognition) recognition technology to convert text content on pictures and photos , converted directly to editable text. However, the current text recognition algorithm has high complexity and a large amount of calculation, so there are restrictions on the use environment, and it is only suitable for execution on devices with high hardware configuration such as servers, but on devices with low hardware configuration such as terminal devices. When it is executed, the recognition speed will be very slow or even impossible to recognize, so it is not easy to perform text recognition when the terminal device is offline.
发明内容Contents of the invention
本公开至少一个实施例提供一种图像处理方法,包括:获得初始图像,初始图像包括至少一个目标对象;对初始图像进行处理以得到中间图像;利用区域检测模型对中间图像进行识别,以得到包括M个对象连通区域的连通图像;确定在连通图像中与M个对象连通区域分别对应的M个包围框;基于M个包围框,从初始图像中截取N个图像块,每个图像块包括至少一个目标对象;以及利用对象识别模型识别N个图像块,以得到初始图像中的目标对象,M和N均为正整数。At least one embodiment of the present disclosure provides an image processing method, including: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain an image including A connected image of M object connected regions; determine M bounding boxes corresponding to the M object connected regions in the connected image; based on the M bounding boxes, intercept N image blocks from the initial image, and each image block includes at least A target object; and using an object recognition model to identify N image blocks to obtain the target object in the initial image, where M and N are both positive integers.
例如,在本公开一实施例提供的图像处理方法中,利用区域检测模型对中间图像进行识别,以得到包括M个对象连通区域的连通图像,包括:利用区域检测模型处理中间图像,得到包括多个初始对象连通区域的连通图像;对包括多个初始对象连通区域的连通图像进行形态学变换,以基于包括多个初始对象连通区域的连通图像得到包括M个对象连通区域的连通图像。For example, in the image processing method provided by an embodiment of the present disclosure, using a region detection model to identify an intermediate image to obtain a connected image including M object connected regions includes: using a region detection model to process the intermediate image to obtain a connected image including multiple A connected image of initial object connected regions; performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.
例如,在本公开一实施例提供的图像处理方法中,对初始图像进行处理 以得到中间图像包括:将初始图像的尺寸由初始尺寸缩小至预定尺寸;对预定尺寸的初始图像进行二值化处理,得到中间图像。For example, in the image processing method provided by an embodiment of the present disclosure, processing the initial image to obtain the intermediate image includes: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size , to get the intermediate image.
例如,在本公开一实施例提供的图像处理方法中,确定在连通图像中与M个对象连通区域分别对应的M个包围框,包括:提取M个对象连通区域各自的轮廓信息;基于轮廓信息,确定M个对象连通区域各自的包围框。For example, in the image processing method provided by an embodiment of the present disclosure, determining M bounding boxes respectively corresponding to M object connected regions in the connected image includes: extracting the contour information of each of the M object connected regions; , to determine the respective bounding boxes of the connected regions of M objects.
例如,在本公开一实施例提供的图像处理方法中,基于M个包围框,从初始图像中截取N个图像块,包括:根据中间图像和初始图像之间的对应关系,基于M个包围框中的每个包围框,对应截取初始图像中的一个图像块,M与N相等;或者对M个包围框进行预定处理,得到N个处理后的包围框,并根据中间图像和初始图像之间的对应关系,基于每个处理后的包围框,对应截取初始图像中的一个图像块。For example, in the image processing method provided by an embodiment of the present disclosure, intercepting N image blocks from the initial image based on M bounding boxes includes: according to the correspondence between the intermediate image and the initial image, based on the M bounding boxes Each bounding box in , corresponds to intercepting an image block in the initial image, and M is equal to N; or perform predetermined processing on M bounding boxes to obtain N processed bounding boxes, and according to the difference between the intermediate image and the initial image Corresponding relationship of , based on each processed bounding box, correspondingly intercepts an image block in the initial image.
例如,在本公开一实施例提供的图像处理方法中,对M个包围框进行预定处理,包括:对M个包围框进行评分,以得到M个包围框分别对应的质量分值;将质量分值小于分值阈值的包围框作为无效包围框,并删除无效包围框。For example, in the image processing method provided by an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes includes: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes; Bounding boxes whose value is less than the score threshold are regarded as invalid bounding boxes, and invalid bounding boxes are deleted.
例如,在本公开一实施例提供的图像处理方法中,对M个包围框进行评分包括:针对M个包围框中的每个包围框执行以下操作:确定包围框的面积和位于包围框中的目标对象对应的像素的面积;基于像素的面积与包围框的面积的比例,确定包围框对应的质量分值。For example, in the image processing method provided by an embodiment of the present disclosure, scoring the M bounding boxes includes: performing the following operations on each of the M bounding boxes: determining the area of the bounding box and the The area of the pixel corresponding to the target object; based on the ratio of the area of the pixel to the area of the bounding box, the quality score corresponding to the bounding box is determined.
例如,在本公开一实施例提供的图像处理方法中,对M个包围框进行预定处理,包括:将M个包围框中的一个或多个包围框放大第一预定倍数。For example, in the image processing method provided in an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes includes: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.
例如,在本公开一实施例提供的图像处理方法中,对M个包围框进行预定处理,还包括:检测M个包围框中每相邻两个包围框之间是否至少部分区域重叠,若是,将至少部分区域重叠的两个包围框中的每个包围框基于第二预定倍数进行缩小处理,以使得缩小后的两个包围框不重叠或者重叠区域减小。For example, in the image processing method provided in an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes further includes: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, Each of the two bounding boxes whose at least partial areas overlap is reduced based on a second predetermined multiple, so that the reduced two bounding boxes do not overlap or the overlapping area decreases.
例如,在本公开一实施例提供的图像处理方法中,利用对象识别模型识别N个图像块,以得到初始图像中的目标对象,包括:确定N个图像块中在 第一方向上的长度大于识别长度阈值的P个第一图像块,并将每个第一图像块分割为至少两个子图像块,以得到与P个第一图像块对应的多个子图像块,每个子图像块的长度等于或小于识别长度阈值;以及利用对象识别模型识别多个子图像块,以得到P个第一图像块中的目标对象,初始图像中的目标对象包括P个第一图像块中的目标对象,P为正整数。For example, in the image processing method provided by an embodiment of the present disclosure, using an object recognition model to identify N image blocks to obtain the target object in the initial image includes: determining that the length of the N image blocks in the first direction is greater than Identify P first image blocks with a length threshold, and divide each first image block into at least two sub-image blocks to obtain a plurality of sub-image blocks corresponding to the P first image blocks, the length of each sub-image block is equal to or less than the recognition length threshold; and using the object recognition model to identify a plurality of sub-image blocks to obtain the target object in the P first image block, the target object in the initial image includes the target object in the P first image block, and P is positive integer.
例如,在本公开一实施例提供的图像处理方法中,利用对象识别模型识别N个图像块,以得到初始图像中的目标对象,还包括:确定N个图像块中在第一方向上的长度小于识别长度阈值的Q个第二图像块,并对每个第二图像块进行处理,得到Q个处理后的第二图像块,每个处理后的第二图像块在第一方向上的长度为识别长度阈值;利用对象识别模型识别Q个处理后的第二图像块,以得到Q个第二图像块中的目标对象,初始图像中的目标对象还包括Q个第二图像块中的目标对象,Q为正整数。For example, in the image processing method provided by an embodiment of the present disclosure, the object recognition model is used to identify N image blocks to obtain the target object in the initial image, and further includes: determining the length of the N image blocks in the first direction Q second image blocks smaller than the recognition length threshold, and each second image block is processed to obtain Q processed second image blocks, the length of each processed second image block in the first direction To identify the length threshold; use the object recognition model to identify Q processed second image blocks to obtain the target objects in the Q second image blocks, and the target objects in the initial image also include the targets in the Q second image blocks Object, Q is a positive integer.
例如,在本公开一实施例提供的图像处理方法中,将每个第一图像块分割为至少两个子图像块包括:针对N个图像块中的第i个第一图像块执行以下操作:在第一方向上,每间隔识别长度阈值设置一个候选分割点,以确定第i个第一图像块对应的至少一个候选分割点;基于至少一个候选分割点,确定第i个第一图像块对应的至少一个分割点;基于至少一个分割点,将第i个第一图像块分割为至少两个子图像块,i为小于等于P的正整数。For example, in the image processing method provided by an embodiment of the present disclosure, dividing each first image block into at least two sub-image blocks includes: performing the following operations on the ith first image block among the N image blocks: In the first direction, a candidate segmentation point is set for each interval identification length threshold to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine the i-th first image block corresponding to At least one segmentation point; based on at least one segmentation point, the i-th first image block is divided into at least two sub-image blocks, where i is a positive integer less than or equal to P.
例如,在本公开一实施例提供的图像处理方法中,基于至少一个候选分割点,确定第i个第一图像块对应的至少一个分割点,包括:若在第i个第一图像块中的至少一个候选分割点中的任一候选分割点的距离阈值的范围内包含间隔区域,则将间隔区域中的一点作为第i个第一图像块对应的一个分割点;若在第i个第一图像块中的至少一个候选分割点中的任一候选分割点的距离阈值的范围内不包含间隔区域,则将任一候选分割点作为第i个第一图像块对应的一个分割点。For example, in the image processing method provided in an embodiment of the present disclosure, based on at least one candidate segmentation point, determining at least one segmentation point corresponding to the i-th first image block includes: if the i-th first image block Any candidate segmentation point in at least one candidate segmentation point contains an interval area within the range of the distance threshold, then a point in the interval area is used as a segmentation point corresponding to the i-th first image block; if the i-th first If any of the at least one candidate segmentation point in the image block does not contain a gap area within the range of the distance threshold, any candidate segmentation point is used as a segmentation point corresponding to the i-th first image block.
例如,在本公开一实施例提供的图像处理方法中,对每个第二图像块进行处理,包括:在第一方向上,在每个第二图像块的至少一端拼接端部图像块,以得到每个第二图像块对应的处理后的第二图像块,端部图像块中的每 个像素的像素值与第二图像块中的每个对象对应的像素的像素值不同。For example, in the image processing method provided in an embodiment of the present disclosure, processing each second image block includes: stitching end image blocks at least one end of each second image block in the first direction, so as to A processed second image block corresponding to each second image block is obtained, and the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
例如,在本公开一实施例提供的图像处理方法中,每个第一图像块包括多个目标对象,多个目标对象沿第一方向依次排列。For example, in the image processing method provided by an embodiment of the present disclosure, each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction.
例如,在本公开一实施例提供的图像处理方法中,至少一个目标对象包括字符。For example, in the image processing method provided by an embodiment of the present disclosure, at least one target object includes characters.
本公开一实施例提供一种图像处理装置,包括:图像获取模块,配置为获得初始图像,所述初始图像包括至少一个目标对象;图像处理模块,配置为对所述初始图像进行处理以得到中间图像;区域识别模块,配置为利用区域检测模型对所述中间图像进行识别,以得到包括M个对象连通区域的连通图像;确定模块,配置为在所述连通图像中确定与所述M个对象连通区域分别对应的M个包围框;截取模块,配置为基于所述M个包围框,从所述初始图像中截取N个图像块,每个所述图像块包括至少一个目标对象;以及对象识别模块,配置为利用对象识别模型识别所述N个图像块,以得到所述初始图像中的目标对象,M和N均为正整数。An embodiment of the present disclosure provides an image processing device, including: an image acquisition module configured to obtain an initial image, the initial image including at least one target object; an image processing module configured to process the initial image to obtain an intermediate Image; a region identification module configured to identify the intermediate image using a region detection model to obtain a connected image including connected regions of M objects; a determination module configured to determine the connection with the M objects in the connected image M bounding boxes corresponding to the connected regions; an interception module configured to intercept N image blocks from the initial image based on the M bounding boxes, each of which includes at least one target object; and object recognition A module configured to use an object recognition model to identify the N image blocks to obtain the target object in the initial image, where M and N are both positive integers.
本公开一实施例还提供一种电子设备,包括:处理器;存储器,存储一个或多个计算机程序模块;所述一个或多个计算机程序模块被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于实现根据上述任一实施例所述的图像处理方法。An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory storing one or more computer program modules; the one or more computer program modules are configured to be executed by the processor, and the one Or a plurality of computer program modules are included for implementing the image processing method according to any one of the above embodiments.
本公开一实施例还提供一种计算机可读存储介质,用于非暂时性存储计算机可读指令,当所述计算机可读指令由计算机执行时可以实现根据上述任一实施例所述的图像处理方法。An embodiment of the present disclosure also provides a computer-readable storage medium for non-transitory storage of computer-readable instructions. When the computer-readable instructions are executed by a computer, the image processing according to any of the above-mentioned embodiments can be realized. method.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure .
图1为本公开至少一实施例提供的一种图像处理方法的示意性流程图;Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure;
图2为本公开至少一实施例提供的初始图像的示意图;Fig. 2 is a schematic diagram of an initial image provided by at least one embodiment of the present disclosure;
图3为本公开至少一实施例提供的一种目标对象的示意图;Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure;
图4为本公开至少一实施例提供的二值化图像的示意图;Fig. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure;
图5为本公开至少一实施例提供的连通图像的示意图;Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure;
图6为本公开至少一实施例提供的包围框的示意图;Fig. 6 is a schematic diagram of a bounding box provided by at least one embodiment of the present disclosure;
图7A为本公开至少一实施例提供的从初始图像截取图像块的示意图;Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure;
图7B为本公开至少一实施例提供的图像块的示意图;Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure;
图8为本公开至少一实施例提供的包括多个初始对象连通区域的连通图像的示意图;Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure;
图9为本公开至少一实施例提供的识别N个图像块的示意性流程图;Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure;
图10A为本公开至少一实施例提供的分割图像块的示意图;Fig. 10A is a schematic diagram of segmented image blocks provided by at least one embodiment of the present disclosure;
图10B为本公开至少一实施例提供的拼接端部图像块的示意图;Fig. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure;
图11为本公开至少一实施例提供的目标对象识别结果的示意图;Fig. 11 is a schematic diagram of a target object recognition result provided by at least one embodiment of the present disclosure;
图12为本公开至少一实施例提供的一种图像处理装置的示意性框图;Fig. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure;
图13为本公开至少一实施例提供的一种电子设备的示意性框图;Fig. 13 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure;
图14为本公开至少一实施例提供的另一种电子设备的示意性框图;Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure;
图15为本公开至少一实施例提供的一种计算机可读存储介质的示意图;以及Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure; and
图16为本公开至少一实施例提供的一种硬件环境的示意图。Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure.
具体实施方式detailed description
为了使得本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Apparently, the described embodiments are some of the embodiments of the present disclosure, not all of them. Based on the described embodiments of the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative effort fall within the protection scope of the present disclosure.
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用 来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。Unless otherwise defined, the technical terms or scientific terms used in the present disclosure shall have the usual meanings understood by those skilled in the art to which the present disclosure belongs. "First", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. "Comprising" or "comprising" and similar words mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, without excluding other elements or items. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right" and so on are only used to indicate the relative positional relationship. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.
本公开至少一实施例提供一种图像处理方法、图像处理装置、电子设备和计算机可读存储介质。该图像处理方法包括:获得初始图像,初始图像包括至少一个目标对象;对初始图像进行处理以得到中间图像;利用区域检测模型对中间图像进行识别,以得到包括M个对象连通区域的连通图像;确定在连通图像中与M个对象连通区域分别对应的M个包围框;基于M个包围框,从初始图像中截取N个图像块,每个图像块包括至少一个目标对象;以及利用对象识别模型识别N个图像块,以得到初始图像中的目标对象,M和N均为正整数。At least one embodiment of the present disclosure provides an image processing method, an image processing device, electronic equipment, and a computer-readable storage medium. The image processing method includes: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain a connected image including M object connected regions; Determining M bounding boxes respectively corresponding to the connected regions of M objects in the connected image; based on the M bounding boxes, intercepting N image blocks from the initial image, each image block including at least one target object; and using an object recognition model Identify N image blocks to obtain the target object in the initial image, and both M and N are positive integers.
本公开实施例提供的图像处理方法,可以先将初始图像转换为中间图像,再利用区域检测模型将中间图像转换为连通图像以得到若干个对象连通区域,确定对象连通区域对应的包围框,然后再回到初始图像中截取包围框对应的图像块。本公开实施例的这一方式相比于相关技术中直接根据初始图像确定对象所在区域的算法计算量更小且处理过程更为简单,因而解决了复杂度高、计算量大的问题,使对象识别算法能够应用于手机等硬件配置较低的终端设备上,使终端设备在离线的情况下也能够进行对象识别。The image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box. Compared with the algorithm of determining the area where the object is located directly based on the initial image in the related art, the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus solving the problem of high complexity and large amount of calculation, and making the object The recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, so that the terminal devices can also perform object recognition even when they are offline.
本公开实施例的图像处理方法可应用于本公开实施例的图像处理装置,该图像处理装置可被配置于电子设备上。该电子设备可以是个人计算机、移动终端等,该移动终端可以是手机、平板电脑等硬件设备。The image processing method of the embodiment of the present disclosure can be applied to the image processing device of the embodiment of the present disclosure, and the image processing device can be configured on an electronic device. The electronic device may be a personal computer, a mobile terminal, etc., and the mobile terminal may be a hardware device such as a mobile phone or a tablet computer.
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.
图1为本公开至少一实施例提供的一种图像处理方法的示意性流程图。Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure.
如图1所示,该方法包括步骤S110~S160。As shown in Fig. 1, the method includes steps S110-S160.
步骤S110:获得初始图像,初始图像包括至少一个目标对象。Step S110: Obtain an initial image, where the initial image includes at least one target object.
步骤S120:对初始图像进行处理以得到中间图像。Step S120: Process the initial image to obtain an intermediate image.
步骤S130:利用区域检测模型对中间图像进行识别,以得到包括M个对象连通区域的连通图像。Step S130: Use the region detection model to identify the intermediate image to obtain a connected image including M object connected regions.
步骤S140:确定在连通图像中与M个对象连通区域分别对应的M个包围框。Step S140: Determine M bounding boxes respectively corresponding to the M object connected regions in the connected image.
步骤S150:基于M个包围框,从初始图像中截取N个图像块,每个图像块包括至少一个目标对象。Step S150: Based on the M bounding boxes, intercept N image blocks from the initial image, each image block includes at least one target object.
步骤S160:利用对象识别模型识别N个图像块,以得到初始图像中的目标对象。Step S160: Use the object recognition model to identify N image blocks to obtain the target object in the initial image.
例如,M和N均为正整数。For example, both M and N are positive integers.
例如,在步骤S110中,初始图像可以为多种形式,例如照片、扫描图、截图、PDF图片页面等任何图像形式的电子文件。初始图像可以为灰度图像,也可以为彩色图像。For example, in step S110, the initial image may be in various forms, such as electronic files in any image form such as photos, scanned images, screenshots, and PDF image pages. The initial image can be a grayscale image or a color image.
图2为本公开至少一实施例提供的初始图像201的示意图。如图2所示,初始图像201包括至少一个目标对象,至少一个目标对象可以包括字符。例如,每个字符可以为数字、中文字(中文汉字、中文单词等)、外文字(例如,外文字母、外文单词等)、特殊字符(例如,百分号“%”)、标点符号、图形(例如,三角形、箭头)等。例如,字符可以为多种字体,可以是打印字体也可以是手写字体,打印字体可以包括已知的多种字体,例如宋体、黑体、楷体、Times New Roman、Arial等,此外,打印字体还可以包括艺术字体等。例如,图2所示的示例中,目标对象包括英文字母和数字。Fig. 2 is a schematic diagram of an initial image 201 provided by at least one embodiment of the present disclosure. As shown in FIG. 2 , the initial image 201 includes at least one target object, and the at least one target object may include characters. For example, each character can be a number, Chinese characters (Chinese characters, Chinese words, etc.), foreign characters (for example, foreign letters, foreign words, etc.), special characters (for example, percent sign "%"), punctuation marks, graphics (eg, triangle, arrow), etc. For example, characters can be multiple fonts, which can be printing fonts or handwritten fonts, and printing fonts can include known multiple fonts, such as Song, Hei, Kai, Times New Roman, Arial, etc. In addition, printing fonts can also be Including artistic fonts, etc. For example, in the example shown in FIG. 2 , the target object includes English letters and numbers.
图3为本公开至少一实施例提供的一种目标对象的示意图,如图3所示,在另一示例中,目标对象还可以包括多种图案,例如,心形图案、笑 脸图案、云形图案、太阳图案、月亮图案等等。此外,目标对象还可以是除字符和图案之外的其他形式,以下以目标对象为字符为例进行详细描述,其他类型的目标对象的处理方式可以对应参照字符的处理方式。Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure. As shown in Fig. 3, in another example, the target object may also include a variety of patterns, for example, a heart-shaped pattern, a smiling face pattern, a cloud-shaped pattern patterns, sun patterns, moon patterns and more. In addition, the target object can also be in other forms than characters and patterns. The following description will be made in detail by taking the target object as an example. The processing methods of other types of target objects can correspond to the processing methods of reference characters.
例如,目标对象的类型可以根据实际需求而定,可以预先设定需要识别的目标对象的类型,并根据需要识别的对象类型来训练相应的区域检测模型和对象识别模型,以使区域检测模型能够将相应类型的对象所在的位置包含在对象连通区域内,以及使对象识别模型能够识别相应类型的对象。例如,在一些应用场景中,需要识别英文单词和标点符号,则可以利用包含英文单词和标点符号的样本图像训练区域检测模型和对象识别模型,以使训练得到的区域检测模型能够将英文单词和标点符号所在的区域进行连通,并使训练得到的对象识别模型能够识别英文单词和标点符号。For example, the type of target object can be determined according to actual needs, the type of target object to be recognized can be preset, and the corresponding area detection model and object recognition model can be trained according to the type of object to be recognized, so that the area detection model can The location of the corresponding type of object is included in the object connected area, and the object recognition model can recognize the corresponding type of object. For example, in some application scenarios, it is necessary to recognize English words and punctuation marks, you can use sample images containing English words and punctuation marks to train the region detection model and object recognition model, so that the trained region detection model can recognize English words and punctuation marks The regions where the punctuation marks are located are connected, and the trained object recognition model can recognize English words and punctuation marks.
例如,在步骤S120中,对初始图像进行处理以得到中间图像可以包括:将初始图像的尺寸由初始尺寸缩小至预定尺寸;对预定尺寸的初始图像进行二值化处理,得到中间图像。For example, in step S120, processing the initial image to obtain an intermediate image may include: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size to obtain an intermediate image.
例如,不同初始图像的尺寸可能不一致,为了便于处理,可以先将初始图像由其原始尺寸统一缩小为一个预定尺寸,预定尺寸例如可以是640*640(像素),一方面可以减小后续的计算量,另一方面统一的尺寸可以便于后续处理,例如,可以便于区域检测模型进行区域识别处理。For example, the sizes of different initial images may be inconsistent. In order to facilitate processing, the initial images can be uniformly reduced from their original size to a predetermined size. The predetermined size can be, for example, 640*640 (pixels). On the one hand, it can reduce subsequent calculations. On the other hand, the uniform size can facilitate subsequent processing, for example, it can facilitate the region recognition processing of the region detection model.
例如,可以对缩小为预定尺寸的图像(即预定尺寸的初始图像)进行归一化处理,在一个示例中,可以将预定尺寸的初始图像的各个像素值(例如灰度值)均映射至0~1之间,也就是说,使像素值除以255以转化为0~1之间的数值。在另一示例中,可以将预定尺寸的初始图像的各个像素值均映射至-1.0~1.0之间。For example, the image reduced to a predetermined size (that is, the original image of the predetermined size) can be normalized. In one example, each pixel value (such as a gray value) of the initial image of the predetermined size can be mapped to 0 ~1, that is, divide the pixel value by 255 to convert it to a value between 0 and 1. In another example, each pixel value of the initial image of a predetermined size may be mapped to a range between -1.0 and 1.0.
例如,对于归一化后的图像,可以进行二值化处理,得到二值化图像,并可以将该二值化图像作为上述的中间图像。图4为本公开至少一实施例提供的二值化图像的示意图,图4所示的二值化图像为图2所示的初始图像的二值化图像。如图4所示,可以预先设置一个二值化阈值(例如为0.3,二值化阈值可以根据实际情况设置,本公开对此不作具体限定),并将归 一化后的各个像素值与该二值化阈值的大小进行比较,若像素值高于或等于该二值化阈值,则将像素值转换为1,即将对应像素点的颜色变为纯白色;若像素值低于该二值化阈值,则将像素值转换为0,即将对应像素点的颜色变为纯黑色,基于这一方式,可以得到一张纯黑白图像,该纯黑白图像即为二值化图像。For example, the normalized image can be binarized to obtain a binarized image, and the binarized image can be used as the above-mentioned intermediate image. FIG. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure. The binarized image shown in FIG. 4 is a binarized image of the initial image shown in FIG. 2 . As shown in FIG. 4 , a binarization threshold (for example, 0.3, which can be set according to actual conditions, which is not specifically limited in the present disclosure) can be preset, and each pixel value after normalization can be compared with the The size of the binarization threshold is compared, if the pixel value is higher than or equal to the binarization threshold, the pixel value is converted to 1, that is, the color of the corresponding pixel point becomes pure white; if the pixel value is lower than the binarization threshold Threshold, the pixel value is converted to 0, that is, the color of the corresponding pixel point is changed to pure black. Based on this method, a pure black and white image can be obtained, and the pure black and white image is a binary image.
例如,在一些实施例中,还可以在上述缩小尺寸处理、归一化处理和二值化处理中的任一步骤之前或之后,对图像(初始图像或预定尺寸的初始图像或归一化后的初始图像或二值化图像)进行倾斜校正,以使图像中的字符按照水平方向(例如图4所示的X方向)或者竖直方向(例如图4所示的Y方向)排列。此外,还可以对初始图像进行裁剪,去除四周区域的背景区域。For example, in some embodiments, before or after any step in the above-mentioned size reduction processing, normalization processing and binarization processing, the image (initial image or an initial image of a predetermined size or after normalization The original image or binarized image) is tilt-corrected so that the characters in the image are arranged in a horizontal direction (such as the X direction shown in FIG. 4 ) or a vertical direction (such as the Y direction shown in FIG. 4 ). In addition, the original image can also be cropped to remove the background area of the surrounding area.
例如,在步骤S130中,区域检测模型可以采用机器学习技术实现并且例如运行在通用计算装置或专用计算装置上。该区域检测模型为预先训练得到的神经网络模型。例如,区域检测模型可以采用深度卷积神经网络(DEEP-CNN)等神经网络实现。将中间图像输入区域检测模型,区域检测模型可以识别出待识别的中间图像中的各个对象所在的区域,并将识别出的各个对象连通区域标注出来。在目标对象为字符的场景中,对象连通区域可以为字符连通区域。例如,区域检测模型可以采用DBNet(Driving Behavior Net,驾驶行为网络)架构实现,DBNet架构中的主干网络(Backbone)可以采用MobileNetV3 Large网络,MobileNetV3 Large为轻量级网络,在一些实施例中,MobileNetV3 Large网络的参数量例如可以在原始数据量的基础上削减,例如削减为原始数据量的r倍,r为大于0小于1的正数,例如r=0.75(r可以根据实际情况设置)。在本公开其他实施例中,根据实际需求,区域检测模型可以采用除DBNet架构之外的其他网络架构,主干网络可以采用除MobileNetV3 Large网络之外的其他网络。For example, in step S130, the region detection model can be implemented using machine learning technology and run on a general-purpose computing device or a special-purpose computing device, for example. The region detection model is a pre-trained neural network model. For example, a region detection model can be implemented using a neural network such as a deep convolutional neural network (DEEP-CNN). Input the intermediate image into the area detection model, and the area detection model can identify the area where each object in the intermediate image to be identified is located, and mark the connected areas of each identified object. In a scene where the target object is a character, the object connected region may be a character connected region. For example, the region detection model can be implemented using DBNet (Driving Behavior Net, driving behavior network) architecture, the backbone network (Backbone) in the DBNet architecture can use MobileNetV3 Large network, MobileNetV3 Large is a lightweight network, in some embodiments, MobileNetV3 For example, the parameter quantity of the Large network can be reduced on the basis of the original data quantity, for example, it can be reduced to r times the original data quantity, and r is a positive number greater than 0 and less than 1, for example, r=0.75 (r can be set according to the actual situation). In other embodiments of the present disclosure, according to actual needs, the region detection model may adopt a network architecture other than the DBNet architecture, and the backbone network may adopt a network other than the MobileNetV3 Large network.
需要说明的是,初始图像中的各个对象的位置和类型等与中间图像中的各个对象的位置和类型等均相同,如图2和4所示,初始图像包括对象“DECLARATION AND ASSIGNMENT”,且该对象“DECLARATION AND  ASSIGNMENT”位于该初始图像的上侧,中间图像也包括对象“DECLARATION AND ASSIGNMENT”,且该对象“DECLARATION AND ASSIGNMENT”也位于该中间图像的上侧。It should be noted that the position and type of each object in the initial image is the same as that of each object in the intermediate image, as shown in Figures 2 and 4, the initial image includes the object "DECLARATION AND ASSIGNMENT", and The object "DECLARATION AND ASSIGNMENT" is located on the upper side of the initial image, the intermediate image also includes the object "DECLARATION AND ASSIGNMENT", and the object "DECLARATION AND ASSIGNMENT" is also located on the upper side of the intermediate image.
图5为本公开至少一实施例提供的连通图像的示意图,图5所示的连通图像为对图4所示的中间图像进行处理得到的连通图像,图5所示的连通图像为包括M个对象连通区域的连通图像。例如,图5所示的连通图像的尺寸和图4所示的中间图像的尺寸相同。Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure. The connected image shown in Fig. 5 is a connected image obtained by processing the intermediate image shown in Fig. 4, and the connected image shown in Fig. 5 includes M A connected image of connected regions of objects. For example, the size of the connected image shown in FIG. 5 is the same as the size of the intermediate image shown in FIG. 4 .
结合图4和图5所示,每行字符可以对应一个或多个对象连通区域(也可称为字符连通区域)。例如,若一行中的各个字符连续排列,即一行中的每相邻两个字符之间的间隔不超过预定间隔(例如,两个(或三个等)空格的间隔),则该行字符可以对应形成一个对象连通区域。例如,图4示出了字符行“DECLARATION AND ASSIGNMENT”,由于该行中的相邻字符之间的间隔均未超过预定间隔,则可以对应该字符行“DECLARATION AND ASSIGNMENT”形成一个对象连通区域501。需要说明的是,预定间隔可以根据实际情况设置,本公开对此不作限定。此外,对于字符行“DECLARATION AND ASSIGNMENT”,可以将单个英文字母作为一个字符,也可以将一个英文单词作为一个字符。As shown in FIG. 4 and FIG. 5 , each row of characters may correspond to one or more object connected regions (also called character connected regions). For example, if each character in a row is arranged continuously, that is, the interval between every adjacent two characters in a row does not exceed a predetermined interval (for example, the interval of two (or three, etc.) spaces), then the row of characters can be corresponds to form an object connected region. For example, Fig. 4 shows the character line "DECLARATION AND ASSIGNMENT", since the intervals between the adjacent characters in the line do not exceed the predetermined interval, an object connected area 501 can be formed corresponding to the character line "DECLARATION AND ASSIGNMENT" . It should be noted that the predetermined interval may be set according to actual conditions, which is not limited in the present disclosure. In addition, for the character line "DECLARATION AND ASSIGNMENT", either a single English letter can be used as a character, or an English word can be used as a character.
例如,若一行中的各个字符不连续排列,即一个字符行中出现相邻两个字符之间的间隔超过了预定间隔(例如,两个(或三个等)空格的间隔),则可以根据间隔的数量形成若干个对象连通区域,例如一个字符行中的第a个字符至第a+b个字符连续排列,第a+b个字符与第a+b+1个字符之间的间隔超过了预定间隔,第a+b+1个字符至第a+b+c个字符连续排列,则第a个字符至第a+b个字符可以对应形成一个对象连通区域,并第a+b+1个字符至第a+b+c个字符可以对应形成另一个对象连通区域,a、b和c均为正整数。例如,图4示出了字符行“Signature:_____Date:_____”,若在该实施例中,下划线不作为检测和识别的对象,则该行中的字符依次为第一个字符“Signature”、第二个字符“:”、第三个字符“Date”和第四个字符“:”,由于第二个字符“:”和第三个字符“Date”之间的间隔超过 了预定间隔,第一个字符“Signature”和第二个字符“:”是连续的,第三个字符“Date”和第四个字符“:”是连续的,因此第一个字符“Signature”和第二个字符“:”可以对应形成一个对象连通区域502,并第三个字符“Date”和第四个字符“:”可以对应形成另一个对象连通区域503。For example, if each character in a row is arranged discontinuously, that is, the interval between two adjacent characters in a character row exceeds a predetermined interval (for example, the interval between two (or three, etc.) spaces), then it can be based on The number of intervals forms several object connected areas, for example, the ath character to the a+bth character in a character line are arranged continuously, and the interval between the a+bth character and the a+b+1th character exceeds If a predetermined interval is set, the a+b+1th character to the a+b+cth character are arranged continuously, then the ath character to the a+bth character can correspond to form an object connected region, and the a+b+th character 1 character to the a+b+c character can correspond to form another object connected region, and a, b and c are all positive integers. For example, Fig. 4 shows character line "Signature:_____Date:_____", if in this embodiment, underline is not as the object of detection and identification, then the character in this line is first character "Signature", the first character successively. The second character ":", the third character "Date" and the fourth character ":", since the interval between the second character ":" and the third character "Date" exceeds the predetermined interval, the first The first character "Signature" and the second character ":" are consecutive, the third character "Date" and the fourth character ":" are consecutive, so the first character "Signature" and the second character " :” can correspond to form an object connected region 502, and the third character “Date” and the fourth character “:” can correspond to form another object connected region 503.
例如,在步骤S140中,可以根据M个对象连通区域中的每个对象连通区域确定一个对应的包围框。For example, in step S140, a corresponding bounding box may be determined according to each object connected region in the M object connected regions.
图6为本公开至少一实施例提供的包围框的示意图,结合图5和图6所示,在本实施例中,包围框例如为矩形框,包围框例如可以是最小外接框,即能够将对象连通区域完全包围在内的最小尺寸的边框,可以根据对象连通区域的长度和高度来确定最小外接包围框的尺寸。例如,如图5和图6所示,对于对象连通区域501,在确定对象连通区域501的包围框在X方向上的尺寸的过程中,可以确定对象连通区域501在X方向上的最左侧的端点对应的X坐标以及最右侧的端点对应的X坐标,将该两个X坐标之间的差值的绝对值作为对象连通区域501的包围框601在X方向上的尺寸。在确定对象连通区域501的包围框在Y方向上的尺寸的过程中,可以确定对象连通区域501在Y方向上的最低点对应的Y坐标以及最高点对应的Y坐标,将该两个Y坐标之间的差值的绝对值作为对象连通区域501的包围框601在Y方向上的尺寸,由此,可以得到将对象连通区域501包围在内的包围框601。类似地,可以确定得到每个对象连通区域对应的包围框,例如对象连通区域502对应的包围框602和对象连通区域503对应的包围框603等。值得注意的是,为了清楚示出包围框,图6所示的各个包围框在X方向和Y方向上的尺寸均大于基于上述方式确定的尺寸,然而,需要理解的是,每个包围框在X方向和Y方向上的尺寸可以与上述方式确定的尺寸相等。FIG. 6 is a schematic diagram of a bounding frame provided by at least one embodiment of the present disclosure. As shown in FIG. 5 and FIG. 6 , in this embodiment, the bounding frame is, for example, a rectangular frame. The minimum bounding box that is completely surrounded by the connected region of the object. The size of the smallest bounding box can be determined according to the length and height of the connected region of the object. For example, as shown in Figure 5 and Figure 6, for the connected object region 501, in the process of determining the size of the bounding box of the connected object region 501 in the X direction, the leftmost part of the connected object region 501 in the X direction can be determined The X coordinate corresponding to the endpoint of , and the X coordinate corresponding to the rightmost endpoint, the absolute value of the difference between the two X coordinates is taken as the size of the bounding box 601 of the object connected region 501 in the X direction. In the process of determining the size of the bounding box of the object connected region 501 in the Y direction, the Y coordinate corresponding to the lowest point and the Y coordinate corresponding to the highest point of the connected object region 501 in the Y direction can be determined, and the two Y coordinates The absolute value of the difference between them is taken as the size of the bounding box 601 of the connected object region 501 in the Y direction, thus, the bounding box 601 surrounding the connected object region 501 can be obtained. Similarly, the bounding box corresponding to each connected object region can be determined, for example, the bounding box 602 corresponding to the connected object region 502, the bounding box 603 corresponding to the connected object region 503, and the like. It is worth noting that, in order to clearly show the bounding boxes, the size of each bounding box shown in FIG. The dimensions in the X direction and the Y direction may be equal to those determined in the above manner.
例如,在其他实施例中,包围框还可以是除矩形之外的其他形状,例如可以是椭圆形、三角形、梯形等等。For example, in other embodiments, the bounding box may also be in other shapes than rectangle, such as oval, triangle, trapezoid and so on.
需要说明的是,还可以通过其他合适的方式确定对象连通区域的包围框。It should be noted that the bounding boxes of the connected regions of the objects may also be determined in other suitable ways.
例如,在步骤150中,可以根据M个包围框中的一个或多个(N个)包围框,从初始图像201中截取对应的图像块。For example, in step 150, corresponding image blocks may be intercepted from the initial image 201 according to one or more (N) bounding boxes among the M bounding boxes.
图7A为本公开至少一实施例提供的从初始图像截取图像块的示意图,图7B为本公开至少一实施例提供的图像块的示意图,结合图2、图7A和图7B所示,若在得到中间图像或者连通图像的过程中进行了倾斜校正处理,则在从初始图像中截取图像块之前,可以先对初始图像201进行倾斜校正处理,得到校正后的初始图像201`,然后再从矫正后的初始图像201`中截取图像块。在一个示例中,可以针对M个包围框中的每个包围框,均从初始图像201中截取对应区域的一个图像块,这种情况下,M与N相等。例如,根据M个包围框的坐标参数,根据中间图像和初始图像之间的对应关系(例如,映射关系),将M个包围框均映射至初始图像201`,以截取初始图像201`中被每个包围框框起来的一个图像块,从而得到M个图像块,例如根据包围框601截取得到图像块701、根据包围框602截取得到图像块702、根据包围框603截取得到图像块703、根据包围框604截取得到图像块704、以及根据包围框605截取得到图像块705等。在另一示例中,N也可以小于M,即可以从M个包围框中选取部分包围框,然后从初始图像201中截取该部分包围框所限定的图像块。Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure, and Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure, combined with Fig. 2, Fig. 7A and Fig. 7B, if in In the process of obtaining the intermediate image or the connected image, the tilt correction process is carried out, then before the image block is intercepted from the initial image, the tilt correction process can be performed on the initial image 201 to obtain the corrected initial image 201`, and then from the corrected The image block is intercepted from the subsequent initial image 201'. In an example, for each of the M bounding boxes, an image block of a corresponding area may be intercepted from the initial image 201 , and in this case, M and N are equal. For example, according to the coordinate parameters of the M bounding boxes, according to the corresponding relationship (for example, mapping relationship) between the intermediate image and the initial image, the M bounding boxes are all mapped to the initial image 201 ′, so as to intercept the initial image 201 ′ An image block is framed by each bounding frame, thereby obtaining M image blocks, for example, the image block 701 is intercepted according to the bounding frame 601, the image block 702 is obtained according to the bounding frame 602, the image block 703 is obtained according to the bounding frame 603, and the image block is obtained according to the enclosing frame Block 604 intercepts the image block 704, and intercepts the image block 705 according to the bounding box 605, and so on. In another example, N may also be smaller than M, that is, a partial bounding box may be selected from the M bounding boxes, and then an image block defined by the partial bounding box may be intercepted from the initial image 201 .
例如,在步骤S160中,可以利用对象识别模型识别各个图像块,得到每个图像块中的字符内容。在目标对象包括字符的情况下,对象识别模型可以包括字符识别模型,例如,字符识别模型可以基于光学字符识别等技术实现并且例如运行在通用计算装置或专用计算装置上,例如,字符识别模型也可以为预先训练好的神经网络模型。在一些实施例中,例如,识别得到的多个字符内容可能存在语义错误、逻辑错误等,因此,需要对字符识别模型识别得到的字符内容进行校验,纠正字符内容中的语义错误、逻辑错误等,以得到准确的字符内容。例如,字符识别模型可以采用CRNN(Convolutional Recurrent Neural Network,卷积循环神经网络)+CTC(Connectionist Temporal Classification,连接时序分类)架构,CRNN+CTC架构的主干网络(Backbone)可以采用MobileNetV3 Small网络,为了适 配本公开实施例中的对图像块的识别,可以进行适应性的调整,例如对MobileNetV3 Small网络中的inverted_res_block部分进行适应性调整。For example, in step S160, the object recognition model may be used to identify each image block to obtain the character content in each image block. In the case where the target object includes characters, the object recognition model may include a character recognition model. For example, the character recognition model may be implemented based on technologies such as optical character recognition and run on a general-purpose computing device or a special-purpose computing device. For example, the character recognition model may also be Can be a pre-trained neural network model. In some embodiments, for example, there may be semantic errors, logic errors, etc. in the recognized multiple character contents. Therefore, it is necessary to verify the character contents recognized by the character recognition model and correct the semantic errors and logical errors in the character contents. etc. to get the exact character content. For example, the character recognition model can use the CRNN (Convolutional Recurrent Neural Network, convolutional cyclic neural network) + CTC (Connectionist Temporal Classification, connection time series classification) architecture, and the backbone network (Backbone) of the CRNN+CTC architecture can use the MobileNetV3 Small network. To adapt to the identification of image blocks in the embodiments of the present disclosure, adaptive adjustments can be made, for example, adaptive adjustments can be made to the inverted_res_block part in the MobileNetV3 Small network.
例如,针对每个图像块可以识别得到至少一个字符,每个字符可以为单个中文字、单个外文字(例如,单个英文字母或单个英文单词等)、单个数字、单个符号、单个图形、单个标点符号等。例如,根据图像块701可以识别得到字符内容“DECLARATION AND ASSIGNMENT”,根据图像块702可以识别得到字符内容“Signature:”,以及根据图像块702可以识别得到字符内容“Date:”等。For example, at least one character can be recognized for each image block, and each character can be a single Chinese character, a single foreign character (for example, a single English letter or a single English word, etc.), a single number, a single symbol, a single graphic, a single punctuation symbols etc. For example, the character content "DECLARATION AND ASSIGNMENT" can be identified according to the image block 701, the character content "Signature:" can be identified according to the image block 702, and the character content "Date:" can be identified according to the image block 702.
例如,在其他实施例中,目标对象可以包括除字符之外的其他对象,例如图案等,在这种情况下,对象识别模型还可以包括图案识别模型等,图案识别模型例如运行在通用计算装置或专用计算装置上,例如,图案识别模型也可以为预先训练好的神经网络模型。在一个示例中,图案识别模型可以将图案识别为相应的英文单词或中文词,例如可以将太阳图案识别为文字“太阳”。在另一示例中,还可以利用图案识别模型将图案转换为相应的简笔画图形,例如可以预存多种简笔画图形,若利用图案识别模型识别出待识别的图案为太阳图案,则可以从图形库中选出与太阳图案对应的简笔画图形,并将该简笔画图形作为识别结果。For example, in other embodiments, the target object may include other objects other than characters, such as patterns, etc. In this case, the object recognition model may also include a pattern recognition model, etc., and the pattern recognition model, for example, runs on a general-purpose computing device Or on a dedicated computing device, for example, the pattern recognition model can also be a pre-trained neural network model. In one example, the pattern recognition model can recognize the pattern as a corresponding English word or Chinese word, for example, it can recognize the sun pattern as the word "sun". In another example, a pattern recognition model can also be used to convert the pattern into a corresponding stick figure. For example, a variety of stick figures can be stored in advance. Select the stick figure corresponding to the sun pattern from the library, and use the stick figure as the recognition result.
例如,在目标对象包含多种类型的情况下,可以分别利用不同的识别模型识别不同类型的目标对象,并可以将多个识别模型的识别结果拼接结合得到初始图像中的全部目标对象的识别结果。For example, in the case of multiple types of target objects, different recognition models can be used to identify different types of target objects, and the recognition results of multiple recognition models can be spliced and combined to obtain the recognition results of all target objects in the initial image .
本公开实施例提供的图像处理方法,可以先将初始图像转换为中间图像,再利用区域检测模型将中间图像转换为连通图像以得到若干个对象连通区域,确定对象连通区域对应的包围框,然后再回到初始图像中截取包围框对应的图像块。本公开实施例的这一方式相比于相关技术中直接根据初始图像确定对象所在区域的算法计算量更小且处理过程更为简单,因而至少部分解决了复杂度高、计算量大的问题,使对象识别算法能够应用于手机等硬件配置较低的终端设备上,使终端设备在离线的情况下也能够进行对象识别。The image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box. Compared with the algorithm of determining the area where the object is located directly based on the initial image in the related art, the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus at least partially solving the problem of high complexity and large amount of calculation. The object recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, and the terminal device can also perform object recognition when it is offline.
例如,在步骤S130(利用区域检测模型对所述中间图像进行识别,以得到包括M个对象连通区域的连通图像)中,可以利用区域检测模型处理中间图像,得到包括多个初始对象连通区域的连通图像;对包括多个初始对象连通区域的连通图像进行形态学变换,以基于包括多个初始对象连通区域的连通图像得到包括M个对象连通区域的连通图像。For example, in step S130 (using the region detection model to identify the intermediate image to obtain a connected image including M object connected regions), the region detection model can be used to process the intermediate image to obtain a plurality of initial object connected regions Connected image: performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.
图8为本公开至少一实施例提供的包括多个初始对象连通区域的连通图像的示意图,如图8所示,利用区域检测模型处理得到的包括多个初始对象连通区域的连通图像中可能会存在小白点801和黏连行802等问题,例如某相邻的两行文本行因为行之间的某一个像素而形成黏连。这种情况下,可以对包括多个初始对象连通区域的连通图像进行形态学(morphology)变换,以得到图5所示的修正后的连通图像(即M个对象连通区域的连通图像),在修正后的连通图像中,小白点801被去除,黏连行802被拆分为图5所示的行504和行505。形态学(morphology)变换可以包括闭操作和开操作,开操作可以平滑轮廓,断开较窄的狭颈(例如细长的白色线条),并消除细小的突出物,例如去除黏连行的凸起;闭操作也可以平滑物体轮廓,但与开操作相反的是,闭操作可以弥合较窄的间断和细长的沟壑,消除小的空洞,填补轮廓线中的断裂,例如去除小白点。Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure. As shown in Fig. 8 , there may be There are problems such as small white dots 801 and glued lines 802. For example, two adjacent text lines are glued because of a certain pixel between the lines. In this case, the connected image including multiple initial object connected regions can be morphologically transformed to obtain the corrected connected image shown in FIG. 5 (that is, the connected image of M object connected regions). In the corrected connected image, the small white dot 801 is removed, and the cohesive row 802 is split into row 504 and row 505 shown in FIG. 5 . Morphological transformations can include closing operations and opening operations. The opening operation can smooth contours, break off narrow necks (such as thin white lines), and eliminate small protrusions, such as removing cohesive rows. Opening; the closing operation can also smooth the outline of the object, but contrary to the opening operation, the closing operation can bridge narrower discontinuities and slender gullies, eliminate small holes, and fill in the breaks in the contour line, such as removing small white spots.
例如,在步骤S140(确定在连通图像中与M个对象连通区域分别对应的M个包围框)中,可以提取M个对象连通区域各自的轮廓信息;基于轮廓信息,确定M个对象连通区域各自的包围框。For example, in step S140 (determining the M bounding boxes corresponding to the M object connected regions in the connected image), the contour information of each of the M object connected regions can be extracted; based on the contour information, each of the M object connected regions can be determined bounding box.
例如,轮廓信息可以是轮廓线信息,例如轮廓线的坐标信息。可以针对每个对象连通区域提取区域的轮廓线信息,根据轮廓线信息可以确定对象连通区域在X方向和Y方向上的边界点,进而可以根据边界点确定对象连通区域对应的最小外接框,即对象连通区域的包围框。例如,可以采用opencv(一种计算机视觉和机器学习软件库)中的多种轮廓提取算法实现轮廓线信息的提取,多种轮廓提取算法例如包括Canny(坎尼)边缘检测算法、Sobel(索贝尔)边缘检测算法等。For example, the contour information may be contour line information, such as coordinate information of the contour line. The contour line information of the region can be extracted for each object connected region, and the boundary points of the object connected region in the X direction and Y direction can be determined according to the contour line information, and then the minimum bounding box corresponding to the object connected region can be determined according to the boundary points, namely The bounding box of the connected region of the object. For example, various contour extraction algorithms in opencv (a kind of computer vision and machine learning software library) can be used to realize the extraction of contour line information. Various contour extraction algorithms include Canny (Canny) edge detection algorithm, Sobel (Sobel) for example. ) edge detection algorithm, etc.
例如,在步骤S150(基于M个包围框,从初始图像中截取N个图像块)中,如上所述,在一个示例中,M与N可以相等,根据中间图像和初始图像之间的对应关系,基于M个包围框中的每个包围框,对应截取初始图像中的一个图像块。For example, in step S150 (cutting N image blocks from the initial image based on M bounding boxes), as mentioned above, in one example, M and N can be equal, according to the correspondence between the intermediate image and the initial image , based on each of the M bounding boxes, correspondingly intercepting an image block in the initial image.
例如,在一个示例中,在确定包围框之前,可以先将连通图像的尺寸缩放(例如放大)至初始图像的原始尺寸,使连通图像的尺寸与初始图像的尺寸一致,然后再在具有原始尺寸的连通图像中根据对象连通区域的轮廓信息确定对象连通区域的包围框,然后将各个包围框映射至初始图像中。在将连通图像的尺寸缩放为原始尺寸之后,由于新增的一些像素的像素值是利用插值计算而得到的,这些像素的像素值是介于0~1之间的数值,因此,为了便于处理,可以对缩放后的连通图像进行二值化处理(例如,图像的灰阶值范围为0~255的情况下设置阈值为127,图像的灰阶值范围为0~1的情况下设置阈值为0.5),使缩放后的连通图像转换为纯黑白图,然后再在二值化后的连通图像中确定包围框。在另一示例中,可以不对连通图像进行缩放,而是在具有预定尺寸的连通图像中确定包围框,然后再根据原始尺寸和预定尺寸的比例关系,将包围框的尺寸放大以得到与原始尺寸相对应的放大尺寸后的包围框,以便将放大尺寸后的包围框映射至初始图像的相应区域。需要说明的是,还可以采用其他合适的方式将连通图像中的包围框映射至初始图像,本公开对此不作具体限定。For example, in one example, before determining the bounding box, the size of the connected image can be scaled (for example, enlarged) to the original size of the original image, so that the size of the connected image is consistent with the size of the original image, and then the original size In the connected image of the object, the bounding boxes of the connected area of the object are determined according to the contour information of the connected area of the object, and then each bounding box is mapped to the initial image. After the size of the connected image is scaled to the original size, since the pixel values of some newly added pixels are obtained by interpolation calculation, the pixel values of these pixels are values between 0 and 1, so for the convenience of processing , the scaled connected image can be binarized (for example, if the grayscale value of the image ranges from 0 to 255, the threshold is set to 127, and when the grayscale value of the image ranges from 0 to 1, the threshold is set to 0.5), convert the scaled connected image into a pure black and white image, and then determine the bounding box in the binarized connected image. In another example, the connected image may not be scaled, but the bounding box is determined in the connected image with a predetermined size, and then according to the proportional relationship between the original size and the predetermined size, the size of the bounding box is enlarged to obtain the original size The corresponding enlarged bounding box is used to map the enlarged bounding box to the corresponding area of the original image. It should be noted that other suitable methods may also be used to map the bounding boxes in the connected images to the initial image, which is not specifically limited in the present disclosure.
在另一示例中,可以对M个包围框进行预定处理,得到N个处理后的包围框,并根据中间图像和初始图像之间的对应关系,基于每个处理后的包围框,对应截取初始图像中的一个图像块,M与N相等或不相等。In another example, M bounding boxes can be pre-processed to obtain N processed bounding boxes, and according to the correspondence between the intermediate image and the initial image, based on each processed bounding box, correspondingly intercept the initial An image block in the image, M and N are equal or not.
例如,对M个包围框进行预定处理可以包括:对M个包围框进行评分,以得到M个包围框分别对应的质量分值;将质量分值小于分值阈值的包围框作为无效包围框,并删除无效包围框。For example, performing predetermined processing on the M bounding boxes may include: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes respectively; using bounding boxes with quality scores smaller than the score threshold as invalid bounding boxes, And remove invalid bounding boxes.
例如,对M个包围框进行评分可以包括:针对M个包围框中的每个包围框执行以下操作:确定包围框的面积和位于包围框中的目标对象对应的像素的面积;基于像素的面积与包围框的面积的比例,确定包围框对应 的质量分值。For example, scoring the M bounding boxes may include: performing the following operations for each of the M bounding boxes: determining the area of the bounding box and the area of the pixels corresponding to the target object located in the bounding box; based on the pixel-based area The ratio to the area of the bounding box determines the quality score corresponding to the bounding box.
例如,可以将包围框映射至图4所示的二值化图像中,字符的颜色与背景颜色不同,即字符的像素值与背景的像素值不同,例如,字符的像素值为1,背景的像素值为0。在计算包围框中的目标对象的面积与包围框的面积的比例的过程中,可以遍历包围框中的各个像素,统计像素值与目标对象的像素值相等的像素的数量,以得到目标对象对应的像素数量,利用统计的目标对象对应的像素数量除以包围框包含的全部像素的数量,即可得到目标对象对应的像素的面积与包围框的面积的比例。在一个示例中,可以直接将该比例作为包围框的质量分值;在另一示例中,可以划分若干个比例范围,每个比例范围对应一个分值,例如,比例范围[0~0.2)可以对应分值1,[0.2~0.4)可以对应分值2,…,[0.8~1]可以对应分值5。For example, the bounding box can be mapped to the binarized image shown in Figure 4. The color of the character is different from the background color, that is, the pixel value of the character is different from the pixel value of the background. For example, the pixel value of the character is 1, and the pixel value of the background is 1. The pixel value is 0. In the process of calculating the ratio of the area of the target object in the bounding box to the area of the bounding box, each pixel in the bounding box can be traversed, and the number of pixels whose pixel values are equal to the pixel value of the target object can be counted to obtain the corresponding By dividing the number of pixels corresponding to the statistical target object by the number of all pixels included in the bounding box, the ratio of the area of the pixel corresponding to the target object to the area of the bounding box can be obtained. In one example, the ratio can be directly used as the quality score of the bounding box; in another example, several ratio ranges can be divided, and each ratio range corresponds to a score, for example, the ratio range [0-0.2) can be Corresponding to a score of 1, [0.2~0.4) can correspond to a score of 2, ..., [0.8~1] can correspond to a score of 5.
例如,在其他实施例中,可以根据包围框的倾斜度来确定质量分值,例如,对于图4所示的二值化图像,字符均按照X方向排列,可以根据包围框的轴线与X方向(或Y方向)的夹角来确定包围框的质量分值,例如可以将夹角直接作为质量分值,或者可以划分若干个夹角范围,每个夹角范围对应一个分值。此外,本领域技术人员还可以采用其他方式对图像块进行评分。For example, in other embodiments, the quality score can be determined according to the inclination of the bounding box. For example, for the binarized image shown in FIG. (or Y direction) to determine the quality score of the bounding box. For example, the included angle can be directly used as the quality score, or several included angle ranges can be divided, and each included angle range corresponds to a score. In addition, those skilled in the art may use other methods to score image blocks.
例如,在获得包围框的质量分值之后,可以将质量分值低于预定分值阈值的包围框去除,保留优质包围框。通过对包围框进行评分并去除无效包围框的方式,可以过滤掉无效内容,避免后续无效的计算量,并保证识别结果的准确度。For example, after the quality scores of the bounding boxes are obtained, the bounding boxes whose quality scores are lower than a predetermined score threshold may be removed, and high-quality bounding boxes are retained. By scoring the bounding boxes and removing invalid bounding boxes, invalid content can be filtered out, subsequent invalid calculations can be avoided, and the accuracy of recognition results can be guaranteed.
例如,分值阈值可以根据实际情况设置,在一些示例中,分值阈值可以为最高预定分值的s倍,s例如介于0.3~0.8之间。例如,质量分值为0~1之间的数值,则最高预定分值为1,分值阈值可以是0.3~0.8之间的数值(例如0.5),质量分值大于等于分值阈值的包围框可以认为是优质包围框,质量分值小于分值阈值的包围框可以认为是无效包围框。For example, the score threshold may be set according to actual conditions. In some examples, the score threshold may be s times the highest predetermined score, and s is, for example, between 0.3 and 0.8. For example, if the quality score is a value between 0 and 1, the highest predetermined score is 1, the score threshold can be a value between 0.3 and 0.8 (for example, 0.5), and the bounding box with the quality score greater than or equal to the score threshold It can be considered as a high-quality bounding box, and the bounding box with a quality score smaller than the score threshold can be considered as an invalid bounding box.
例如,对M个包围框进行预定处理还可以包括:将M个包围框中的一个或多个包围框放大第一预定倍数。For example, performing predetermined processing on the M bounding boxes may further include: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.
例如,可能有些包围框包围的范围较小而导致目标对象未被完全包围在包围框中,例如文本行中有部分字符未被包围框包围或者一些字符的部分区域未包含在包围框中。为解决这一问题,可以对这些包围框进行放大处理,以将未被包含在包围框中的目标对象包含进包围框中。例如,可以将包围框按照面积周长比的k(第一预定倍数)倍进行放大,例如,放大中心为该包围框的中心,k例如为大于1且小于2的正数,例如k为1.6。例如,对于任一包围框,该包围框对应的放大后的包围框可以完全覆盖该包围框。For example, some bounding boxes may have a smaller range and the target object is not completely enclosed in the bounding box, for example, some characters in the text line are not surrounded by the bounding box or some characters are not included in the bounding box. To solve this problem, these bounding boxes can be enlarged to include target objects that are not contained in the bounding boxes. For example, the bounding box can be enlarged according to k (the first predetermined multiple) times of the area-to-perimeter ratio, for example, the center of enlargement is the center of the bounding box, k is, for example, a positive number greater than 1 and less than 2, for example, k is 1.6 . For example, for any bounding box, the enlarged bounding box corresponding to the bounding box may completely cover the bounding box.
例如,可以对M个包围框均进行放大处理,或者可以从M个包围框中筛选出范围较小的若干个包围框进行放大处理,例如,可以检测每个包围框的预定周边范围内是否存在目标对象未被包围进任一包围框中,例如可以检测包围框的预定周边范围内是否存在一定数量的目标对象的像素,若是,则可以对该包围框进行放大处理。例如,预定周边范围可以是将包围框以其中心点为放大中心等比例放大t倍后所得的虚拟包围框与该包围框之间的环形区域,t例如大于1且小于2。例如,M个包围框包括第一包围框,以第一包围框的中心点为放大中心将第一包围框放大t倍得到第一虚拟包围框,可以将该第一虚拟包围框与第一包围框之间的环形区域作为第一包围框的预定周边范围。例如,包围框包括沿X方向延伸的两个第一边和沿Y方向延伸的两个第二边,第一边与第二边垂直,等比例放大可以是将包围框的第一边和第二边均放大t倍,得到虚拟包围框。这种情况下,包围框的中心点与虚拟包围框的第一边之间的距离为该中心点与包围框的第一边之间的距离的t倍,例如,该中心点与包围框的第一边之间的距离为5(mm),则包围框的中心点与虚拟包围框的第一边之间的距离为5t(mm)。同样地,包围框的中心点与虚拟包围框的第二边之间的距离为该中心点与包围框的第二边之间的距离的t倍。For example, all the M bounding boxes can be enlarged, or several bounding boxes with a smaller range can be selected from the M bounding boxes for zooming in. For example, it can be detected whether there is If the target object is not enclosed in any bounding box, for example, it may be detected whether there are a certain number of pixels of the target object within a predetermined surrounding range of the bounding box, and if so, the bounding box may be enlarged. For example, the predetermined surrounding range may be an annular area between a virtual bounding box and the bounding box obtained by enlarging the bounding box with its central point as the magnification center by t times, and t is, for example, greater than 1 and less than 2. For example, the M bounding boxes include the first bounding box, and the center point of the first bounding box is used as the magnification center to enlarge the first bounding box by t times to obtain the first virtual bounding box, and the first virtual bounding box can be combined with the first bounding box The annular area between the frames serves as a predetermined peripheral range of the first bounding frame. For example, the bounding box includes two first sides extending along the X direction and two second sides extending along the Y direction. Both sides are enlarged by t times to obtain a virtual bounding box. In this case, the distance between the center point of the bounding box and the first side of the virtual bounding box is t times the distance between the center point and the first side of the bounding box, for example, the distance between the center point and the first side of the bounding box The distance between the first sides is 5 (mm), then the distance between the center point of the bounding box and the first side of the virtual bounding box is 5t (mm). Likewise, the distance between the center point of the bounding box and the second side of the virtual bounding box is t times the distance between the center point and the second side of the bounding box.
例如,针对包围框的放大操作可以在去除无效包围框的操作之后进行,这种情况下,去除无效包围框后剩余N个包围框,可以对N个包围框均进行放大处理,或者可以从N个包围框中筛选出范围较小的若干个包围框进 行放大处理。For example, the enlarging operation for the bounding boxes can be performed after the operation of removing the invalid bounding boxes. In this case, after removing the invalid bounding boxes, there are still N bounding boxes, and the enlarging process can be performed on all the N bounding boxes, or can be obtained from N Several bounding boxes with a smaller range are screened out from the bounding boxes for zooming in.
例如,对M个包围框进行预定处理还可以包括:检测M个包围框中每相邻两个包围框之间是否至少部分区域重叠,若是,将至少部分区域重叠的两个包围框中的每个包围框基于第二预定倍数进行缩小处理,以使得缩小后的两个包围框不重叠或者重叠区域减小。For example, performing predetermined processing on the M bounding boxes may also include: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, each of the two bounding boxes with at least partially overlapping areas The two bounding boxes are reduced based on the second predetermined multiple, so that the two reduced bounding boxes do not overlap or the overlapping area is reduced.
例如,可能有些包围框包围的范围较大而导致相邻两个包围框部分区域重叠,为解决这一问题,可以对这些包围框进行缩小处理以使得缩小后的两个包围框不重叠或者重叠区域减小。例如,可以计算每相邻两个包围框之间的交集,相邻两个包围框之间的交集例如是相邻两个包围框之间的MIoU值(Mean Intersection over Union,语义分割评估指标),并按照0.9*(1-MIoU)的倍数进行缩小,第二预定倍数例如为该0.9*(1-MIoU),第二预定倍数例如为0.5~0.9之间的数值,即将包围框缩小为原尺寸的0.5~0.9倍。针对包围框的缩小处理可以在针对包围框的放大处理之后进行,这样可以避免包围框扩大后导致相邻包围框连接或者重叠的问题,使每个包围框具有合适的范围,进而可以在初始图像中截取到合适大小的图像块。For example, some bounding boxes may enclose a large area and cause some areas of two adjacent bounding boxes to overlap. To solve this problem, these bounding boxes can be reduced so that the reduced two bounding boxes do not overlap or overlap The area is reduced. For example, the intersection between every two adjacent bounding boxes can be calculated. The intersection between two adjacent bounding boxes is, for example, the MIoU value (Mean Intersection over Union, semantic segmentation evaluation index) between two adjacent bounding boxes. , and shrink according to the multiple of 0.9*(1-MIoU), the second predetermined multiple is, for example, the 0.9*(1-MIoU), the second predetermined multiple is, for example, a value between 0.5 and 0.9, that is, the bounding box is reduced to the original 0.5 to 0.9 times the size. The shrinking process for the bounding box can be performed after the zoom-in process for the bounding box, which can avoid the problem that the adjacent bounding boxes are connected or overlapped after the bounding box is enlarged, so that each bounding box has a suitable range, and then can be in the initial image. An image block of appropriate size is intercepted from the image.
例如,在截取得到N个图像块之后,可以对N个图像块中的至少部分图像块进行缩放处理,以使处理后的N个图像块在Y方向上的尺寸相同,例如可以将N个图像块在Y方向上的尺寸统一缩放为32像素对应的尺寸,以方便后续对象识别模型的处理。For example, after the N image blocks are intercepted, at least part of the N image blocks can be scaled so that the processed N image blocks have the same size in the Y direction. For example, the N image blocks can be The size of the block in the Y direction is uniformly scaled to the size corresponding to 32 pixels to facilitate the processing of the subsequent object recognition model.
图9为本公开至少一实施例提供的识别N个图像块的示意性流程图,如图9所示,例如,在步骤S160(利用对象识别模型识别N个图像块,以得到初始图像中的目标对象)中,可以包括步骤S161~步骤S164。Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure. As shown in Fig. 9, for example, in step S160 (using an object recognition model to identify N image blocks to obtain the target object), steps S161 to S164 may be included.
步骤S161:确定N个图像块中在第一方向上的长度大于识别长度阈值的P个第一图像块,并将每个第一图像块分割为至少两个子图像块,以得到与P个第一图像块对应的多个子图像块,每个子图像块的长度等于或小于识别长度阈值。例如,P为正整数。Step S161: Determine the P first image blocks whose length in the first direction is greater than the recognition length threshold among the N image blocks, and divide each first image block into at least two sub-image blocks to obtain An image block corresponds to a plurality of sub-image blocks, and the length of each sub-image block is equal to or smaller than the recognition length threshold. For example, P is a positive integer.
步骤S162:利用对象识别模型识别多个子图像块,以得到P个第一图像块中的目标对象。例如,初始图像中的目标对象包括P个第一图像块中 的目标对象。Step S162: Using the object recognition model to identify a plurality of sub-image blocks to obtain target objects in the P first image blocks. For example, the target object in the initial image includes the target object in the P first image blocks.
步骤S163:确定N个图像块中在第一方向上的长度小于识别长度阈值的Q个第二图像块,并对每个第二图像块进行处理,得到Q个处理后的第二图像块,每个处理后的第二图像块在第一方向上的长度为识别长度阈值。例如,Q为正整数。Step S163: Determine Q second image blocks whose length in the first direction is smaller than the recognition length threshold among the N image blocks, and process each second image block to obtain Q processed second image blocks, The length of each processed second image block in the first direction is the recognition length threshold. For example, Q is a positive integer.
步骤S164:利用对象识别模型识别Q个处理后的第二图像块,以得到Q个第二图像块中的目标对象。例如,初始图像中的目标对象还包括Q个第二图像块中的目标对象。Step S164: Use the object recognition model to identify the Q processed second image blocks, so as to obtain the target objects in the Q second image blocks. For example, the target object in the initial image also includes the target object in the Q second image blocks.
例如,在一些实施例中,N个图像块中可以仅包括第一图像块而不包括第二图像块,这种情况下,在步骤S160中,可以仅执行步骤S161和步骤S162,而无需执行步骤S163和步骤S164。在另一些实施例中,N个图像块中可以仅包括第二图像块而不包括第一图像块,这种情况下,在步骤S160中,可以仅执行步骤S163和步骤S164,而无需执行步骤S161和步骤S162。For example, in some embodiments, the N image blocks may only include the first image block and not include the second image block. In this case, in step S160, only step S161 and step S162 may be executed without executing Step S163 and Step S164. In some other embodiments, the N image blocks may only include the second image block but not the first image block. In this case, in step S160, only step S163 and step S164 may be performed without performing step S161 and step S162.
例如,每个第一图像块包括多个目标对象,多个目标对象沿第一方向依次排列。第一方向可以是图像块的长度方向,图像块的长度方向可以根据图像块中目标对象的排列方向来确定,例如如图7B所示,图像块中的字符按照X方向排列,则第一方向可以是指X方向。For example, each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction. The first direction can be the length direction of the image block, and the length direction of the image block can be determined according to the arrangement direction of the target objects in the image block. For example, as shown in Figure 7B, the characters in the image block are arranged according to the X direction, then the first direction May refer to the X direction.
例如,图像块的长度可以用像素数量来表示,可以预设一个识别长度阈值,识别长度阈值例如可以是400~1000个像素,例如640像素。对于N个图像块中大于该识别长度阈值的图像块,可以将图像块进行分割,例如分割为若干个长度小于或等于识别长度阈值的子图像块。对于N个图像块中小于该识别长度阈值的图像块,可将图像块处理为长度等于识别长度阈值。基于这一方式,一方面,将图像块处理为近似统一的尺寸可以便于模型处理,另一方面,将较大的图像块分割为小图像块,可以减小模型的计算量,并且可以使用简单的识别模型进行识别,提高了识别速度。For example, the length of an image block can be represented by the number of pixels, and a recognition length threshold can be preset, and the recognition length threshold can be, for example, 400-1000 pixels, such as 640 pixels. For an image block larger than the recognition length threshold among the N image blocks, the image block may be segmented, for example, divided into several sub-image blocks whose length is less than or equal to the recognition length threshold. For an image block among the N image blocks that is smaller than the recognition length threshold, the image block may be processed to have a length equal to the recognition length threshold. Based on this method, on the one hand, processing image blocks into approximately uniform sizes can facilitate model processing; on the other hand, dividing larger image blocks into small image blocks can reduce the amount of calculation of the model, and can be used simply The recognition model is used for recognition, which improves the recognition speed.
例如,在步骤S161中,将每个第一图像块分割为至少两个子图像块可以包括:针对N个图像块中的第i个第一图像块执行以下操作:在第一 方向上,每间隔识别长度阈值设置一个候选分割点,以确定第i个第一图像块对应的至少一个候选分割点;基于至少一个候选分割点,确定第i个第一图像块对应的至少一个分割点;基于至少一个分割点,将第i个第一图像块分割为至少两个子图像块,例如,i为小于等于P的正整数。For example, in step S161, dividing each first image block into at least two sub-image blocks may include: performing the following operations on the i-th first image block among the N image blocks: in the first direction, every interval Identifying the length threshold to set a candidate segmentation point to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine at least one segmentation point corresponding to the i-th first image block; based on at least A segmentation point, which divides the i-th first image block into at least two sub-image blocks, for example, i is a positive integer less than or equal to P.
图10A为本公开至少一实施例提供的分割图像块的示意图,如图10A所示,以图像块704为例对分割过程进行说明,可以从图像块704的起点901开始,每间隔识别长度阈值L设置一个候选分割点,例如得到候选分割点902和903。根据每个候选分割点可以确定一个分割点,在一个示例中,候选分割点可以直接作为一个分割点,例如,候选分割点902可以作为一个分割点;在另一示例中,可以将候选分割点的预定距离范围内的一个点作为一个分割点,预定距离范围例如可以是X方向上[pc-lg,pc+lg]之间的范围,pc为候选分割点的X坐标,lg为g个像素的尺寸,g例如介于12~60之间。Fig. 10A is a schematic diagram of a segmented image block provided by at least one embodiment of the present disclosure. As shown in Fig. 10A , the segmentation process is described by taking the image block 704 as an example, starting from the starting point 901 of the image block 704, and identifying the length threshold at each interval L sets a candidate segmentation point, for example, the candidate segmentation points 902 and 903 are obtained. A segmentation point can be determined according to each candidate segmentation point. In one example, the candidate segmentation point can be directly used as a segmentation point, for example, the candidate segmentation point 902 can be used as a segmentation point; in another example, the candidate segmentation point can be A point within the predetermined distance range of is used as a segmentation point. The predetermined distance range can be, for example, the range between [pc-lg, pc+lg] in the X direction, pc is the X coordinate of the candidate segmentation point, and lg is g pixels The size of g, for example, is between 12 and 60.
例如在候选分割点903的预定距离范围确定一个分割点903`。在得到各个分割点之后,可以沿分割点对图像块进行切割,例如切割得到子图像块7041、7042和7043。For example, a segmentation point 903 ′ is determined within a predetermined distance range of the candidate segmentation point 903 . After each division point is obtained, the image block may be cut along the division point, for example, sub-image blocks 7041 , 7042 and 7043 are obtained by cutting.
例如,基于至少一个候选分割点,确定第i个第一图像块对应的至少一个分割点可以包括:若在第i个第一图像块中的至少一个候选分割点中的任一候选分割点的距离阈值的范围内包含间隔区域,则将间隔区域中的一点作为第i个第一图像块对应的一个分割点;若在第i个第一图像块中的至少一个候选分割点中的任一候选分割点的距离阈值的范围内不包含间隔区域,则将任一候选分割点作为第i个第一图像块对应的一个分割点。For example, based on at least one candidate segmentation point, determining at least one segmentation point corresponding to the i-th first image block may include: if any candidate segmentation point in the at least one candidate segmentation point in the i-th first image block Include interval area in the scope of distance threshold value, then take a point in the interval area as a segmentation point corresponding to the i-th first image block; if any of at least one candidate segmentation point in the i-th first image block If the range of the distance threshold of the candidate segmentation point does not include the interval area, any candidate segmentation point is taken as a segmentation point corresponding to the i-th first image block.
例如,在目标对象包括字符的场景中,若候选分割点正好位于相邻两个字符之间的间隔区域内,则可以将候选分割点作为分割点,例如候选分割点902位于字符“,”和字符“Building”之间的间隔区域内,则可以将候选分割点作为一个分割点。若候选分割点没有位于相邻两个字符之间的间隔区域内,则可以确定候选分割点附近的间隔区域,并将间隔区域内的一点作为分割点,例如,候选分割点903位于字符“Beijing”中,而没有 位于字符的间隔区域中,因此,可以遍历候选分割点903的一定距离范围内的像素点,以寻找字符“Beijing”附近的间隔区域,例如字符“,”(位于字符“Road”和字符“Beijing”之间)和字符“Beijing”之间的间隔区域位于字符“Beijing”的一定距离范围内,则可以在字符“,”和字符“Beijing”之间的间隔区域中确定一个点作为分割点,例如将字符间隔区域的中点作为分割点。若候选分割点的预定距离范围内不存在字符间隔区域,则可以将候选分割点作为一个分割点。For example, in a scene where the target object includes characters, if the candidate segmentation point is exactly located in the space between two adjacent characters, the candidate segmentation point can be used as the segmentation point, for example, the candidate segmentation point 902 is located between the characters "," and In the interval area between the characters "Building", the candidate segmentation point can be used as a segmentation point. If the candidate segmentation point is not located in the interval area between two adjacent characters, the interval area near the candidate segmentation point can be determined, and a point in the interval area is used as the segmentation point. For example, the candidate segmentation point 903 is located in the character "Beijing ", but not in the space between the characters, therefore, the pixel points within a certain distance range of the candidate segmentation point 903 can be traversed to find the space between the character "Beijing", for example, the character "," (located in the character "Road " and the character "Beijing") and the space between the character "Beijing" is within a certain distance from the character "Beijing", then a space can be determined in the space between the character "," and the character "Beijing". point as the split point, for example, the midpoint of the character space area as the split point. If there is no character space area within the predetermined distance range of the candidate segmentation point, the candidate segmentation point may be regarded as a segmentation point.
例如,在确定分割点之后,可以按照切割点所在的位置对第一图像块进行切割处理,得到若干个子图像块。For example, after the division point is determined, the first image block may be cut according to the position of the cut point to obtain several sub-image blocks.
例如,对每个第二图像块进行处理可以包括:在第一方向上,在每个第二图像块的至少一端拼接端部图像块,以得到每个第二图像块对应的处理后的第二图像块。例如,端部图像块中的每个像素的像素值与第二图像块中的每个对象对应的像素的像素值不同。For example, processing each second image block may include: in the first direction, splicing end image blocks at at least one end of each second image block to obtain a processed first image block corresponding to each second image block. Two image blocks. For example, the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
图10B为本公开至少一实施例提供的拼接端部图像块的示意图,如图10B所示,以图像块702为例对分割过程进行说明,例如,对于长度小于识别长度阈值L的图像块702,可以对图像块702进行补长处理,例如可以在图像块702的X方向的一侧或者两侧拼接端部图像块,端部图像块的像素值与目标对象的像素值不同,端部图像块的像素值例如可以与图像块702的背景部分的像素值一致,拼接后得到的新的图像块702`的长度例如等于识别长度阈值L。FIG. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure. As shown in FIG. 10B , the segmentation process is described by taking the image block 702 as an example. For example, for the image block 702 whose length is less than the recognition length threshold L , the image block 702 can be complemented. For example, the end image block can be spliced on one side or both sides of the X direction of the image block 702. The pixel value of the end image block is different from the pixel value of the target object. The end image The pixel value of the block may be consistent with the pixel value of the background part of the image block 702, for example, and the length of the new image block 702' obtained after splicing is, for example, equal to the recognition length threshold L.
例如,对于切割后得到的长度不足识别长度阈值L的子图像块,也可以进行拼接补长处理,例如,如图10A所示,若切割后得到的子图像块7043的长度小于识别长度阈值L,则可以按照上述拼接方式处理子图像块7043,以使处理后的子图像块7043的长度等于识别长度阈值L。For example, for sub-image blocks obtained after cutting whose length is less than the recognition length threshold L, splicing and supplementary length processing may also be performed. For example, as shown in FIG. 10A, if the length of the sub-image block 7043 obtained after cutting is less than the recognition length threshold L , then the sub-image block 7043 can be processed according to the above splicing manner, so that the length of the processed sub-image block 7043 is equal to the recognition length threshold L.
例如,在利用切割处理的方式和/或拼接处理的方式得到与识别长度阈值对应的各个子图像块和第二图像块之后,可以利用对象识别模型对各个子图像块和第二图像块进行识别处理。以识别英文字母为例,每个英文字母和标点符号的长度例如为4像素,则对于一个32*640*3的图像块,可以 识别得到640/4=160个英文字母,32*640*3中的32例如代表图像块高度为32个像素对应的高度,640例如代表图像块长度为640个像素对应的长度,3例如代表这是图像块为3通道的图像块。For example, after each sub-image block and the second image block corresponding to the recognition length threshold are obtained by means of cutting processing and/or splicing processing, the object recognition model can be used to identify each sub-image block and the second image block deal with. Taking the recognition of English letters as an example, the length of each English letter and punctuation mark is, for example, 4 pixels, then for an image block of 32*640*3, 640/4=160 English letters can be recognized, 32*640*3 32, for example, represents the height corresponding to the image block height of 32 pixels, 640, for example, represents the length corresponding to the image block length of 640 pixels, and 3, for example, represents that the image block is an image block with 3 channels.
例如,可以将对象识别模型训练为针对每个目标对象输出d个可能的候选识别结果,d为大于0且小于5的整数,例如,以识别英文字母为例,在d为2的情况下,针对图像块中的英文字母“m”,对象识别模型可能会输出候选识别结果为“m”和“n”。例如,对于一个32*640*3的图像块可以返回160*d个识别结果,每个字符是4个像素,那么就是有160个字符,d代表着对象识别模型判断每个字符的候选识别结果的数量。然后,可以利用argmax函数对160*d个识别结果进行操作返回160个识别结果。相当于从每个字符的候选识别结果中找到最可能的识别结果。例如,在一些实施例中,在识别过程中是按照逐像素分割4个像素来进行判断识别,可能会有重复的识别结果,因此还可以通过去重操作,去除重复的识别结果,得到图像块的最终的识别结果。For example, the object recognition model can be trained to output d possible candidate recognition results for each target object, and d is an integer greater than 0 and less than 5. For example, taking the recognition of English letters as an example, when d is 2, For the English letter "m" in the image block, the object recognition model may output candidate recognition results as "m" and "n". For example, for a 32*640*3 image block, 160*d recognition results can be returned, and each character is 4 pixels, so there are 160 characters, and d represents the candidate recognition result of the object recognition model to judge each character quantity. Then, the argmax function can be used to operate on 160*d recognition results and return 160 recognition results. It is equivalent to finding the most likely recognition result from the candidate recognition results of each character. For example, in some embodiments, in the recognition process, it is judged and recognized by dividing 4 pixels pixel by pixel, and there may be repeated recognition results, so it is also possible to remove repeated recognition results through deduplication operations to obtain image blocks the final recognition result.
图11为本公开至少一实施例提供的目标对象识别结果的示意图,结合图2、7A、7B和11所示,例如,将根据每个子图像块和处理后的第二图像块识别得到的对象识别结果进行组合拼接,可以得到与初始图像对应的对象识别结果1100,即利用本公开提供的图像处理方法处理初始图像201之后得到的处理结果如该图11所示。如图11所示,在该对象识别结果1100中,初始图像201中的所有字符(即目标对象)被识别,且在该对象识别结果1100中各个字符之间的相对位置关系与其在初始图像201中的相对位置关系相同。Fig. 11 is a schematic diagram of the target object recognition result provided by at least one embodiment of the present disclosure, combined with Fig. 2, 7A, 7B and 11, for example, the object recognized according to each sub-image block and the processed second image block The recognition results are combined and spliced to obtain the object recognition result 1100 corresponding to the initial image, that is, the processing result obtained after the initial image 201 is processed by the image processing method provided in the present disclosure is shown in FIG. 11 . As shown in FIG. 11 , in the object recognition result 1100, all the characters (i.e. target objects) in the initial image 201 are recognized, and the relative positional relationship between each character in the object recognition result 1100 and its position in the initial image 201 The relative positions in are the same.
本公开至少一实施例还提供一种图像处理装置,图12为本公开至少一实施例提供的一种图像处理装置的示意性框图。At least one embodiment of the present disclosure further provides an image processing device, and FIG. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure.
如图12所示,图像处理装置可以包括:图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、截取模块1205和对象识别模块1206。As shown in FIG. 12 , the image processing device may include: an image acquisition module 1201 , an image processing module 1202 , an area identification module 1203 , a determination module 1204 , an interception module 1205 and an object identification module 1206 .
例如,图像获取模块1201配置为获得初始图像,初始图像包括至少一 个目标对象。例如,图像获取模块1201例如可以执行图1描述的步骤S110,具体介绍可参考步骤S110的相关描述,在此不再赘述。For example, the image acquiring module 1201 is configured to acquire an initial image, and the initial image includes at least one target object. For example, the image acquiring module 1201 may execute step S110 described in FIG. 1 , for a specific introduction, please refer to the related description of step S110 , which will not be repeated here.
例如,图像处理模块1202配置为对初始图像进行处理以得到中间图像。例如,图像处理模块1202例如可以执行图1描述的步骤S120,具体介绍可参考步骤S120的相关描述,在此不再赘述。For example, the image processing module 1202 is configured to process the initial image to obtain an intermediate image. For example, the image processing module 1202 may execute the step S120 described in FIG. 1 , for a specific introduction, please refer to the related description of the step S120 , which will not be repeated here.
例如,区域识别模块1203配置为利用区域检测模型对中间图像进行识别,以得到包括M个对象连通区域的连通图像,M为正整数。例如,区域识别模块1203例如可以执行图1描述的步骤S130,具体介绍可参考步骤S130的相关描述,在此不再赘述。For example, the region recognition module 1203 is configured to use a region detection model to recognize the intermediate image to obtain a connected image including M object connected regions, where M is a positive integer. For example, the area identification module 1203 may execute the step S130 described in FIG. 1 , for a specific introduction, refer to the relevant description of the step S130 , which will not be repeated here.
例如,确定模块1204配置为在连通图像中确定与M个对象连通区域分别对应的M个包围框。例如,确定模块1204例如可以执行图1描述的步骤S140,具体介绍可参考步骤S140的相关描述,在此不再赘述。For example, the determination module 1204 is configured to determine M bounding boxes respectively corresponding to the M object connected regions in the connected image. For example, the determining module 1204 may execute the step S140 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S140 , which will not be repeated here.
例如,截取模块1205配置为基于M个包围框,从初始图像中截取N个图像块,每个图像块包括至少一个目标对象,N为正整数。例如,截取模块1205例如可以执行图1描述的步骤S150,具体介绍可参考步骤S150的相关描述,在此不再赘述。For example, the interception module 1205 is configured to intercept N image blocks from the initial image based on the M bounding boxes, each image block includes at least one target object, and N is a positive integer. For example, the intercepting module 1205 may execute step S150 described in FIG. 1 , for a specific introduction, please refer to the relevant description of step S150 , which will not be repeated here.
例如,对象识别模块1206配置为利用对象识别模型识别N个图像块,以得到初始图像中的目标对象。例如,对象识别模块1206例如可以执行图1描述的步骤S160,具体介绍可参考步骤S160的相关描述,在此不再赘述。For example, the object recognition module 1206 is configured to use an object recognition model to recognize N image blocks, so as to obtain the target object in the initial image. For example, the object recognition module 1206 may execute the step S160 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S160 , which will not be repeated here.
此外,图像处理装置可以实现与前述图像处理方法相似的技术效果,在此不再赘述。In addition, the image processing device can achieve technical effects similar to those of the aforementioned image processing method, which will not be repeated here.
例如,图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、截取模块1205和/或对象识别模块1206包括存储在存储器中的代码和程序;处理器可以执行该代码和程序以实现如上所述的图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、截取模块1205和/或对象识别模块1206的一些功能或全部功能。例如,图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、 截取模块1205和/或对象识别模块1206可以是专用硬件器件,用来实现如上所述的该图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、截取模块1205和/或对象识别模块1206的一些或全部功能。例如,图像获取模块1201、图像处理模块1202、区域识别模块1203、确定模块1204、截取模块1205和/或对象识别模块1206可以是一个电路板或多个电路板的组合,用于实现如上所述的功能。在本申请实施例中,该一个电路板或多个电路板的组合可以包括:(1)一个或多个处理器;(2)与处理器相连接的一个或多个非暂时的存储器;以及(3)处理器可执行的存储在存储器中的固件。For example, image acquisition module 1201, image processing module 1202, area identification module 1203, determination module 1204, interception module 1205, and/or object identification module 1206 include codes and programs stored in memory; processors can execute the codes and programs to Realize some or all of the functions of the image acquisition module 1201 , image processing module 1202 , area identification module 1203 , determination module 1204 , interception module 1205 and/or object identification module 1206 described above. For example, the image acquisition module 1201, the image processing module 1202, the area identification module 1203, the determination module 1204, the interception module 1205 and/or the object identification module 1206 may be dedicated hardware devices, which are used to implement the above-mentioned image acquisition module 1201, Some or all of the functions of the image processing module 1202 , the region recognition module 1203 , the determination module 1204 , the interception module 1205 and/or the object recognition module 1206 . For example, the image acquisition module 1201, the image processing module 1202, the area recognition module 1203, the determination module 1204, the interception module 1205 and/or the object recognition module 1206 may be a circuit board or a combination of multiple circuit boards for realizing the above-mentioned function. In the embodiment of the present application, the circuit board or a combination of multiple circuit boards may include: (1) one or more processors; (2) one or more non-transitory memories connected to the processors; and (3) Processor-executable firmware stored in memory.
本公开至少一实施例还提供一种电子设备,图13为本公开至少一实施例提供的一种电子设备的示意性框图。At least one embodiment of the present disclosure further provides an electronic device, and FIG. 13 is a schematic block diagram of the electronic device provided by at least one embodiment of the present disclosure.
例如,如图13所示,电子设备1300包括处理器1301、通信接口1302、存储器1303和通信总线1304。处理器1301、通信接口1302、存储器1303通过通信总线1304实现相互通信,处理器1301、通信接口1302、存储器1303等组件之间也可以通过网络连接进行通信。本公开对网络的类型和功能在此不作限制。For example, as shown in FIG. 13 , an electronic device 1300 includes a processor 1301 , a communication interface 1302 , a memory 1303 and a communication bus 1304 . The processor 1301, the communication interface 1302, and the memory 1303 communicate with each other through the communication bus 1304, and the processor 1301, the communication interface 1302, the memory 1303 and other components may also communicate through a network connection. The present disclosure does not limit the type and function of the network here.
例如,存储器1303用于存储计算机可读指令。处理器1301用于执行计算机可读指令时,实现根据上述任一实施例所述的图像处理方法。关于该图像处理方法的各个步骤的具体实现以及相关解释内容可以参见上述图像处理方法的实施例,在此不做赘述。For example, memory 1303 is used to store computer readable instructions. When the processor 1301 is configured to execute computer-readable instructions, implement the image processing method according to any of the foregoing embodiments. For the specific implementation of each step of the image processing method and related explanations, reference may be made to the above-mentioned embodiment of the image processing method, and details are not repeated here.
例如,处理器1301执行存储器1303上所存放的程序而实现的图像处理方法的其他实现方式,与前述方法实施例部分所提及的实现方式相同,这里也不再赘述。For example, other implementations of the image processing method implemented by the processor 1301 executing the program stored in the memory 1303 are the same as the implementations mentioned in the foregoing method embodiments, and will not be repeated here.
例如,通信总线1304可以是外设部件互连标准(PCI)总线或扩展工业标准结构(EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。For example, communication bus 1304 may be a Peripheral Component Interconnect Standard (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
例如,通信接口1302用于实现电子设备与其他设备之间的通信。For example, the communication interface 1302 is used to implement communication between the electronic device and other devices.
例如,处理器1301和存储器1303可以设置在服务器端(或云端)。For example, the processor 1301 and the memory 1303 may be set at the server (or cloud).
例如,处理器1301可以控制电子设备中的其它组件以执行期望的功能。处理器1301可以是中央处理器(CPU)、网络处理器(NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。For example, the processor 1301 may control other components in the electronic device to perform desired functions. The processor 1301 can be a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. The central processing unit (CPU) may be an X86 or ARM architecture or the like.
例如,存储器1303可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器1301可以运行所述计算机可读指令,以实现电子设备的各种功能。在存储介质中还可以存储各种应用程序和各种数据等。For example, memory 1303 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read only memory (ROM), hard disks, erasable programmable read only memory (EPROM), compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer-readable instructions can be stored on the computer-readable storage medium, and the processor 1301 can execute the computer-readable instructions to realize various functions of the electronic device. Various application programs, various data, and the like can also be stored in the storage medium.
例如,关于电子设备执行图像处理的过程的详细说明可以参考图像处理方法的实施例中的相关描述,重复之处不再赘述。For example, for a detailed description of the process of image processing performed by the electronic device, reference may be made to relevant descriptions in the embodiments of the image processing method, and repeated descriptions will not be repeated.
图14为本公开至少一实施例提供的另一种电子设备的示意性框图。Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure.
本公开至少一实施例还提供另一种电子设备。如图14所示,电子设备1400可以包括存储器1401、处理器1402和图像获取部件1403。应当注意,图14所示的电子设备1400的组件只是示例性的,而非限制性的,根据实际应用需要,该电子设备1400还可以具有其他组件。At least one embodiment of the present disclosure also provides another electronic device. As shown in FIG. 14 , an electronic device 1400 may include a memory 1401 , a processor 1402 and an image acquisition component 1403 . It should be noted that the components of the electronic device 1400 shown in FIG. 14 are exemplary rather than limiting, and the electronic device 1400 may also have other components according to actual application requirements.
例如,图像获取部件1403用于获得初始图像。存储器1401用于存储初始图像以及计算机可读指令。处理器1402用于读取初始图像,并运行计算机可读指令。计算机可读指令被处理器1402运行时执行根据上述任一实施例所述的图像处理方法中的一个或多个步骤。For example, the image acquiring component 1403 is used to acquire an initial image. The memory 1401 is used to store initial images and computer readable instructions. Processor 1402 is used to read the initial image and execute computer readable instructions. When the computer-readable instructions are executed by the processor 1402, one or more steps in the image processing method according to any of the above-mentioned embodiments are executed.
例如,图像获取部件1403可以是图像采集装置,例如,图像获取部件1403可以是智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、 数码照相机的镜头、网络摄像头以及其它用于图像采集的装置。For example, the image acquisition component 1403 can be an image acquisition device, for example, the image acquisition component 1403 can be a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a network camera, and other devices for image acquisition .
例如,初始图像可以是图像获取部件1403直接采集到的原始图像,也可以是对原始图像进行预处理之后获得的图像。预处理可以消除原始图像中的无关信息或噪声信息,以便于更好地对图像进行处理。预处理例如可以包括对原始图像进行图像扩充(Data Augment)、图像缩放、伽玛(Gamma)校正、图像增强或降噪滤波等处理。For example, the initial image may be an original image directly collected by the image acquisition component 1403, or may be an image obtained after preprocessing the original image. Preprocessing can eliminate irrelevant information or noise information in the original image, so as to process the image better. Preprocessing may include, for example, performing image augmentation (Data Augment), image scaling, gamma (Gamma) correction, image enhancement, or noise reduction filtering on the original image.
例如,处理器1402可以控制电子设备1400中的其它组件以执行期望的功能。处理器1402可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器(GPU)等具有数据处理能力和/或程序执行能力的器件。For example, the processor 1402 may control other components in the electronic device 1400 to perform desired functions. The processor 1402 may be a central processing unit (CPU), a tensor processing unit (TPU), or a graphics processing unit (GPU), etc., which has data processing capabilities and/or program execution capabilities.
例如,存储器1401可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。在计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器1402可以运行计算机可读指令,以实现电子设备1400的各种功能。For example, memory 1401 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. One or more computer-readable instructions can be stored on the computer-readable storage medium, and the processor 1402 can execute the computer-readable instructions to implement various functions of the electronic device 1400 .
例如,关于电子设备1400执行图像处理的过程的详细说明可以参考图像处理方法的实施例中的相关描述,重复之处不再赘述。For example, for a detailed description of the process of image processing performed by the electronic device 1400, reference may be made to relevant descriptions in the embodiments of the image processing method, and repeated descriptions will not be repeated.
图15为本公开至少一实施例提供的一种计算机可读存储介质的示意图。例如,如图15所示,在存储介质1500上可以非暂时性地存储一个或多个计算机可读指令1501。例如,当计算机可读指令1501由处理器执行时可以执行根据上文所述的图像处理方法中的一个或多个步骤。Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 15 , one or more computer readable instructions 1501 may be stored non-transitory on storage medium 1500 . For example, when the computer-readable instructions 1501 are executed by the processor, one or more steps in the image processing method described above may be performed.
例如,该存储介质1500可以应用于上述电子设备1300和/或电子设备1400中,例如,其可以包括电子设备1300中的存储器1303和/或电子设备1400中的存储器1401。For example, the storage medium 1500 may be applied in the electronic device 1300 and/or the electronic device 1400 , for example, it may include the memory 1303 in the electronic device 1300 and/or the memory 1401 in the electronic device 1400 .
例如,关于存储介质1500的说明可以参考电子设备1300和/或电子设备1400的实施例中对于存储器的描述,重复之处不再赘述。For example, for the description of the storage medium 1500, reference may be made to the description of the memory in the embodiments of the electronic device 1300 and/or the electronic device 1400, and repeated descriptions will not be repeated.
图16为本公开至少一实施例提供的一种硬件环境的示意图。本公开提供的电子设备可以应用在互联网系统。Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure. The electronic device provided by the present disclosure can be applied in the Internet system.
利用图16中提供的计算机系统可以实现本公开中涉及的图像处理装 置、电子设备1300和/或电子设备1400。这类计算机系统可以包括个人电脑、笔记本电脑、平板电脑、手机、个人数码助理、智能眼镜、智能手表、智能指环、智能头盔及任何智能便携设备或可穿戴设备。本实施例中的特定系统利用功能框图解释了一个包含用户界面的硬件平台。这种计算机设备可以是一个通用目的的计算机设备,或一个有特定目的的计算机设备。两种计算机设备都可以被用于实现本实施例中的图像处理装置和电子设备。计算机系统可以实施当前描述的实现图像处理识别所需要的信息的任何组件。例如,计算机系统能够被计算机设备通过其硬件设备、软件程序、固件以及它们的组合所实现。为了方便起见,图16中只绘制了一台计算机设备,但是本实施例所描述的实现图像处理所需要的信息的相关计算机功能是可以以分布的方式、由一组相似的平台所实施的,分散计算机系统的处理负荷。The image processing apparatus, the electronic device 1300, and/or the electronic device 1400 involved in the present disclosure may be implemented using the computer system provided in FIG. 16 . Such computer systems can include personal computers, laptops, tablets, mobile phones, personal digital assistants, smart glasses, smart watches, smart rings, smart helmets, and any smart portable or wearable device. The specific system in this embodiment illustrates a hardware platform including a user interface using functional block diagrams. Such computer equipment may be a general purpose computer equipment or a special purpose computer equipment. Both computer devices can be used to realize the image processing apparatus and electronic devices in this embodiment. The computer system can implement any of the components of the presently described information needed to achieve image processing recognition. For example, a computer system can be realized by a computer device through its hardware devices, software programs, firmware, and combinations thereof. For the sake of convenience, only one computer device is drawn in Fig. 16, but the relevant computer functions for realizing the information required for image processing described in this embodiment can be implemented by a group of similar platforms in a distributed manner, Distribute the processing load of a computer system.
如图16所示,计算机系统可以包括通信端口1650,与之相连的是实现数据通信的网络,例如,计算机系统可以通过通信端口1650发送和接收信息及数据,即通信端口1650可以实现计算机系统与其他电子设备进行无线或有线通信以交换数据。计算机系统还可以包括一个处理器组1620(即上面描述的处理器),用于执行程序指令。处理器组1620可以由至少一个处理器(例如,CPU)组成。计算机系统可以包括一个内部通信总线1610。计算机系统可以包括不同形式的程序储存单元以及数据储存单元(即上面描述的存储器或存储介质),例如硬盘1670、只读存储器(ROM)1630、随机存取存储器(RAM)1640,能够用于存储计算机处理和/或通信使用的各种数据文件,以及处理器组1620所执行的可能的程序指令。计算机系统还可以包括一个输入/输出组件1660,输入/输出组件1660用于实现计算机系统与其他组件(例如,用户界面1680等)之间的输入/输出数据流。As shown in Figure 16, the computer system can include a communication port 1650, which is connected to a network for data communication. For example, the computer system can send and receive information and data through the communication port 1650, that is, the communication port 1650 can realize the communication between the computer system and the computer system. Other electronic devices communicate wirelessly or by wire to exchange data. The computer system may also include a processor group 1620 (ie, the processor described above) for executing program instructions. The processor group 1620 may consist of at least one processor (eg, CPU). The computer system may include an internal communication bus 1610 . A computer system may include different forms of program storage units and data storage units (i.e., memory or storage media described above), such as hard disk 1670, read-only memory (ROM) 1630, random-access memory (RAM) 1640, which can be used to store Various data files used by the computer for processing and/or communicating, and possibly program instructions executed by the processor group 1620 . The computer system may also include an input/output component 1660 for enabling input/output data flow between the computer system and other components (eg, user interface 1680, etc.).
通常,以下装置可以连接输入/输出组件1660:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置;包括例如磁带、硬盘等的存储装置;以及通信接口。Typically, the following devices can be connected to the input/output assembly 1660: input devices including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrator, etc. output devices; storage devices including, for example, magnetic tapes, hard disks, etc.; and communication interfaces.
虽然图16示出了具有各种装置的计算机系统,但应理解的是,并不要求计算机系统具备所有示出的装置,可以替代地,计算机系统可以具备更多或更少的装置。While FIG. 16 shows a computer system with various devices, it should be understood that the computer system is not required to have all of the devices shown and, instead, the computer system may have more or fewer devices.
对于本公开,还有以下几点需要说明:For this disclosure, the following points need to be explained:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。(1) The drawings of the embodiments of the present disclosure only relate to the structures involved in the embodiments of the present disclosure, and other structures may refer to general designs.
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。(2) In the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。The above description is only a specific implementation manner of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims (19)

  1. 一种图像处理方法,包括:An image processing method, comprising:
    获得初始图像,其中,所述初始图像包括至少一个目标对象;obtaining an initial image, wherein the initial image includes at least one target object;
    对所述初始图像进行处理以得到中间图像;processing the initial image to obtain an intermediate image;
    利用区域检测模型对所述中间图像进行识别,以得到包括M个对象连通区域的连通图像;Using a region detection model to identify the intermediate image to obtain a connected image including M object connected regions;
    确定在所述连通图像中与所述M个对象连通区域分别对应的M个包围框;determining M bounding boxes respectively corresponding to the connected regions of the M objects in the connected image;
    基于所述M个包围框,从所述初始图像中截取N个图像块,其中,每个所述图像块包括至少一个目标对象;以及Based on the M bounding boxes, intercepting N image blocks from the initial image, wherein each of the image blocks includes at least one target object; and
    利用对象识别模型识别所述N个图像块,以得到所述初始图像中的目标对象,identifying the N image blocks by using an object recognition model to obtain the target object in the initial image,
    其中,M和N均为正整数。Wherein, both M and N are positive integers.
  2. 根据权利要求1所述的方法,其中,利用区域检测模型对所述中间图像进行识别,以得到包括M个对象连通区域的连通图像,包括:The method according to claim 1, wherein, using a region detection model to identify the intermediate image, to obtain a connected image comprising M object connected regions, comprising:
    利用所述区域检测模型处理所述中间图像,得到包括多个初始对象连通区域的连通图像;processing the intermediate image by using the region detection model to obtain a connected image including a plurality of connected regions of initial objects;
    对所述包括多个初始对象连通区域的连通图像进行形态学变换,以基于所述包括多个初始对象连通区域的连通图像得到所述包括M个对象连通区域的连通图像。performing a morphological transformation on the connected image including the multiple initial object connected regions, so as to obtain the connected image including the M object connected regions based on the connected image including the multiple initial object connected regions.
  3. 根据权利要求2所述的方法,其中,对所述初始图像进行处理以得到中间图像包括:The method according to claim 2, wherein processing the initial image to obtain an intermediate image comprises:
    将所述初始图像的尺寸由初始尺寸缩小至预定尺寸;reducing the size of the initial image from an initial size to a predetermined size;
    对所述预定尺寸的初始图像进行二值化处理,得到所述中间图像。Perform binarization processing on the initial image of the predetermined size to obtain the intermediate image.
  4. 根据权利要求2所述的方法,其中,确定在所述连通图像中与所述M个对象连通区域分别对应的M个包围框,包括:The method according to claim 2, wherein determining M bounding boxes respectively corresponding to the M object connected regions in the connected image comprises:
    提取所述M个对象连通区域各自的轮廓信息;extracting contour information of each of the M object connected regions;
    基于所述轮廓信息,确定所述M个对象连通区域各自的包围框。Based on the contour information, the respective bounding boxes of the M object connected regions are determined.
  5. 根据权利要求1所述的方法,其中,基于所述M个包围框,从所述初始图像中截取N个图像块,包括:The method according to claim 1, wherein, based on the M bounding boxes, intercepting N image blocks from the initial image comprises:
    根据所述中间图像和所述初始图像之间的对应关系,基于所述M个包围框中的每个包围框,对应截取所述初始图像中的一个图像块,其中,M与N相等;或者According to the correspondence between the intermediate image and the initial image, based on each of the M bounding boxes, correspondingly intercepting an image block in the initial image, where M is equal to N; or
    对所述M个包围框进行预定处理,得到N个处理后的包围框,并根据所述中间图像和所述初始图像之间的对应关系,基于每个所述处理后的包围框,对应截取所述初始图像中的一个图像块。performing predetermined processing on the M bounding boxes to obtain N processed bounding boxes, and according to the correspondence between the intermediate image and the initial image, based on each of the processed bounding boxes, correspondingly intercepting An image patch in the original image.
  6. 根据权利要求5所述的方法,其中,对所述M个包围框进行预定处理,包括:The method according to claim 5, wherein performing predetermined processing on the M bounding boxes includes:
    对所述M个包围框进行评分,以得到所述M个包围框分别对应的质量分值;Scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes;
    将质量分值小于分值阈值的包围框作为无效包围框,并删除所述无效包围框。A bounding box whose quality score is smaller than the score threshold is regarded as an invalid bounding box, and the invalid bounding box is deleted.
  7. 根据权利要求6所述的方法,其中,对所述M个包围框进行评分包括:针对所述M个包围框中的每个包围框执行以下操作:The method according to claim 6, wherein scoring the M bounding boxes comprises: performing the following operations for each of the M bounding boxes:
    确定所述包围框的面积和位于所述包围框中的目标对象对应的像素的面积;determining the area of the bounding box and the area of pixels corresponding to the target object located in the bounding box;
    基于所述像素的面积与所述包围框的面积的比例,确定所述包围框对应的质量分值。Based on the ratio of the area of the pixel to the area of the bounding box, the quality score corresponding to the bounding box is determined.
  8. 根据权利要求5所述的方法,其中,对所述M个包围框进行预定处理,包括:The method according to claim 5, wherein performing predetermined processing on the M bounding boxes includes:
    将所述M个包围框中的一个或多个包围框放大第一预定倍数。Enlarge one or more bounding boxes in the M bounding boxes by a first predetermined factor.
  9. 根据权利要求6-8任一项所述的方法,其中,对所述M个包围框进行预定处理,还包括:The method according to any one of claims 6-8, wherein performing predetermined processing on the M bounding boxes further includes:
    检测所述M个包围框中每相邻两个包围框之间是否至少部分区域重叠,Detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes,
    若是,将至少部分区域重叠的两个包围框中的每个包围框基于第二预定 倍数进行缩小处理,以使得缩小后的两个包围框不重叠或者重叠区域减小。If so, each of the two bounding boxes with at least partial overlapping areas is reduced based on a second predetermined multiple, so that the reduced two bounding boxes do not overlap or the overlapping area is reduced.
  10. 根据权利要求1-6任一项所述的方法,其中,利用对象识别模型识别所述N个图像块,以得到所述初始图像中的目标对象,包括:The method according to any one of claims 1-6, wherein using an object recognition model to identify the N image blocks to obtain the target object in the initial image comprises:
    确定所述N个图像块中在第一方向上的长度大于识别长度阈值的P个第一图像块,并将每个所述第一图像块分割为至少两个子图像块,以得到与所述P个第一图像块对应的多个子图像块,其中,每个所述子图像块的长度等于或小于所述识别长度阈值;以及Determining P first image blocks whose length in the first direction is greater than the recognition length threshold among the N image blocks, and dividing each of the first image blocks into at least two sub-image blocks, so as to obtain the A plurality of sub-image blocks corresponding to the P first image blocks, wherein the length of each of the sub-image blocks is equal to or less than the recognition length threshold; and
    利用所述对象识别模型识别所述多个子图像块,以得到所述P个第一图像块中的目标对象,identifying the plurality of sub-image blocks by using the object recognition model to obtain target objects in the P first image blocks,
    其中,所述初始图像中的目标对象包括所述P个第一图像块中的目标对象,P为正整数。Wherein, the target object in the initial image includes the target object in the P first image blocks, and P is a positive integer.
  11. 根据权利要求10所述的方法,其中,利用对象识别模型识别所述N个图像块,以得到所述初始图像中的目标对象,还包括:The method according to claim 10, wherein, using an object recognition model to identify the N image blocks to obtain the target object in the initial image, further comprising:
    确定所述N个图像块中在所述第一方向上的长度小于所述识别长度阈值的Q个第二图像块,并对每个所述第二图像块进行处理,得到Q个处理后的第二图像块,其中,每个所述处理后的第二图像块在所述第一方向上的长度为所述识别长度阈值;Determining Q second image blocks whose length in the first direction is less than the recognition length threshold among the N image blocks, and processing each of the second image blocks to obtain Q processed The second image block, wherein the length of each processed second image block in the first direction is the recognition length threshold;
    利用所述对象识别模型识别所述Q个处理后的第二图像块,以得到所述Q个第二图像块中的目标对象,identifying the Q processed second image blocks by using the object recognition model, so as to obtain target objects in the Q second image blocks,
    其中,所述初始图像中的目标对象还包括所述Q个第二图像块中的目标对象,Q为正整数。Wherein, the target object in the initial image also includes the target object in the Q second image blocks, and Q is a positive integer.
  12. 根据权利要求10所述的方法,其中,将每个所述第一图像块分割为至少两个子图像块包括:The method according to claim 10, wherein dividing each of the first image blocks into at least two sub-image blocks comprises:
    针对所述N个图像块中的第i个第一图像块执行以下操作:The following operations are performed on the i-th first image block among the N image blocks:
    在所述第一方向上,每间隔所述识别长度阈值设置一个候选分割点,以确定所述第i个第一图像块对应的至少一个候选分割点;In the first direction, setting a candidate segmentation point at intervals of the identification length threshold to determine at least one candidate segmentation point corresponding to the ith first image block;
    基于所述至少一个候选分割点,确定所述第i个第一图像块对应的至少一个分割点;Based on the at least one candidate segmentation point, determine at least one segmentation point corresponding to the ith first image block;
    基于所述至少一个分割点,将所述第i个第一图像块分割为至少两个子图像块,based on the at least one segmentation point, dividing the ith first image block into at least two sub-image blocks,
    其中,i为小于等于P的正整数。Wherein, i is a positive integer less than or equal to P.
  13. 根据权利要求12所述的方法,其中,基于所述至少一个候选分割点,确定所述第i个第一图像块对应的至少一个分割点,包括:The method according to claim 12, wherein, based on the at least one candidate segmentation point, determining at least one segmentation point corresponding to the ith first image block comprises:
    若在所述第i个第一图像块中的所述至少一个候选分割点中的任一候选分割点的距离阈值的范围内包含间隔区域,则将所述间隔区域中的一点作为所述第i个第一图像块对应的一个分割点;If an interval area is included within the range of the distance threshold of any candidate segmentation point in the at least one candidate segmentation point in the ith first image block, then a point in the interval area is used as the first A segmentation point corresponding to the i first image block;
    若在所述第i个第一图像块中的所述至少一个候选分割点中的任一候选分割点的所述距离阈值的范围内不包含间隔区域,则将所述任一候选分割点作为所述第i个第一图像块对应的一个分割点。If the interval region is not included in the range of the distance threshold of any candidate segmentation point in the at least one candidate segmentation point in the ith first image block, then the any candidate segmentation point is used as A segmentation point corresponding to the ith first image block.
  14. 根据权利要求11所述的方法,其中,对每个所述第二图像块进行处理,包括:The method according to claim 11, wherein processing each of the second image blocks comprises:
    在所述第一方向上,在每个所述第二图像块的至少一端拼接端部图像块,以得到每个所述第二图像块对应的处理后的第二图像块,其中,所述端部图像块中的每个像素的像素值与所述第二图像块中的每个对象对应的像素的像素值不同。In the first direction, at least one end of each of the second image blocks is spliced to obtain a processed second image block corresponding to each of the second image blocks, wherein the The pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
  15. 根据权利要求10所述的方法,其中,每个所述第一图像块包括多个目标对象,所述多个目标对象沿所述第一方向依次排列。The method according to claim 10, wherein each of the first image blocks includes a plurality of target objects, and the plurality of target objects are arranged in sequence along the first direction.
  16. 根据权利要求1-6任一项所述的方法,其中,所述至少一个目标对象包括字符。The method according to any one of claims 1-6, wherein said at least one target object comprises a character.
  17. 一种图像处理装置,包括:An image processing device, comprising:
    图像获取模块,配置为获得初始图像,其中,所述初始图像包括至少一个目标对象;An image acquisition module configured to acquire an initial image, wherein the initial image includes at least one target object;
    图像处理模块,配置为对所述初始图像进行处理以得到中间图像;an image processing module configured to process the initial image to obtain an intermediate image;
    区域识别模块,配置为利用区域检测模型对所述中间图像进行识别,以得到包括M个对象连通区域的连通图像;An area recognition module configured to use an area detection model to identify the intermediate image, so as to obtain a connected image including M object connected areas;
    确定模块,配置为在所述连通图像中确定与所述M个对象连通区域分别 对应的M个包围框;A determination module configured to determine M bounding boxes corresponding to the connected regions of the M objects in the connected image;
    截取模块,配置为基于所述M个包围框,从所述初始图像中截取N个图像块,其中,每个所述图像块包括至少一个目标对象;以及An intercepting module configured to intercept N image blocks from the initial image based on the M bounding boxes, wherein each of the image blocks includes at least one target object; and
    对象识别模块,配置为利用对象识别模型识别所述N个图像块,以得到所述初始图像中的目标对象,an object recognition module configured to use an object recognition model to recognize the N image blocks to obtain the target object in the initial image,
    其中,M和N均为正整数。Wherein, both M and N are positive integers.
  18. 一种电子设备,包括:An electronic device comprising:
    处理器;processor;
    存储器,存储一个或多个计算机程序模块;memory for storing one or more computer program modules;
    其中,所述一个或多个计算机程序模块被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于实现权利要求1-16任一项所述的图像处理方法的指令。Wherein, the one or more computer program modules are configured to be executed by the processor, and the one or more computer program modules include instructions for realizing the image processing method according to any one of claims 1-16 .
  19. 一种计算机可读存储介质,用于非暂时性存储计算机可读指令,当所述计算机可读指令由计算机执行时可以实现权利要求1-16任一项所述的图像处理方法。A computer-readable storage medium, used for non-transitory storage of computer-readable instructions, when the computer-readable instructions are executed by a computer, the image processing method described in any one of claims 1-16 can be realized.
PCT/CN2022/100269 2021-07-13 2022-06-22 Image processing method and apparatus, device, and storage medium WO2023284502A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110788327.XA CN113486828B (en) 2021-07-13 Image processing method, device, equipment and storage medium
CN202110788327.X 2021-07-13

Publications (1)

Publication Number Publication Date
WO2023284502A1 true WO2023284502A1 (en) 2023-01-19

Family

ID=77938189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100269 WO2023284502A1 (en) 2021-07-13 2022-06-22 Image processing method and apparatus, device, and storage medium

Country Status (1)

Country Link
WO (1) WO2023284502A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189194A (en) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 Drawing enhancement segmentation method for engineering modeling
CN116204105A (en) * 2023-05-05 2023-06-02 北京睿企信息科技有限公司 Processing system for associated image presentation
CN117409428A (en) * 2023-12-13 2024-01-16 南昌理工学院 Test paper information processing method, system, computer and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140072219A1 (en) * 2012-09-08 2014-03-13 Konica Minolta Laboratory U.S.A., Inc. Document image binarization and segmentation using image phase congruency
CN110348449A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of identity card character recognition method neural network based
CN111860479A (en) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 Optical character recognition method, device, electronic equipment and storage medium
CN112464931A (en) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 Text detection method, model training method and related equipment
CN112560847A (en) * 2020-12-25 2021-03-26 中国建设银行股份有限公司 Image text region positioning method and device, storage medium and electronic equipment
CN113486828A (en) * 2021-07-13 2021-10-08 杭州睿胜软件有限公司 Image processing method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140072219A1 (en) * 2012-09-08 2014-03-13 Konica Minolta Laboratory U.S.A., Inc. Document image binarization and segmentation using image phase congruency
CN110348449A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of identity card character recognition method neural network based
CN111860479A (en) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 Optical character recognition method, device, electronic equipment and storage medium
CN112464931A (en) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 Text detection method, model training method and related equipment
CN112560847A (en) * 2020-12-25 2021-03-26 中国建设银行股份有限公司 Image text region positioning method and device, storage medium and electronic equipment
CN113486828A (en) * 2021-07-13 2021-10-08 杭州睿胜软件有限公司 Image processing method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189194A (en) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 Drawing enhancement segmentation method for engineering modeling
CN116204105A (en) * 2023-05-05 2023-06-02 北京睿企信息科技有限公司 Processing system for associated image presentation
CN117409428A (en) * 2023-12-13 2024-01-16 南昌理工学院 Test paper information processing method, system, computer and storage medium
CN117409428B (en) * 2023-12-13 2024-03-01 南昌理工学院 Test paper information processing method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113486828A (en) 2021-10-08

Similar Documents

Publication Publication Date Title
WO2023284502A1 (en) Image processing method and apparatus, device, and storage medium
WO2021147631A1 (en) Handwritten content removing method and device and storage medium
WO2021233266A1 (en) Edge detection method and apparatus, and electronic device and storage medium
US20230222631A1 (en) Method and device for removing handwritten content from text image, and storage medium
EP3940589B1 (en) Layout analysis method, electronic device and computer program product
US8755595B1 (en) Automatic extraction of character ground truth data from images
CN110647882A (en) Image correction method, device, equipment and storage medium
CN111259878A (en) Method and equipment for detecting text
CN110942004A (en) Handwriting recognition method and device based on neural network model and electronic equipment
US20220092325A1 (en) Image processing method and device, electronic apparatus and storage medium
US10169650B1 (en) Identification of emphasized text in electronic documents
CN112926421B (en) Image processing method and device, electronic equipment and storage medium
WO2021051553A1 (en) Certificate information classification and positioning method and apparatus
CN113223025A (en) Image processing method and device, and neural network training method and device
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
WO2022166707A1 (en) Image processing method and apparatus, electronic device, and storage medium
WO2022002002A1 (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN110598703A (en) OCR (optical character recognition) method and device based on deep neural network
CN112597940B (en) Certificate image recognition method and device and storage medium
CN113486828B (en) Image processing method, device, equipment and storage medium
US11367296B2 (en) Layout analysis
CN112150394B (en) Image processing method and device, electronic equipment and storage medium
CN114241486A (en) Method for improving accuracy rate of identifying student information of test paper
CN113449686A (en) Identification method, device, equipment and medium for identity card counterfeiting
CN111401365A (en) OCR image automatic generation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22841142

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE