CN115578729B - AI intelligent process arrangement method for digital staff - Google Patents
AI intelligent process arrangement method for digital staff Download PDFInfo
- Publication number
- CN115578729B CN115578729B CN202211457579.5A CN202211457579A CN115578729B CN 115578729 B CN115578729 B CN 115578729B CN 202211457579 A CN202211457579 A CN 202211457579A CN 115578729 B CN115578729 B CN 115578729B
- Authority
- CN
- China
- Prior art keywords
- image
- grayscale
- rotated
- value
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000008569 process Effects 0.000 title claims abstract description 40
- 230000009466 transformation Effects 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 12
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 6
- 238000012937 correction Methods 0.000 claims description 24
- 239000013598 vector Substances 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000012015 optical character recognition Methods 0.000 description 5
- 230000000717 retained effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
- G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1463—Orientation detection or correction, e.g. rotation of multiples of 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/1607—Correcting image deformation, e.g. trapezoidal deformation caused by perspective
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/18105—Extraction of features or characteristics of the image related to colour
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
本发明公开了数字员工AI智能流程编排方法,包括:采集带有流程信息的纸质文件的原始图像,对原始图像进行差异灰度化,得到若干差异灰度化图像;对差异灰度化图像进行预设角度的旋转,得到若干旋转灰度化图像;对旋转灰度化图像进行膨胀,利用霍夫变换检测膨胀后每行字符形成的字符直线,得到字符走向图;根据字符走向图对膨胀前的旋转灰度化图像进行透视变换,得到矫正图;提取矫正图中的箭头标识,以箭头标识作为辅助信息对矫正图进行仿射变换,旋转得到还原图,将还原图二值化后输入至字符识别模块进行识别,依次提取流程信息完成编排。本发明可以得到准确的文字走向,避免特殊角度等原因导致的识别错误,有利于提高处理速度和准确性。
The invention discloses a digital employee AI intelligent process layout method, comprising: collecting the original image of a paper document with process information, performing difference grayscale on the original image, and obtaining several difference grayscale images; Rotate at a preset angle to obtain several rotated grayscale images; expand the rotated grayscale images, use Hough transform to detect the character line formed by each line of characters after expansion, and obtain a character trend map; according to the character trend map, expand Perform perspective transformation on the previously rotated grayscale image to obtain the rectified image; extract the arrow logo in the rectified image, use the arrow logo as auxiliary information to perform affine transformation on the rectified image, rotate to obtain the restored image, and input the restored image after binarization Go to the character recognition module for recognition, and sequentially extract process information to complete the arrangement. The invention can obtain accurate character direction, avoid recognition errors caused by special angles and the like, and is beneficial to improve processing speed and accuracy.
Description
技术领域technical field
本发明涉及数据处理技术领域,特别涉及数字员工AI智能流程编排方法。The invention relates to the technical field of data processing, in particular to a digital employee AI intelligent process arrangement method.
背景技术Background technique
当前,纸质文件所记载的流程信息,如果采用人工输入至计算机,则效率较低,因此图像识别是一种常见的解决方案。其中,OCR(Optical Character Recognition,光学字符识别)是指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机数据的过程。是针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。衡量一个OCR系统性能好坏的主要指标有:拒识率、误识率、识别速度、产品的稳定性、易用性等。RPA数字员工将传统字符识别与机器学习进行深度融合,可以从非标准文档中解析数据,有助于将手写文本字符转换为机器可读的格式。多数情况下,OCR主要用于简化纸质业务并将其转化成数字化业务,例如:PDF、扫描文件、纸质发票、传真和手写文档等。At present, if the process information recorded in paper documents is manually entered into the computer, the efficiency is low, so image recognition is a common solution. Among them, OCR (Optical Character Recognition, Optical Character Recognition) refers to the process in which electronic equipment checks characters printed on paper, determines their shapes by detecting dark and bright patterns, and then uses character recognition methods to translate the shapes into computer data. For printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software for further editing and processing by the word processing software. The main indicators to measure the performance of an OCR system are: rejection rate, false recognition rate, recognition speed, product stability, ease of use, etc. RPA digital workers deeply integrate traditional character recognition with machine learning, can parse data from non-standard documents, and help convert handwritten text characters into machine-readable formats. In most cases, OCR is mainly used to simplify paper business and convert it into digital business, such as: PDF, scanned documents, paper invoices, fax and handwritten documents, etc.
但对于纸质文件的识别来说,由于文件摆放可能不整齐或角度不对正,可能实际拍摄到的字符方向会出现变化,同时个别文件包含表格或流程图,常规的字符方向判断无法使用,因此现有技术针对这一问题,特别是角度差距较大时,会出现难以准确识别的情况。But for the identification of paper documents, because the documents may not be neatly placed or the angles are not aligned, the direction of the characters actually photographed may change. At the same time, some documents contain tables or flow charts, and the conventional character direction judgment cannot be used. Therefore, the prior art aims at this problem, especially when the angle difference is large, it may be difficult to identify accurately.
发明内容Contents of the invention
针对现有技术在纸质文件识别过程中难以判断字符角度或方向的问题,本发明提供了数字员工AI智能流程编排方法,主要针对文字识别的前期处理,自动修正字符的角度和方向,避免因特殊的表格或流程图等原因导致识别错误或识别失败,有利于提高处理的速度和准确性,得到的结果准确清晰便于后续识别。Aiming at the problem that it is difficult to judge the angle or direction of characters in the paper document recognition process in the prior art, the present invention provides a digital employee AI intelligent process layout method, which is mainly aimed at the pre-processing of character recognition, and automatically corrects the angle and direction of characters, avoiding the Recognition errors or recognition failures are caused by special forms or flow charts, which is conducive to improving the speed and accuracy of processing, and the obtained results are accurate and clear for subsequent recognition.
以下是本发明的技术方案。The following is the technical scheme of the present invention.
数字员工AI智能流程编排方法,包括以下步骤:The digital employee AI intelligent process orchestration method includes the following steps:
S1:采集带有流程信息的纸质文件的原始图像,对原始图像进行差异灰度化,得到若干差异灰度化图像;S1: Collect the original image of the paper document with process information, perform differential grayscale conversion on the original image, and obtain several differential grayscale images;
S2:对差异灰度化图像进行预设角度的旋转,得到若干旋转灰度化图像;S2: Rotate the difference grayscaled images at a preset angle to obtain several rotated grayscaled images;
S3:对旋转灰度化图像进行膨胀,利用霍夫变换检测膨胀后每行字符形成的字符直线,得到字符走向图;S3: Inflate the rotated grayscale image, use the Hough transform to detect the character line formed by each line of characters after the expansion, and obtain the character direction map;
S4:根据字符走向图对膨胀前的旋转灰度化图像进行透视变换,得到矫正图;S4: Perform perspective transformation on the rotated grayscale image before expansion according to the character direction diagram to obtain a correction diagram;
S5:提取矫正图中的箭头标识,以箭头标识作为辅助信息对矫正图进行仿射变换,旋转得到还原图,将还原图二值化后输入至字符识别模块进行识别,依次提取流程信息完成编排。S5: Extract the arrow logo in the rectified image, use the arrow logo as auxiliary information to perform affine transformation on the rectified image, rotate to obtain the restored image, binarize the restored image and input it to the character recognition module for recognition, and sequentially extract process information to complete the arrangement .
本发明通过差异灰度化可以防止单一灰度化可能出现的图像不清楚的问题,通过预设角度的旋转,可以保证出现至少一张与摆正角度较小的图像,以减少后续变换过程出错的概率,最后通过一系列变换并借助箭头表示识别流程信息,可以避免因特殊的表格或流程图等原因导致识别错误或识别失败,有利于提高处理的速度和准确性。The present invention can prevent the problem of unclear images that may occur in single grayscale through differential grayscale, and through the rotation of the preset angle, it can ensure that at least one image with a smaller angle than the straightening angle appears, so as to reduce errors in the subsequent transformation process Finally, through a series of transformations and using arrows to indicate the identification process information, it can avoid identification errors or identification failures caused by special forms or flow charts, which is conducive to improving the speed and accuracy of processing.
作为优选,所述对原始图像进行差异灰度化,包括:As preferably, said performing differential grayscale on the original image includes:
对原始图像以RGB值的平均值作为灰度值进行平均值灰度化,得到平均值灰度化图像;The original image is grayscaled with the average value of the RGB value as the grayscale value to obtain an average grayscaled image;
对原始图像以RGB值中的最大值作为灰度值进行最大值灰度化,得到最大值灰度化图像;The original image is grayscaled with the maximum value in the RGB value as the grayscale value to obtain the maximum grayscaled image;
对原始图像以RGB值结合预设权重进行加权平均灰度化,得到加权平均灰度化图像。The weighted average grayscale is performed on the original image with the RGB value combined with the preset weight to obtain a weighted average grayscale image.
作为优选,所述预设权重的获取过程,包括:As a preference, the acquisition process of the preset weight includes:
计算原始图像中,R值大于临界值的像素与总像素之比,得到第一比值,计算G值大于临界值的像素与总像素之比,得到第二比值,计算B值大于临界值的像素与总像素之比,得到第三比值;Calculate the ratio of the pixels whose R value is greater than the critical value to the total pixels in the original image to obtain the first ratio, calculate the ratio of the pixels whose G value is greater than the critical value to the total pixels to obtain the second ratio, and calculate the pixels whose B value is greater than the critical value and the ratio of the total pixels to obtain the third ratio;
根据第一比值、第二比值、第三比值的大小,等比例确定RGB每个值的预设权重。According to the magnitude of the first ratio, the second ratio, and the third ratio, the preset weight of each value of RGB is determined in equal proportion.
本方案中,以R值大于临界值的像素与总像素之比为例,第一比值越大,表示图像整体上R值的色彩占比越大,对于图像的影响程度就越大,因此等比例确定RGB每个值的预设权重时,R值得到的权重就越大,反之比值越小则得到的权重就越小;该方式可以强化图像色彩特点所带来的差异,特别适用于文字识别类的图像处理任务,因为相比一般的图像,以文字为主的图像中,文字与背景的色彩参数之间通常有明显断档,通过这一方式可以放大参数断档所带来的差异。临界值一般设置为128左右,可以根据实际需要调整。In this solution, take the ratio of pixels whose R value is greater than the critical value to the total pixels as an example. The larger the first ratio, the larger the proportion of the color of the R value in the image as a whole, and the greater the impact on the image. Therefore, etc. When the ratio determines the preset weight of each value of RGB, the greater the weight obtained by the R value, the smaller the ratio, the smaller the weight obtained; this method can strengthen the difference brought about by the color characteristics of the image, especially suitable for text For recognition image processing tasks, compared with general images, in text-based images, there is usually an obvious gap between the color parameters of the text and the background. This method can amplify the difference caused by the parameter gap. The threshold value is generally set to about 128, which can be adjusted according to actual needs.
作为优选,所述对差异灰度化图像进行预设角度的旋转,得到若干旋转灰度化图像,包括:As a preference, said rotating the difference gray-scaled image at a preset angle to obtain several rotated gray-scaled images, including:
设置预设角度为-90度、90度、180度,每幅差异灰度化图像依次选取一个预设角度,进行旋转得到若干旋转灰度化图像。一般来说,不确定角度的图像与期望的摆正角的夹角小于45度时识别起来最容易,但事实上图像可能存在平放、倒放的情况,会严重增加识别难度,因此通过上述旋转,必然可以得到至少一个与摆正角的夹角小于45度的图像,识别准确的概率增加,利于文字识别。Set the preset angles to -90 degrees, 90 degrees, and 180 degrees, select a preset angle for each difference grayscale image in turn, and rotate to obtain several rotated grayscale images. Generally speaking, it is easiest to recognize when the angle between an image with an uncertain angle and the expected straightening angle is less than 45 degrees, but in fact the image may be placed flat or upside down, which will seriously increase the difficulty of recognition. Therefore, through the above By rotating, at least one image with an included angle of less than 45 degrees with the straight angle can be obtained, and the probability of accurate recognition increases, which is beneficial to character recognition.
作为优选,所述根据字符走向图对膨胀前的旋转灰度化图像进行透视变换,得到矫正图,包括:Preferably, the perspective transformation is performed on the rotated grayscale image before expansion according to the character direction diagram to obtain a rectification diagram, including:
以字符走向图中的任意一条字符直线为基准直线,局部拉伸或压缩膨胀前的旋转灰度化图像的像素,以使其余字符直线均与基准直线平行,得到矫正图。Taking any character line in the character direction graph as the reference line, the pixels of the rotated grayscale image before expansion are partially stretched or compressed, so that the rest of the character lines are parallel to the reference line, and a correction map is obtained.
作为优选,所述提取矫正图中的箭头标识,以箭头标识作为辅助信息对矫正图进行仿射变换,旋转得到还原图,包括:Preferably, said extracting the arrow mark in the rectified image, using the arrow mark as auxiliary information to perform affine transformation on the rectified image, and rotating to obtain the restored image, includes:
判断同一矫正图中箭头标识的方向,得到若干单位矢量,计算若干单位矢量的总矢量,判断总矢量的指向方向(x,y);Judging the direction marked by the arrow in the same correction diagram, obtaining several unit vectors, calculating the total vector of several unit vectors, and judging the direction (x, y) of the total vector;
旋转矫正图,直至矫正图中字符直线处在水平位置,且总矢量的指向方向(x,y)中y小于等于0,得到候选图;Rotate the correction map until the character line in the correction map is in the horizontal position, and y is less than or equal to 0 in the pointing direction (x, y) of the total vector, and the candidate map is obtained;
根据候选图相对于原始图像的实际旋转角进行筛选,保留至少一张合格的候选图作为还原图。According to the actual rotation angle of the candidate image relative to the original image, at least one qualified candidate image is retained as the restored image.
本方案对于带有箭头的流程图的识别进行了针对性优化,流程图一般整体从上到下的形式,但由于局部分叉箭头方向并不一致,因此这里根据总矢量的指向方向进行判断,当旋转之后y小于等于0,则表示总矢量具有向下的分量,不论其是往左偏还是往右偏,均符合条件。该步骤可以过滤掉旋转后倒置的图像。This solution optimizes the recognition of flow charts with arrows. Generally, the flow charts are from top to bottom as a whole. However, because the direction of the arrows of partial forks is not consistent, the judgment is based on the direction of the total vector. When If y is less than or equal to 0 after rotation, it means that the total vector has a downward component, regardless of whether it is biased to the left or to the right, it meets the conditions. This step filters out rotated and inverted images.
作为优选,所述根据候选图相对于原始图像的实际旋转角进行筛选,保留至少一张合格的候选图作为还原图,包括:Preferably, the screening is carried out according to the actual rotation angle of the candidate image relative to the original image, and at least one qualified candidate image is retained as the restored image, including:
判断同一原始图像处理得到的不同候选图相对于原始图像的实际旋转角,计算每个实际旋转角的数值分布,保留数值差值在10%以内的实际旋转角,删除其余的实际旋转角所对应的候选图,剩下的候选图作为还原图。虽然字符直线处在水平位置,但不排除图像被处理成倒置的情况,在引入箭头判断的基础上可以减少这种可能发生,再通过进一步筛选可以基本上去除这一可能。Judging the actual rotation angles of different candidate images obtained by processing the same original image relative to the original image, calculating the numerical distribution of each actual rotation angle, retaining the actual rotation angles whose numerical difference is within 10%, and deleting the remaining actual rotation angles candidate graphs, and the remaining candidate graphs are used as restored graphs. Although the straight line of the character is in a horizontal position, it is not ruled out that the image is processed as an inversion. This possibility can be reduced on the basis of introducing the arrow judgment, and this possibility can be basically eliminated by further screening.
作为优选,所述实际旋转角的计算过程,包括:As a preference, the calculation process of the actual rotation angle includes:
记录每幅旋转灰度化图像所旋转的预设角度p;Record the preset angle p rotated by each rotated grayscale image;
记录矫正图旋转得到候选图时的旋转角度q;Record the rotation angle q when the correction image is rotated to obtain the candidate image;
实际旋转角C=q+p,其中顺时针旋转记为正,逆时针旋转记为负。The actual rotation angle C=q+p, where clockwise rotation is recorded as positive, and counterclockwise rotation is recorded as negative.
本发明的实质性效果包括:利用数字员工对待识别图像进行AI文字识别,通过差异灰度化可以得到若干突出不同颜色特点的灰度图,便于得到特征最清晰的结果;通过预设角度的旋转可以至少得到一个与摆正角的夹角小于45度的图像,识别准确的概率增加;通过对箭头方向的整体判断,辅助矫正过程;通过上述各步骤的层层递进,相互之间共同作用,可以逐渐减少角度、方向的错误情况,增加矫正成功率,最终准确判断文字方向,不会出现文字倒置的情况,适用于流程图的初期识别。The substantive effects of the present invention include: using digital employees to perform AI text recognition on the image to be recognized, and obtaining a number of grayscale images highlighting different color characteristics through differential grayscale, which is convenient to obtain the most clear results; through the rotation of the preset angle At least one image with an angle of less than 45 degrees to the straight angle can be obtained, and the probability of accurate recognition increases; through the overall judgment of the direction of the arrow, the correction process is assisted; through the progressive steps of the above steps, they interact with each other , can gradually reduce the error of angle and direction, increase the success rate of correction, and finally accurately judge the direction of the text, and there will be no inversion of the text, which is suitable for the initial recognition of the flow chart.
附图说明Description of drawings
图1是本发明实施例的流程图。Fig. 1 is a flowchart of an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合实施例,对本技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions will be clearly and completely described below in conjunction with the embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. the embodiment. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
应当理解,在本发明的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that in various embodiments of the present invention, the sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the implementation order of the embodiments of the present invention. The implementation process constitutes no limitation.
应当理解,在本发明中,“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to Those steps or elements are not explicitly listed, but may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
应当理解,在本发明中,“多个”是指两个或两个以上。“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“包含A、B和C”、“包含A、B、C”是指A、B、C三者都包含,“包含A、B或C”是指包含A、B、C三者之一,“包含A、B和/或C”是指包含A、B、C三者中任1个或任2个或3个。It should be understood that in the present invention, "plurality" means two or more. "And/or" is just an association relationship describing associated objects, which means that there can be three kinds of relationships, for example, and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone. . The character "/" generally indicates that the contextual objects are an "or" relationship. "Includes A, B and C", "Includes A, B, C" means that A, B, and C are all included, "includes A, B, or C" means includes one of A, B, and C, "Containing A, B and/or C" means containing any 1 or any 2 or 3 of A, B and C.
下面以具体的实施例对本发明的技术方案进行详细说明。实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present invention will be described in detail below with specific examples. The embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
实施例:Example:
数字员工AI智能流程编排方法,如图1所示,包括以下步骤:The digital employee AI intelligent process orchestration method, as shown in Figure 1, includes the following steps:
S1:采集带有流程信息的纸质文件的原始图像,对原始图像进行差异灰度化,得到若干差异灰度化图像。包括:S1: Collect the original image of the paper document with process information, perform differential grayscale conversion on the original image, and obtain several differential grayscale images. include:
对原始图像以RGB值的平均值作为灰度值进行平均值灰度化,得到平均值灰度化图像;The original image is grayscaled with the average value of the RGB values as the grayscale value to obtain an average grayscaled image;
对原始图像以RGB值中的最大值作为灰度值进行最大值灰度化,得到最大值灰度化图像;The original image is grayscaled with the maximum value in the RGB value as the grayscale value to obtain the maximum grayscaled image;
对原始图像以RGB值结合预设权重进行加权平均灰度化,得到加权平均灰度化图像。The weighted average grayscale is performed on the original image with the RGB value combined with the preset weight to obtain a weighted average grayscale image.
其中,预设权重的获取过程,包括:Among them, the acquisition process of the preset weight includes:
计算原始图像中,R值大于临界值的像素与总像素之比,得到第一比值,计算G值大于临界值的像素与总像素之比,得到第二比值,计算B值大于临界值的像素与总像素之比,得到第三比值;Calculate the ratio of the pixels whose R value is greater than the critical value to the total pixels in the original image to obtain the first ratio, calculate the ratio of the pixels whose G value is greater than the critical value to the total pixels to obtain the second ratio, and calculate the pixels whose B value is greater than the critical value and the ratio of the total pixels to obtain the third ratio;
根据第一比值、第二比值、第三比值的大小,等比例确定RGB每个值的预设权重。According to the magnitude of the first ratio, the second ratio, and the third ratio, the preset weight of each value of RGB is determined in equal proportion.
本方案中,以R值大于临界值的像素与总像素之比为例,第一比值越大,表示图像整体上R值的色彩占比越大,对于图像的影响程度就越大,因此等比例确定RGB每个值的预设权重时,R值得到的权重就越大,反之比值越小则得到的权重就越小;该方式可以强化图像色彩特点所带来的差异,特别适用于文字识别类的图像处理任务,因为相比一般的图像,以文字为主的图像中,文字与背景的色彩参数之间通常有明显断档,通过这一方式可以放大参数断档所带来的差异。In this solution, take the ratio of pixels whose R value is greater than the critical value to the total pixels as an example. The larger the first ratio, the larger the proportion of the color of the R value in the image as a whole, and the greater the impact on the image. Therefore, etc. When the ratio determines the preset weight of each value of RGB, the greater the weight obtained by the R value, the smaller the ratio, the smaller the weight obtained; this method can strengthen the difference brought about by the color characteristics of the image, especially suitable for text For recognition image processing tasks, compared with general images, in text-based images, there is usually an obvious gap between the color parameters of the text and the background. This method can amplify the difference caused by the parameter gap.
在大部分情况下,常规的灰度值化方式就可以得到预期的效果,但有时候却无法起到作用,例如,在一幅拍摄的图像中,背景与文字颜色非常相近,背景色偏绿,其RGB值是(180,250,100),占了整幅图像的70%,而文字部分颜色偏蓝,其RGB值是(100,180,250),占了整幅图像的30%。如果以平均值灰度化,则得到的背景和文字灰度值相同,这显然不利于后续的识别;如果以最大值灰度化,则得到的背景和文字灰度值仍然相同,也显然不利于后续的识别。虽然大部分情况下,这些处理方式得到的灰度值不会相等,但此时必须依赖其他方式进行灰度化。In most cases, the conventional grayscale method can get the expected effect, but sometimes it can't work. For example, in a captured image, the background and the text color are very similar, and the background color is greenish , its RGB value is (180, 250, 100), which accounts for 70% of the entire image, while the color of the text part is bluish, and its RGB value is (100, 180, 250), accounting for 30% of the entire image. If it is grayscaled by the average value, the gray value of the background and text is the same, which is obviously not conducive to subsequent recognition; if it is grayed by the maximum value, the gray value of the background and text is still the same, which is obviously not good. for subsequent identification. Although in most cases, the grayscale values obtained by these processing methods will not be equal, but at this time, other methods must be relied on for grayscale conversion.
如果采用本实施例的加权平均灰度化,临界值取128的情况下,R值大于128的像素占了70%,因此第一比值是0.7;所有像素的G值都大于128,因此第二比值是1;同理,第三比值是0.3。以等比例确定RGB每个值的预设权重,则R的预设权重是0.35,G的预设权重是0.5,B的预设权重是0.15,因此,背景的灰度值是203,文字的灰度值是162.5。由于原图本身颜色非常相近,因此灰度化后肉眼看到的差异并不明显,但相比常规方式,已经有了明显的区分,并且通过调整临界值,还可以得到更清楚的图像。If the weighted average grayscale of this embodiment is adopted, when the critical value is 128, the pixels with an R value greater than 128 account for 70%, so the first ratio is 0.7; the G values of all pixels are greater than 128, so the second The ratio is 1; similarly, the third ratio is 0.3. Determine the default weight of each value of RGB in equal proportions, then the default weight of R is 0.35, the default weight of G is 0.5, and the default weight of B is 0.15. Therefore, the gray value of the background is 203, and the gray value of the text is 0.35. The gray value is 162.5. Since the original image itself is very similar in color, the difference seen by the naked eye after grayscale is not obvious, but compared with the conventional method, there is already a clear distinction, and by adjusting the threshold, a clearer image can also be obtained.
S2:对差异灰度化图像进行预设角度的旋转,得到若干旋转灰度化图像。包括:S2: Rotating the difference grayscaled images at a preset angle to obtain several rotated grayscaled images. include:
设置预设角度为-90度、90度、180度,每幅差异灰度化图像依次选取一个预设角度,进行旋转得到若干旋转灰度化图像。一般来说,不确定角度的图像与期望的摆正角的夹角小于45度时识别起来最容易,但事实上图像可能存在平放、倒放的情况,会严重增加识别难度,因此通过上述旋转,必然可以得到至少一个与摆正角的夹角小于45度的图像,识别准确的概率增加,利于文字识别。Set the preset angles to -90 degrees, 90 degrees, and 180 degrees, select a preset angle for each difference grayscale image in turn, and rotate to obtain several rotated grayscale images. Generally speaking, it is easiest to recognize when the angle between an image with an uncertain angle and the desired straightening angle is less than 45 degrees, but in fact, the image may be placed flat or upside down, which will seriously increase the difficulty of recognition. Therefore, through the above By rotating, at least one image with an included angle of less than 45 degrees to the square angle can be obtained, and the probability of accurate recognition increases, which is beneficial to character recognition.
S3:对旋转灰度化图像进行膨胀,利用霍夫变换检测膨胀后每行字符形成的字符直线,得到字符走向图。S3: Dilate the rotated grayscale image, use Hough transform to detect the character line formed by each line of characters after dilation, and obtain the character direction map.
倾斜矫正最常用的方法是霍夫变换,其原理是将图片进行膨胀处理,将断续的文字连成一条直线,便于直线检测。The most commonly used method of skew correction is Hough transform, whose principle is to expand the image and connect intermittent text into a straight line, which is convenient for straight line detection.
S4:根据字符走向图对膨胀前的旋转灰度化图像进行透视变换,得到矫正图。包括:S4: Perspective transformation is performed on the rotated grayscale image before expansion according to the character direction map to obtain a correction map. include:
以字符走向图中的任意一条字符直线为基准直线,局部拉伸或压缩膨胀前的旋转灰度化图像的像素,以使其余字符直线均与基准直线平行,得到矫正图。该过程类似梯形矫正的过程,可以把因为拍摄位置导致的角度进行修正。Taking any character line in the character direction graph as the reference line, the pixels of the rotated grayscale image before expansion are partially stretched or compressed, so that the rest of the character lines are parallel to the reference line, and a correction map is obtained. This process is similar to the process of keystone correction, which can correct the angle caused by the shooting position.
S5:提取矫正图中的箭头标识,以箭头标识作为辅助信息对矫正图进行仿射变换,旋转得到还原图,将还原图二值化后输入至字符识别模块进行识别,依次提取流程信息完成编排。用于将倾斜图片矫正到水平位置。包括:S5: Extract the arrow logo in the rectified image, use the arrow logo as auxiliary information to perform affine transformation on the rectified image, rotate to obtain the restored image, binarize the restored image and input it to the character recognition module for recognition, and sequentially extract process information to complete the arrangement . Used to correct a skewed image to a horizontal position. include:
判断同一矫正图中箭头标识的方向,得到若干单位矢量,计算若干单位矢量的总矢量,判断总矢量的指向方向(x,y);Judging the direction marked by the arrow in the same correction diagram, obtaining several unit vectors, calculating the total vector of several unit vectors, and judging the direction (x, y) of the total vector;
旋转矫正图,直至矫正图中字符直线处在水平位置,且总矢量的指向方向(x,y)中y小于等于0,得到候选图;Rotate the correction map until the character line in the correction map is in the horizontal position, and y is less than or equal to 0 in the pointing direction (x, y) of the total vector, and the candidate map is obtained;
根据候选图相对于原始图像的实际旋转角进行筛选,保留至少一张合格的候选图作为还原图。According to the actual rotation angle of the candidate image relative to the original image, at least one qualified candidate image is retained as the restored image.
本方案对于带有箭头的流程图的识别进行了针对性优化,流程图一般整体从上到下的形式,但由于局部分叉箭头方向并不一致,因此这里根据总矢量的指向方向进行判断,当旋转之后y小于等于0,则表示总矢量具有向下的分量,不论其是往左偏还是往右偏,均符合条件。该步骤可以过滤掉旋转后倒置的图像。This solution optimizes the recognition of flow charts with arrows. Generally, the flow charts are from top to bottom as a whole. However, because the direction of the arrows of partial forks is not consistent, the judgment is based on the direction of the total vector. When If y is less than or equal to 0 after rotation, it means that the total vector has a downward component, regardless of whether it is biased to the left or to the right, it meets the conditions. This step filters out rotated and inverted images.
另外,还原图的提取过程包括:In addition, the extraction process of the restoration map includes:
判断同一原始图像处理得到的不同候选图相对于原始图像的实际旋转角,计算每个实际旋转角的数值分布,保留数值差值在10%以内的实际旋转角,删除其余的实际旋转角所对应的候选图,剩下的候选图作为还原图。Judging the actual rotation angles of different candidate images obtained by processing the same original image relative to the original image, calculating the numerical distribution of each actual rotation angle, retaining the actual rotation angles whose numerical difference is within 10%, and deleting the remaining actual rotation angles candidate graphs, and the remaining candidate graphs are used as restored graphs.
其中实际旋转角的计算过程,包括:The calculation process of the actual rotation angle includes:
记录每幅旋转灰度化图像所旋转的预设角度p;Record the preset angle p rotated by each rotated grayscale image;
记录矫正图旋转得到候选图时的旋转角度q;Record the rotation angle q when the correction image is rotated to obtain the candidate image;
实际旋转角C=q+p,其中顺时针旋转记为正,逆时针旋转记为负。The actual rotation angle C=q+p, where clockwise rotation is recorded as positive, and counterclockwise rotation is recorded as negative.
虽然字符直线处在水平位置,但不排除图像被处理成倒置的情况,在引入箭头判断的基础上可以减少这种可能发生,再通过进一步筛选可以基本上去除这一可能。Although the straight line of the character is in a horizontal position, it is not ruled out that the image is processed as an inversion. This possibility can be reduced on the basis of introducing the arrow judgment, and this possibility can be basically eliminated by further screening.
需要说明的是,通过上述各步骤的层层递进,相互之间共同作用,可以逐渐减少角度、方向的错误情况,增加矫正成功率,最终准确判断文字方向,不会出现文字倒置的情况,适用于流程图的初期识别。上述步骤实现了1+1大于2的结果,缺少任意步骤,都将使其他步骤失去最优的效果,导致结果不准确。What needs to be explained is that, through the step-by-step progression of the above steps and their interaction with each other, the error of angle and direction can be gradually reduced, the success rate of correction can be increased, and finally the direction of the text can be accurately judged without inversion of the text. Applicable to the initial identification of flow charts. The above steps achieve the result that 1+1 is greater than 2. If any step is missing, other steps will lose the optimal effect, resulting in inaccurate results.
本实施例通过差异灰度化可以防止单一灰度化可能出现的图像不清楚的问题,通过预设角度的旋转,可以保证出现至少一张与摆正角度较小的图像,以减少后续变换过程出错的概率,最后通过一系列变换并借助箭头表示识别流程信息,不会出现文字倒置的情况,可以避免因特殊的表格或流程图等原因导致识别错误或识别失败,有利于提高处理的速度和准确性。In this embodiment, the problem of unclear images that may occur in single grayscale can be prevented through differential grayscale, and through the rotation of the preset angle, at least one image with a smaller angle to the alignment can be guaranteed to reduce the subsequent transformation process The probability of error, finally, through a series of transformations and the use of arrows to indicate the identification process information, there will be no text inversion, and it can avoid identification errors or identification failures due to special forms or flow charts, which is conducive to improving processing speed and accuracy.
在本申请所提供的实施例中,应该理解到,所揭露的结构和方法,可以通过其它的方式实现,或一些特征可以忽略,或不执行。In the embodiments provided in this application, it should be understood that the disclosed structures and methods may be implemented in other ways, or some features may be omitted or not implemented.
另外,在本申请实施例可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If an integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium Among them, several instructions are included to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present application. The above-mentioned storage medium includes: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, and other various media that can store program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above content is only the specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application, and should covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211457579.5A CN115578729B (en) | 2022-11-21 | 2022-11-21 | AI intelligent process arrangement method for digital staff |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211457579.5A CN115578729B (en) | 2022-11-21 | 2022-11-21 | AI intelligent process arrangement method for digital staff |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115578729A CN115578729A (en) | 2023-01-06 |
CN115578729B true CN115578729B (en) | 2023-03-21 |
Family
ID=84588185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211457579.5A Active CN115578729B (en) | 2022-11-21 | 2022-11-21 | AI intelligent process arrangement method for digital staff |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115578729B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113160222A (en) * | 2021-05-14 | 2021-07-23 | 电子科技大学 | Production data identification method for industrial information image |
WO2021179485A1 (en) * | 2020-03-11 | 2021-09-16 | 平安科技(深圳)有限公司 | Image rectification processing method and apparatus, storage medium, and computer device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317223B1 (en) * | 1998-12-14 | 2001-11-13 | Eastman Kodak Company | Image processing system for reducing vertically disposed patterns on images produced by scanning |
CN114787934A (en) * | 2019-11-25 | 2022-07-22 | 通用电气精准医疗有限责任公司 | Algorithm orchestration of workflows to facilitate healthcare imaging diagnostics |
CN114926839B (en) * | 2022-07-22 | 2022-10-14 | 富璟科技(深圳)有限公司 | Image identification method based on RPA and AI and electronic equipment |
-
2022
- 2022-11-21 CN CN202211457579.5A patent/CN115578729B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021179485A1 (en) * | 2020-03-11 | 2021-09-16 | 平安科技(深圳)有限公司 | Image rectification processing method and apparatus, storage medium, and computer device |
CN113160222A (en) * | 2021-05-14 | 2021-07-23 | 电子科技大学 | Production data identification method for industrial information image |
Also Published As
Publication number | Publication date |
---|---|
CN115578729A (en) | 2023-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111507251B (en) | Method and device for positioning answering area in test question image, electronic equipment and computer storage medium | |
CN108717545B (en) | Bill identification method and system based on mobile phone photographing | |
CN103034848B (en) | A kind of recognition methods of form types | |
Luo et al. | Design and implementation of a card reader based on build-in camera | |
US9324073B2 (en) | Systems for mobile image capture and remittance processing | |
CN111476109A (en) | Bill processing method, bill processing apparatus, and computer-readable storage medium | |
JP5500480B2 (en) | Form recognition device and form recognition method | |
CN111353492B (en) | Image recognition and information extraction method and device for standardized document | |
CN112183038A (en) | Form identification and typing method, computer equipment and computer readable storage medium | |
CN103606220B (en) | A kind of check printing digit recognizing method based on White-light image and infrared image | |
CN108416355A (en) | A kind of acquisition method of the industry spot creation data based on machine vision | |
CN107195069A (en) | A kind of RMB crown word number automatic identifying method | |
CN111814780B (en) | Bill image processing method, device, equipment and storage medium | |
CN112418210B (en) | Intelligent classification method for tower inspection information | |
CN107679479A (en) | A kind of objective full-filling recognition methods based on morphological image process | |
CN111814576A (en) | A deep learning-based image recognition method for shopping receipts | |
CN113095307B (en) | Automatic identification method for financial voucher information | |
WO2019071476A1 (en) | Express information input method and system based on intelligent terminal | |
CN115909375A (en) | Report form analysis method based on intelligent recognition | |
CN115588208A (en) | A recognition method of full-line table structure based on digital image processing technology | |
CN115376149A (en) | Reimbursement invoice identification method | |
CN115578729B (en) | AI intelligent process arrangement method for digital staff | |
Shukla et al. | An approach for skew detection using hough transform | |
CN111783888A (en) | A system and method for checking duplication of electronic work of pictures | |
CN105956590A (en) | Character recognition method and character recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |