CN113780260B

CN113780260B - Barrier-free character intelligent detection method based on computer vision

Info

Publication number: CN113780260B
Application number: CN202110849867.4A
Authority: CN
Inventors: 卜佳俊; 燕雪雅; 周晟; 王炜; 于智
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2023-09-19
Anticipated expiration: 2041-07-27
Also published as: CN113780260A

Abstract

A barrier-free character intelligent detection method based on computer vision includes that firstly, a webpage or an App is opened, and screenshot is carried out on the webpage or the App; performing basic processing on the obtained image, and transmitting the basic processing to an OCR model, and automatically predicting the position of a text box which possibly exists and the confidence level of the text box; then, carrying out similar pair comparison, fusing similar text boxes, and filtering according to the confidence level; then, performing shape regularity and edge detection, and determining the minimum range of the text box; and finally, carrying out barrier-free rule detection on the finally determined text box, wherein the detection comprises word size detection and color contrast detection, and providing the text box which does not accord with barrier-free specification for reference correction of developers. The method is suitable for any webpage and App, provides a unified barrier-free character intelligent detection scheme, has the characteristics of high accuracy, strong applicability, wide universality and the like, and is favorable for assisting the further popularization of the aging-suitable and barrier-free business of Internet application in China.

Description

Barrier-free character intelligent detection method based on computer vision

Technical field: the invention belongs to the field of information barrier-free, and is suitable for intelligently detecting whether characters accord with barrier-free application specifications.

The background technology is as follows:

with the rapid development of internet technology, internet applications such as web pages, apps and the like play an increasingly important role in the life of people. When the ordinary people enjoy the convenience brought by the information technology, the disabled groups such as visually impaired people, the elderly and the like meet greater challenges. In March 2020, china formally implements technical requirements and test methods for accessible information technology Internet content (GB/T37668-2019), establishes national standards for accessible Internet information in China, and greatly pushes construction processes in the fields of accessible information, aging and the like. The standard emphasizes that the information needs to ensure a certain readability principle at any time, and is implemented into rules, namely, the visual presentation of the size and the color of the characters is required to reach a certain standard.

However, due to the diversity of programming languages and technologies, confidentiality of product technologies, complexity of applications, etc., the size of the text and the color contrast with the background cannot be obtained accurately from the code in general; and a few text recognition by OCR usually requires manual processing of text regions in advance; in addition, no related character recognition technology is applied according to national standards aiming at the field of information barrier-free. Therefore, how to efficiently and accurately identify text boxes possibly existing in any complex scene, detect characters and locate non-compliance characters, promote the barrier-free improvement of product information, and become an important guarantee for the readability of Internet application information.

The invention comprises the following steps:

aiming at the technical difficulties, the invention provides an intelligent barrier-free character detection method based on computer vision. The method can be used for non-invasive intelligent detection, can rapidly and accurately identify text box areas possibly existing in any complex interface under the condition of not contacting source codes, detects whether the information barrier-free specification is violated one by one, and obviously identifies the positions of the text boxes. In addition, the method can be universally applied to Internet applications such as any web page, app, small program and the like, provides a unified intelligent detection scheme for the unobstructed characters, has the characteristics of high accuracy, strong applicability, wide universality and the like, and is favorable for assisting the further popularization of suitable aging and unobstructed careers of Internet application in China.

A computer vision-based barrier-free character intelligent detection method comprises the following specific steps:

1. the intelligent barrier-free character detection method based on computer vision is characterized by comprising the following steps of:

s1: opening a webpage or an App, acquiring an interface screenshot, preprocessing a data format of a model input image, respectively reducing the width and the height of the image to be integral multiples of 32, and adding a full 0 dimension to an image matrix;

s2: the preprocessed image in the S1 is transmitted into an OCR model, a text box and the confidence level thereof which possibly exist are identified, and an identification result is represented by a geometry map and a score map, wherein the geometry map represents the distance from the pixel point to four sides of the text box, and the score map represents the confidence level that each pixel point in the image is a word;

s3: screening the possible text boxes identified in the step S2 to determine a final text box set;

s31: performing primary screening on the possible text boxes identified in the step S2 according to the confidence coefficient, and discarding text boxes lower than a confidence coefficient threshold score_map_thresh;

s32: calculating rectangular coordinates corresponding to four vertexes of the text box after the preliminary screening according to the geometry map in the S2;

s33: suppressing fusion of similar text boxes according to the local maximum value, and calculating new confidence coefficient;

s331: calculating IoU values of local non-maximum suppression for the text box;

s332: if the IoU value of the two text boxes is larger than IoU threshold nms_thresh, fusion is needed, otherwise, the next two text boxes are continuously traversed;

s333: for the text box g and the text box p to be fused in S332, the vertex coordinate matrix value of the fused text box q is (1), the confidence coefficient is (2), wherein the first 8 parameters of the text box matrix represent the x and y coordinates of four vertexes, and the 9 th parameter represents the confidence coefficient of the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

s34: secondarily screening the fused text boxes according to a new confidence threshold box_thresh, and discarding text boxes with confidence lower than the box_thresh;

s35: according to the rectangular coordinates obtained in the S32, calculating the width and the height of the text box, and discarding the text box if any length of the width or the height is smaller than a length threshold value length_thresh;

s4: the text boxes screened in the step S3 are subjected to shape regularity, and non-rectangular text boxes are expanded to be rectangular; then, character edge detection is carried out, RGB matrixes of pixel points corresponding to the upper boundary of the character edge detection are taken, the value change of each color channel is judged, 1 pixel is taken as a step size to downwards move the boundary, and if the value difference of the color channel of a certain boundary is large, the boundary is considered to be contacted with the character edge; the upper and lower edges of the text box are basically level with the characters, and the minimum range of the shape of the text box is determined;

s5: detecting the size rule of the text box with no obstacle, which is determined in the step S4, wherein the width and the height of the text box are respectively in units of pixels, the height of the text box is converted into a pt unit, if the height of the text box is smaller than 18pt, the requirement of the size rule of the text box in no obstacle is violated, and the position of the text box with no obstacle is marked on the image;

s6: detecting the color contrast rule of the text box determined in the step S4, judging the color contrast difference between the text and the background, if the difference value is smaller than 4.5, indicating that the color difference between the text and the background is not easy to distinguish, namely, the requirement of the color contrast of the text in the barrier-free rule is violated, and marking the position of the non-compliant text box on the image;

s61: in the text box, R, G, B color values exist in each pixel point, and each color value is divided by 255 to obtain sR, sG and sB respectively;

s62: for sR obtained in S61, calculating the contrast tR of the color R, if sR is less than or equal to 0.03928, tR is obtained by calculation in (3), otherwise, tR is obtained by calculation in (4); similarly calculating tG and tB;

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055) ^2.4 ,(sR＞0.03928) (4)

s63: calculating the color contrast of a single pixel, namely (5), and calculating the contrast of each pixel point in the text box, thereby compressing the RGB three-dimensional matrix into a one-dimensional matrix;

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

s64: performing binary clustering on the color contrast of pixels in the text box range by using a K-Means clustering algorithm with K=2, and calculating the average value of the contrast in each cluster, wherein the average value represents the color contrast of characters and a background respectively;

s65: and (3) calculating the difference value of the contrast of the text and the background color according to the contrast ratio calculation standard (6).

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)

Specifically, the OCR model used in step S2 is EAST (An Efficient and Accurate Scene Text Detector), its basic network structure is PVANet, features are extracted through 4-layer convolution, and feature fusion is performed, so as to finally obtain a prediction result.

Preferably, the confidence threshold score_map_thresh for text box prescreening in step S31 is 0.8.

Preferably, in step S332, it is determined that the IoU threshold nms_thresh of the fused text box is 0.2.

Preferably, the threshold box_thresh for the secondary screening of the fused text box in step S34 is 0.1.

Preferably, the text box with too small a filtering area in step S35 has a length threshold value length_thresh of 5.

In summary, the invention creates the intelligent barrier-free text detection method based on computer vision, which has the following beneficial effects: (1) is non-invasive. The barrier-free character detection can be performed without touching the application source code. (2) has universality. The method is suitable for Internet application such as any webpage, app, applet and the like, and provides a unified barrier-free character detection scheme. (3) The intelligent detection is carried out without manually preprocessing a text region in an interface, the OCR technology is used for effectively extracting image features, and the text box position is automatically identified, so that the accurate detection of complex and various business scenes is facilitated. (4) Aiming at the field of information barrier-free, standard detection is carried out according to national standards, and non-compliance words are effectively positioned, so that product information barrier-free improvement is promoted.

Description of the drawings:

in order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a general flow chart of the computer vision-based intelligent detection method for unobstructed words;

FIG. 2 is a flow chart of text box screening in the computer vision-based intelligent detection method for unobstructed words;

FIG. 3 is a flow chart of word size rule detection in the computer vision-based barrier-free word intelligent detection method provided by the invention;

FIG. 4 is a flow chart of color contrast rule detection in the computer vision-based barrier-free text intelligent detection method provided by the invention;

fig. 5 is an exemplary diagram of word size rule detection in the computer vision-based barrier-free word intelligent detection method provided by the invention.

Fig. 6 is an exemplary diagram of color contrast rule detection in the computer vision-based intelligent detection method for unobstructed words.

The specific implementation method comprises the following steps:

exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Taking a certain APP as an example, the method comprises the following specific steps:

s331: calculating IoU values of local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055) ^2.4 ,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

s64: and (3) performing binary clustering on the color contrast of the pixels in the text box range by using a K-Means clustering algorithm with K=2, and calculating the average value of the contrast in each cluster, wherein the average value is respectively representative of the color contrast of the text and the background.

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)

Fig. 1 is a general flow chart of the intelligent detection method for the unobstructed words based on computer vision.

Fig. 2 is a flowchart of text box screening in the computer vision-based intelligent detection method for unobstructed words, provided by the invention:

s331: calculating IoU values of local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

s35: and (3) calculating the width and the height of the text box according to the rectangular coordinates obtained in the step (S32), and discarding the text box if any length of the width or the height is smaller than a length threshold value length_thresh.

Fig. 3 is a flowchart of word size rule detection in the computer vision-based barrier-free word intelligent detection method provided by the invention. S5: and (3) detecting the size rule of the text box with the no-obstacle word size determined in the step (S4), wherein the width and the height of the text box are respectively in units of pixels, the height of the text box is converted into a pt unit, if the height of the text box is smaller than 18pt, the requirement of the size rule on the word size in the no-obstacle rule is violated, and the position of the non-compliant text box is marked on the image.

Fig. 4 is a flowchart of color contrast rule detection in the computer vision-based barrier-free text intelligent detection method provided by the invention:

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055) ^2.4 ,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)。

Claims

s331: calculating IoU values of local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055) ^2.4 ,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

s65: calculating a difference value of the contrast of the characters and the background color according to a contrast ratio calculation standard (6);

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)。

2. the intelligent barrier-free text detection method based on computer vision as claimed in claim 1, wherein the method comprises the following steps: in the step S2, the OCR model is EAST (An Efficient and Accurate Scene Text Detector), the basic network structure is PVANet, features are extracted through 4-layer convolution, and feature fusion is performed, so as to finally obtain a prediction result.

3. The intelligent barrier-free text detection method based on computer vision as claimed in claim 1, wherein the method comprises the following steps: the step S31, wherein the confidence threshold score_map_thresh of the text box preliminary screening is preferably 0.8.

4. The intelligent barrier-free text detection method based on computer vision as claimed in claim 1, wherein the method comprises the following steps: the step S332 is described, wherein it is determined that the IoU threshold nms_thresh of the fused text box is preferably 0.2.

5. The intelligent barrier-free text detection method based on computer vision as claimed in claim 1, wherein the method comprises the following steps: the step S34, in which the threshold box_thresh for the secondary screening of the fused text box is preferably 0.1.

6. The intelligent barrier-free text detection method based on computer vision as claimed in claim 1, wherein the method comprises the following steps: the step S35 is described in which the length threshold value length_thresh of the text box with too small a filtering area is preferably 5.