CN113780260A

CN113780260A - Computer vision-based intelligent barrier-free character detection method

Info

Publication number: CN113780260A
Application number: CN202110849867.4A
Authority: CN
Inventors: 卜佳俊; 燕雪雅; 周晟; 王炜; 于智
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-12-10
Anticipated expiration: 2041-07-27
Also published as: CN113780260B

Abstract

A barrier-free character intelligent detection method based on computer vision comprises the steps of firstly, opening a webpage or an App and carrying out screenshot on the webpage or the App; carrying out basic processing on the obtained image, transmitting the processed image into an OCR (optical character recognition) model, and automatically predicting the position and the confidence coefficient of a text box which possibly exists; then, performing similarity pair comparison, fusing similar text boxes, and filtering according to the confidence; then, carrying out shape normalization and edge detection, and determining the minimum range of the text box; and finally, performing barrier-free rule detection on the finally determined text box, wherein the barrier-free rule detection comprises word size detection and color contrast detection, and providing the text box which does not accord with the barrier-free specification for reference and correction of developers. The method is suitable for any webpage and App, provides a set of unified intelligent detection scheme for the barrier-free characters, has the characteristics of high accuracy, strong applicability, wide universality and the like, and is beneficial to assisting the further popularization of Internet application, aging and barrier-free career in China.

Description

Computer vision-based intelligent barrier-free character detection method

The technical field is as follows: the invention belongs to the field of information barrier-free, and is suitable for intelligently detecting whether characters meet barrier-free application specifications.

Background art:

with the rapid development of internet technology, internet applications such as web pages and apps play more and more important roles in the life of people. When ordinary people enjoy the convenience brought by the information technology, the disabled groups such as visually impaired people and the old have a bigger challenge. China officially implements 'information technology internet content barrier-free accessibility technical requirements and test methods' (GB/T37668-2019) in March 2020, establishes national standards of internet information barrier-free in China, and vigorously promotes construction processes in fields of information barrier-free, aging-suitable and the like. The standard emphasis information needs to ensure a certain readability principle at any time and is implemented into a detailed rule, namely, the visual presentation of the size and the color of the characters is required to reach a certain standard.

However, due to the diversity of programming languages and technologies, the confidentiality of product technologies, the complexity of applications, and the like, the size of characters and the color contrast with the background cannot be accurately obtained from codes in general; in addition, the text area is generally manually processed in advance when few characters are recognized through OCR; in addition, no relevant character recognition technology is applied according to national standards in the information barrier-free field at present. Therefore, how to efficiently and accurately identify the text boxes possibly existing in any complex scene, detect characters and locate non-compliant characters promotes barrier-free improvement of product information, and becomes an important guarantee for reading internet application information.

The invention content is as follows:

aiming at the technical difficulties, the invention provides an intelligent barrier-free character detection method based on computer vision. The method has the advantages that non-invasive intelligent detection can be realized, text box areas possibly existing in any complex interface can be rapidly and accurately identified under the condition of not contacting source codes, whether information barrier-free specifications are violated or not is detected one by one, and the text box positions are obviously identified. In addition, the method can be universally applied to any webpage, App, applet and other internet applications, provides a set of unified barrier-free character intelligent detection scheme, has the characteristics of high accuracy, strong applicability, wide universality and the like, and is beneficial to assisting the further popularization of the internet application aging and barrier-free career in China.

A computer vision-based intelligent barrier-free character detection method comprises the following specific steps:

1. a barrier-free character intelligent detection method based on computer vision is characterized by comprising the following steps:

s1: opening a webpage or App, acquiring an interface screenshot, preprocessing the data format of a model input image, respectively reducing the width and the height of the image to integral multiples of 32, and adding a full 0 dimension to an image matrix;

s2: transferring the preprocessed image in the S1 into an OCR model, identifying a text box which may exist and the confidence coefficient of the text box, and representing the identification result by using a geometry map and a score map, wherein the geometry map represents the distance from the pixel point to four edges of the text box, and the score map represents the confidence coefficient that each pixel point in the image is a character;

s3: screening the possible text boxes identified in the S2 to determine a final text box set;

s31: primarily screening the possible text boxes identified in the step S2 according to the confidence level, and discarding the text boxes below the confidence level threshold score _ map _ thresh;

s32: calculating rectangular coordinates corresponding to four vertexes of the text box after primary screening according to a geometry map in S2;

s33: inhibiting and fusing similar text boxes according to the local maximum value, and calculating a new confidence coefficient;

s331: computing IoU values for local non-maximum suppression for the text box;

s332: if the IoU values of the two text boxes are larger than the IoU threshold nms _ thresh, fusion is needed, otherwise, the next two text boxes are continuously traversed;

s333: for the text box g and the text box p needing to be fused in the S332, the vertex coordinate matrix value of the fused text box q is (1), and the confidence coefficient is (2), wherein the first 8 parameters of the text box matrix represent the x and y coordinates of four vertices, and the 9 th parameter represents the confidence coefficient of the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

s34: performing secondary screening on the fused text box according to the new confidence threshold box _ thresh, and discarding the text box with the confidence lower than the box _ thresh;

s35: calculating the width and height of the text box according to the rectangular coordinates obtained in the step S32, and if any length of the width or the height is smaller than a length threshold value length _ thresh, discarding the text box;

s4: the shapes of the text boxes screened in the S3 are regulated, and the non-rectangular text boxes are expanded to be rectangular; then, character edge detection is carried out, an RGB matrix of a pixel point corresponding to the upper boundary of the character is taken, the change of the values of the color channels of all colors is judged, the boundary is downwards moved by taking 1 pixel as a step length, and if the difference of the color channel values of a certain boundary is large, the boundary is considered to be contacted with the character edge; in the same way, upward detection is carried out on the lower boundary of the matrix, so that the upper edge and the lower edge of the text box are basically flush with the characters, and the minimum range of the shape of the text box is determined;

s5: carrying out obstacle-free font size rule detection on the text box determined in the S4, converting the height of the text box into pt units by taking pixels as units, and marking out the position of the text box which does not conform to the requirements on the font size in the obstacle-free rule if the height of the text box is less than 18 pt;

s6: detecting the text box determined in the S4 according to the barrier-free color contrast rule, judging the color contrast difference between the characters and the background, if the difference value is less than 4.5, indicating that the color difference between the characters and the background is not easy to distinguish, namely, the requirements for the color contrast of the characters in the barrier-free rule are violated, and marking out the position of the text box which does not conform to the standard on the image;

s61: in the text box, R, G, B three color numerical values exist in each pixel point, and each color numerical value is divided by 255 to obtain sR, sG and sB respectively;

s62: calculating the contrast tR of the color R for sR obtained in S61, wherein if sR is less than or equal to 0.03928, tR is obtained by calculating in (3), otherwise, tR is obtained by calculating in (4); computing tG and tB in the same way;

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055)^2.4,(sR＞0.03928) (4)

s63: calculating the color contrast of a single pixel to obtain (5), and calculating the contrast of each pixel point in the text box, thereby compressing the RGB three-dimensional matrix into a one-dimensional matrix;

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

s64: performing binary clustering on the color contrast of the pixels in the text box range by using a K-Means clustering algorithm with K being 2, and calculating the average value of the contrast in each cluster to respectively represent the color contrast of characters and a background;

s65: and (6) calculating the contrast difference value of the characters and the background color according to the contrast ratio calculation standard (6).

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)

Specifically, the OCR model used in step S2 is east (an Efficient and Accurate Scene Text detector), the basic network structure of which is PVANet, features are extracted by 4 layers of convolution, and feature fusion is performed, so as to obtain a prediction result finally.

Preferably, the confidence threshold score _ map _ thresh of the text box preliminary screening in step S31 is 0.8.

Preferably, the IoU threshold nms _ thresh of the fused text box is determined to be 0.2 in step S332.

Preferably, the threshold box _ thresh for the secondary filtering of the fused text box in step S34 is 0.1.

Preferably, the length threshold length _ thresh of the text box with the small filtering area in step S35 is 5.

In summary, the invention creates an intelligent barrier-free character detection method based on computer vision, which has the following beneficial effects: (1) is non-invasive. The barrier-free character detection can be carried out without contacting an application source code. (2) Has universality. The method is suitable for any webpage, App, applet and other internet applications, and provides a set of unified barrier-free character detection scheme. (3) The intelligent detection is realized, the character area in the interface is not required to be manually preprocessed, the image characteristics are effectively extracted by using an OCR (optical character recognition) technology, the position of a text box is automatically recognized, and the accurate detection on complex and various service scenes is facilitated. (4) The method aims at the information barrier-free field, carries out standard detection according to national standards, effectively positions non-compliant characters, and promotes barrier-free improvement of product information.

Description of the drawings:

in order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a general flow chart of the computer vision-based intelligent detection method for unobstructed text;

FIG. 2 is a flow chart of text box screening in the computer vision-based intelligent obstacle-free character detection method provided by the invention;

FIG. 3 is a flow chart of font size rule detection in the computer vision-based intelligent barrier-free text detection method provided by the invention;

FIG. 4 is a flow chart of color contrast rule detection in the computer vision-based intelligent barrier-free text detection method provided by the invention;

fig. 5 is an exemplary diagram of word size rule detection in the computer vision-based intelligent barrier-free word detection method provided by the invention.

Fig. 6 is an exemplary diagram of color contrast rule detection in the computer vision-based intelligent barrier-free text detection method provided by the invention.

The specific implementation method comprises the following steps:

exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the embodiment, taking a certain APP as an example, the method comprises the following specific steps:

s331: computing IoU values for local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055)^2.4,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

s64: and performing binary clustering on the color contrast of the pixels in the text box range by using a K-Means clustering algorithm with K being 2, and calculating the average value of the contrast in each cluster to respectively represent the color contrast of characters and background.

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)

Fig. 1 is a general flow chart of the computer vision-based intelligent detection method for unobstructed characters.

FIG. 2 is a flow chart of text box screening in the computer vision-based intelligent obstacle-free character detection method provided by the invention:

s331: computing IoU values for local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

s35: the width and height of the text box are calculated according to the rectangular coordinates obtained in S32, and if either the width or height is less than the length threshold length _ thresh, the text box is discarded.

Fig. 3 is a flow chart of font size rule detection in the computer vision-based intelligent barrier-free text detection method provided by the invention. S5: and (4) carrying out obstacle-free word size rule detection on the text box determined in the S4, converting the height of the text box into pt units by taking pixels as units, and marking out the position of the text box which does not conform to the requirements on the word size in the obstacle-free rule on the image if the height of the text box is less than 18 pt.

FIG. 4 is a flow chart of color contrast rule detection in the computer vision-based intelligent barrier-free text detection method provided by the invention:

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055)^2.4,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)。

Claims

s331: computing IoU values for local non-maximum suppression for the text box;

q[:8]＝(g[8]*g[:8]+p[8]*p[:8])/(g[8]+p[8]) (1)

q[8]＝(g[8]+p[8]) (2)

tR＝sR/12.92,(sR≤0.03928) (3)

tR＝((sR+0.055)/1.055)^2.4,(sR＞0.03928) (4)

t＝0.2126*tR+0.7152*tG+0.0722*tB (5)

ratio＝(max(lum1,lum2)+0.05)/(min(lum1,lum2)+0.05) (6)

2. The computer vision-based intelligent barrier-free character detection method is characterized in that: in the step S2, the OCR model used is east (an Efficient and Accurate Scene Text detector), the basic network structure is PVANet, the features are extracted by 4 layers of convolution, and the features are fused to finally obtain the prediction result.

3. The computer vision-based intelligent barrier-free character detection method is characterized in that: the step S31, wherein the confidence threshold score _ map _ thresh of the text box primary screening is preferably 0.8.

4. The computer vision-based intelligent barrier-free character detection method is characterized in that: the step S332, in which it is determined that IoU threshold nms _ thresh of the fused text box is preferably 0.2.

5. The computer vision-based intelligent barrier-free character detection method is characterized in that: in step S34, the threshold box _ thresh for the second filtering of the merged textbox is preferably 0.1.

6. The computer vision-based intelligent barrier-free character detection method is characterized in that: the step S35, wherein the length threshold length _ thresh of the text box with the too small filtering area is preferably 5.