CN113434072B

CN113434072B - Mobile terminal application control identification method based on computer vision

Info

Publication number: CN113434072B
Application number: CN202110597673.XA
Authority: CN
Inventors: 卜佳俊; 张建锋; 周晟; 刘美含; 王炜; 于智
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-06-07
Anticipated expiration: 2041-05-31
Also published as: WO2022252239A1; CN113434072A

Abstract

The invention discloses a mobile terminal application control identification method based on computer vision. The method combines a hardware and software method, utilizes the barrier-free function of the system, and realizes a non-invasive mobile terminal application control identification method with strong universality and low error rate. Firstly, opening a screen reader and corresponding software, and uploading a screenshot to a server after a mechanical arm operates; secondly, preprocessing the screenshot, and then extracting and expanding the color of the screenshot to obtain a single-channel image. Then, performing edge detection and straight line detection on the single-channel image, and filtering noise to obtain a central coordinate of the control; and finally, judging the functions of the control by using a computer vision method. And circulating the steps to construct the page control tree of the App. The method can be applied to complex scenes, can realize the understanding of functional meaning of each control, has strong universality, and can be applied to scenes such as automatic testing, page structure decomposition, man-machine interaction analysis and the like of mobile application.

Description

Mobile terminal application control identification method based on computer vision

The technical field is as follows: the invention relates to a non-invasive control identification algorithm applied to a mobile terminal based on computer vision, and belongs to the technical field of computer software.

Background art:

the number of mobile applications has been explosively increased with the development of mobile internet, and software design has become more and more complex. Therefore, the demands of automatic testing, page structure decomposition, human-computer interaction analysis and the like of the mobile application are increasing day by day, and the demands do not depart from a control identification method based on a Graphical User Interface (GUI), namely, interactive visual components in the GUI are automatically identified. For example, in order to ensure the product quality of mobile applications, GUI automation tests are often required, and the mainstream "record playback" method currently used needs to determine the number, positions and interactive operations available in the GUI interface in advance.

At present, most of common control identification methods identify controls according to control attributes, and can be mainly classified into three categories: coordinate-based control identification, source code-based control identification, and control tree-based control identification. The following disadvantages are mainly present: (1) an understanding of the functional meaning of each control cannot be achieved. The existing method just classifies the controls by using the attributes, but cannot really identify the specific purpose of each control. In addition, the method fails when the attribute value is null or repeated. (2) The method can not be applied to complex scenes such as pages with interactive logic such as pop windows and sub-pages. (3) It has no universality. Because the control identification logic, the control calling logic and the like of the Android and iOS platforms are different, the control identification scheme cannot be reused.

The invention content is as follows:

aiming at the problems and difficulties, the invention provides a mobile terminal application control identification method based on computer vision. Compared with a control identification method based on coordinates, the method can be adapted to devices with different platforms and different resolutions, and is higher in universality; compared with the identification method based on the source code, the method is non-intrusive, namely the source code of software does not need to be obtained, the method can be used in scenes such as black box test and the like, and has a wider application range; compared with a control identification method based on a control tree, the method is not influenced by the page hierarchical structure and the control position, and can flexibly deal with various complex scenes. In addition, the method can realize semantic understanding on each control, and can identify the specific purpose of each control besides judging the position and the attribute of the control.

A mobile terminal application control identification method based on computer vision specifically comprises the following steps: s101: the screen reader is turned on and its main role is to describe elements on the screen in voice and out of the focus box by examining the GUI of the mobile application and the extra information the mobile application provides for the barrier-free feature. S102: and opening corresponding software, performing one-time operation on the screen by the mechanical arm, capturing the screen, uploading the screen to the server, and preprocessing the image. S103: and determining an RGB matrix corresponding to the color of the focus frame for the screenshot obtained in S102, and superposing according to different backgrounds to obtain an RGB range. S104: and extracting the pixels in the RGB range obtained in the step S103 from the screenshot obtained in the step S102 to obtain a single-channel image. S105: and expanding the image of the S104, carrying out edge detection according to the single-channel image obtained in the S104, and detecting the edge of the focus frame. S106: and performing linear detection according to the edge obtained in the step S105, and obtaining a rectangular coordinate system equation of the straight line. S107: and screening the equation obtained in the S106 to obtain a straight line corresponding to the focus frame, calculating the center coordinate and the length and the width of the control, and converting the center coordinate and the length and the width into the screen ratio. And S108, determining the function of the rectangular screenshot framed by the focus frame in a computer vision mode according to the central coordinate of the control and the screen ratio of the length and the width obtained in the S107. S109: and (5) forming a corresponding relation between the coordinates of the control obtained in the step (107) and the characters identified in the step (108), and constructing a page tree with the key value pairs as nodes. And S110, if the control to be clicked is known, traversing the page tree obtained in the step S109 to find the node corresponding to the control, wherein the path from the father node to the target node is the path operated after opening the APP, the physical coordinates on the screen are obtained according to the percentage of the coordinates of the control in the image, and the mechanical arm can directly double-click the control until the target control is found.

In step S101, the screen reader used in the method is in the following specific form: s201, the Android screen reader is Talkback, and the iOS screen reader is Voiceover.

Specifically, in step S102, the specific requirements of the screenshot are: s301, the image must be in png format.

Specifically, in step S102, the operation of the robot arm is: and S401, sliding left, sliding right and double clicking.

Specifically, in the step S102, the image preprocessing specifically includes: and S501, intercepting a screen part occupied by the graphical user interface of the mobile terminal software.

Specifically, in step S103, the specific method for obtaining the RGB matrix range includes: s601, determining a plurality of RBG matrixes corresponding to the focus frame to obtain an initial range; and S602, superposing different gray scales on the background according to the initial range obtained in the S501 to obtain a final range.

Specifically, in step S104, the single-channel image acquisition scheme is as follows: s701, traversing in the image matrix obtained in S102 according to the RGB matrix range obtained in S103; s702, setting the corresponding value of the pixel in the RGB matrix range as 1; and S703, setting the corresponding value of the pixel out of the RGB matrix range to 0.

Specifically, in step S105, the image is expanded as follows: s801, respectively splicing matrixes with pixel values of 0 and width of 50 at the left side and the right side of the image; s802: according to the image spliced in S801, matrices having pixel values of 0 and 50 are respectively spliced at the upper and lower sides thereof.

Specifically, in step S105, the scheme of edge detection is as follows: s901: carrying out Gaussian denoising on the image; s902, calculating the gradient of the denoised image obtained in the S901, and calculating the edge amplitude and angle of the image according to the gradient; s903, according to the image edge amplitude and the angle obtained in the S902, carrying out non-maximum value suppression along the gradient direction; and S904, performing double-threshold edge connection processing to obtain an edge.

Specifically, in step S106, the specific scheme of the line detection is as follows: s1001, converting the coordinates of each point of the image obtained in the S904 into polar coordinates; s1002: calculating a linear equation corresponding to each coordinate, wherein the coordinates with the common linear equation are on the same straight line; s1003, counting the pixel value of each straight line; s1004, if the pixel value on the straight line obtained in S1003 exceeds a certain threshold value, the straight line is reserved; s1005: if the pixel value on the straight line obtained in S1003 does not exceed a certain threshold, the straight line is not retained.

Specifically, in step S106, the specific method for obtaining the rectangular coordinate system equation of the straight line is as follows: and S1101, converting the polar coordinate equation into a rectangular coordinate equation.

In step S107, the specific method for screening the line corresponding to the focus frame includes: s1201: if the difference between the pixel values of two adjacent straight lines meets a certain fixed value, the straight line is considered to be the straight line corresponding to the focus frame; s1202: if the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, the straight line is considered as an interference straight line.

Specifically, in step S107, the specific method for calculating the center coordinate of the control includes: s1301: for a vertical straight line, taking an average value to obtain the abscissa of the control; s1302, averaging the horizontal straight lines to obtain the vertical coordinate of the control.

In step S107, the specific method for calculating the length and width of the control is as follows: s1401: for a vertical straight line, calculating the difference value between the maximum value and the minimum value to obtain the width of the control; and S1402, calculating the difference value between the maximum value and the minimum value to obtain the length of the control for the horizontal straight line.

Specifically, in step S107, the method for calculating the percentage of the control center in the screen includes: s1501: dividing the abscissa by the image width to obtain the percentage of the x-axis; s1502: dividing the ordinate by the image length to obtain the percentage of the y-axis;

specifically, in step S108, the method for specifically determining the control function is: s1601, performing character recognition by using an OCR (optical character recognition) to obtain characters corresponding to the control; and S1602, if no characters are detected in S1601, performing image matching, and determining functions of the database according to the constructed database.

Specifically, in step S109, the specific method for constructing the page tree is as follows: s1701, taking the central coordinate of the control and the control function as a combined key value pair as a node of the tree; s1702, setting a null node as a root node, and taking all controls of the mobile application home page as a parent node; s1703, jumping to all the controls of the page by clicking a certain control as a child node of the clicked control, and establishing a page tree by analogy.

In conclusion, the invention creates a non-invasive control identification algorithm method for the mobile terminal application based on computer vision, and has the following beneficial effects: (1) the understanding of the functional meaning of each control is realized, and besides the positioning of the control, the page level and the function of the control can be known. (2) The method can be applied to complex scenes such as pages with interactive logic of pop windows, sub-pages and the like. (3) Has universality. The method is suitable for the conditions of different platforms and different models.

Description of the drawings:

in order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of a hardware-software interaction of a non-intrusive control recognition algorithm for a mobile terminal based on computer vision provided by the present invention;

FIG. 2 is a general flow chart of a non-intrusive control identification algorithm applied to a mobile terminal based on computer vision provided by the present invention;

FIG. 3 is an example of an image augmentation method in a general flowchart of a non-invasive control recognition algorithm applied to a mobile terminal based on computer vision provided by the present invention;

FIG. 4 is a flowchart of edge detection in a general flowchart of a non-intrusive control identification algorithm applied to a mobile terminal based on computer vision provided by the present invention;

FIG. 5 is a flow chart of line detection in a general flow chart of a non-invasive control identification algorithm applied to a mobile terminal based on computer vision provided by the invention;

the specific implementation method comprises the following steps:

exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In the embodiment, taking a certain APP as an example, the method comprises the following specific steps:

s101: the screen reader is turned on.

S102: and opening the APP, and performing screen capture uploading after one-time 'right-sliding' operation on the screen by the mechanical arm, and capturing the image.

S103: and determining the RGB matrix range corresponding to the color of the focus frame for the screenshot obtained in the S102.

S104: and extracting the pixels in the RGB range obtained in the step S103 from the screenshot obtained in the step S102 to obtain a single-channel image.

S105: and expanding the single-channel image of the S104, and carrying out edge detection according to the single-channel image obtained in the S104.

S106: and performing linear detection according to the edge obtained in the step S105, and obtaining a rectangular coordinate system equation of the straight line.

S107: and screening the equation obtained in the S106 to obtain a straight line corresponding to the focus frame, calculating the center coordinate and the length and the width of the control, and converting the center coordinate and the length and the width into the screen ratio.

And S108, determining the function of the rectangular screenshot framed by the focus frame in a computer vision mode according to the central coordinate of the control and the screen ratio of the length and the width obtained in the S107.

S109: and (5) forming a corresponding relation between the coordinates of the control obtained in the step (107) and the characters identified in the step (108), and constructing a page tree with the key value pairs as nodes.

And S110, traversing the page tree obtained in the step S109 to find the node corresponding to the control according to the control to be clicked, obtaining the physical coordinates on the screen according to the percentage of the coordinates of the control in the image, and directly clicking the control by double clicking the mechanical arm until the target control is found.

fig. 3 is an example of an image augmentation method in a general flowchart of a non-invasive control recognition algorithm applied to a mobile terminal based on computer vision provided by the present invention: s801, respectively splicing matrixes with pixel values of 0 and 50 widths at the left side and the right side of the image; s802: according to the image spliced in the step S801, matrixes with pixel values of 0 and 50 are respectively spliced at the upper side and the lower side of the image;

fig. 4 is a flowchart of edge detection in a general flowchart of a non-intrusive control identification algorithm applied to a mobile terminal based on computer vision provided by the present invention: s901: carrying out Gaussian denoising on the image; s902, calculating the gradient of the denoised image obtained in the S901, and calculating the edge amplitude and angle of the image according to the gradient; s903, according to the image edge amplitude and the angle obtained in the S902, carrying out non-maximum value suppression along the gradient direction; and S904, performing double-threshold edge connection processing to obtain an edge.

Fig. 5 is a flowchart of line detection in a general flowchart of a non-invasive control recognition algorithm applied to a mobile terminal based on computer vision provided by the present invention: s1001, converting the coordinates of each point of the image obtained in the S904 into polar coordinates; s1002: calculating a linear equation corresponding to each coordinate, wherein the coordinates with the common linear equation are on a straight line; s1003, counting the pixel value of each straight line; s1004, if the pixel value on the straight line obtained in S1003 exceeds a certain threshold value, the straight line is reserved; s1005: if the pixel value on the straight line obtained in S1003 does not exceed a certain threshold, the straight line is not retained.

Claims

1. A mobile terminal application control identification method based on computer vision is characterized by comprising the following steps:

s101: opening a screen reader, which is mainly used for describing elements on a screen by voice and framing the elements by using a focus frame by checking a GUI (graphical user interface) of a mobile application program and additional information provided by the mobile application for barrier-free characteristics;

s102: corresponding software is opened, the mechanical arm performs one-time operation on the screen, then the screen is captured and uploaded to the server, and image preprocessing is performed;

s103: determining an RGB matrix corresponding to the color of the focus frame for the screenshot obtained in S102, and superposing according to different backgrounds to obtain an RGB range;

s104: extracting pixels in the RGB range obtained in the step S103 from the screenshot obtained in the step S102 to obtain a single-channel image;

s105: expanding the image of S104, carrying out edge detection according to the single-channel image obtained in S104, and detecting the edge of the focus frame;

s106: performing linear detection according to the edge obtained in the step S105, and obtaining a rectangular coordinate system equation of a straight line;

s107: screening the equation obtained in the S106 to obtain a straight line corresponding to the focus frame, calculating the center coordinate and the length and the width of the control, and converting the center coordinate and the length and the width into a screen ratio;

s108, according to the central coordinate of the control and the screen proportion of the length and the width obtained in the S107, determining the function of the rectangular screenshot framed by the focus frame in a computer vision mode;

s109: forming a corresponding relation between the coordinates of the control obtained in the step S107 and the characters identified in the step S108, and constructing a page tree with key value pairs as nodes;

and S110, if the control to be clicked is known, traversing the page tree obtained in the step S109 to find the node corresponding to the control, wherein the path from the father node to the target node is the path operated after opening the APP, the physical coordinates on the screen are obtained according to the percentage of the coordinates of the control in the image, and the mechanical arm can directly double-click the control until the target control is found.

2. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in the step S101, the specific form of the screen reader used is as follows:

s201, the Android screen reader is Talkback, and the iOS screen reader is Voiceover.

3. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S102, the specific requirements of the screenshot are:

s301, the image must be in png format.

4. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in the step S102, the mechanical arm is specifically operated to:

and S401, sliding left, sliding right and double clicking.

5. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in the step S102, a specific scheme of the image preprocessing is as follows:

and S501, intercepting a screen part occupied by the graphical user interface of the mobile terminal software.

6. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S103, the specific method for obtaining the RGB matrix range includes:

s601, determining a plurality of RBG matrixes corresponding to the focus frame to obtain an initial range; and S602, superposing different gray scales on the background according to the initial range obtained in the S501 to obtain a final range.

7. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in the step S104, the single-channel image acquisition scheme is as follows:

s701, traversing in the image matrix obtained in S102 according to the RGB matrix range obtained in S103; s702, setting the corresponding value of the pixel in the RGB matrix range as 1; and S703, setting the corresponding value of the pixel out of the RGB matrix range to 0.

8. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S105, the image is expanded as follows:

s801, respectively splicing matrixes with pixel values of 0 and 50 widths at the left side and the right side of the image; s802: according to the image spliced in S801, matrices having pixel values of 0 and 50 are respectively spliced at the upper and lower sides thereof.

9. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S105, the scheme of edge detection is as follows:

s901: carrying out Gaussian denoising on the image; s902, calculating the gradient of the denoised image obtained in the S901, and calculating the edge amplitude and angle of the image according to the gradient; s903, according to the image edge amplitude and the angle obtained in the S902, carrying out non-maximum value suppression along the gradient direction; and S904, performing double-threshold edge connection processing to obtain an edge.

10. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S106, the specific scheme of the line detection is:

s1001, converting the coordinates of each point of the image obtained in the S904 into polar coordinates; s1002: calculating a linear equation corresponding to each coordinate, wherein the coordinates with the common linear equation are on the same straight line; s1003, counting the pixel value of each straight line; s1004, if the pixel value on the straight line obtained in S1003 exceeds a certain threshold value, the straight line is reserved; s1005: if the pixel value on the straight line obtained in S1003 does not exceed a certain threshold, the straight line is not retained.

11. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S106, a specific method for obtaining the rectangular coordinate system equation of the straight line is as follows:

and S1101, converting the polar coordinate equation into a rectangular coordinate equation.

12. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S107, the specific method for screening the line corresponding to the focus frame includes:

s1201: if the difference between the pixel values of two adjacent straight lines meets a certain fixed value, the straight line is considered to be the straight line corresponding to the focus frame; s1202: if the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, the straight line is considered as an interference straight line.

13. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S107, the specific method for calculating the center coordinate of the control includes:

s1301: for a vertical straight line, taking an average value to obtain the abscissa of the control; s1302, averaging the horizontal straight lines to obtain the vertical coordinate of the control.

14. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S107, the specific method for calculating the length and width of the control is as follows:

s1401: for a vertical straight line, calculating the difference value between the maximum value and the minimum value to obtain the width of the control; and S1402, calculating the difference value between the maximum value and the minimum value to obtain the length of the control for the horizontal straight line.

15. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S107, the method for calculating the percentage of the control center in the screen includes:

s1501: dividing the abscissa by the image width to obtain the percentage of the x-axis; s1502: the ordinate is divided by the image length to give the percentage on the y-axis.

16. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S108, the specific method for determining the control function is:

s1601, performing character recognition by using an OCR (optical character recognition) to obtain characters corresponding to the control; and S1602, if no characters are detected in S1601, performing image matching, and determining functions of the database according to the constructed database.

17. The method for identifying the mobile-end application control based on the computer vision of claim 1, wherein: in step S109, the specific method for constructing the page tree is as follows:

s1701, taking the central coordinate of the control and the control function as a combined key value pair as a node of the tree; s1702, setting a null node as a root node, and taking the following nodes as father nodes of all controls of the mobile application home page; s1703, jumping to all the controls of the page by clicking a certain control as a child node of the clicked control, and establishing a page tree by analogy.