WO2022252239A1 - Computer vision-based mobile terminal application control identification method - Google Patents
Computer vision-based mobile terminal application control identification method Download PDFInfo
- Publication number
- WO2022252239A1 WO2022252239A1 PCT/CN2021/098490 CN2021098490W WO2022252239A1 WO 2022252239 A1 WO2022252239 A1 WO 2022252239A1 CN 2021098490 W CN2021098490 W CN 2021098490W WO 2022252239 A1 WO2022252239 A1 WO 2022252239A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- control
- mobile terminal
- computer vision
- image
- terminal application
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 238000003708 edge detection Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 abstract description 4
- 238000000354 decomposition reaction Methods 0.000 abstract description 2
- 238000012482 interaction analysis Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 abstract 1
- 230000002452 interceptive effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 241001270131 Agaricus moelleri Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000000332 black box Nutrition 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Definitions
- the invention relates to a computer vision-based mobile terminal application non-intrusive control recognition algorithm, which belongs to the field of computer technology software.
- GUI Graphical User Interface
- control identification methods mostly identify controls based on control properties, which can be mainly divided into three categories: control identification based on coordinates, control identification based on source code, and control identification based on control tree.
- the present invention proposes a computer vision-based mobile terminal application control recognition method.
- this method can be adapted to different platforms and devices with different resolutions, and has higher universality;
- this method is non-intrusive, that is, it does not It is necessary to obtain the source code of the software, which can be used in scenarios such as black-box testing, and has a wider range of applications;
- the control tree-based control identification method this method is not affected by the page hierarchy and control position, and can flexibly respond to various complex scene.
- this method can also realize semantic understanding for each control, and can also identify the specific use of each control in addition to judging the position and attributes of the control.
- S101 Open the screen reader, its main function is to use Voice describes elements on the screen and frames them with a focus frame.
- S102 Open the corresponding software, perform an operation on the screen with the robotic arm, and upload the screen shot to the server, and perform image preprocessing.
- S103 For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds.
- S104 Extract the pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
- S105 Expand the image in S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame.
- S106 Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
- S107 Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
- S108 According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
- S109 Construct a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes.
- step S101 the specific form of the screen reader used is as follows: S201: the screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
- step S102 the specific requirements for screenshots are: S301: The image must be in png format.
- step S102 the specific operations of the mechanical arm are: S401: slide left, slide right, double click.
- step S102 the specific scheme of image preprocessing is: S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
- the specific method for obtaining the range of the RGB matrix is: S601: determine several RBG matrices corresponding to the focus frame, and obtain the initial range; S602: superimpose different grayscales at the background according to the initial range obtained in S501, to get the final range.
- step S104 wherein the acquisition scheme of the single-channel image is: S701: according to the RGB matrix range obtained in S103, traverse in the image matrix obtained in S102; S702: for the pixel in the RGB matrix range, its corresponding value Set to 1; S703: For pixels outside the range of the RGB matrix, set its corresponding value to 0.
- the mode of expanding the image is as follows: S801: the left and right sides of the image are spliced respectively to a matrix whose pixel value is 0 and the width is 50; A matrix with a height of 0 and a height of 50.
- the scheme of edge detection is: S901: Gaussian denoising is carried out to the image; S902: calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: according to S902 The obtained image edge amplitude and angle are subjected to non-maximum suppression along the gradient direction; S904: performing double-threshold edge connection processing to obtain edges.
- step S106 the specific scheme of straight line detection is: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: calculate the straight line equation corresponding to each coordinate, and the coordinates with common straight line equation are in On a straight line; S1003: Count the pixel values on each straight line; S1004: If the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep this straight line; S1005: If the pixel value on the straight line obtained by S1003 does not exceed A certain threshold, the straight line is not retained.
- the specific method for obtaining the rectangular coordinate system equation of the straight line is: S1101: converting the polar coordinate equation into a rectangular coordinate equation.
- step S107 the specific method of screening the straight line corresponding to the focus frame is: S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, it is considered to be the straight line corresponding to the focus frame; S1202: If If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, it is considered as an interfering straight line.
- step S107 the specific method for calculating the center coordinates of the control is: S1301: for a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: for a horizontal straight line, take the mean value to obtain the ordinate of the control.
- step S107 the specific method for calculating the length and width of the control is: S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the maximum value and the minimum value The difference between the values gets the length of the control.
- step S107 the method for calculating the percentage of the control center to the screen is: S1501: divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the length of the image to obtain the percentage of the y-axis;
- step S108 the method for specifically determining the function of the control is: S1601: carry out text recognition with OCR to obtain the corresponding text of the control; S1602: if S1601 does not detect the text, perform image matching, and determine its function according to the database that has been built .
- step S109 the specific method of constructing the page tree is: S1701: combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: set an empty node as the root node, move All controls on the home page of the application regard the root node as a parent node; S1703: click on a certain control to jump to all controls on the page as child nodes of the clicked control, and build a page tree by analogy.
- the present invention creates a computer vision-based non-intrusive control recognition algorithm for mobile terminal applications, which has the following beneficial effects: (1) The understanding of the meaning of the function of each control can be realized in addition to the positioning of the control. Know the page hierarchy and function of the control. (2) It can be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) It is universal. Applicable to different platforms and different models.
- Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention
- Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention
- Fig. 3 is an example of an image expansion method in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
- Fig. 4 is a flow chart of edge detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
- Fig. 5 is a flow chart of straight line detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
- This example takes an APP as an example.
- the method includes the following specific steps:
- S102 Open the APP, perform a "swipe right” operation on the screen with the robotic arm, and then take a screenshot and upload it, and capture the image.
- S104 Extract pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
- S105 Expand the single-channel image in S104, and perform edge detection according to the single-channel image obtained in S104.
- S106 Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
- S107 Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
- S108 According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
- S109 Construct the coordinates of the control obtained in S107 and the text recognized in S108 to form a corresponding relationship, and construct a page tree with key-value pairs as nodes.
- Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention
- Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention
- Fig. 3 is an example of an image expansion method in the overall flowchart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image respectively ; S802: according to the stitched image in S801, splice a matrix with a pixel value of 0 and a height of 50 on its upper and lower sides respectively;
- Fig. 4 is the flow chart of edge detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S901: Gaussian denoising is performed on the image; The gradient of the image calculates the image edge amplitude and angle according to the gradient; S903: according to the image edge amplitude and angle obtained in S902, non-maximum suppression is carried out along the gradient direction; S904: double-threshold edge connection processing is performed to obtain the edge.
- Fig. 5 is the flow chart of straight line detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002 : Calculate the straight line equation corresponding to each coordinate, the coordinates with the common straight line equation are on a straight line; S1003: count the pixel values on each straight line; S1004: if the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep This straight line; S1005: If the pixel values on the straight line obtained in S1003 do not exceed a certain threshold, do not keep this straight line.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
Disclosed is a computer vision-based mobile terminal application control identification method. In the method, hardware and software methods are combined, and an accessibility function of a system is used, thus achieving a non-invasive, universal and low-error-rate mobile terminal application control identification method. First, a screen reader and corresponding software are opened, and a screenshot is uploaded to a server once a robotic arm is operated. Next, once the screenshot is preprocessed, color extraction and expansion are performed thereon to obtain a single-channel image. Then, edge detection and line detection are performed on the single-channel image, and center coordinates of a control are obtained after noise is filtered. Finally, the function of the control is distinguished by using a computer vision method. The steps above are repeated to build a page control tree of an app. The method may be applied to complex scenarios, achieves the understanding of the functional meaning of each control and has strong universality, and may be applied to scenarios such as the automated testing of mobile applications, page structure decomposition, and human-computer interaction analysis.
Description
本发明涉及一种基于计算机视觉的移动端应用非侵入式控件识别算法,属于计算机技术软件领域。The invention relates to a computer vision-based mobile terminal application non-intrusive control recognition algorithm, which belongs to the field of computer technology software.
移动应用的数量随着移动互联网的发展呈现爆炸式增长,软件设计也变得越来越复杂。由此,移动应用的自动化测试、页面结构分解,人机交互分析等需求也随之日益增多,而这些需求都离不开基于图形用户界面(Graphical User Interface,GUI)的控件识别方法,即自动识别出GUI界面中可交互的可视化组件。例如,为了保证移动应用的产品质量,往往需要进行GUI自动化测试,而现用主流的“录制回放”方法需要预先确定GUI界面中控件的数量、位置以及可进行的交互操作。The number of mobile applications has exploded with the development of the mobile Internet, and software design has become more and more complex. As a result, the requirements for automated testing, page structure decomposition, and human-computer interaction analysis of mobile applications are also increasing, and these requirements are inseparable from the control identification method based on the Graphical User Interface (GUI), that is, automatic Identify the interactive visual components in the GUI interface. For example, in order to ensure the product quality of mobile applications, GUI automation testing is often required, while the current mainstream "recording and playback" method needs to pre-determine the number, location and interactive operations of controls in the GUI interface.
目前,常见的控件识别方法大多根据控件属性对控件进行识别,主要可以分为三类:基于坐标的控件识别、基于源代码的控件识别和基于控件树的控件识别。主要存在以下缺陷:(1)无法对每个控件实现功能含义的理解。现有方法只是利用属性对控件进行分类,而无法真正识别每个控件的具体用途。除此之外,当属性值为空或者重复的时候,方法失效。(2)无法应用于复杂场景,例如具有弹窗、子页面等交互逻辑的页面。(3)不具有普适性。由于Android和iOS平台的控件识别逻辑、控件调用逻辑等存在差异,因此控件识别方案不可复用。At present, common control identification methods mostly identify controls based on control properties, which can be mainly divided into three categories: control identification based on coordinates, control identification based on source code, and control identification based on control tree. There are mainly the following defects: (1) It is impossible to realize the understanding of the functional meaning of each control. Existing methods only use attributes to classify controls, but cannot really identify the specific use of each control. In addition, when the attribute value is empty or repeated, the method fails. (2) It cannot be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) Not universal. Due to differences in control identification logic and control call logic between the Android and iOS platforms, the control identification scheme cannot be reused.
发明内容:Invention content:
针对以上问题和难点,本发明提出了一种基于计算机视觉的移动端应用控件识别方法。与基于坐标的控件识别方法相比,该方法可适配不同平台和不同分辨率的设备,普适性更高;与基于源代码的识别方法相比,该方法是非侵入式的,也就是不需要获得软件的源代码,可用于黑盒测试等场景,有更广阔的应用范围;与基于控件树的控件识别方法相比,该方法不受页面层次结构和控件位置影响,可以灵活应对各种复杂场景。除此以外,该方法还可以对每个控件实现语义理解,在判断控件位置和属性外,还可以识别每个控件的具体用途。Aiming at the above problems and difficulties, the present invention proposes a computer vision-based mobile terminal application control recognition method. Compared with the coordinate-based control identification method, this method can be adapted to different platforms and devices with different resolutions, and has higher universality; compared with the source code-based identification method, this method is non-intrusive, that is, it does not It is necessary to obtain the source code of the software, which can be used in scenarios such as black-box testing, and has a wider range of applications; compared with the control tree-based control identification method, this method is not affected by the page hierarchy and control position, and can flexibly respond to various complex scene. In addition, this method can also realize semantic understanding for each control, and can also identify the specific use of each control in addition to judging the position and attributes of the control.
一种基于计算机视觉的移动端应用控件识别方法的具体步骤如下:S101:打开屏幕阅读器,其主要作用是,通过检查移动应用程序的GUI,以及移动应用为无障碍特性提供的额外信息,用语音描述屏幕上的元素,并用焦点框框出来。S102:打开相应软件,机械臂在屏幕上做一次操作后截屏上传到服务端,并进行图像的预处理。S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵,并根据不同背景叠加得到RGB的范围。S104:在S102得 到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像。S105:扩充S104的图像,根据S104得到的单通道图像进行边缘检测,检测到焦点框的边缘。S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式。S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比。S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能。S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对为节点的页面树。S110:若已知想要点击的控件,则遍历S109得到的页面树找到控件对应的节点,父节点到目标节点的路径则为打开APP后操作的路径,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。The specific steps of a computer vision-based mobile application control identification method are as follows: S101: Open the screen reader, its main function is to use Voice describes elements on the screen and frames them with a focus frame. S102: Open the corresponding software, perform an operation on the screen with the robotic arm, and upload the screen shot to the server, and perform image preprocessing. S103: For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds. S104: Extract the pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image. S105: Expand the image in S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame. S106: Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line. S107: Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio. S108: According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner. S109: Construct a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes. S110: If the control to be clicked is known, then traverse the page tree obtained in S109 to find the node corresponding to the control, the path from the parent node to the target node is the path operated after opening the APP, and according to the percentage of the coordinates of the control in the image, obtain With the physical coordinates on the screen, the robotic arm can directly double-click the control until it finds the target control.
具体所述步骤S101中,其中所使用到的屏幕阅读器的具体形式如下:S201:Android的屏幕阅读器为Talkback,iOS的屏幕阅读器为Voiceover。Specifically, in step S101, the specific form of the screen reader used is as follows: S201: the screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
具体所述步骤S102中,其中截图的具体要求为:S301:图像必须为png格式。Specifically, in step S102, the specific requirements for screenshots are: S301: The image must be in png format.
具体所述步骤S102中,其中机械臂的具体操作为:S401:左滑、右滑、双击。Specifically, in step S102, the specific operations of the mechanical arm are: S401: slide left, slide right, double click.
具体所述步骤S102中,其中图像预处理的具体方案为:S501:截取移动端软件图形用户界面占用的屏幕部分。Specifically, in step S102, the specific scheme of image preprocessing is: S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
具体所述步骤S103中,其中获得RGB矩阵范围的具体方法为:S601:确定焦点框对应的几个RBG矩阵,得到初始范围;S602:根据S501得到的初始范围在背景处叠加不同的灰度,得到最终范围。Specifically in the step S103, the specific method for obtaining the range of the RGB matrix is: S601: determine several RBG matrices corresponding to the focus frame, and obtain the initial range; S602: superimpose different grayscales at the background according to the initial range obtained in S501, to get the final range.
具体所述步骤S104中,其中单通道图像的获取方案为:S701:根据S103得到的RGB矩阵范围,在S102得到的图像矩阵中遍历;S702:对于在RGB矩阵范围内的像素,将其对应值设为1;S703:对于在RGB矩阵范围外的像素,将其对应值设为0。Specifically in the step S104, wherein the acquisition scheme of the single-channel image is: S701: according to the RGB matrix range obtained in S103, traverse in the image matrix obtained in S102; S702: for the pixel in the RGB matrix range, its corresponding value Set to 1; S703: For pixels outside the range of the RGB matrix, set its corresponding value to 0.
具体所述步骤S105中,扩充图像的方式如下:S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵。In the specifically described step S105, the mode of expanding the image is as follows: S801: the left and right sides of the image are spliced respectively to a matrix whose pixel value is 0 and the width is 50; A matrix with a height of 0 and a height of 50.
具体所述步骤S105中,边缘检测的方案为:S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。In the specifically described step S105, the scheme of edge detection is: S901: Gaussian denoising is carried out to the image; S902: calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: according to S902 The obtained image edge amplitude and angle are subjected to non-maximum suppression along the gradient direction; S904: performing double-threshold edge connection processing to obtain edges.
具体所述步骤S106中,直线检测的具体方案是:S1001:将S904得到的图像的每个点的 坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。In the specifically described step S106, the specific scheme of straight line detection is: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: calculate the straight line equation corresponding to each coordinate, and the coordinates with common straight line equation are in On a straight line; S1003: Count the pixel values on each straight line; S1004: If the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep this straight line; S1005: If the pixel value on the straight line obtained by S1003 does not exceed A certain threshold, the straight line is not retained.
具体所述步骤S106中,得到直线的直角坐标系方程式的具体方法是:S1101:将极坐标方程式转化为直角坐标方程式。Specifically in the step S106, the specific method for obtaining the rectangular coordinate system equation of the straight line is: S1101: converting the polar coordinate equation into a rectangular coordinate equation.
具体所述步骤S107中,其中筛选焦点框对应的直线的具体方法是:S1201:若相邻两条直线的像素值之差满足某个固定值,则认为是焦点框对应的直线;S1202:若相邻两条直线的像素值之差不满足某个固定值,则认为是干扰直线。Specifically in step S107, the specific method of screening the straight line corresponding to the focus frame is: S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, it is considered to be the straight line corresponding to the focus frame; S1202: If If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, it is considered as an interfering straight line.
具体所述步骤S107中,其中计算控件的中心坐标的具体方法是:S1301:对于垂直的直线,取均值得到控件的横坐标;S1302:对于水平的直线,取均值得到控件的纵坐标。Specifically in step S107, the specific method for calculating the center coordinates of the control is: S1301: for a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: for a horizontal straight line, take the mean value to obtain the ordinate of the control.
具体所述步骤S107中,其中计算控件长宽的具体方法是:S1401:对于垂直的直线,计算最大值和最小值的差值得到控件的宽;S1402:对于水平的直线,计算最大值和最小值的差值得到控件的长。Specifically in step S107, the specific method for calculating the length and width of the control is: S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the maximum value and the minimum value The difference between the values gets the length of the control.
具体所述步骤S107中,其中计算控件中心占屏幕的百分比的方法是:S1501:横坐标除以图像宽,得到x轴的百分比;S1502:纵坐标除以图像长,得到y轴的百分比;Specifically in step S107, the method for calculating the percentage of the control center to the screen is: S1501: divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the length of the image to obtain the percentage of the y-axis;
具体所述步骤S108中,具体确定控件功能的方法是:S1601:用OCR进行文字识别,得到控件对应文字;S1602:若S1601没有检测到文字,进行图像匹配,根据已构建好的数据库确定其功能。In the specifically described step S108, the method for specifically determining the function of the control is: S1601: carry out text recognition with OCR to obtain the corresponding text of the control; S1602: if S1601 does not detect the text, perform image matching, and determine its function according to the database that has been built .
具体所述步骤S109中,构建页面树的具体方法是:S1701:将控件的中心坐标和控件功能作为结合成键值对,作为树的节点;S1702:将一个空节点设为根结点,移动应用首页的所有控件将跟节点视为父节点;S1703:将点击某个控件跳转到页面的所有控件,作为点击控件的子节点,以此类推建立页面树。In the specifically described step S109, the specific method of constructing the page tree is: S1701: combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: set an empty node as the root node, move All controls on the home page of the application regard the root node as a parent node; S1703: click on a certain control to jump to all controls on the page as child nodes of the clicked control, and build a page tree by analogy.
综上,本发明创建了基于计算机视觉的移动端应用非侵入式控件识别算法方法,具有如下有益效果:(1)对每个控件实现功能含义的理解,除实现对控件的定位外,还可知道控件所在的页面层级和功能。(2)可应用于复杂场景,例如具有弹窗、子页面等交互逻辑的页面。(3)具有普适性。适用于不同平台、不同机型的情况。In summary, the present invention creates a computer vision-based non-intrusive control recognition algorithm for mobile terminal applications, which has the following beneficial effects: (1) The understanding of the meaning of the function of each control can be realized in addition to the positioning of the control. Know the page hierarchy and function of the control. (2) It can be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) It is universal. Applicable to different platforms and different models.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一 些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的硬软件交互图;Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention;
图2是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图;Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention;
图3是是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中图像扩充方法示例;Fig. 3 is an example of an image expansion method in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
图4是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中边缘检测的流程图;Fig. 4 is a flow chart of edge detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
图5是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中直线检测的流程图;Fig. 5 is a flow chart of straight line detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
具体实施方法:Specific implementation method:
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
本实例以某个APP为例,方法包括如下具体步骤:This example takes an APP as an example. The method includes the following specific steps:
S101:打开屏幕阅读器。S101: Open the screen reader.
S102:打开该APP,机械臂在屏幕上做一次“右滑”操作后截屏上传,并进行图像的截取。S102: Open the APP, perform a "swipe right" operation on the screen with the robotic arm, and then take a screenshot and upload it, and capture the image.
S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵范围。S103: For the screenshot obtained in S102, determine the RGB matrix range corresponding to the color of the focus frame.
S104:在S102得到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像。S104: Extract pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
S105:扩充S104的单通道图像,根据S104得到的单通道图像进行边缘检测。S105: Expand the single-channel image in S104, and perform edge detection according to the single-channel image obtained in S104.
S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式。S106: Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比。S107: Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能。S108: According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对 为节点的页面树。S109: Construct the coordinates of the control obtained in S107 and the text recognized in S108 to form a corresponding relationship, and construct a page tree with key-value pairs as nodes.
S110:根据想要点击的控件,遍历S109得到的页面树找到控件对应的节点,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。S110: According to the control to be clicked, traverse the page tree obtained in S109 to find the node corresponding to the control, and obtain the physical coordinates on the screen according to the percentage of the coordinates of the control in the image, and the robotic arm can directly double-click the control until the target control is found .
图1是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的硬软件交互图;Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention;
图2是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图;Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention;
图3是是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中图像扩充方法示例:S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵;Fig. 3 is an example of an image expansion method in the overall flowchart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image respectively ; S802: according to the stitched image in S801, splice a matrix with a pixel value of 0 and a height of 50 on its upper and lower sides respectively;
图4是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中边缘检测的流程图:S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。Fig. 4 is the flow chart of edge detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S901: Gaussian denoising is performed on the image; The gradient of the image calculates the image edge amplitude and angle according to the gradient; S903: according to the image edge amplitude and angle obtained in S902, non-maximum suppression is carried out along the gradient direction; S904: double-threshold edge connection processing is performed to obtain the edge.
图5是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中直线检测的流程图:S1001:将S904得到的图像的每个点的坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。Fig. 5 is the flow chart of straight line detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002 : Calculate the straight line equation corresponding to each coordinate, the coordinates with the common straight line equation are on a straight line; S1003: count the pixel values on each straight line; S1004: if the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep This straight line; S1005: If the pixel values on the straight line obtained in S1003 do not exceed a certain threshold, do not keep this straight line.
Claims (17)
- 一种基于计算机视觉的移动端应用控件识别方法,其特征在于包括以下步骤:A computer vision-based mobile terminal application control recognition method is characterized in that it comprises the following steps:S101:打开屏幕阅读器,其主要作用是,通过检查移动应用程序的GUI,以及移动应用为无障碍特性提供的额外信息,用语音描述屏幕上的元素,并用焦点框框出来;S101: Turn on the screen reader, whose main function is to describe the elements on the screen with speech and frame them with a focus frame by checking the GUI of the mobile application and the additional information provided by the mobile application for the accessibility features;S102:打开相应软件,机械臂在屏幕上做一次操作后截屏上传到服务端,并进行图像的预处理;S102: Open the corresponding software, perform an operation on the screen with the robotic arm, and then upload the screen shot to the server, and perform image preprocessing;S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵,并根据不同背景叠加得到RGB的范围;S103: For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds;S104:在S102得到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像;S104: Extracting pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image;S105:扩充S104的图像,根据S104得到的单通道图像进行边缘检测,检测到焦点框的边缘;S105: expand the image of S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame;S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式;S106: Perform line detection according to the edge obtained in S105, and obtain the Cartesian coordinate system equation of the line;S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比;S107: filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert it into a screen ratio;S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能;S108: According to the center coordinates of the control obtained in S107 and the screen ratio of length and width, the rectangular screenshot framed by the focus frame is used to determine its function in a computer vision mode;S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对为节点的页面树;S109: form a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes;S110:若已知想要点击的控件,则遍历S109得到的页面树找到控件对应的节点,父节点到目标节点的路径则为打开APP后操作的路径,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。S110: If the control to be clicked is known, then traverse the page tree obtained in S109 to find the node corresponding to the control, the path from the parent node to the target node is the path operated after opening the APP, and according to the percentage of the coordinates of the control in the image, obtain With the physical coordinates on the screen, the robotic arm can directly double-click the control until it finds the target control.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S101,其中所使用到的屏幕阅读器的具体形式如下:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in: said step S101, wherein the specific form of the screen reader used is as follows:S201:Android的屏幕阅读器为Talkback,iOS的屏幕阅读器为Voiceover。S201: The screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S102中,其中截图的具体要求为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S102, the specific requirements for screenshots are:S301:图像必须为png格式。S301: The image must be in png format.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S102中,其中机械臂的具体操作为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S102, the specific operation of the mechanical arm is:S401:左滑、右滑、双击。S401: slide left, slide right, double click.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于: 所述步骤S102中,其中图像预处理的具体方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: In the step S102, the specific scheme of image preprocessing is:S501:截取移动端软件图形用户界面占用的屏幕部分。S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S103中,其中获得RGB矩阵范围的具体方法为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S103, the specific method for obtaining the range of the RGB matrix is:S601:确定焦点框对应的几个RBG矩阵,得到初始范围;S602:根据S501得到的初始范围在背景处叠加不同的灰度,得到最终范围。S601: Determine several RBG matrices corresponding to the focus frame to obtain an initial range; S602: Superimpose different gray levels on the background according to the initial range obtained in S501 to obtain a final range.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S104中,其中单通道图像的获取方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S104, the single-channel image acquisition scheme is as follows:S701:根据S103得到的RGB矩阵范围,在S102得到的图像矩阵中遍历;S702:对于在RGB矩阵范围内的像素,将其对应值设为1;S703:对于在RGB矩阵范围外的像素,将其对应值设为0。S701: according to the range of the RGB matrix obtained by S103, traverse in the image matrix obtained by S102; S702: for pixels within the range of the RGB matrix, set its corresponding value to 1; S703: for pixels outside the range of the RGB matrix, set Its corresponding value is set to 0.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S105中,扩充图像的方式如下:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S105, the way of expanding the image is as follows:S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵。S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image; S802: splicing a matrix with a pixel value of 0 and a height of 50 on the upper and lower sides of the image spliced according to S801.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S105中,边缘检测的方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S105, the edge detection scheme is:S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。S901: Carry out Gaussian denoising to the image; S902: Calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: Perform along the gradient direction according to the image edge amplitude and angle obtained in S902 Non-maximum value suppression; S904: Perform double-threshold edge connection processing to obtain edges.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S106中,直线检测的具体方案是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S106, the specific solution for straight line detection is:S1001:将S904得到的图像的每个点的坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。S1001: Convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: Calculate the straight line equation corresponding to each coordinate, and the coordinates with the common straight line equation are on a straight line; S1003: Count the pixel values on each straight line ; S1004: If the pixel value on the straight line obtained in S1003 exceeds a certain threshold, keep this straight line; S1005: If the pixel value on the straight line obtained in S1003 does not exceed a certain threshold, then do not keep this straight line.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S106中,得到直线的直角坐标系方程式的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S106, the specific method for obtaining the Cartesian coordinate system equation of the straight line is:S1101:将极坐标方程式转化为直角坐标方程式。S1101: converting the polar coordinate equation into a rectangular coordinate equation.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于: 所述步骤S107中,其中筛选焦点框对应的直线的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: In the step S107, the specific method of screening the straight line corresponding to the focus frame is:S1201:若相邻两条直线的像素值之差满足某个固定值,则认为是焦点框对应的直线;S1202:若相邻两条直线的像素值之差不满足某个固定值,则认为是干扰直线。S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, consider it to be the line corresponding to the focus frame; S1202: If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, consider it to be the line corresponding to the focus frame; is the interference line.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件的中心坐标的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the specific method for calculating the center coordinates of the control is:S1301:对于垂直的直线,取均值得到控件的横坐标;S1302:对于水平的直线,取均值得到控件的纵坐标。S1301: For a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: For a horizontal straight line, take the mean value to obtain the ordinate of the control.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件长宽的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the specific method for calculating the length and width of the control is:S1401:对于垂直的直线,计算最大值和最小值的差值得到控件的宽;S1402:对于水平的直线,计算最大值和最小值的差值得到控件的长。S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the difference between the maximum value and the minimum value to obtain the length of the control.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件中心占屏幕的百分比的方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the method for calculating the percentage of the control center to the screen is:S1501:横坐标除以图像宽,得到x轴的百分比;S1502:纵坐标除以图像长,得到y轴的百分比。S1501: Divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the image length to obtain the percentage of the y-axis.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S108中,具体确定控件功能的方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S108, the specific method for determining the control function is:S1601:用OCR进行文字识别,得到控件对应文字;S1602:若S1601没有检测到文字,进行图像匹配,根据已构建好的数据库确定其功能。S1601: Use OCR to perform text recognition to obtain the text corresponding to the control; S1602: If no text is detected in S1601, perform image matching, and determine its function according to the constructed database.
- 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S109中,构建页面树的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S109, the specific method of constructing the page tree is:S1701:将控件的中心坐标和控件功能作为结合成键值对,作为树的节点;S1702:将一个空节点设为根结点,移动应用首页的所有控件将跟节点视为父节点;S1703:将点击某个控件跳转到页面的所有控件,作为点击控件的子节点,以此类推建立页面树。S1701: Combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: Set an empty node as the root node, and all controls on the home page of the mobile application will regard the root node as the parent node; S1703: Click a control to jump to all controls on the page, as the child nodes of the clicked control, and so on to build a page tree.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110597673.X | 2021-05-31 | ||
CN202110597673.XA CN113434072B (en) | 2021-05-31 | 2021-05-31 | Mobile terminal application control identification method based on computer vision |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022252239A1 true WO2022252239A1 (en) | 2022-12-08 |
Family
ID=77803292
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/098490 WO2022252239A1 (en) | 2021-05-31 | 2021-06-05 | Computer vision-based mobile terminal application control identification method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113434072B (en) |
WO (1) | WO2022252239A1 (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080195958A1 (en) * | 2007-02-09 | 2008-08-14 | Detiege Patrick J | Visual recognition of user interface objects on computer |
US20110047488A1 (en) * | 2009-08-24 | 2011-02-24 | Emma Butin | Display-independent recognition of graphical user interface control |
CN108509342A (en) * | 2018-04-04 | 2018-09-07 | 成都中云天下科技有限公司 | A kind of precisely quick App automated testing methods |
CN110990238A (en) * | 2019-11-13 | 2020-04-10 | 南京航空航天大学 | Non-invasive visual test script automatic recording method based on video shooting |
CN112181255A (en) * | 2020-10-12 | 2021-01-05 | 深圳市欢太科技有限公司 | Control identification method and device, terminal equipment and storage medium |
CN112597065A (en) * | 2021-03-03 | 2021-04-02 | 浙江口碑网络技术有限公司 | Page testing method and device |
CN112657176A (en) * | 2020-12-31 | 2021-04-16 | 华南理工大学 | Binocular projection man-machine interaction method combined with portrait behavior information |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0612128D0 (en) * | 2006-06-19 | 2006-07-26 | British Telecomm | Apparatus & Method for Selecting Menu Items |
CN105045489B (en) * | 2015-08-27 | 2018-05-29 | 广东欧珀移动通信有限公司 | A kind of button control method and device |
CN109922363A (en) * | 2019-03-15 | 2019-06-21 | 青岛海信电器股份有限公司 | A kind of graphical user interface method and display equipment of display screen shot |
-
2021
- 2021-05-31 CN CN202110597673.XA patent/CN113434072B/en active Active
- 2021-06-05 WO PCT/CN2021/098490 patent/WO2022252239A1/en unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080195958A1 (en) * | 2007-02-09 | 2008-08-14 | Detiege Patrick J | Visual recognition of user interface objects on computer |
US20110047488A1 (en) * | 2009-08-24 | 2011-02-24 | Emma Butin | Display-independent recognition of graphical user interface control |
CN108509342A (en) * | 2018-04-04 | 2018-09-07 | 成都中云天下科技有限公司 | A kind of precisely quick App automated testing methods |
CN110990238A (en) * | 2019-11-13 | 2020-04-10 | 南京航空航天大学 | Non-invasive visual test script automatic recording method based on video shooting |
CN112181255A (en) * | 2020-10-12 | 2021-01-05 | 深圳市欢太科技有限公司 | Control identification method and device, terminal equipment and storage medium |
CN112657176A (en) * | 2020-12-31 | 2021-04-16 | 华南理工大学 | Binocular projection man-machine interaction method combined with portrait behavior information |
CN112597065A (en) * | 2021-03-03 | 2021-04-02 | 浙江口碑网络技术有限公司 | Page testing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113434072A (en) | 2021-09-24 |
CN113434072B (en) | 2022-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614934B (en) | Online teaching quality assessment parameter generation method and device | |
CN109684803B (en) | Man-machine verification method based on gesture sliding | |
WO2020063314A1 (en) | Character segmentation identification method and apparatus, electronic device, and storage medium | |
US20220375225A1 (en) | Video Segmentation Method and Apparatus, Device, and Medium | |
CN111833303A (en) | Product detection method and device, electronic equipment and storage medium | |
US10678521B1 (en) | System for image segmentation, transformation and user interface component construction | |
US10635413B1 (en) | System for transforming using interface image segments and constructing user interface objects | |
CN114549993B (en) | Method, system and device for grading line segment image in experiment and readable storage medium | |
US9355333B2 (en) | Pattern recognition based on information integration | |
US20210343042A1 (en) | Audio acquisition device positioning method and apparatus, and speaker recognition method and system | |
WO2021159843A1 (en) | Object recognition method and apparatus, and electronic device and storage medium | |
US20210350173A1 (en) | Method and apparatus for evaluating image relative definition, device and medium | |
CN116052193B (en) | RPA interface dynamic form picking and matching method and system | |
US11948385B2 (en) | Zero-footprint image capture by mobile device | |
US11881224B2 (en) | Multilingual speech recognition and translation method and related system for a conference which determines quantity of attendees according to their distances from their microphones | |
CN108665769B (en) | Network teaching method and device based on convolutional neural network | |
WO2022252239A1 (en) | Computer vision-based mobile terminal application control identification method | |
CN111414889B (en) | Financial statement identification method and device based on character identification | |
Pan et al. | Research on functional test of mobile app based on robot | |
CN112380134A (en) | WebUI automatic testing method based on image recognition | |
CN110633976B (en) | Virtual resource transfer method and device | |
CN110826564A (en) | Small target semantic segmentation method and system in complex scene image | |
WO2022172739A1 (en) | Method and system for checking data gathering conditions associated with image-data during ai based visual-inspection process | |
EP4360078A1 (en) | Sign language and gesture capture and detection | |
Ashaduzzaman et al. | An Automated Testing Framework for Gesture Recognition System using Dynamic Image Pattern Generation with Augmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21943600 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |