WO2022252239A1 - Computer vision-based mobile terminal application control identification method - Google Patents

Computer vision-based mobile terminal application control identification method Download PDF

Info

Publication number
WO2022252239A1
WO2022252239A1 PCT/CN2021/098490 CN2021098490W WO2022252239A1 WO 2022252239 A1 WO2022252239 A1 WO 2022252239A1 CN 2021098490 W CN2021098490 W CN 2021098490W WO 2022252239 A1 WO2022252239 A1 WO 2022252239A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
mobile terminal
computer vision
image
terminal application
Prior art date
Application number
PCT/CN2021/098490
Other languages
French (fr)
Chinese (zh)
Inventor
卜佳俊
张建锋
周晟
刘美含
王炜
于智
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2022252239A1 publication Critical patent/WO2022252239A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the invention relates to a computer vision-based mobile terminal application non-intrusive control recognition algorithm, which belongs to the field of computer technology software.
  • GUI Graphical User Interface
  • control identification methods mostly identify controls based on control properties, which can be mainly divided into three categories: control identification based on coordinates, control identification based on source code, and control identification based on control tree.
  • the present invention proposes a computer vision-based mobile terminal application control recognition method.
  • this method can be adapted to different platforms and devices with different resolutions, and has higher universality;
  • this method is non-intrusive, that is, it does not It is necessary to obtain the source code of the software, which can be used in scenarios such as black-box testing, and has a wider range of applications;
  • the control tree-based control identification method this method is not affected by the page hierarchy and control position, and can flexibly respond to various complex scene.
  • this method can also realize semantic understanding for each control, and can also identify the specific use of each control in addition to judging the position and attributes of the control.
  • S101 Open the screen reader, its main function is to use Voice describes elements on the screen and frames them with a focus frame.
  • S102 Open the corresponding software, perform an operation on the screen with the robotic arm, and upload the screen shot to the server, and perform image preprocessing.
  • S103 For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds.
  • S104 Extract the pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
  • S105 Expand the image in S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame.
  • S106 Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
  • S107 Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
  • S108 According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
  • S109 Construct a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes.
  • step S101 the specific form of the screen reader used is as follows: S201: the screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
  • step S102 the specific requirements for screenshots are: S301: The image must be in png format.
  • step S102 the specific operations of the mechanical arm are: S401: slide left, slide right, double click.
  • step S102 the specific scheme of image preprocessing is: S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
  • the specific method for obtaining the range of the RGB matrix is: S601: determine several RBG matrices corresponding to the focus frame, and obtain the initial range; S602: superimpose different grayscales at the background according to the initial range obtained in S501, to get the final range.
  • step S104 wherein the acquisition scheme of the single-channel image is: S701: according to the RGB matrix range obtained in S103, traverse in the image matrix obtained in S102; S702: for the pixel in the RGB matrix range, its corresponding value Set to 1; S703: For pixels outside the range of the RGB matrix, set its corresponding value to 0.
  • the mode of expanding the image is as follows: S801: the left and right sides of the image are spliced respectively to a matrix whose pixel value is 0 and the width is 50; A matrix with a height of 0 and a height of 50.
  • the scheme of edge detection is: S901: Gaussian denoising is carried out to the image; S902: calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: according to S902 The obtained image edge amplitude and angle are subjected to non-maximum suppression along the gradient direction; S904: performing double-threshold edge connection processing to obtain edges.
  • step S106 the specific scheme of straight line detection is: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: calculate the straight line equation corresponding to each coordinate, and the coordinates with common straight line equation are in On a straight line; S1003: Count the pixel values on each straight line; S1004: If the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep this straight line; S1005: If the pixel value on the straight line obtained by S1003 does not exceed A certain threshold, the straight line is not retained.
  • the specific method for obtaining the rectangular coordinate system equation of the straight line is: S1101: converting the polar coordinate equation into a rectangular coordinate equation.
  • step S107 the specific method of screening the straight line corresponding to the focus frame is: S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, it is considered to be the straight line corresponding to the focus frame; S1202: If If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, it is considered as an interfering straight line.
  • step S107 the specific method for calculating the center coordinates of the control is: S1301: for a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: for a horizontal straight line, take the mean value to obtain the ordinate of the control.
  • step S107 the specific method for calculating the length and width of the control is: S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the maximum value and the minimum value The difference between the values gets the length of the control.
  • step S107 the method for calculating the percentage of the control center to the screen is: S1501: divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the length of the image to obtain the percentage of the y-axis;
  • step S108 the method for specifically determining the function of the control is: S1601: carry out text recognition with OCR to obtain the corresponding text of the control; S1602: if S1601 does not detect the text, perform image matching, and determine its function according to the database that has been built .
  • step S109 the specific method of constructing the page tree is: S1701: combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: set an empty node as the root node, move All controls on the home page of the application regard the root node as a parent node; S1703: click on a certain control to jump to all controls on the page as child nodes of the clicked control, and build a page tree by analogy.
  • the present invention creates a computer vision-based non-intrusive control recognition algorithm for mobile terminal applications, which has the following beneficial effects: (1) The understanding of the meaning of the function of each control can be realized in addition to the positioning of the control. Know the page hierarchy and function of the control. (2) It can be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) It is universal. Applicable to different platforms and different models.
  • Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention
  • Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention
  • Fig. 3 is an example of an image expansion method in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
  • Fig. 4 is a flow chart of edge detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
  • Fig. 5 is a flow chart of straight line detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention
  • This example takes an APP as an example.
  • the method includes the following specific steps:
  • S102 Open the APP, perform a "swipe right” operation on the screen with the robotic arm, and then take a screenshot and upload it, and capture the image.
  • S104 Extract pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
  • S105 Expand the single-channel image in S104, and perform edge detection according to the single-channel image obtained in S104.
  • S106 Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
  • S107 Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
  • S108 According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
  • S109 Construct the coordinates of the control obtained in S107 and the text recognized in S108 to form a corresponding relationship, and construct a page tree with key-value pairs as nodes.
  • Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention
  • Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention
  • Fig. 3 is an example of an image expansion method in the overall flowchart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image respectively ; S802: according to the stitched image in S801, splice a matrix with a pixel value of 0 and a height of 50 on its upper and lower sides respectively;
  • Fig. 4 is the flow chart of edge detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S901: Gaussian denoising is performed on the image; The gradient of the image calculates the image edge amplitude and angle according to the gradient; S903: according to the image edge amplitude and angle obtained in S902, non-maximum suppression is carried out along the gradient direction; S904: double-threshold edge connection processing is performed to obtain the edge.
  • Fig. 5 is the flow chart of straight line detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002 : Calculate the straight line equation corresponding to each coordinate, the coordinates with the common straight line equation are on a straight line; S1003: count the pixel values on each straight line; S1004: if the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep This straight line; S1005: If the pixel values on the straight line obtained in S1003 do not exceed a certain threshold, do not keep this straight line.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a computer vision-based mobile terminal application control identification method. In the method, hardware and software methods are combined, and an accessibility function of a system is used, thus achieving a non-invasive, universal and low-error-rate mobile terminal application control identification method. First, a screen reader and corresponding software are opened, and a screenshot is uploaded to a server once a robotic arm is operated. Next, once the screenshot is preprocessed, color extraction and expansion are performed thereon to obtain a single-channel image. Then, edge detection and line detection are performed on the single-channel image, and center coordinates of a control are obtained after noise is filtered. Finally, the function of the control is distinguished by using a computer vision method. The steps above are repeated to build a page control tree of an app. The method may be applied to complex scenarios, achieves the understanding of the functional meaning of each control and has strong universality, and may be applied to scenarios such as the automated testing of mobile applications, page structure decomposition, and human-computer interaction analysis.

Description

基于计算机视觉的移动端应用控件识别方法Recognition method of mobile application controls based on computer vision 技术领域:Technical field:
本发明涉及一种基于计算机视觉的移动端应用非侵入式控件识别算法,属于计算机技术软件领域。The invention relates to a computer vision-based mobile terminal application non-intrusive control recognition algorithm, which belongs to the field of computer technology software.
背景技术:Background technique:
移动应用的数量随着移动互联网的发展呈现爆炸式增长,软件设计也变得越来越复杂。由此,移动应用的自动化测试、页面结构分解,人机交互分析等需求也随之日益增多,而这些需求都离不开基于图形用户界面(Graphical User Interface,GUI)的控件识别方法,即自动识别出GUI界面中可交互的可视化组件。例如,为了保证移动应用的产品质量,往往需要进行GUI自动化测试,而现用主流的“录制回放”方法需要预先确定GUI界面中控件的数量、位置以及可进行的交互操作。The number of mobile applications has exploded with the development of the mobile Internet, and software design has become more and more complex. As a result, the requirements for automated testing, page structure decomposition, and human-computer interaction analysis of mobile applications are also increasing, and these requirements are inseparable from the control identification method based on the Graphical User Interface (GUI), that is, automatic Identify the interactive visual components in the GUI interface. For example, in order to ensure the product quality of mobile applications, GUI automation testing is often required, while the current mainstream "recording and playback" method needs to pre-determine the number, location and interactive operations of controls in the GUI interface.
目前,常见的控件识别方法大多根据控件属性对控件进行识别,主要可以分为三类:基于坐标的控件识别、基于源代码的控件识别和基于控件树的控件识别。主要存在以下缺陷:(1)无法对每个控件实现功能含义的理解。现有方法只是利用属性对控件进行分类,而无法真正识别每个控件的具体用途。除此之外,当属性值为空或者重复的时候,方法失效。(2)无法应用于复杂场景,例如具有弹窗、子页面等交互逻辑的页面。(3)不具有普适性。由于Android和iOS平台的控件识别逻辑、控件调用逻辑等存在差异,因此控件识别方案不可复用。At present, common control identification methods mostly identify controls based on control properties, which can be mainly divided into three categories: control identification based on coordinates, control identification based on source code, and control identification based on control tree. There are mainly the following defects: (1) It is impossible to realize the understanding of the functional meaning of each control. Existing methods only use attributes to classify controls, but cannot really identify the specific use of each control. In addition, when the attribute value is empty or repeated, the method fails. (2) It cannot be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) Not universal. Due to differences in control identification logic and control call logic between the Android and iOS platforms, the control identification scheme cannot be reused.
发明内容:Invention content:
针对以上问题和难点,本发明提出了一种基于计算机视觉的移动端应用控件识别方法。与基于坐标的控件识别方法相比,该方法可适配不同平台和不同分辨率的设备,普适性更高;与基于源代码的识别方法相比,该方法是非侵入式的,也就是不需要获得软件的源代码,可用于黑盒测试等场景,有更广阔的应用范围;与基于控件树的控件识别方法相比,该方法不受页面层次结构和控件位置影响,可以灵活应对各种复杂场景。除此以外,该方法还可以对每个控件实现语义理解,在判断控件位置和属性外,还可以识别每个控件的具体用途。Aiming at the above problems and difficulties, the present invention proposes a computer vision-based mobile terminal application control recognition method. Compared with the coordinate-based control identification method, this method can be adapted to different platforms and devices with different resolutions, and has higher universality; compared with the source code-based identification method, this method is non-intrusive, that is, it does not It is necessary to obtain the source code of the software, which can be used in scenarios such as black-box testing, and has a wider range of applications; compared with the control tree-based control identification method, this method is not affected by the page hierarchy and control position, and can flexibly respond to various complex scene. In addition, this method can also realize semantic understanding for each control, and can also identify the specific use of each control in addition to judging the position and attributes of the control.
一种基于计算机视觉的移动端应用控件识别方法的具体步骤如下:S101:打开屏幕阅读器,其主要作用是,通过检查移动应用程序的GUI,以及移动应用为无障碍特性提供的额外信息,用语音描述屏幕上的元素,并用焦点框框出来。S102:打开相应软件,机械臂在屏幕上做一次操作后截屏上传到服务端,并进行图像的预处理。S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵,并根据不同背景叠加得到RGB的范围。S104:在S102得 到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像。S105:扩充S104的图像,根据S104得到的单通道图像进行边缘检测,检测到焦点框的边缘。S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式。S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比。S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能。S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对为节点的页面树。S110:若已知想要点击的控件,则遍历S109得到的页面树找到控件对应的节点,父节点到目标节点的路径则为打开APP后操作的路径,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。The specific steps of a computer vision-based mobile application control identification method are as follows: S101: Open the screen reader, its main function is to use Voice describes elements on the screen and frames them with a focus frame. S102: Open the corresponding software, perform an operation on the screen with the robotic arm, and upload the screen shot to the server, and perform image preprocessing. S103: For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds. S104: Extract the pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image. S105: Expand the image in S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame. S106: Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line. S107: Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio. S108: According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner. S109: Construct a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes. S110: If the control to be clicked is known, then traverse the page tree obtained in S109 to find the node corresponding to the control, the path from the parent node to the target node is the path operated after opening the APP, and according to the percentage of the coordinates of the control in the image, obtain With the physical coordinates on the screen, the robotic arm can directly double-click the control until it finds the target control.
具体所述步骤S101中,其中所使用到的屏幕阅读器的具体形式如下:S201:Android的屏幕阅读器为Talkback,iOS的屏幕阅读器为Voiceover。Specifically, in step S101, the specific form of the screen reader used is as follows: S201: the screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
具体所述步骤S102中,其中截图的具体要求为:S301:图像必须为png格式。Specifically, in step S102, the specific requirements for screenshots are: S301: The image must be in png format.
具体所述步骤S102中,其中机械臂的具体操作为:S401:左滑、右滑、双击。Specifically, in step S102, the specific operations of the mechanical arm are: S401: slide left, slide right, double click.
具体所述步骤S102中,其中图像预处理的具体方案为:S501:截取移动端软件图形用户界面占用的屏幕部分。Specifically, in step S102, the specific scheme of image preprocessing is: S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
具体所述步骤S103中,其中获得RGB矩阵范围的具体方法为:S601:确定焦点框对应的几个RBG矩阵,得到初始范围;S602:根据S501得到的初始范围在背景处叠加不同的灰度,得到最终范围。Specifically in the step S103, the specific method for obtaining the range of the RGB matrix is: S601: determine several RBG matrices corresponding to the focus frame, and obtain the initial range; S602: superimpose different grayscales at the background according to the initial range obtained in S501, to get the final range.
具体所述步骤S104中,其中单通道图像的获取方案为:S701:根据S103得到的RGB矩阵范围,在S102得到的图像矩阵中遍历;S702:对于在RGB矩阵范围内的像素,将其对应值设为1;S703:对于在RGB矩阵范围外的像素,将其对应值设为0。Specifically in the step S104, wherein the acquisition scheme of the single-channel image is: S701: according to the RGB matrix range obtained in S103, traverse in the image matrix obtained in S102; S702: for the pixel in the RGB matrix range, its corresponding value Set to 1; S703: For pixels outside the range of the RGB matrix, set its corresponding value to 0.
具体所述步骤S105中,扩充图像的方式如下:S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵。In the specifically described step S105, the mode of expanding the image is as follows: S801: the left and right sides of the image are spliced respectively to a matrix whose pixel value is 0 and the width is 50; A matrix with a height of 0 and a height of 50.
具体所述步骤S105中,边缘检测的方案为:S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。In the specifically described step S105, the scheme of edge detection is: S901: Gaussian denoising is carried out to the image; S902: calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: according to S902 The obtained image edge amplitude and angle are subjected to non-maximum suppression along the gradient direction; S904: performing double-threshold edge connection processing to obtain edges.
具体所述步骤S106中,直线检测的具体方案是:S1001:将S904得到的图像的每个点的 坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。In the specifically described step S106, the specific scheme of straight line detection is: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: calculate the straight line equation corresponding to each coordinate, and the coordinates with common straight line equation are in On a straight line; S1003: Count the pixel values on each straight line; S1004: If the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep this straight line; S1005: If the pixel value on the straight line obtained by S1003 does not exceed A certain threshold, the straight line is not retained.
具体所述步骤S106中,得到直线的直角坐标系方程式的具体方法是:S1101:将极坐标方程式转化为直角坐标方程式。Specifically in the step S106, the specific method for obtaining the rectangular coordinate system equation of the straight line is: S1101: converting the polar coordinate equation into a rectangular coordinate equation.
具体所述步骤S107中,其中筛选焦点框对应的直线的具体方法是:S1201:若相邻两条直线的像素值之差满足某个固定值,则认为是焦点框对应的直线;S1202:若相邻两条直线的像素值之差不满足某个固定值,则认为是干扰直线。Specifically in step S107, the specific method of screening the straight line corresponding to the focus frame is: S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, it is considered to be the straight line corresponding to the focus frame; S1202: If If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, it is considered as an interfering straight line.
具体所述步骤S107中,其中计算控件的中心坐标的具体方法是:S1301:对于垂直的直线,取均值得到控件的横坐标;S1302:对于水平的直线,取均值得到控件的纵坐标。Specifically in step S107, the specific method for calculating the center coordinates of the control is: S1301: for a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: for a horizontal straight line, take the mean value to obtain the ordinate of the control.
具体所述步骤S107中,其中计算控件长宽的具体方法是:S1401:对于垂直的直线,计算最大值和最小值的差值得到控件的宽;S1402:对于水平的直线,计算最大值和最小值的差值得到控件的长。Specifically in step S107, the specific method for calculating the length and width of the control is: S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the maximum value and the minimum value The difference between the values gets the length of the control.
具体所述步骤S107中,其中计算控件中心占屏幕的百分比的方法是:S1501:横坐标除以图像宽,得到x轴的百分比;S1502:纵坐标除以图像长,得到y轴的百分比;Specifically in step S107, the method for calculating the percentage of the control center to the screen is: S1501: divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the length of the image to obtain the percentage of the y-axis;
具体所述步骤S108中,具体确定控件功能的方法是:S1601:用OCR进行文字识别,得到控件对应文字;S1602:若S1601没有检测到文字,进行图像匹配,根据已构建好的数据库确定其功能。In the specifically described step S108, the method for specifically determining the function of the control is: S1601: carry out text recognition with OCR to obtain the corresponding text of the control; S1602: if S1601 does not detect the text, perform image matching, and determine its function according to the database that has been built .
具体所述步骤S109中,构建页面树的具体方法是:S1701:将控件的中心坐标和控件功能作为结合成键值对,作为树的节点;S1702:将一个空节点设为根结点,移动应用首页的所有控件将跟节点视为父节点;S1703:将点击某个控件跳转到页面的所有控件,作为点击控件的子节点,以此类推建立页面树。In the specifically described step S109, the specific method of constructing the page tree is: S1701: combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: set an empty node as the root node, move All controls on the home page of the application regard the root node as a parent node; S1703: click on a certain control to jump to all controls on the page as child nodes of the clicked control, and build a page tree by analogy.
综上,本发明创建了基于计算机视觉的移动端应用非侵入式控件识别算法方法,具有如下有益效果:(1)对每个控件实现功能含义的理解,除实现对控件的定位外,还可知道控件所在的页面层级和功能。(2)可应用于复杂场景,例如具有弹窗、子页面等交互逻辑的页面。(3)具有普适性。适用于不同平台、不同机型的情况。In summary, the present invention creates a computer vision-based non-intrusive control recognition algorithm for mobile terminal applications, which has the following beneficial effects: (1) The understanding of the meaning of the function of each control can be realized in addition to the positioning of the control. Know the page hierarchy and function of the control. (2) It can be applied to complex scenarios, such as pages with interactive logic such as pop-up windows and sub-pages. (3) It is universal. Applicable to different platforms and different models.
附图说明:Description of drawings:
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一 些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的硬软件交互图;Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention;
图2是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图;Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention;
图3是是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中图像扩充方法示例;Fig. 3 is an example of an image expansion method in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
图4是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中边缘检测的流程图;Fig. 4 is a flow chart of edge detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
图5是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中直线检测的流程图;Fig. 5 is a flow chart of straight line detection in the overall flow chart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention;
具体实施方法:Specific implementation method:
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
本实例以某个APP为例,方法包括如下具体步骤:This example takes an APP as an example. The method includes the following specific steps:
S101:打开屏幕阅读器。S101: Open the screen reader.
S102:打开该APP,机械臂在屏幕上做一次“右滑”操作后截屏上传,并进行图像的截取。S102: Open the APP, perform a "swipe right" operation on the screen with the robotic arm, and then take a screenshot and upload it, and capture the image.
S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵范围。S103: For the screenshot obtained in S102, determine the RGB matrix range corresponding to the color of the focus frame.
S104:在S102得到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像。S104: Extract pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image.
S105:扩充S104的单通道图像,根据S104得到的单通道图像进行边缘检测。S105: Expand the single-channel image in S104, and perform edge detection according to the single-channel image obtained in S104.
S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式。S106: Perform line detection according to the edge obtained in S105, and obtain a Cartesian coordinate system equation of the line.
S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比。S107: Filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert them into screen ratio.
S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能。S108: According to the central coordinates of the control obtained in S107 and the ratio of the length and width of the screen, take a screenshot of the rectangle framed by the focus frame, and determine its function in a computer vision manner.
S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对 为节点的页面树。S109: Construct the coordinates of the control obtained in S107 and the text recognized in S108 to form a corresponding relationship, and construct a page tree with key-value pairs as nodes.
S110:根据想要点击的控件,遍历S109得到的页面树找到控件对应的节点,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。S110: According to the control to be clicked, traverse the page tree obtained in S109 to find the node corresponding to the control, and obtain the physical coordinates on the screen according to the percentage of the coordinates of the control in the image, and the robotic arm can directly double-click the control until the target control is found .
图1是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的硬软件交互图;Fig. 1 is the hardware-software interaction diagram of the mobile terminal application non-intrusive control recognition algorithm based on computer vision provided by the present invention;
图2是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图;Fig. 2 is the overall flow chart of the computer vision-based non-intrusive control recognition algorithm for mobile terminal applications provided by the present invention;
图3是是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中图像扩充方法示例:S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵;Fig. 3 is an example of an image expansion method in the overall flowchart of the computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image respectively ; S802: according to the stitched image in S801, splice a matrix with a pixel value of 0 and a height of 50 on its upper and lower sides respectively;
图4是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中边缘检测的流程图:S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。Fig. 4 is the flow chart of edge detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S901: Gaussian denoising is performed on the image; The gradient of the image calculates the image edge amplitude and angle according to the gradient; S903: according to the image edge amplitude and angle obtained in S902, non-maximum suppression is carried out along the gradient direction; S904: double-threshold edge connection processing is performed to obtain the edge.
图5是本发明提供的基于计算机视觉的移动端应用非侵入式控件识别算法的总体流程图中直线检测的流程图:S1001:将S904得到的图像的每个点的坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。Fig. 5 is the flow chart of straight line detection in the overall flow chart of computer vision-based mobile terminal application non-intrusive control recognition algorithm provided by the present invention: S1001: convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002 : Calculate the straight line equation corresponding to each coordinate, the coordinates with the common straight line equation are on a straight line; S1003: count the pixel values on each straight line; S1004: if the pixel value on the straight line obtained by S1003 exceeds a certain threshold, then keep This straight line; S1005: If the pixel values on the straight line obtained in S1003 do not exceed a certain threshold, do not keep this straight line.

Claims (17)

  1. 一种基于计算机视觉的移动端应用控件识别方法,其特征在于包括以下步骤:A computer vision-based mobile terminal application control recognition method is characterized in that it comprises the following steps:
    S101:打开屏幕阅读器,其主要作用是,通过检查移动应用程序的GUI,以及移动应用为无障碍特性提供的额外信息,用语音描述屏幕上的元素,并用焦点框框出来;S101: Turn on the screen reader, whose main function is to describe the elements on the screen with speech and frame them with a focus frame by checking the GUI of the mobile application and the additional information provided by the mobile application for the accessibility features;
    S102:打开相应软件,机械臂在屏幕上做一次操作后截屏上传到服务端,并进行图像的预处理;S102: Open the corresponding software, perform an operation on the screen with the robotic arm, and then upload the screen shot to the server, and perform image preprocessing;
    S103:对S102中获得的截图,确定焦点框颜色对应的RGB矩阵,并根据不同背景叠加得到RGB的范围;S103: For the screenshot obtained in S102, determine the RGB matrix corresponding to the color of the focus frame, and superimpose the range of RGB according to different backgrounds;
    S104:在S102得到的截图中提取S103得到的RGB范围的像素,得到一个单通道图像;S104: Extracting pixels in the RGB range obtained in S103 from the screenshot obtained in S102 to obtain a single-channel image;
    S105:扩充S104的图像,根据S104得到的单通道图像进行边缘检测,检测到焦点框的边缘;S105: expand the image of S104, perform edge detection according to the single-channel image obtained in S104, and detect the edge of the focus frame;
    S106:根据S105得到的边缘进行直线检测,并得到直线的直角坐标系方程式;S106: Perform line detection according to the edge obtained in S105, and obtain the Cartesian coordinate system equation of the line;
    S107:筛选S106得到的方程式,得到焦点框对应的直线,计算控件的中心坐标和长宽,并将其转换成屏幕占比;S107: filter the equation obtained in S106 to obtain a straight line corresponding to the focus frame, calculate the center coordinates and length and width of the control, and convert it into a screen ratio;
    S108:根据S107得到的控件的中心坐标和长宽的屏幕占比,将焦点框框出的矩形截图,用计算机视觉的方式确定其功能;S108: According to the center coordinates of the control obtained in S107 and the screen ratio of length and width, the rectangular screenshot framed by the focus frame is used to determine its function in a computer vision mode;
    S109:将S107得到的控件的坐标和S108识别出来的文字构成对应关系,构建以键值对为节点的页面树;S109: form a corresponding relationship between the coordinates of the control obtained in S107 and the text recognized in S108, and construct a page tree with key-value pairs as nodes;
    S110:若已知想要点击的控件,则遍历S109得到的页面树找到控件对应的节点,父节点到目标节点的路径则为打开APP后操作的路径,根据控件的坐标占图像的百分比,得到屏幕上的物理坐标,机械臂可以直接双击点击控件,一直找到目标控件为止。S110: If the control to be clicked is known, then traverse the page tree obtained in S109 to find the node corresponding to the control, the path from the parent node to the target node is the path operated after opening the APP, and according to the percentage of the coordinates of the control in the image, obtain With the physical coordinates on the screen, the robotic arm can directly double-click the control until it finds the target control.
  2. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S101,其中所使用到的屏幕阅读器的具体形式如下:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in: said step S101, wherein the specific form of the screen reader used is as follows:
    S201:Android的屏幕阅读器为Talkback,iOS的屏幕阅读器为Voiceover。S201: The screen reader for Android is Talkback, and the screen reader for iOS is Voiceover.
  3. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S102中,其中截图的具体要求为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S102, the specific requirements for screenshots are:
    S301:图像必须为png格式。S301: The image must be in png format.
  4. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S102中,其中机械臂的具体操作为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S102, the specific operation of the mechanical arm is:
    S401:左滑、右滑、双击。S401: slide left, slide right, double click.
  5. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于: 所述步骤S102中,其中图像预处理的具体方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: In the step S102, the specific scheme of image preprocessing is:
    S501:截取移动端软件图形用户界面占用的屏幕部分。S501: Intercepting the screen portion occupied by the graphical user interface of the mobile terminal software.
  6. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S103中,其中获得RGB矩阵范围的具体方法为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S103, the specific method for obtaining the range of the RGB matrix is:
    S601:确定焦点框对应的几个RBG矩阵,得到初始范围;S602:根据S501得到的初始范围在背景处叠加不同的灰度,得到最终范围。S601: Determine several RBG matrices corresponding to the focus frame to obtain an initial range; S602: Superimpose different gray levels on the background according to the initial range obtained in S501 to obtain a final range.
  7. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S104中,其中单通道图像的获取方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S104, the single-channel image acquisition scheme is as follows:
    S701:根据S103得到的RGB矩阵范围,在S102得到的图像矩阵中遍历;S702:对于在RGB矩阵范围内的像素,将其对应值设为1;S703:对于在RGB矩阵范围外的像素,将其对应值设为0。S701: according to the range of the RGB matrix obtained by S103, traverse in the image matrix obtained by S102; S702: for pixels within the range of the RGB matrix, set its corresponding value to 1; S703: for pixels outside the range of the RGB matrix, set Its corresponding value is set to 0.
  8. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S105中,扩充图像的方式如下:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S105, the way of expanding the image is as follows:
    S801:在图像的左右两侧分别拼接像素值为0宽为50的矩阵;S802:根据S801拼接的图像,在其上下两侧分别拼接像素值为0高为50的矩阵。S801: splicing a matrix with a pixel value of 0 and a width of 50 on the left and right sides of the image; S802: splicing a matrix with a pixel value of 0 and a height of 50 on the upper and lower sides of the image spliced according to S801.
  9. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S105中,边缘检测的方案为:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S105, the edge detection scheme is:
    S901:对图像进行高斯去噪;S902:计算S901得到的去噪后的图像的梯度,根据梯度计算图像边缘幅值与角度;S903:根据S902获得的图像边缘幅值与角度,沿梯度方向进行非极大值抑制;S904:进行双阈值边缘连接处理,得到边缘。S901: Carry out Gaussian denoising to the image; S902: Calculate the gradient of the denoised image obtained in S901, and calculate the image edge amplitude and angle according to the gradient; S903: Perform along the gradient direction according to the image edge amplitude and angle obtained in S902 Non-maximum value suppression; S904: Perform double-threshold edge connection processing to obtain edges.
  10. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S106中,直线检测的具体方案是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S106, the specific solution for straight line detection is:
    S1001:将S904得到的图像的每个点的坐标转换成极坐标;S1002:计算每个坐标对应的直线方程,具有共同直线方程的坐标在一条直线上;S1003:统计每条直线上的像素值;S1004:若S1003得到的直线上的像素值超过某个阈值,则保留这条直线;S1005:若S1003得到的直线上的像素值不超过某个阈值,则不保留这条直线。S1001: Convert the coordinates of each point of the image obtained in S904 into polar coordinates; S1002: Calculate the straight line equation corresponding to each coordinate, and the coordinates with the common straight line equation are on a straight line; S1003: Count the pixel values on each straight line ; S1004: If the pixel value on the straight line obtained in S1003 exceeds a certain threshold, keep this straight line; S1005: If the pixel value on the straight line obtained in S1003 does not exceed a certain threshold, then do not keep this straight line.
  11. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S106中,得到直线的直角坐标系方程式的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S106, the specific method for obtaining the Cartesian coordinate system equation of the straight line is:
    S1101:将极坐标方程式转化为直角坐标方程式。S1101: converting the polar coordinate equation into a rectangular coordinate equation.
  12. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于: 所述步骤S107中,其中筛选焦点框对应的直线的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: In the step S107, the specific method of screening the straight line corresponding to the focus frame is:
    S1201:若相邻两条直线的像素值之差满足某个固定值,则认为是焦点框对应的直线;S1202:若相邻两条直线的像素值之差不满足某个固定值,则认为是干扰直线。S1201: If the difference between the pixel values of two adjacent straight lines satisfies a certain fixed value, consider it to be the line corresponding to the focus frame; S1202: If the difference between the pixel values of two adjacent straight lines does not satisfy a certain fixed value, consider it to be the line corresponding to the focus frame; is the interference line.
  13. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件的中心坐标的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the specific method for calculating the center coordinates of the control is:
    S1301:对于垂直的直线,取均值得到控件的横坐标;S1302:对于水平的直线,取均值得到控件的纵坐标。S1301: For a vertical straight line, take the mean value to obtain the abscissa of the control; S1302: For a horizontal straight line, take the mean value to obtain the ordinate of the control.
  14. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件长宽的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the specific method for calculating the length and width of the control is:
    S1401:对于垂直的直线,计算最大值和最小值的差值得到控件的宽;S1402:对于水平的直线,计算最大值和最小值的差值得到控件的长。S1401: For a vertical straight line, calculate the difference between the maximum value and the minimum value to obtain the width of the control; S1402: For a horizontal straight line, calculate the difference between the maximum value and the minimum value to obtain the length of the control.
  15. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S107中,其中计算控件中心占屏幕的百分比的方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S107, the method for calculating the percentage of the control center to the screen is:
    S1501:横坐标除以图像宽,得到x轴的百分比;S1502:纵坐标除以图像长,得到y轴的百分比。S1501: Divide the abscissa by the image width to obtain the percentage of the x-axis; S1502: divide the ordinate by the image length to obtain the percentage of the y-axis.
  16. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S108中,具体确定控件功能的方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S108, the specific method for determining the control function is:
    S1601:用OCR进行文字识别,得到控件对应文字;S1602:若S1601没有检测到文字,进行图像匹配,根据已构建好的数据库确定其功能。S1601: Use OCR to perform text recognition to obtain the text corresponding to the control; S1602: If no text is detected in S1601, perform image matching, and determine its function according to the constructed database.
  17. 根据权利1所述的一种基于计算机视觉的移动端应用控件识别方法,其特征在于:所述步骤S109中,构建页面树的具体方法是:A computer vision-based mobile terminal application control recognition method according to claim 1, characterized in that: in the step S109, the specific method of constructing the page tree is:
    S1701:将控件的中心坐标和控件功能作为结合成键值对,作为树的节点;S1702:将一个空节点设为根结点,移动应用首页的所有控件将跟节点视为父节点;S1703:将点击某个控件跳转到页面的所有控件,作为点击控件的子节点,以此类推建立页面树。S1701: Combine the central coordinates of the control and the control function into a key-value pair as a node of the tree; S1702: Set an empty node as the root node, and all controls on the home page of the mobile application will regard the root node as the parent node; S1703: Click a control to jump to all controls on the page, as the child nodes of the clicked control, and so on to build a page tree.
PCT/CN2021/098490 2021-05-31 2021-06-05 Computer vision-based mobile terminal application control identification method WO2022252239A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110597673.X 2021-05-31
CN202110597673.XA CN113434072B (en) 2021-05-31 2021-05-31 Mobile terminal application control identification method based on computer vision

Publications (1)

Publication Number Publication Date
WO2022252239A1 true WO2022252239A1 (en) 2022-12-08

Family

ID=77803292

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098490 WO2022252239A1 (en) 2021-05-31 2021-06-05 Computer vision-based mobile terminal application control identification method

Country Status (2)

Country Link
CN (1) CN113434072B (en)
WO (1) WO2022252239A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080195958A1 (en) * 2007-02-09 2008-08-14 Detiege Patrick J Visual recognition of user interface objects on computer
US20110047488A1 (en) * 2009-08-24 2011-02-24 Emma Butin Display-independent recognition of graphical user interface control
CN108509342A (en) * 2018-04-04 2018-09-07 成都中云天下科技有限公司 A kind of precisely quick App automated testing methods
CN110990238A (en) * 2019-11-13 2020-04-10 南京航空航天大学 Non-invasive visual test script automatic recording method based on video shooting
CN112181255A (en) * 2020-10-12 2021-01-05 深圳市欢太科技有限公司 Control identification method and device, terminal equipment and storage medium
CN112597065A (en) * 2021-03-03 2021-04-02 浙江口碑网络技术有限公司 Page testing method and device
CN112657176A (en) * 2020-12-31 2021-04-16 华南理工大学 Binocular projection man-machine interaction method combined with portrait behavior information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0612128D0 (en) * 2006-06-19 2006-07-26 British Telecomm Apparatus & Method for Selecting Menu Items
CN105045489B (en) * 2015-08-27 2018-05-29 广东欧珀移动通信有限公司 A kind of button control method and device
CN109922363A (en) * 2019-03-15 2019-06-21 青岛海信电器股份有限公司 A kind of graphical user interface method and display equipment of display screen shot

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080195958A1 (en) * 2007-02-09 2008-08-14 Detiege Patrick J Visual recognition of user interface objects on computer
US20110047488A1 (en) * 2009-08-24 2011-02-24 Emma Butin Display-independent recognition of graphical user interface control
CN108509342A (en) * 2018-04-04 2018-09-07 成都中云天下科技有限公司 A kind of precisely quick App automated testing methods
CN110990238A (en) * 2019-11-13 2020-04-10 南京航空航天大学 Non-invasive visual test script automatic recording method based on video shooting
CN112181255A (en) * 2020-10-12 2021-01-05 深圳市欢太科技有限公司 Control identification method and device, terminal equipment and storage medium
CN112657176A (en) * 2020-12-31 2021-04-16 华南理工大学 Binocular projection man-machine interaction method combined with portrait behavior information
CN112597065A (en) * 2021-03-03 2021-04-02 浙江口碑网络技术有限公司 Page testing method and device

Also Published As

Publication number Publication date
CN113434072A (en) 2021-09-24
CN113434072B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN109614934B (en) Online teaching quality assessment parameter generation method and device
CN109684803B (en) Man-machine verification method based on gesture sliding
WO2020063314A1 (en) Character segmentation identification method and apparatus, electronic device, and storage medium
US20220375225A1 (en) Video Segmentation Method and Apparatus, Device, and Medium
CN111833303A (en) Product detection method and device, electronic equipment and storage medium
US10678521B1 (en) System for image segmentation, transformation and user interface component construction
US10635413B1 (en) System for transforming using interface image segments and constructing user interface objects
CN114549993B (en) Method, system and device for grading line segment image in experiment and readable storage medium
US9355333B2 (en) Pattern recognition based on information integration
US20210343042A1 (en) Audio acquisition device positioning method and apparatus, and speaker recognition method and system
WO2021159843A1 (en) Object recognition method and apparatus, and electronic device and storage medium
US20210350173A1 (en) Method and apparatus for evaluating image relative definition, device and medium
CN116052193B (en) RPA interface dynamic form picking and matching method and system
US11948385B2 (en) Zero-footprint image capture by mobile device
US11881224B2 (en) Multilingual speech recognition and translation method and related system for a conference which determines quantity of attendees according to their distances from their microphones
CN108665769B (en) Network teaching method and device based on convolutional neural network
WO2022252239A1 (en) Computer vision-based mobile terminal application control identification method
CN111414889B (en) Financial statement identification method and device based on character identification
Pan et al. Research on functional test of mobile app based on robot
CN112380134A (en) WebUI automatic testing method based on image recognition
CN110633976B (en) Virtual resource transfer method and device
CN110826564A (en) Small target semantic segmentation method and system in complex scene image
WO2022172739A1 (en) Method and system for checking data gathering conditions associated with image-data during ai based visual-inspection process
EP4360078A1 (en) Sign language and gesture capture and detection
Ashaduzzaman et al. An Automated Testing Framework for Gesture Recognition System using Dynamic Image Pattern Generation with Augmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE