WO2024067039A1

WO2024067039A1 - Application automated testing method, electronic device, and computer readable medium

Info

Publication number: WO2024067039A1
Application number: PCT/CN2023/117993
Authority: WO
Inventors: 杨治国; 黄来锋
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-09-27
Filing date: 2023-09-11
Publication date: 2024-04-04
Also published as: CN117827623A

Abstract

The present disclosure provides an application automated testing method, comprising: acquiring an interface image of an application, and inputting the interface image into a target detection model to identify a control in the interface image and obtain control information; determining a traversal path according to the control information and the interface image by using a deep reinforcement learning model, and traversing a service of the application; and acquiring traffic information generated by the application in the traversal process. The present disclosure also provides an electronic device and a computer readable medium.

Description

Application program dialing method, electronic device, and computer-readable medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese patent application CN202211181563.6, entitled “Dialing method for application, electronic device, and computer-readable medium,” filed on September 27, 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates to the field of computer technology, and in particular to a dialing method for an application program, an electronic device, and a computer-readable medium.

Background technique

With the vigorous development of mobile Internet in recent years, the total amount and types of traffic in operators' pipelines have shown explosive growth, posing great challenges to operators' network security and network operation and maintenance. For operators, sensing and identifying the traffic flowing through the pipeline is the basis for ensuring pipeline security and performing pipeline maintenance.

Traffic identification technology used to sense and identify traffic is usually highly dependent on traffic samples, and the technical means of obtaining traffic samples is a byproduct of application (APP) automated testing. APP automated testing focuses on whether the front-end interface or back-end logic of the APP meets the needs of developers during the test process; for network security and operation and maintenance, it is more important to obtain the traffic generated in every link of the APP use process as detailed as possible for subsequent analysis.

In order to meet the needs of network security and operation and maintenance, the performance of traffic sample acquisition needs to be further improved.

Summary of the invention

The embodiments of the present disclosure provide a method for dialing an application program, an electronic device, and a computer-readable medium.

The present disclosure provides a method for testing an application program, comprising: obtaining an interface image of the application program and inputting the interface image into a target detection model to detect the interface image of the application program. The control is identified and the control information is obtained; a traversal path is determined using a deep reinforcement learning model according to the control information and the interface image, and the business of the application is traversed; and traffic information generated by the application during the traversal process is obtained.

An embodiment of the present disclosure also provides an electronic device, including: one or more processors; a memory, on which one or more programs are stored, and when the one or more programs are executed by the one or more processors, the one or more processors implement the dialing method of the application program of the embodiment of the present disclosure.

The embodiment of the present disclosure further provides a computer-readable medium on which a computer program is stored. When the program is executed by a processor, the processor implements the dialing method of the application program of the embodiment of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a flow chart of a dialing method according to an embodiment of the present disclosure;

FIG2 is another flow chart of the dialing method according to an embodiment of the present disclosure;

FIG3 is another flow chart of the dialing method according to an embodiment of the present disclosure;

FIG4 is a block diagram of an electronic device according to an embodiment of the present disclosure;

FIG5 is a block diagram of a computer-readable medium according to an embodiment of the present disclosure;

FIG6 is a schematic diagram of a conventional automatic dialing system according to an embodiment of the present disclosure;

FIG7 is a schematic diagram of a training automatic dialing system according to an embodiment of the present disclosure;

FIG8 is a schematic diagram of an end-to-end application traffic production system according to an embodiment of the present disclosure;

FIG9 is a schematic diagram of a training interface classification model according to an embodiment of the present disclosure;

FIG10 is a schematic diagram of a target detection model according to an embodiment of the present disclosure;

FIG11 is a schematic diagram of an action prediction network of a deep reinforcement learning model according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a deep reinforcement learning model according to an embodiment of the present disclosure.

Detailed ways

In order to enable those skilled in the art to better understand the technical solution of the present disclosure, the following is a detailed description of the application program dialing method, electronic device, and computer readable medium provided by the present disclosure in conjunction with the accompanying drawings. The medium is described in detail.

Example embodiments will be described more fully below with reference to the accompanying drawings, but the example embodiments may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. On the contrary, the purpose of providing these embodiments is to make the present disclosure thorough and complete and to enable those skilled in the art to fully understand the scope of the present disclosure.

In the absence of conflict, the various embodiments of the present disclosure and the various features therein may be combined with each other.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terms used herein are only used to describe specific embodiments and are not intended to limit the present disclosure. As used herein, the singular forms "a", "an" and "the" are also intended to include the plural forms, unless the context clearly indicates otherwise. It will also be understood that when the terms "comprising" and/or "made of" are used in this specification, the presence of the features, wholes, steps, operations, elements and/or components is specified, but the presence or addition of one or more other features, wholes, steps, operations, elements, components and/or groups thereof is not excluded.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted as having an idealized or overly formal meaning unless explicitly defined as such herein.

FIG. 1 is a flow chart of a dialing method according to an embodiment of the present disclosure.

1 , the application program dialing method according to an embodiment of the present disclosure includes the following steps S1 to S3 .

In step S1, an interface image of an application is acquired and input into a target detection model to identify controls in the interface image and obtain control information.

In step S2, a traversal path is determined using a deep reinforcement learning model according to the control information and the interface image, and the business of the application is traversed.

In step S3, traffic information generated by the application during the traversal process is obtained.

In the embodiment of the present disclosure, an interface image of any user interface (UI) of the application can be obtained, or an interface image of any user interface (UI) of the application can be obtained each time the application switches to a new user interface. The present disclosure embodiment does not specifically limit this.

In the embodiment of the present disclosure, each user interface may have one or more controls, which are not specifically limited in the embodiment of the present disclosure.

It should be noted that in the disclosed embodiment, the target detection model identifies controls in the interface image based on the target detection algorithm. Different from image recognition and image matching algorithms that rely on detailed prior knowledge of controls to identify controls, and different from algorithms that identify controls based on interface description information (e.g., XML description information) of the user interface, identifying controls in interface images based on the target detection algorithm can have better compatibility with a variety of applications. The disclosed embodiment does not specifically limit the target detection model. For example, the target detection model is a Yolo model or a Yolo-V4 model.

In the disclosed embodiments, traversing the services of an application refers to traversing the various service logics of the application by simulating the process of a person using the application. In some embodiments, by utilizing the traversal path determined by the deep reinforcement learning model, it is possible to traverse as many services of the application as possible, while being able to cover as comprehensively as possible the services that will generate a large amount of traffic, thereby obtaining as much traffic information as possible. The disclosed embodiments do not specifically limit the deep reinforcement learning model. For example, the deep reinforcement learning model is a deep deterministic policy gradient (DDPG, Deep Deterministic Policy Gradient) reinforcement learning model based on the AC (Acotr-Critic) architecture.

In the disclosed embodiments, the traffic information obtained during the traversal process can provide offline analysis training data for upper-layer applications such as network security and network operation and maintenance.

In the application dialing method provided by the embodiment of the present disclosure, the controls in the user interface of the application are identified based on the target detection model, and the services of the application are traversed based on the deep reinforcement learning model. Therefore, it does not rely on detailed prior knowledge of the controls or the interface description information of the user interface, and can also be compatible with a variety of applications, thus achieving highly automated and highly versatile application dialing. In addition, the coverage of the application's services is more comprehensive, and the services that will generate a large amount of traffic can be fully traversed, so that richer and more accurate traffic information can be obtained, which is conducive to providing comprehensive and detailed offline analysis training data for upper-level applications such as network security and network operation and maintenance.

In some scenarios, the application includes a user interface that relies on human-computer logical interaction. The user needs to participate in completing the human-computer interaction process in the user interface before entering the new user interface through the user interface. For example, the login interface and the registration interface are both user interfaces that rely on human-computer logical interaction. In some embodiments, the category of the user interface can be identified from the perspective of human-computer interaction, and the corresponding interactive operation is performed according to the category of the user interface.

FIG. 2 is another flow chart of the dialing method according to an embodiment of the present disclosure.

In some embodiments, referring to FIG. 2 , the dialing method of the application according to the embodiment of the present disclosure further includes steps S41 to S42 .

In step S41, the interface image is input into the interface classification model, and the category of the user interface corresponding to the interface image is calibrated.

In step S42, an interactive operation is performed in the user interface according to the category of the user interface.

In some embodiments, the categories of user interfaces include: user interfaces that require human-computer interaction and user interfaces that do not require human-computer interaction. In some embodiments, the categories of user interfaces include: login interface, registration interface, and other interfaces, wherein the login interface and registration interface are user interfaces that require human-computer interaction, and other interfaces are user interfaces that do not require human-computer interaction. The embodiments of the present disclosure do not specifically limit this.

It should be noted that in the embodiments of the present disclosure, the categories of user interfaces are identified from the perspective of human-computer interaction, and corresponding interactive operations are performed according to the categories of user interfaces, so that interfaces that require human-computer interaction can be bypassed, and the application or new user interface can be entered to complete a deep traversal of the entire application. For example, the embodiments of the present disclosure can identify the login interface (or registration interface) and automatically perform interactive operations for login (or registration), avoiding the inability to enter the application or only being able to use some of the functions of the application due to failure to log in (or register), thereby being able to fully traverse all the services of the application.

The disclosed embodiments do not specifically limit the interface classification model. In some embodiments, the interface classification model is a convolutional neural network (CNN) model. It should be noted that the CNN-based interface classification model does not rely on the interface description information of the user interface and has higher universality.

In some embodiments, when the user interface is a login interface, performing an interactive operation in the user interface according to the type of the user interface includes: determining whether there is login information for the user interface; if there is login information, performing a login operation in the user interface using the login information; if there is no login information, locating the user according to the interface image; Click the registration jump control in the interface and jump to the registration interface.

In some embodiments, if there is no login information, a mobile phone number or social account is used to verify the login. If the login cannot be verified, optical character recognition (OCR) is used to locate the registration jump control through text and jump to the registration interface.

In some embodiments, when the user interface is a registration interface, performing interactive operations in the user interface according to the category of the user interface includes: locating the registration information control in the user interface according to the interface image; filling the registration information in the registration information control according to a preconfigured filling template; and if the registration is successful, storing the registration information as the login information of the user interface.

In some embodiments, the user interface is scanned and recognized by OCR, and the registration information control in the user interface is located.

In some embodiments, the deep reinforcement learning model is a deep reinforcement learning model based on an AC architecture.

Accordingly, in some embodiments, determining a traversal path using a deep reinforcement learning model based on control information and an interface image, and traversing the application's business (i.e., step S2) includes: inputting the interface image and control information into an action prediction network of the deep reinforcement learning model to determine a target control and a target action, wherein the target control is one of the controls in the interface image; inputting the interface image, control information, and target action into an action evaluation network of the deep reinforcement learning model to evaluate the target action; and calculating the error using a reward function to optimize the action prediction network.

The present disclosed embodiment does not impose any special limitation on the reward function.

In some embodiments, the reward function is expressed as follows:
Reward＝∑α*State _{change_percent} +β*Traffi _cinc +λ*|depth _change | (1)

Among them, α, β, and λ are weight factors, State _{change_percent} is the percentage of interface pixel change, Traffic _inc indicates the increase in traffic, and depth _change is the change in traversal depth.

It should be noted that in the embodiment of the present disclosure, the reward function based on formula (1) can enable the deep reinforcement learning model to traverse as many services of the application as possible, and at the same time can cover the services that will generate a large amount of traffic as comprehensively as possible, thereby obtaining as much traffic information as possible.

The present disclosure embodiment does not make any changes to the architecture of the deep reinforcement learning model based on the AC architecture. Special limitations. In some embodiments, the deep reinforcement learning model based on the AC architecture includes a convolutional neural sub-network and a long short-term memory (LSTM) sub-network.

Accordingly, in some embodiments, inputting the interface image and the control information into the action prediction network of the deep reinforcement learning model to determine the target control and the target action includes: inputting the interface image into the convolutional neural subnetwork in the action prediction network to extract features of the interface image and obtain a user interface vector; inputting the control information into the long short-term memory subnetwork in the action prediction network to extract features of the control information and obtain a control information vector; and merging features of the user interface vector and the control information vector to obtain the target control and the target action.

The disclosed embodiment does not specifically limit how to obtain the traffic information generated by the application during the traversal process.

In some embodiments, obtaining the traffic information generated by the application during the traversal process (i.e., step S3) includes: capturing packets of the application during the traversal process to obtain a packet capture message sequence; inserting action information, control location information, and control screenshots during the traversal process into the packet capture message sequence in the form of a piling message to obtain traffic information.

In some embodiments, the piling message carries identification information, for example, the media access control (MAC) address of the piling message is set to all 0s to identify the piling message.

It should be noted that in the embodiment of the present disclosure, when dialing and capturing packets, the action information, control location information, and control screenshots during the traversal process are inserted into the packet capture message sequence in the form of piling messages, so as to facilitate the acquisition of traffic information generated by specific segmented services during upper-level application analysis such as network security and network operation and maintenance.

FIG. 3 is another flow chart of the dialing method according to an embodiment of the present disclosure.

In some embodiments, referring to FIG. 3 , the dialing method of the application according to the embodiment of the present disclosure further includes steps S51 to S52 .

In step S51, a conventional automatic dialing test system is used to perform dialing test on at least one application to be tested based on the interface description information of the user interface to obtain a training sample set.

In step S52, the initial target detection model and the initial deep reinforcement learning model are trained using the training sample set to obtain a target detection model and a deep reinforcement learning model.

In some embodiments, through step S51 to step S52, it is equivalent to using a teacher-student combination framework to train the initial target detection model and the initial deep reinforcement learning model. The traditional automatic dialing system based on the interface description information of the user interface is the teacher end, and the initial target detection model and the initial deep reinforcement learning model are the student end. The teacher end performs dialing on at least one application to be tested based on the interface description information of the user interface, and the student end receives the training sample set including the current user interface, action information, control position information, next user interface, etc. of the application to be tested obtained during the dialing process of the teacher end, and trains to obtain the target detection model, the deep reinforcement learning model, and the interface classification model. An evaluation function is designed in the initial deep reinforcement learning model to evaluate the execution results of the teacher end, so that the dialing test should not only traverse the user interface as much as possible, but also take into account the acquisition of as much traffic information as possible. The teacher end will collect the data of manual operation and correct the student end so that the operation of the student end is more in line with the user's behavior of using the application. The initial target detection model and the initial deep reinforcement learning model are trained using a teacher-student combination framework. The initialization effect of the target detection model and the deep reinforcement learning model obtained has a higher starting point, higher completeness of automated testing, and a shorter time to traverse all user interfaces of the application, which is conducive to obtaining a target detection model and a deep reinforcement learning model with better performance.

FIG. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.

4 , an electronic device provided according to an embodiment of the present disclosure includes: one or more processors 101; a memory 102 on which one or more programs are stored. When the one or more programs are executed by the one or more processors 101, the one or more processors 101 implement the dialing method of the application according to each embodiment of the present disclosure.

In addition, one or more I/O interfaces 103 are connected between the processor 101 and the memory 102 to implement information interaction between the processor 101 and the memory 102 .

The processor 101 is a device with data processing capabilities, including but not limited to a central processing unit (CPU); the memory 102 is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH); the I/O interface (read-write interface) 103 is connected between the processor 101 and the memory 102, and can realize information exchange between the processor 101 and the memory 102, including but not limited to a data bus (Bus), etc.

In some embodiments, the processor 101 , the memory 102 , and the I/O interface 103 are connected to each other via a bus 104 , and further connected to other components of the computing device.

FIG. 5 is a block diagram of a computer-readable medium according to an embodiment of the present disclosure.

5 , an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored. When the program is executed by a processor, the processor implements the dialing method of the application program of each embodiment of the present disclosure.

In order to enable those skilled in the art to more clearly understand the technical solutions provided by the embodiments of the present disclosure, the technical solutions provided by the embodiments of the present disclosure are described in detail below through specific examples.

Embodiment 1

This embodiment provides an automatic dialing test system, the purpose of which is to obtain traffic information generated by an application during a dialing test.

In this embodiment, the traditional automatic dialing test system is shown in FIG6 , and the traditional automatic dialing test system can perform dialing test on the application based on the interface description information. For example, the traditional automatic dialing test system parses the XML description information of the UI of the application, and makes a decision based on the parsed controls to determine the next operation of the dialing test until the termination condition is reached (for example, the page traversal is completed or the time setting expires).

As shown in Figure 7, the automated dialing system according to the embodiment of the present disclosure will be trained with the operation guidance of the traditional automated dialing system before being independently put online, which can effectively avoid the trial and error stage in the early stage of training and accelerate system optimization. In actual operation, the automated dialing system is significantly better than the traditional automated dialing system in terms of the time to reach the maximum depth, the time consumed to traverse all interfaces, and the amount of traffic obtained.

As shown in Figure 8, the automatic dialing system according to the embodiment of the present disclosure is part of an end-to-end application traffic production system. For example, the end-to-end application traffic production system obtains the update status of the APP by monitoring the application market, and then automatically dials the updated APP through the automatic dialing system to obtain the traffic generated when the APP is running. Since the traffic is also mixed with some public traffic, advertising traffic, etc., it is necessary to clean the traffic to remove public traffic, advertising traffic, etc., and obtain a training test set, so that it can be used for practical purposes such as rule extraction and verification.

The automatic dialing test system proposed in the embodiment of the present disclosure comprises a traditional automatic dialing test system, an injection The system includes components and modules such as registration interface & login interface judgment, automatic registration and login, target detection model for control positioning, and deep reinforcement learning model for action recommendation. In the early stage, the whole system is based on the traditional automatic dialing system. The target detection model and deep reinforcement learning model need to go through a long period of observation and learning of the traditional automatic dialing system. When the expression and recommendation capabilities of the target detection model and deep reinforcement learning model are comparable to those of the traditional automatic dialing system, the components based on the target detection model and deep reinforcement learning model will be used as core modules to undertake APP dialing tasks.

Traditional automatic dialing system

The traditional automatic dialing test system is an automated testing system built on the classic XML description information based on the UI. The system obtains the location of the control by parsing the XML description information, and then infers and decides what action to use (including single click, double click, left and right swipe, up and down swipe, long press, input, etc.) to perform the operation, so that the interface is refreshed or a new interface is entered. The test of the entire APP is completed through the depth first traversal (DFS, Depth First Search) method, and complete traffic is collected at the same time. The advantage of the traditional automatic dialing test system is that it is simple to implement, and the DFS traversal of the APP can be completed through the interface design logic. However, the traditional automatic dialing test system is highly dependent on the XML description information of the UI, and it is impossible to perform dialing tests on APPs that cannot obtain XML description information.

Registration interface & login interface judgment

The registration interface and login interface are interfaces that rely on human-computer logical interaction. In many apps, it is impossible to bypass the registration interface and login interface to enter the app without entering prior information, and it is impossible to complete a deep traversal of the entire app. Therefore, it is very important to identify whether the UI is a registration interface or a login interface during the dialing process. In the traditional automatic dialing system, by parsing and identifying the XML description file of the UI, a three-category (registration interface, login interface, other) random forest model is trained based on features such as whether the UI contains specific characters and/or icons (for example, registration, mobile phone number, password, verification code, login, WeChat icon, skip, enter, QQ icon, Sina icon, etc.), the number of clickable controls, and the current interface depth. This model can be used to identify whether the UI is a registration interface or a login interface.

However, not all APP UIs can obtain complete XML description information. In order to make the judgment logic universal, the calibration data generated by the existing system is used in this process. Based on the data and the manually collected registration and login interface, a convolutional neural network (CNN) with APP interface images as input was trained as an interface classification model, as shown in Figure 9. In actual verification, the CNN model's judgment accuracy for UI interfaces with background description information is slightly higher than that of traditional methods. Since the CNN-based judgment model is independent of the background description information, it has a very high universality.

Automatic registration and login

The login interface and registration interface are strong interactive logic interfaces that rely on people's prior knowledge. Following the human factors engineering design interface, the automatic registration and login of the APP can be achieved as follows.

For the identified login interface, check whether there is a registered account. If so, use the registered account to log in. If not, use the verification based on mobile phone number, WeChat, or QQ to log in. If the interface still cannot be logged in, locate the control suspected to be "Jump to registration" through text and click it to enter the registration interface.

There are dozens of designs and layout patterns for common registration interfaces. The disclosed embodiment designs a set of filling templates for common registration interfaces. If an APP has never been registered in the dialing system, the registration interface will be scanned and identified by OCR, the key information controls will be located, and then the pre-configured registration information will be filled in. After successful registration, the relevant registration information will be saved, and the registered information will be used directly to log in during subsequent dialing.

Control positioning

Traditional automatic dialing systems use UI interface description files to obtain control information. From the perspective of human factors engineering, the interaction logic of the front-end UI should be consistent with people's subjective feelings, allowing people to intuitively feel which places are operable controls and which places are just backgrounds. Therefore, from the perspective of target detection, based on the existing traditional automatic dialing system's recognition and analysis of controls, a set of target detection models that can identify and mark the location of controls are trained to mark controls based on interface images.

The training data of the target detection model comes from the dialing process of the traditional automatic dialing system on the APP with UI cross-sectional description information. The traditional automatic dialing system will cut out the recognized controls from the interface image. When cutting out the control image, it is slightly larger than the actual size of the control (generally enlarged by 20%). The test found that including the information around the control can effectively increase the accuracy of control recognition. At the same time, in order to increase the negative sample data, it will randomly cut out the interface image. Take pictures with a background that does not contain controls or a control area that is less than 20% as non-control samples. The collected sample data will be used to train the Yolo-V4 target detection model (as shown in Figure 10). The trained model will directly use the UI image as input and output the recognized control information (single-click control, double-click control, left-swipe control, right-swipe control, up-swipe control, down-swipe control, input control, long-press control).

Automatic dialing test based on reinforcement learning

Traditional automatic dialing systems use deep traversal to traverse applications in order to cover all interfaces. The purpose of the automatic dialing system proposed in the disclosed embodiment is to obtain the traffic information generated by the APP, and it is necessary to traverse all business processes as much as possible, especially to fully cover business scenarios that can generate large traffic.

In principle, APP automatic dialing is a typical Markov decision (MDP) process of accepting the current perception state, executing operation behavior, and jumping to a new perception state.

The automatic dialing system proposed in the embodiment of the present disclosure adopts the DDPG algorithm of the AC architecture to implement the deep reinforcement learning model. The AC architecture separates action prediction and action evaluation, which is conducive to learning prior experience from other homogeneous traditional automatic dialing systems. The starting point of the deep reinforcement learning model is already very high when it actually works, avoiding the early random walk exploration.

Reinforcement learning is based on the four-tuple (S _t , R _t , S _t+1 , A _t ), where S _t represents the state perceived by the current reinforcement learning, S _t+1 represents the new environmental state after the reinforcement learning performs the action A _t , and R _t is the feedback value obtained after switching to the new state. In the automatic dialing system, the current UI and the information of all controls (up to 50 controls) will be used as the environmental information perceived by reinforcement learning. After the reinforcement learning perceives the environmental information, it can make a selection operation and select one of the controls to perform an action.

In automated system testing, the goal is to traverse all UIs as much as possible and test the business that generates the largest data traffic as much as possible. Therefore, the reward function is designed as follows:
Reward = ∑α*State _{change_percent} + β*Traffic _inc + λ*|depth _change |;

Among them, α, β, and λ are weight factors, State _{change_percent} is the percentage of interface pixel change, Traffic _inc indicates the increase in traffic, and depth _change is the change in traversal depth. In some scenarios, α can be 1, β can be 0.01, and λ can be 0.05. This configuration can achieve the effect of paying more attention to the degree of interface change in the early stage of dialing and paying more attention to traffic in the later stage.

_St is the state of the environment perceived by reinforcement learning, including the current UI image of the action to be performed and the position and type information of all controls _Xp , _Yp , T, where _Xp is the horizontal position of the center of the control, _Yp is the vertical position of the center of the control, and T is the control type, which is normalized to the range of 0-1 according to the total number of control types). The action prediction (Actor) network in the deep reinforcement learning model shown in Figure 11 includes a CNN subnetwork and an LSTM subnetwork. The Actor network takes the UI and the control information obtained by detection as input, extracts features through the CNN subnetwork and the LSTM subnetwork respectively to obtain the user interface vector and the control information vector, and outputs the predicted action through feature merging.

In the initial training stage, the traditional automatic dialing system passes the process data generated by the operation process to the DDPG reinforcement learning network. The Actor and Critic networks process the input images using the CNN structure shown in Figure 11. The specific training process is shown in Figure 12. Compared with the Actor network, the Critic network has an additional action vector input. The feature merging part of the Critic network merges the user interface vector, the control information vector and the action vector to obtain the output result, and then optimizes the Actor network by calculating the error through the reward function.

Embodiment 2

The traditional automatic dialing test system has a large reliance on the XML file describing the front-end UI, and cannot be applied in the scenario where the front-end UI description file cannot be obtained. The disclosed embodiment replaces the reliance of the traditional automatic dialing test system on the XML file by training the control calibration model, thereby realizing a more universal dialing test system.

In the process of dialing test in traditional dialing test system, the operable controls are marked by obtaining XML description information, and the controls are divided into 8 categories according to the operable actions, namely: single-click control, double-click control, long-press control, left-swipe control, right-swipe control, up-swipe control, down-swipe control, and input control. The corresponding data samples are saved in the data set.

In the saved data samples, a large number of non-control background images (containing less than 20% of the control content) are intercepted in a controlled random manner as negative samples.

Use the collected sample data to train the YOLO-V4 target detection model until the model accuracy is higher than 90%.

The traditional dial-up test system will continue to collect new data, and when the data accumulates to a certain level, it will repeat the steps of training the YOLO-V4 target detection model. The general update cycle is more than 1 week.

The control recognition model replaces the traditional automatic dialing system to obtain XML information and parse Module for control information.

During the traversal process, in order to handle the registration and login logic of strong interactive logic, a classification model based on a convolutional neural network (CNN) is trained. After the interface jumps to a new interface, it first passes through the network to determine whether it is a login or registration interface.

If it is a login or registration interface, registration or login is achieved using the method described in the aforementioned technical solution.

During the application test, packet capture is enabled to obtain all message data sent from the test application. At the same time, the actions, location information, and control screenshots performed during the test are inserted into the packet capture message sequence in the form of spiking messages. The spiking message can be identified by filling the MAC address with 0s.

The dialing test system based on the control identification model will implement the dialing test of the entire application according to the depth-first traversal (DFS) logic.

Embodiment 3

Traditional automatic dialing tests generally use deep traversal to traverse the application in order to cover all interfaces. This method is generally used in APP front-end UI automation testing. However, the purpose of the disclosed embodiment is not only to test the APP, but also to obtain the data packets generated by the APP. Therefore, it is necessary to traverse all business processes as much as possible and collect as many business scenarios that can generate large traffic as possible. In the disclosed embodiment, reinforcement learning will be used to replace the deep traversal logic to achieve a balance among multiple goals as much as possible.

Every time a traditional dial-up test system executes an action, it collects the APP interface before the action is executed, the controls operated, the new interface jumped to after the execution, the traffic gain, the APP depth, etc., and records them in the data set.

The reinforcement learning model trains the DDPG reinforcement learning model of operation actions and action evaluation based on the data in the dataset.

The action generation Actor model of the reinforcement learning model replaces the deep traversal process of the traditional automatic dialing system.

The automatic dialing system replaced with the Actor model will continue to be online and the model will continue to be trained online.

During the traversal process, in order to handle the registration and login logic of strong interactive logic, a classification model based on a convolutional neural network is trained. After the interface jumps to a new interface, it first passes through the network to determine whether it is a login or registration interface.

The dialing test system based on the control identification model will complete the dialing test of the entire application according to the traversal path recommended by reinforcement learning.

A person of ordinary skill in the art will appreciate that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or may be implemented as hardware, or may be implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium. (or non-transitory media) and communication media (or temporary media). As is well known to those of ordinary skill in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted only in a general illustrative sense and not for limiting purposes. In some instances, it will be apparent to those skilled in the art that, unless otherwise expressly noted, features, characteristics, and/or elements described in conjunction with a particular embodiment may be used alone or in combination with features, characteristics, and/or elements described in conjunction with other embodiments. Therefore, those skilled in the art will appreciate that various changes in form and detail may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

A dialing test method for an application program, comprising:

Acquire an interface image of the application and input it into a target detection model to identify controls in the interface image and obtain control information;

Determine a traversal path using a deep reinforcement learning model according to the control information and the interface image, and traverse the business of the application;

Obtain traffic information generated by the application during the traversal process.
The dialing method according to claim 1, further comprising:

Inputting the interface image into an interface classification model, and calibrating the category of the user interface corresponding to the interface image;

An interactive operation is performed in the user interface according to the category of the user interface.
The dialing method according to claim 2, wherein, when the user interface is a login interface, performing an interactive operation in the user interface according to the type of the user interface comprises:

Determining whether there is login information for the user interface;

In the case where the login information exists, performing a login operation in the user interface using the login information;

In the case that the login information does not exist, the registration jump control in the user interface is located according to the interface image, and the registration interface is jumped to.
The dialing method according to claim 2, wherein, when the user interface is a registration interface, performing an interactive operation in the user interface according to the type of the user interface comprises:

Locating a registration information control in the user interface according to the interface image;

Filling registration information in the registration information control according to a preconfigured filling template;

When the registration is successful, the registration information is used as the login information of the user interface and stored.
According to any one of claims 1 to 4, the dialing method, wherein determining a traversal path using a deep reinforcement learning model according to the control information and the interface image, and traversing the business of the application program comprises:

Inputting the interface image and the control information into the action prediction network of the deep reinforcement learning model to determine a target control and a target action, wherein the target control is one of the controls in the interface image;

Inputting the interface image, the control information and the target action into the action evaluation network of the deep reinforcement learning model to evaluate the target action;

The error is calculated using a reward function to optimize the action prediction network.
The dialing method according to claim 5, wherein the reward function is expressed as:
Reward = ∑α*State change_percent + β*Traffic inc + λ*|depth change |;

Among them, α, β, and λ are weight factors, State change_percent is the percentage of interface pixel change, Traffic inc indicates the increase in traffic, and depth change is the change in traversal depth.
The dialing method according to claim 5, wherein inputting the interface image and the control information into the action prediction network of the deep reinforcement learning model to determine the target control and the target action comprises:

Inputting the interface image into the convolutional neural sub-network in the action prediction network to extract features of the interface image and obtain a user interface vector;

Inputting the control information into the long short-term memory subnetwork in the action prediction network to extract features of the control information and obtain a control information vector;

The user interface vector and the control information vector are feature merged to obtain the target control and the target action.
According to any one of claims 1 to 4, the step of obtaining the traffic information generated by the application during the traversal process comprises:

During the traversal process, packets of the application are captured to obtain a packet capture message sequence;

The action information, control location information, and control screenshots during the traversal process are inserted into the packet capture message sequence in the form of a pile message to obtain the flow information.
The dialing method according to any one of claims 1 to 4, further comprising:

Using a traditional automatic dialing test system to perform dialing test on at least one application to be tested based on the interface description information of the user interface to obtain a training sample set;

The initial target detection model and the initial deep reinforcement learning model are trained using the training sample set to obtain the target detection model and the deep reinforcement learning model.
An electronic device, comprising:

one or more processors;

A memory having one or more programs stored thereon, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the application program dialing method according to any one of claims 1 to 9.
A computer-readable medium having a computer program stored thereon, wherein when the program is executed by a processor, the processor implements the application program dialing method according to any one of claims 1 to 9.