CN111930622A

CN111930622A - Interface control testing method and system based on deep learning

Info

Publication number: CN111930622A
Application number: CN202010793876.1A
Authority: CN
Inventors: 吴思奥; 张�浩; 傅媛媛; 丘士丹
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-11-13
Anticipated expiration: 2040-08-10
Also published as: CN111930622B

Abstract

The invention provides an interface control testing method and system based on deep learning, wherein the method comprises the following steps: acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested; intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information; screenshot is carried out on a corresponding area in the screen interface data according to the text position information to obtain screenshot data of a text area, and the screenshot data of the text area is brought into a text recognition model obtained through training of a text recognition public data set to obtain text information; matching the text information with a target text, and acquiring a corresponding target control position and input information according to the target text obtained by matching; and executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

Description

Interface control testing method and system based on deep learning

Technical Field

The invention relates to the field of interface control testing, in particular to an interface control method and a testing system based on deep learning.

Background

To ensure the quality of the software product, the software product is subjected to a great deal of testing work before being released. In the prior art, a tester can test an interface by means of an automatic test tool or other means so as to save manpower and reduce time cost. One of the most important things for interface automation testing is to get the control object to be operated, such as a text box, click box, drop-down box, etc. The accurate and quick acquisition of the target control object is the key of the automatic test of the interface control.

Traditional interface control testing is generally divided into two methods: the method comprises the steps of recording pictures of interface controls, storing the pictures in a specific path, and then finding the position of a target control in the interface in a test script writing mode so as to test the target control in the interface according to corresponding action instructions. And secondly, filling the rough position coordinates of the control and action instructions for operating the control into case data directly through test case driving, continuously traversing the size of a text box of a target text picture according to each pixel through target text position coordinates around the target control filled in the case and target text information around the control, sending the picture obtained by traversing once to OCR recognition service for recognition, matching the recognition result with the target text around the control, finding the text position coordinates around the target control, and then traversing according to the obtained text position coordinates and the relative positions of the control filled in the case and the target text around the control to find the target control.

The first method has the disadvantages that: 1. when the interface changes, the position of the recorded interface control and the written test script change. 2. When the interface is displayed on a display with a lower resolution, the control picture originally recorded at the higher resolution may not be recognized on the display with the lower resolution, and in this case, the script needs to be recorded again. The continuous and repeated recording and modification of scripts wastes labor and time costs, resulting in low test efficiency.

The second method has the following disadvantages: 1. in the test case, the starting point coordinates and the end point coordinates of the position of the target text around the target control need to be filled, however, in the actual test process, a tester can hardly know the position of the target text in the interface, the starting point coordinates can only estimate small coordinates when filling, and the end point coordinates can only estimate large coordinates when filling, but the number of times of picture traversal is increased, and the time is very long. 2. When a target text picture around the control is searched, the size of the target text box needs to be specified and the target text picture needs to be continuously traversed according to pixels, and when the specified size of the target text box cannot find the picture of the target text through traversal, the size of the text box needs to be changed and the target text picture needs to be traversed again. This approach is not intelligent and inefficient, and if the specified target text box is small or large, there is a high probability that the target text picture cannot be found.

Disclosure of Invention

The invention aims to provide an interface control method and a test system based on deep learning, which improve the accuracy of identifying the interface control and have stronger usability, thereby saving the test labor and time cost.

To achieve the above object, the interface control testing method based on deep learning provided by the present invention specifically includes: acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested; intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information; screenshot is carried out on the corresponding area in the screen interface data according to the text position information to obtain text area screenshot data, and the text area screenshot data is brought into a text recognition model obtained through text recognition public data set training to obtain text information; matching the text information with the target text, and obtaining a corresponding target control position and input information according to the target text obtained by matching; and executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

In the interface control testing method based on deep learning, preferably, the obtaining of the interface serial number, the target text, the target control position and the input information of the test process of the control to be tested further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases.

In the interface control testing method based on deep learning, preferably, the target control position includes: the relative position and the shortest distance of the target control and the target text.

In the interface control testing method based on deep learning, preferably, intercepting the corresponding screen interface data according to the interface serial number includes: and intercepting corresponding screen interface data by a GUI automatic screen capturing method according to the interface serial number.

In the interface control testing method based on deep learning, preferably, the text detection model construction process includes: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture characteristics of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture characteristics, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of the connection, and obtaining character block example segmentation data according to the connected domain set; extracting an external rectangular box with direction information through OpenCV according to the text block example segmentation data to obtain a text boundary box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data.

In the interface control testing method based on deep learning, preferably, dividing the image data in the scene text detection common data set into the training set image data and the verification set image data according to a preset ratio further includes: and converting the training set image data and the verification set image data into a tfrecrd file format.

In the interface control testing method based on deep learning, preferably, the text recognition model construction process includes: dividing image data in the text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing a characteristic vector sequence of the image convolution characteristics through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

In the interface control testing method based on deep learning, preferably, dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion further includes: normalizing the image data in the text recognition public data set into standard image data with a preset size; and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion.

In the interface control testing method based on deep learning, preferably, the executing, according to the input information, a corresponding instruction operation on the target control corresponding to the target control position includes: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

The invention also provides an interface control testing system based on deep learning, which comprises a setting module, a text position detection module, a text information extraction module, a matching module and a testing module; the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested; the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number and bringing the screen interface data into a text detection model obtained through scene text detection public data set training to obtain text position information; the text information extraction module is used for capturing a corresponding area in the screen interface data according to the text position information to obtain text area capture data, and bringing the text area capture data into a text recognition model obtained through training of a text recognition public data set to obtain text information; the matching module is used for matching the text information with the target text and obtaining the corresponding target control position and input information according to the target text obtained by matching; and the test module is used for executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

In the interface control testing system based on deep learning, preferably, the setting module further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: the relative position and the shortest distance of the target control and the target text.

In the interface control testing system based on deep learning, preferably, the text position detection module further includes a text detection model construction unit, and the text detection model construction unit is configured to: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture characteristics of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture characteristics, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of the connection, and obtaining character block example segmentation data according to the connected domain set; extracting an external rectangular box with direction information through OpenCV according to the text block example segmentation data to obtain a text boundary box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data.

In the interface control testing system based on deep learning, preferably, the text detection model building unit further includes: and converting the training set image data and the verification set image data into a tfrecrd file format to construct a text detection model.

In the interface control testing system based on deep learning, preferably, the text information extraction module further includes a text recognition model construction unit, and the text recognition model construction unit is configured to: dividing image data in the text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing a characteristic vector sequence of the image convolution characteristics through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

In the interface control testing system based on deep learning, preferably, the text recognition model building unit further includes: normalizing the image data in the text recognition public data set into standard image data with a preset size; and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and then constructing a text recognition model.

In the interface control testing system based on deep learning, preferably, the testing module includes: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the computer program.

The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.

The invention has the beneficial technical effects that: the staff can test the interface control only by filling in the test case data without compiling the test script; when the software interface design changes, if the relative position of a target control to be tested and a target text is not changed and the sequence and logic of the test control are also not changed, a tester can directly use the case data used last time to carry out the changed interface test; the method has the advantages that the accuracy of searching the interface target specimen text position is improved and the searching time is reduced through the text detection and text recognition algorithm of deep learning, so that the interface target control can be accurately and quickly found out and is not influenced by a low-resolution display; the labor and time cost of the test are further saved, and the quality of the product is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1A is a schematic flowchart of a deep learning-based interface control testing method according to an embodiment of the present invention;

fig. 1B is a schematic application flow diagram of a deep learning-based interface control testing method according to an embodiment of the present invention;

fig. 2A is a schematic view of a construction process of a text detection model according to an embodiment of the present invention;

fig. 2B is a schematic diagram illustrating a construction principle of a text detection model according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a flow of acquiring screenshot data of a text area according to an embodiment of the present invention;

fig. 4A is a schematic view of a process of constructing a text recognition model according to an embodiment of the present invention;

FIG. 4B is a schematic diagram illustrating a principle of constructing a text recognition model according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a text recognition model recognizing text information according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a deep learning-based interface control testing system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, unless otherwise specified, the embodiments and features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

Referring to fig. 1A, a method for testing an interface control based on deep learning provided by the present invention specifically includes:

s101, acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested;

s102, intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through scene text detection public data set training to obtain text position information;

s103, capturing a corresponding area in the screen interface data according to the text position information to obtain text area capturing data, and bringing the text area capturing data into a text recognition model obtained through training of a text recognition public data set to obtain text information;

s104, matching the text information with the target text, and obtaining a corresponding target control position and input information according to the target text obtained through matching; and executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

In the above embodiment, the obtaining the interface serial number, the target text, the target control position, and the input information of the test process of the control to be tested further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases. Wherein the target control position comprises: the relative position and the shortest distance of the target control and the target text. In practical work, the case template data may be in an Excel form in this embodiment, where the first column is an interface serial number, the interface serial number is an arabic number, the first row of the case data corresponds to the tested interface serial number of 1, the serial number of the subsequent row needs to be filled according to whether the interface of the previous row of target controls changes after operation, if the interface of the previous row of controls does not change after operation, the serial number in the row of data is the same as the serial number of the previous row, and if the interface changes, the serial number of the row is the serial number of the previous row plus 1. The second column is named as target text, and the content filled in the target text column is target characters needing to be recognized in the test interface. The third column is named as the relative position of the target control and the target text, and the filled content of the relative position of the target control and the target text is the position of the control to be tested in which direction of the target text is, namely, up, down, left and right. The fourth column is named as distance, and the content of the distance is that the distance between the target control and the target character is a multiple of the width of the target text font. The fifth column name is the target control name, and the content of the target control name is also the control name which is designed to be selected by a tester in Excel (input box, click, drop-down box and other common control names). The sixth column is named as an input text, and if the target control name of the fifth column is an input box, text content needing to be input needs to be filled in the column of the input text; specific reference may be made to table 1 below.

TABLE 1

Reading the data in the table one by one according to the Excel test case: the data of each line comprises a sequence number, a target text, the relative position of a target control and the target text, a control name and an input text, and the data of each line is stored by a Map set. Then, the intercepting operation in step S102 is executed, specifically, the corresponding screen interface data is intercepted by the GUI automatic screen-capturing method according to the interface serial number, so as to prepare for the subsequent text recognition model prediction.

In one embodiment of the invention, in order to facilitate accurate detection of text position information, a text detection network model is built based on combination of region suggestion and semantic segmentation, screening processing of a text box by using a text segmentation image detection result is added on the basis of a method using the region suggestion, and a multi-model integration method is adopted to combine results of region suggestion prediction and semantic segmentation prediction to obtain a text detection result with higher accuracy. Specifically, as shown in fig. 2A, the text detection model building process includes:

s201, dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion;

s202, extracting picture characteristics of the training set image data and the verification set image data through a convolutional neural network algorithm;

s203, performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture features, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of the connection, and obtaining character block example segmentation data according to the connected domain set;

s204, extracting an external rectangular box with direction information through OpenCV according to the text block example segmentation data to obtain a text boundary box;

s205, a text detection model is constructed according to the training set image data, the text bounding box and the verification set image data.

The method for dividing the image data in the scene text detection public data set into the training set image data and the verification set image data according to the preset proportion further comprises the following steps: converting the training set image data and the verification set image data into a tfrecrd file format; in the actual process, the details of the construction of the text detection model may be as shown in fig. 2B, and the detailed implementation manner of each step is as follows:

(1) acquiring input image data of a text detection network model:

and the scene text detection common data set IC15 is proportionally and randomly divided into a training set and a verification set for training and verifying a subsequent text detection network model in the training process. And the pictures in the data set are converted into a tfrecrd file format, and the tfrecrd file format can be quickly loaded into a memory, so that the training time of the text detection network model is saved.

(2) CNN extracts picture features:

feature extraction is performed on the basis of VGG16, and the input layers are the data preprocessed from the image dataset, followed by a combination of 5 convolutional layers plus pooling layers and 2 convolutional layers of 1 × 1. The 1x1 convolution operation is carried out on the feature maps extracted from different layers, and the feature maps extracted each time are subjected to feature fusion to extract more features, wherein 4 such feature fusion operations are total, when the feature maps output by 2 x1 convolutional layers are subjected to the feature fusion operation, if a pooling layer exists between the feature maps and the pooling kernel size is 2 x 2, a 2 x 2 up-sampling operation needs to be carried out after the 1x1 convolutional layer carried out later, and the size of the feature maps is restored to be the same as the size of the feature maps to be fused.

(3) Text-two-class prediction of pixels and concatenated text-two-class prediction:

after all the feature extraction operations are finished, text two-classification prediction of a certain pixel is carried out, and whether connected text two-classification prediction exists in 8 neighborhood directions (up, down, left, right, left-up, left-down, right-up and right-down) of the pixel or not is carried out.

(4) Obtaining an example segmentation result:

and obtaining a connected domain set through text two-classification prediction of the pixels and connected text two-classification prediction, wherein each element in the set represents a text instance, and each connected domain represents a detected text instance, so that a text block instance segmentation result is obtained.

(5) Acquiring a text bounding box:

the method comprises the steps of extracting circumscribed rectangle boxes with direction information of texts with different sizes by using minAreaRect (minimum circumscribed rectangle) of OpenCV, wherein the specific format is ((x, y), (w, h), theta), (x, y) represents center point coordinates, (w, h) represents the width and height of the current rectangle box, and theta represents a rotation angle. Then, noise filtering operation is carried out, and the position of the final text bounding box is obtained through the 'parallel-set data structure'. The above process and other text detection network models omit the frame regression step, so the convergence rate of the model is faster during training. And training the text detection network by using an SGD (random gradient descent) optimizer to obtain a trained text detection model.

On the basis of the text detection model, the intercepted screen interface can be used as input and sent to the text detection model for predicting the position information, and all text areas of the screen interface are intercepted according to the position information; specifically, in actual work, the whole screen interface can be captured, the whole picture is used as input of a trained text detection model, position information (four vertex coordinates of a text region) of all the text regions in the screen is obtained according to the text region detected by the text detection model, then the text region is cut into pictures according to the coordinate position corresponding to each text region, and therefore the pictures of all the text regions in the screen are directly obtained at one time. A flow chart of a picture using a text detection model to capture all text regions in a screen is shown in fig. 3.

In order to improve the accuracy of text recognition, the text recognition model provided by the invention comprises a CNN convolution module, an RNN circulation network module and a CTC transcription module. And the CNN convolution module learns the image convolution characteristics, the RNN circulation network module further extracts the sequence characteristics in the image convolution characteristics, and the CTC transcribes and merges text sequences to obtain a result of the text in the image. The text recognition model can obtain the context relationship characteristics of the text sequence, so that the recognition performance of the method is superior to that of a method based on a naive convolutional neural network. The text recognition accuracy is effectively improved, and the model is more robust. Specifically, in an embodiment of the present invention, referring to fig. 4A, the text recognition model building process includes:

s401, dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion;

s402, extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm;

s403, analyzing the characteristic vector sequence of the image convolution characteristic through a recurrent neural network algorithm to obtain text character sequence probability;

s404, transcribing the text character sequence probability through a CTC algorithm to obtain text data;

s405, a text recognition model is constructed according to the text data, the training set image data and the verification set image data.

The method for dividing the image data in the text recognition public data set into the training set image data and the verification set image data according to the preset proportion further comprises the following steps: normalizing the image data in the text recognition public data set into standard image data with a preset size; and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion. In an actual process, the details of the construction of the text recognition model may be as shown in fig. 4B, and the detailed implementation manner of each step is as follows:

(1) acquiring input image data of a text recognition network model:

the size of the picture of the text recognition public data set is normalized to 32 × 256, namely the height is 32 pixels, the width is 256 pixels, the RGB three channels of the picture are read as a gray scale image, and the training speed of the model can be improved. The tag matrix is then processed to convert it to a data format supported by tenserflow. And randomly dividing the training set and the verification set according to the proportion for training and verifying a subsequent text recognition network model in the training process.

(2) And (3) performing CNN convolution learning to obtain image convolution characteristics:

the CNN convolution module contains 7 convolution layers and 4 pooling layers of small CNN networks, and two times of batch regularization are added in the middle of the CNN convolution module, so that gradient dispersion of a model is avoided, convergence of the model is accelerated, and a training process is shortened. And (4) obtaining the image convolution characteristics through the CNN learning.

(3) RNN cycle network predicted text character sequence probability:

and (3) inputting the image convolution characteristics learned by the CNN into an RNN cycle network, wherein the RNN cycle network learns the bidirectional dependence relationship of the characteristic sequence by adopting a multilayer LSTM structure, extracts a characteristic vector sequence from the generated image characteristics, and predicts to obtain the probability of the text character sequence.

(4) CTC transcript text:

usually, the predicted result of the text character sequence can not be aligned with the true text, and the predicted character probability sequence is transcribed into the text by using a CTC algorithm. And training the text recognition network by using an SGD optimizer to obtain a trained text recognition model.

As will be understood by those skilled in the art, the above numerical values are exemplary data that the worker selects and sets according to actual needs, and are not specific limitations on the application manner; the adjustment setting can be selected by those skilled in the art according to actual needs, and the present invention is not limited thereto.

In an embodiment of the present invention, in step S103, the text region screenshot data is brought into a text recognition model obtained through training of a text recognition public data set, and the text region screenshot can be preprocessed, so that the text recognition model can complete text recognition more accurately; in actual operation, since the size of the input picture of the text recognition model is 32 × 256, when the picture captured by the text detection model is predicted, the picture captured by the text detection model needs to be scaled and filled. If the pixel width is greater than 256 for a picture, the picture is scaled to 32 x 256. If the picture pixel width is less than 256, the picture is scaled equally as if the height was adjusted to 32, and if the width is less than 256, the pixel portion is filled with 0. And sending the preprocessed text region picture of the screen into a trained text recognition model for prediction to obtain text information contained in the picture. A flow chart for predicting text information in a text region screenshot using a text recognition model is shown in fig. 5. As will be understood by those skilled in the art, the above numerical values are exemplary data that the worker selects and sets according to actual needs, and are not specific limitations on the application manner; the adjustment setting can be selected by those skilled in the art according to actual needs, and the present invention is not limited thereto.

Therefore, after text information is obtained according to the recognition of the text recognition model, a matching link can be entered, namely the text information is matched with the target text; in actual work, the characters predicted by each text area picture can be matched with the target texts in the case data, and if the predicted text information is completely equal to the target texts, the coordinate positions of the text area pictures are recorded and are in one-to-one correspondence with the target texts. Therefore, the position of the target control is further determined according to the coordinate of the successfully matched text information and the relative position between the target controls corresponding to the target text; specifically, in actual operation, the distance between the target control corresponding to the target text and the multiple between the font height of the target text may be denoted as n, the position coordinates of the upper left, lower left, upper right and lower right of the successfully matched target text are denoted as (x, y), (x, s), (v, y), (v, s), the height is denoted as h-y-s, and the width is denoted as w-v-x. And then, acquiring the relative position of the target control and the target text in the data read in the step 2. When the relative position content is middle, directly recording the position coordinate of the target control as the center position coordinate of the target text, namely ((x + v)/2, (y + s)/2); when the relative position content is left, the position coordinate of the target control is recorded as (x-h x n, (y + s)/2); when the relative position content is right, the position coordinate of the target control is recorded as (v + h × n, (y + s)/2); when the relative position content is up, the position coordinates of the target control are recorded as ((x + v)/2, y-h x n); when the relative position content is lower, the position coordinates of the target control are recorded as ((x + v)/2, s + h × n).

After the specific position of the target control is obtained, the corresponding operation on the target control can be executed, and in an embodiment of the invention, a GUI automation technology can be invoked to execute the corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

In summary, the interface control testing method based on deep learning provided by the present invention can be shown in fig. 1B in the overall technical process, and is subdivided into the following 10 steps:

step 1: and setting an Excel case template.

Step 2: and reading the data in the form item by item according to the case data filled by the testers.

And step 3: and intercepting a screen interface to be tested.

And 4, step 4: and building and training a text detection network to obtain a trained text detection model.

And 5: and taking the intercepted screen interface as input, sending the input to a text detection model for predicting position information, and capturing all text areas of the screen interface according to the position information.

Step 6: and building and training a text recognition network to obtain a trained text recognition model.

And 7: and sending the screenshot of the text area as input to a text recognition model for prediction.

And 8: and matching the text information predicted by the text recognition model with the target text contained in the data read in the step 2.

And step 9: and (3) determining the position of the target control according to the coordinate of the successfully matched target text and the relative position of the target control and the target text in the data read in the step (2).

Step 10: and testing the instruction action corresponding to the target control according to the position of the target control determined in the step 9.

The above steps can be appropriately combined or replaced according to actual needs by those skilled in the art, and the present invention is not further limited thereto.

Referring to fig. 6, the present invention further provides a system for testing an interface control based on deep learning, where the system includes a setting module, a text position detection module, a text information extraction module, a matching module, and a testing module; the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested; the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number and bringing the screen interface data into a text detection model obtained through scene text detection public data set training to obtain text position information; the text information extraction module is used for capturing a corresponding area in the screen interface data according to the text position information to obtain text area capture data, and bringing the text area capture data into a text recognition model obtained through training of a text recognition public data set to obtain text information; the matching module is used for matching the text information with the target text and obtaining the corresponding target control position and input information according to the target text obtained by matching; and the test module is used for executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

In the above embodiment, the setting module further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: the relative position and the shortest distance of the target control and the target text. The test module includes: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

In an embodiment of the present invention, the text position detection module further includes a text detection model construction unit, and the text detection model construction unit is configured to: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture characteristics of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture characteristics, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of the connection, and obtaining character block example segmentation data according to the connected domain set; extracting an external rectangular box with direction information through OpenCV according to the text block example segmentation data to obtain a text boundary box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data. Wherein, the text detection model building unit may further include: and converting the training set image data and the verification set image data into a tfrecrd file format to construct a text detection model.

In an embodiment of the present invention, the text information extraction module further includes a text recognition model construction unit, and the text recognition model construction unit is configured to: dividing image data in the text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing a characteristic vector sequence of the image convolution characteristics through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data. Wherein the text recognition model building unit may further include: normalizing the image data in the text recognition public data set into standard image data with a preset size; and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and then constructing a text recognition model.

As shown in fig. 7, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in fig. 7; furthermore, the electronic device 600 may also comprise components not shown in fig. 7, which may be referred to in the prior art.

As shown in fig. 7, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An interface control testing method based on deep learning is characterized by comprising the following steps:

acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested;

intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information;

screenshot is carried out on the corresponding area in the screen interface data according to the text position information to obtain text area screenshot data, and the text area screenshot data is brought into a text recognition model obtained through text recognition public data set training to obtain text information;

matching the text information with the target text, and obtaining a corresponding target control position and input information according to the target text obtained by matching;

and executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

2. The interface control testing method based on deep learning of claim 1, wherein obtaining the interface serial number, the target text, the target control position and the input information of the test process of the control to be tested further comprises: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases.

3. The deep learning based interface control testing method of claim 2, wherein the target control position comprises: the relative position and the shortest distance of the target control and the target text.

4. The interface control testing method based on deep learning of claim 1, wherein intercepting the corresponding screen interface data according to the interface sequence number comprises: and intercepting corresponding screen interface data by a GUI automatic screen capturing method according to the interface serial number.

5. The deep learning-based interface control testing method according to claim 1, wherein the text detection model building process comprises:

dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion;

extracting picture characteristics of the training set image data and the verification set image data through a convolutional neural network algorithm;

performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture characteristics, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of the connection, and obtaining character block example segmentation data according to the connected domain set;

extracting an external rectangular box with direction information through OpenCV according to the text block example segmentation data to obtain a text boundary box;

and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data.

6. The interface control testing method based on deep learning of claim 5, wherein dividing the image data in the scene text detection common data set into training set image data and verification set image data according to a preset ratio further comprises: and converting the training set image data and the verification set image data into a tfrecrd file format.

7. The deep learning-based interface control testing method according to claim 1, wherein the text recognition model building process comprises:

dividing image data in the text recognition public data set into training set image data and verification set image data according to a preset proportion;

extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm;

analyzing a characteristic vector sequence of the image convolution characteristics through a cyclic neural network algorithm to obtain text character sequence probability;

transcribing the text character sequence probability through a CTC algorithm to obtain text data;

and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

8. The interface control testing method based on deep learning of claim 7, wherein dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion further comprises:

normalizing the image data in the text recognition public data set into standard image data with a preset size;

and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion.

9. The interface control testing method based on deep learning of claim 1, wherein executing the corresponding instruction operation on the target control corresponding to the target control position according to the input information comprises:

and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

10. An interface control testing system based on deep learning is characterized by comprising a setting module, a text position detection module, a text information extraction module, a matching module and a testing module;

the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a test process of a control to be tested;

the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number and bringing the screen interface data into a text detection model obtained through scene text detection public data set training to obtain text position information;

the text information extraction module is used for capturing a corresponding area in the screen interface data according to the text position information to obtain text area capture data, and bringing the text area capture data into a text recognition model obtained through training of a text recognition public data set to obtain text information;

the matching module is used for matching the text information with the target text and obtaining the corresponding target control position and input information according to the target text obtained by matching;

and the test module is used for executing corresponding instruction operation on the target control corresponding to the position of the target control according to the input information.

11. The deep learning based interface control testing system of claim 10, wherein the setup module further comprises: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: the relative position and the shortest distance of the target control and the target text.

12. The deep learning based interface control testing system of claim 10, wherein the text position detection module further comprises a text detection model construction unit, and the text detection model construction unit is configured to:

13. The deep learning based interface control testing system of claim 12, wherein the text detection model building unit further comprises: and converting the training set image data and the verification set image data into a tfrecrd file format to construct a text detection model.

14. The deep learning based interface control testing system of claim 10, wherein the text information extraction module further comprises a text recognition model construction unit, and the text recognition model construction unit is configured to:

15. The deep learning based interface control testing system of claim 14, wherein the text recognition model building unit further comprises:

and after converting the standard image data into a tfrecrd file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and then constructing a text recognition model.

16. The deep learning based interface control testing system of claim 10, wherein the testing module comprises: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

17. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 9 when executing the computer program.

18. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 9.