CN111930622B

CN111930622B - Interface control testing method and system based on deep learning

Info

Publication number: CN111930622B
Application number: CN202010793876.1A
Authority: CN
Inventors: 吴思奥; 张�浩; 傅媛媛; 丘士丹
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2023-10-13
Anticipated expiration: 2040-08-10
Also published as: CN111930622A

Abstract

The invention provides an interface control testing method and system based on deep learning, wherein the method comprises the following steps: acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in a testing process; intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information; capturing a corresponding region in the screen interface data according to the text position information to obtain text region capturing data, and taking the text region capturing data into a text recognition model obtained through text recognition public dataset training to obtain text information; matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text; and executing corresponding instruction operation on the target control corresponding to the target control position according to the input information.

Description

Interface control testing method and system based on deep learning

Technical Field

The invention relates to the field of interface control testing, in particular to an interface control method and a testing system based on deep learning.

Background

To ensure the quality of the software product, the software product is subjected to a great deal of testing work prior to release. In the prior art, testers can test interfaces by means of automated test tools or other means to achieve labor saving and time cost reduction. One of the most important things in automated interface testing is to get a control object to be manipulated, such as a text box, click box, drop down box, etc. Accurate and rapid acquisition of a target control object is a key of automatic testing of interface controls.

Conventional interface control testing is generally divided into two methods: the method comprises the steps of recording pictures of the interface controls, storing the pictures in a specific path, and then searching the positions of the target controls in the interface in a mode of compiling test scripts so as to test the corresponding action instructions of the target controls in the interface. And secondly, directly driving through a test case, filling rough position coordinates of the control and action instructions for the control operation into case data, continuously traversing the size of a text box for searching for a target text picture according to each pixel through the target text position coordinates around the target control filled in the case and the target text information around the control, sending the picture obtained once in each traversing to an OCR recognition service for recognition, matching the recognition result with the target text around the control, finding the text position coordinates around the target control, and then traversing to find the target control according to the obtained text position coordinates and the relative positions of the filled control and the surrounding target text in the case.

Disadvantages of the first method: 1. when the interface changes, the recorded position of the interface control and the written test script change. 2. When the interface is displayed on a lower resolution display, the control picture originally recorded at the higher resolution may not be recognized on the lower resolution display, in which case the script needs to be re-recorded. Repeated recording and modification of scripts continuously wastes labor and time costs, resulting in inefficient testing.

The second method has the disadvantage that: 1. in the test case, the starting point and the end point coordinates of the target text position around the target control are required to be filled, however, in the actual test process, the position of the target text in the interface is difficult to be known by a tester, the starting point coordinates can only estimate very small coordinates when filling, the end point coordinates can only estimate very large coordinates when filling, but the number of times of picture traversal is increased, and the time is very long. 2. When searching for the target text picture around the control, the size of the target text box needs to be appointed to be continuously traversed according to the pixels, and when the appointed size of the target text box cannot find the picture of the target text through traversing, the size of the text box needs to be changed to be traversed again. This approach is not intelligent, is inefficient, and if the specified target text box is small or large, there is a high probability that the target text picture will not be found.

Disclosure of Invention

The invention aims to provide an interface control method and a test system based on deep learning, which improve the accuracy and the usability of identifying the interface control, thereby saving the labor and the time cost of the test.

In order to achieve the above purpose, the method for testing the interface control based on deep learning provided by the invention specifically comprises the following steps: acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in a testing process; intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training a scene text detection public data set to obtain text position information; capturing a corresponding region in the screen interface data according to the text position information to obtain text region capturing data, and taking the text region capturing data into a text recognition model obtained through text recognition public dataset training to obtain text information; matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text; and executing corresponding instruction operation on the target control corresponding to the target control position according to the input information.

In the above method for testing an interface control based on deep learning, preferably, the obtaining the interface serial number, the target text, the target control position and the input information of the control to be tested further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases.

In the above method for testing an interface control based on deep learning, preferably, the target control position includes: and the relative position and the shortest distance between the target control and the target text.

In the above method for testing an interface control based on deep learning, preferably, intercepting the corresponding screen interface data according to the interface serial number includes: and intercepting corresponding screen interface data through a GUI automatic screen capturing method according to the interface serial number.

In the above interface control testing method based on deep learning, preferably, the text detection model building process includes: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-classification prediction of preset pixels and connected text two-classification prediction of preset pixel adjacent directions on the picture characteristics, obtaining a connected domain set according to the text two-classification prediction of the preset pixels and the connected text two-classification prediction, and obtaining text block instance segmentation data according to the connected domain set; extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data.

In the above interface control testing method based on deep learning, preferably, dividing the image data in the scene text detection public data set into training set image data and verification set image data according to a preset proportion further includes: and converting the training set image data and the verification set image data into a tfreeord file format.

In the above method for testing an interface control based on deep learning, preferably, the text recognition model construction flow includes: dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

In the above interface control testing method based on deep learning, preferably, dividing the image data in the text recognition public data set into the training set image data and the verification set image data according to a preset proportion further includes: normalizing the image data in the text recognition public data set into standard image data with preset size; and after the standard image data are converted into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion.

In the above method for testing an interface control based on deep learning, preferably, executing a corresponding instruction operation on a target control corresponding to the target control position according to the input information includes: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the target control position.

The invention also provides an interface control testing system based on deep learning, which comprises a setting module, a text position detection module, a text information extraction module, a matching module and a testing module; the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in the testing process; the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information; the text information extraction module is used for carrying out screenshot on a corresponding area in the screen interface data according to the text position information to obtain text area screenshot data, and carrying the text area screenshot data into a text recognition model obtained through text recognition public dataset training to obtain text information; the matching module is used for matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text; and the test module is used for executing corresponding instruction operation on the target control corresponding to the target control position according to the input information.

In the above deep learning based interface control test system, preferably, the setting module further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: and the relative position and the shortest distance between the target control and the target text.

In the above deep learning-based interface control test system, preferably, the text position detection module further includes a text detection model building unit, where the text detection model building unit is configured to: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-classification prediction of preset pixels and connected text two-classification prediction of preset pixel adjacent directions on the picture characteristics, obtaining a connected domain set according to the text two-classification prediction of the preset pixels and the connected text two-classification prediction, and obtaining text block instance segmentation data according to the connected domain set; extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data.

In the above deep learning-based interface control test system, preferably, the text detection model building unit further includes: and converting the training set image data and the verification set image data into a tfreeord file format, and then constructing a text detection model.

In the above deep learning-based interface control test system, preferably, the text information extraction module further includes a text recognition model building unit, where the text recognition model building unit is configured to: dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

In the above deep learning-based interface control test system, preferably, the text recognition model building unit further includes: normalizing the image data in the text recognition public data set into standard image data with preset size; and converting the standard image data into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and constructing a text recognition model.

In the above deep learning based interface control test system, preferably, the test module includes: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the target control position.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method when executing the computer program.

The present invention also provides a computer readable storage medium storing a computer program for executing the above method.

The beneficial technical effects of the invention are as follows: the staff can test the interface control only by filling in the test case data without writing test scripts; when the software interface design changes, if the relative position of the tested target control and the target text does not change and the sequence and logic of the tested control are not changed, a tester can directly use the last used case data to perform the interface test after the change; the text detection and text recognition algorithm through deep learning improves the accuracy of searching the text position of the interface target and reduces the searching time, so that the interface target control can be accurately and rapidly found out and is not influenced by a low-resolution display; further saving the manpower and time cost of the test and improving the quality of the product.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application. In the drawings:

FIG. 1A is a flow chart of a method for testing an interface control based on deep learning according to an embodiment of the present application;

FIG. 1B is a schematic diagram of an application flow of a deep learning-based interface control testing method according to an embodiment of the present application;

FIG. 2A is a schematic diagram of a text detection model according to an embodiment of the present application;

FIG. 2B is a schematic diagram illustrating a construction principle of a text detection model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a text region screenshot data acquisition process according to an embodiment of the present application;

FIG. 4A is a schematic diagram of a text recognition model according to an embodiment of the present application;

FIG. 4B is a schematic diagram illustrating a text recognition model according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of text information recognition by a text recognition model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a deep learning-based interface control testing system according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

Additionally, the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that herein.

Referring to fig. 1A, the method for testing an interface control based on deep learning provided by the present invention specifically includes:

s101, acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in a testing process;

s102, intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information;

S103, capturing a corresponding area in the screen interface data according to the text position information to obtain text area capturing data, and taking the text area capturing data into a text recognition model obtained through text recognition public data set training to obtain text information;

s104, matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text; and executing corresponding instruction operation on the target control corresponding to the target control position according to the input information.

In the above embodiment, obtaining the interface serial number, the target text, the target control position and the input information of the to-be-tested control in the testing process further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases. Wherein the target control position comprises: and the relative position and the shortest distance between the target control and the target text. In actual work, the case template data in this embodiment may be in an Excel form, where the first column is an interface number, the interface number is an arabic number, the interface number of the first row of the case data corresponds to the test is 1, the sequence number of the subsequent row needs to be filled in according to whether the interface changes after the operation of the previous row of target controls, if the interface does not change after the operation of the previous row of target controls, the sequence number in the line data is the same as the sequence number of the previous row, and if the interface changes, the sequence number of the line is the sequence number of the previous row plus 1. The second column is named as target text, and the filled content of the target text column is the target text to be identified in the test interface. The third column is the relative position of the target control and the target text, the content filled in the relative position of the target control and the target text is in which direction of the target text, namely up, down, left and right, the position of the control to be tested is in, the target text is shown on the target control, in order to avoid the random filling of directions by a tester, the column is designed to be selected by the tester instead of manual input by a user, and thus the subsequent position finding of the control according to the designated directions is facilitated. The fourth column is named as distance, the content of the distance is that the nearest distance of the target control to the target text is a multiple of the font width of the target text. The fifth column name is a target control name, and the content of the target control name is also a control name (commonly used control names such as an input box, a click, a drop-down box and the like) designed to be selected by a tester in Excel. The sixth column name is input text, if the target control name of the fifth column is input box, the text content to be input is filled in the column of the input text; specific reference is made to table 1 below.

TABLE 1

According to the Excel test cases, the data in the table are read piece by piece: the data of each line comprises a serial number, a target text, a relative position of a target control and the target text, a control name and an input text, and the data of each line is saved by a Map set. And then, the intercepting operation in the step S102 is executed, specifically, the corresponding screen interface data is intercepted by a GUI automatic screen capturing method according to the interface serial number, so as to prepare for the prediction of the text recognition model to be sent subsequently.

In an embodiment of the invention, in order to accurately detect text position information, a text detection network model is built based on the combination of region suggestion and semantic segmentation, screening processing of text boxes by using text segmentation map detection results is added on the basis of a method using region suggestion, and a multi-model integration method is adopted to combine the results of region suggestion prediction and semantic segmentation prediction to obtain a text detection result with higher accuracy. Referring specifically to fig. 2A, the text detection model building process includes:

s201, dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion;

S202, extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm;

s203, performing text two-class prediction of preset pixels and text two-class prediction of connection in the adjacent direction of the preset pixels on the picture features, obtaining a connected domain set according to the text two-class prediction of the preset pixels and the text two-class prediction of connection, and obtaining text block instance segmentation data according to the connected domain set;

s204, extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box;

s205, constructing a text detection model according to the training set image data, the text boundary box and the verification set image data.

The method for dividing the image data in the scene text detection public data set into training set image data and verification set image data according to the preset proportion further comprises the following steps: converting the training set image data and the verification set image data into a tfrecord file format; in practical implementation, details of the construction of the text detection model may be as follows with reference to fig. 2B:

(1) Text detection network model input image data acquisition:

The scene text detection public data set IC15 is divided into a training set and a verification set according to proportion randomly for training and verification of a subsequent text detection network model in the training process. The pictures in the data set are converted into the tfreeord file format, and the tfreeord file format can be rapidly loaded into the memory, so that training time of a text detection network model is saved.

(2) CNN extracts picture features:

feature extraction is performed on the basis of VGG16, with the input layer being the data preprocessed from the image dataset described above, followed by a combination of 5 convolutional layers plus a pooling layer and 2 1x1 convolutional layers. And carrying out 1x1 convolution operation on the feature graphs extracted from different layers, carrying out feature fusion on the feature graphs extracted each time to extract more features, and carrying out 4 feature fusion operations on the feature graphs output by 2 1x1 convolution layers, wherein if a pooling layer exists between the feature graphs and the pooling kernel size is 2 x 2, 2 x 2 up-sampling operation is needed after the 1x1 convolution layer which is carried out later, and the size of the feature graphs is restored to be the same as the size of the feature graphs to be fused.

(3) Text two-class prediction for pixels and connected text two-class prediction:

After all feature extraction operations are finished, text two-class prediction of a certain pixel is performed, and whether connected text two-class prediction exists in 8 neighborhood directions (up, down, left, right, left up, left down, right up and right down) of the pixel.

(4) Obtaining an instance segmentation result:

and obtaining a connected domain set through the text two-class prediction of the pixels and the connected text two-class prediction, wherein each element in the set represents a text instance, and each connected domain represents a detected text instance, so that a text block instance segmentation result is obtained.

(5) Acquiring a text bounding box:

the minAreRect (minimum bounding rectangle) of OpenCV is used to extract bounding rectangle boxes with direction information of texts with different sizes, the specific format is ((x, y), (w, h), θ), (x, y) represents center point coordinates, (w, h) represents width and height of the current rectangle box, and θ represents rotation angle. Then, noise filtering operation is carried out, and the final text boundary box position is obtained through 'merging set' (disjoint-set data structure). The above process omits the frame regression step with other text detection network models, so that the model has faster convergence speed during training. And training the text detection network by using an SGD (random gradient descent method) optimizer to obtain a trained text detection model.

On the basis of the text detection model, the intercepted screen interface can be used as input to be sent to the text detection model for predicting the position information, and screenshot of all text areas of the screen interface is carried out according to the position information; specifically, in actual work, a screen capturing can be performed on the whole screen interface, the whole picture is used as input of a trained text detection model, position information (four vertex coordinates of the text region) of all text regions in the screen is obtained according to the text regions detected by the text detection model, then the text regions are cut into pictures according to the coordinate positions corresponding to each text region, and therefore the pictures of all the text regions in the screen are directly obtained at one time. A flowchart of the picture of all text areas in the screen is obtained using the text detection model, see fig. 3.

In order to improve the accuracy of text recognition, the text recognition model provided by the invention comprises a CNN convolution module, an RNN circulation network module and a CTC transcription module. The CNN convolution module learns to obtain image convolution characteristics, the RNN circulation network module further extracts sequence characteristics in the image convolution characteristics, and CTC transcribes and merges text sequences to obtain a text result in the picture. The text recognition model can obtain the contextual characteristics of the text sequence, so that the recognition performance of the method is superior to that of a method based on a naive convolutional neural network. The text recognition accuracy is effectively improved, and the model is more robust. Specifically, in one embodiment of the present invention, referring to fig. 4A, the text recognition model building process includes:

S401, dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion;

s402, extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm;

s403, analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability;

s404, transcribing the text character sequence probability through a CTC algorithm to obtain text data;

s405 builds a text recognition model according to the text data, the training set image data and the verification set image data.

The method for dividing the image data in the text recognition public data set into training set image data and verification set image data according to a preset proportion further comprises the following steps: normalizing the image data in the text recognition public data set into standard image data with preset size; and after the standard image data are converted into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion. In practical implementation, details of the construction of the text recognition model may be as follows with reference to fig. 4B:

(1) Text recognition network model input image data acquisition:

the training speed of the model can be improved by normalizing the size of the picture of the text recognition public data set to 32-256 pixels, namely, the height of the picture is 32 pixels, the width of the picture is 256 pixels, and taking the RGB three channels of the picture as the gray scale image for reading. And then processing the tag matrix and converting the tag matrix into a data format supported by tensorsurface. The training set and the verification set are randomly divided into a training set and a verification set according to a proportion, and are used for training and verification of a subsequent text recognition network model in the training process.

(2) CNN convolution learning obtains image convolution characteristics:

the CNN convolution module comprises 7 layers of convolution layers and 4 layers of small CNN networks with pooling layers, and batch regularization is added twice in the middle, so that model gradient dispersion is avoided, model convergence is accelerated, and training process is shortened. And obtaining the image convolution characteristic through the learning of CNN.

(3) The RNN torus network predicts text character sequence probabilities:

and taking the image convolution characteristic learned by CNN as input to be sent into an RNN circulation network, wherein the RNN circulation network adopts a bidirectional dependency relationship of a multi-layer LSTM structure learning characteristic sequence, extracts a characteristic vector sequence from the generated image characteristic, and predicts to obtain text character sequence probability.

(4) CTC transcript text:

usually, the predicted result of the text character sequence cannot be aligned with the true text, and the predicted character probability sequence is transcribed into the text by using a CTC algorithm. And training the text recognition network by using an SGD optimizer to obtain a trained text recognition model.

As will be appreciated by those skilled in the art, each of the above values is an exemplary data selected and set by a worker according to actual needs, and is not limited to a specific application manner; those skilled in the art may choose to adjust the settings according to actual needs, and the present invention is not limited thereto.

In an embodiment of the present invention, step S103 brings the text region screenshot data into a text recognition model obtained through training of a text recognition public dataset, and may also preprocess the text region screenshot, so that the text recognition model can more accurately complete text recognition; in actual work, since the size of the input picture of the text recognition model is 32×256, when predicting the picture intercepted by the text detection model, the picture intercepted by the text detection model needs to be scaled and filled. If the pixel width is greater than 256 for a picture, the picture is scaled to 32 x 256. If the picture pixel width is less than 256, the picture is scaled equally in the case of height adjustment to 32, and if the width pixel portion of less than 256 is filled with 0 entirely. And sending the preprocessed text region picture of the screen into a trained text recognition model for prediction to obtain text information contained in the picture. Predicting text information in a screenshot of a text region using a text recognition model is illustrated in fig. 5. As will be appreciated by those skilled in the art, each of the above values is an exemplary data selected and set by a worker according to actual needs, and is not limited to a specific application manner; those skilled in the art may choose to adjust the settings according to actual needs, and the present invention is not limited thereto.

Therefore, after the text information is obtained according to the text recognition model recognition, a matching link can be entered, namely the text information is matched with the target text; in actual work, the predicted text of each text region picture can be matched with the target text in the case data, and if the predicted text information is completely equal to the target text, the coordinate position of the text region picture is recorded and corresponds to each target text one by one. Therefore, the position of the target control is further determined according to the coordinate of the text information which is successfully matched and the relative position between the target controls corresponding to the target text; specifically, in actual work, the multiple between the distance of the target control corresponding to the target text and the height of the target text word is denoted as n, the position coordinates of the upper left, lower left, upper right and lower right of the successfully matched target text are respectively denoted as (x, y), (x, s), (v, y), (v, s), the height is denoted as h=y-s, and the width is denoted as w=v-x. And then, acquiring the relative position of the target control and the target text in the data read in the step 2. When the relative position content is the same, the position coordinate of the target control is directly recorded as the central position coordinate of the target text, namely ((x+v)/2, (y+s)/2); when the relative position content is left, the position coordinate record of the target control is (x-h x n, (y+s)/2); when the relative position content is right, the position coordinate record of the target control is (v+h+n, (y+s)/2); when the relative position content is the upper position, the position coordinate of the target control is recorded as ((x+v)/2, y-h x n); when the relative position content is the lower, the position coordinates of the target control are recorded as ((x+v)/2, s+h×n).

After the specific position of the target control is obtained, corresponding operation on the target control can be executed, and in one embodiment of the invention, the GUI automation technology can be called according to the input information to execute corresponding instruction operation on the target control corresponding to the position of the target control.

In summary, the interface control testing method based on deep learning provided by the invention can be subdivided into the following 10 steps in the overall technical flow with reference to fig. 1B:

step 1: excel case template setup.

Step 2: and reading the data in the form one by one according to the case data filled in by the tester.

Step 3: and intercepting a screen interface to be tested.

Step 4: and constructing and training a text detection network to obtain a trained text detection model.

Step 5: and sending the intercepted screen interface to a text detection model as input to predict the position information, and capturing all text areas of the screen interface according to the position information.

Step 6: and building and training a text recognition network to obtain a trained text recognition model.

Step 7: and taking the screenshot of the text area as input to a text recognition model for prediction.

Step 8: and (3) matching the text information predicted according to the text recognition model with the target text contained in the data read in the step (2).

Step 9: and (3) determining the position of the target control according to the coordinate of the successfully matched target text and the relative position of the target control and the target text in the data read in the step (2).

Step 10: and (3) testing the instruction action corresponding to the target control according to the position of the target control determined in the step (9).

Those skilled in the art may combine or replace the above steps appropriately according to actual needs, and the present invention is not limited thereto.

Referring to fig. 6, the invention further provides an interface control testing system based on deep learning, which comprises a setting module, a text position detection module, a text information extraction module, a matching module and a testing module; the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in the testing process; the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information; the text information extraction module is used for carrying out screenshot on a corresponding area in the screen interface data according to the text position information to obtain text area screenshot data, and carrying the text area screenshot data into a text recognition model obtained through text recognition public dataset training to obtain text information; the matching module is used for matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text; and the test module is used for executing corresponding instruction operation on the target control corresponding to the target control position according to the input information.

In the above embodiment, the setting module further includes: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: and the relative position and the shortest distance between the target control and the target text. The test module comprises: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the target control position.

In an embodiment of the present invention, the text position detection module further includes a text detection model construction unit, where the text detection model construction unit is configured to: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-classification prediction of preset pixels and connected text two-classification prediction of preset pixel adjacent directions on the picture characteristics, obtaining a connected domain set according to the text two-classification prediction of the preset pixels and the connected text two-classification prediction, and obtaining text block instance segmentation data according to the connected domain set; extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box; and constructing a text detection model according to the training set image data, the text bounding box and the verification set image data. Wherein the text detection model construction unit may further include: and converting the training set image data and the verification set image data into a tfreeord file format, and then constructing a text detection model.

In an embodiment of the present invention, the text information extraction module further includes a text recognition model construction unit, where the text recognition model construction unit is configured to: dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data. Wherein the text recognition model construction unit may further include: normalizing the image data in the text recognition public data set into standard image data with preset size; and converting the standard image data into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and constructing a text recognition model.

As shown in fig. 7, the electronic device 600 may further include: a communication module 110, an input unit 120, an audio processing unit 130, a display 160, a power supply 170. It is noted that the electronic device 600 need not include all of the components shown in fig. 7; in addition, the electronic device 600 may further include components not shown in fig. 7, to which reference is made to the related art.

As shown in fig. 7, the central processor 100, sometimes also referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 100 receives inputs and controls the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 100 can execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides an input to the central processor 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, or the like. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. Memory 140 may also be some other type of device. Memory 140 includes a buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage 142, the application/function storage 142 for storing application programs and function programs or a flow for executing operations of the electronic device 600 by the central processor 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. A communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement usual telecommunication functions. The audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 100 so that sound can be recorded locally through the microphone 132 and so that sound stored locally can be played through the speaker 131.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An interface control testing method based on deep learning, which is characterized by comprising the following steps:

acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in a testing process;

intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training a scene text detection public data set to obtain text position information;

Capturing a corresponding region in the screen interface data according to the text position information to obtain text region capturing data, and taking the text region capturing data into a text recognition model obtained through text recognition public dataset training to obtain text information;

matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text;

executing corresponding instruction operation on the target control corresponding to the target control position according to the input information;

the text detection model construction flow comprises the following steps: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-classification prediction of preset pixels and connected text two-classification prediction of preset pixel adjacent directions on the picture characteristics, obtaining a connected domain set according to the text two-classification prediction of the preset pixels and the connected text two-classification prediction, and obtaining text block instance segmentation data according to the connected domain set; extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box; constructing a text detection model according to the training set image data, the text bounding box and the verification set image data;

The text recognition model construction flow comprises the following steps: dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

2. The method for testing the interface control based on deep learning according to claim 1, wherein obtaining the interface serial number, the target text, the target control position and the input information of the control to be tested in the testing process further comprises: generating a test case according to the interface serial number, the target text, the target control position and the input information; case template data is generated from one or more test cases.

3. The deep learning based interface control testing method of claim 2, wherein the target control location comprises: and the relative position and the shortest distance between the target control and the target text.

4. The method for testing the interface control based on deep learning according to claim 1, wherein intercepting the corresponding screen interface data according to the interface serial number comprises: and intercepting corresponding screen interface data through a GUI automatic screen capturing method according to the interface serial number.

5. The method for testing the interface control based on the deep learning according to claim 1, wherein dividing the image data in the scene text detection common data set into the training set image data and the verification set image data according to a preset proportion further comprises: and converting the training set image data and the verification set image data into a tfreeord file format.

6. The method for testing the deep learning-based interface control according to claim 1, wherein dividing the image data in the text recognition common data set into the training set image data and the verification set image data according to a preset ratio further comprises:

normalizing the image data in the text recognition public data set into standard image data with preset size;

and after the standard image data are converted into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion.

7. The deep learning-based interface control testing method according to claim 1, wherein executing the corresponding instruction operation on the target control corresponding to the target control position according to the input information includes:

and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the target control position.

8. The interface control testing system based on deep learning is characterized by comprising a setting module, a text position detection module, a text information extraction module, a matching module and a testing module;

the setting module is used for acquiring an interface serial number, a target text, a target control position and input information of a control to be tested in the testing process;

the text position detection module is used for intercepting corresponding screen interface data according to the interface serial number, and bringing the screen interface data into a text detection model obtained through training of a scene text detection public data set to obtain text position information;

the text information extraction module is used for carrying out screenshot on a corresponding area in the screen interface data according to the text position information to obtain text area screenshot data, and carrying the text area screenshot data into a text recognition model obtained through text recognition public dataset training to obtain text information;

The matching module is used for matching the text information with the target text, and obtaining the corresponding target control position and input information according to the matched target text;

the test module is used for executing corresponding instruction operation on the target control corresponding to the target control position according to the input information;

the text position detection module further comprises a text detection model construction unit for: dividing image data in a scene text detection public data set into training set image data and verification set image data according to a preset proportion; extracting picture features of the training set image data and the verification set image data through a convolutional neural network algorithm; performing text two-classification prediction of preset pixels and connected text two-classification prediction of preset pixel adjacent directions on the picture characteristics, obtaining a connected domain set according to the text two-classification prediction of the preset pixels and the connected text two-classification prediction, and obtaining text block instance segmentation data according to the connected domain set; extracting an external rectangular frame with direction information through OpenCV according to the text block instance segmentation data to obtain a text bounding box; constructing a text detection model according to the training set image data, the text bounding box and the verification set image data;

The text information extraction module further comprises a text recognition model construction unit for: dividing image data in a text recognition public data set into training set image data and verification set image data according to a preset proportion; extracting image convolution characteristics of the training set image data and the verification set image data through a convolution neural network algorithm; analyzing the characteristic vector sequence of the image convolution characteristic through a cyclic neural network algorithm to obtain text character sequence probability; transcribing the text character sequence probability through a CTC algorithm to obtain text data; and constructing a text recognition model according to the text data, the training set image data and the verification set image data.

9. The deep learning based interface control test system of claim 8, wherein the setup module further comprises: generating a test case according to the interface serial number, the target text, the target control position and the input information; generating case template data according to one or more test cases; wherein the target control position comprises: and the relative position and the shortest distance between the target control and the target text.

10. The deep learning based interface control test system of claim 8, wherein the text detection model building unit further comprises: and converting the training set image data and the verification set image data into a tfreeord file format, and then constructing a text detection model.

11. The deep learning based interface control test system of claim 8, wherein the text recognition model building unit further comprises:

and converting the standard image data into a tfreeord file format, dividing the standard image data into training set image data and verification set image data according to a preset proportion, and constructing a text recognition model.

12. The deep learning based interface control test system of claim 8, wherein the test module comprises: and calling a GUI automation technology according to the input information to execute corresponding instruction operation on the target control corresponding to the target control position.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.

14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for implementing the method of any one of claims 1 to 7, which is executed by a computer.