CN111124902A

CN111124902A - Object operating method and device, computer-readable storage medium and electronic device

Info

Publication number: CN111124902A
Application number: CN201911288866.6A
Authority: CN
Inventors: 黄超; 荆彦青
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-08

Abstract

The invention discloses an object operation method and device, a computer-readable storage medium and an electronic device. Wherein, the method comprises the following steps: the method comprises the steps of training an initial neural network in advance according to a plurality of training images to obtain a target network model, determining a target training image in the training images through the target network model, and completing a testing technical effect of executing the target position in the target testing image by executing the target operation on the target position of the target training image due to the fact that the target training image corresponds to the target position in the target testing image and the executed target operation on the target position.

Description

Object operating method and device, computer-readable storage medium and electronic device

Technical Field

The present invention relates to the field of computers, and in particular, to an object operating method and apparatus, a computer-readable storage medium, and an electronic apparatus.

Background

In the existing technical scheme, a User Interface (User Interface UI) UI test method based on image template matching is provided. It calculates the distance between the template and the specific area of the image, and if the distance is less than a threshold, it executes a preset action. The UI test method based on image template matching comprises the following steps: manually recording a sample set of UI operation, selecting a template matching area for each scene, and manually designating a click position after matching. After a test image is input, the distance between the specific area of the image and the template is calculated, and if the distance is smaller than a threshold value, the corresponding position is clicked. The method is sensitive to background change, requires a configurator to have good priori knowledge on the template matching, is difficult to master, and consumes a lot of time and energy.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides an object operation method and device, a computer-readable storage medium and an electronic device, and at least solves the technical problem that in the prior art, UI test of a user interface consumes a long time.

According to an aspect of an embodiment of the present invention, there is provided an object operation method, including: acquiring a target binary set, wherein each binary in the target binary set comprises a same target test image and a different training image in a first image set, the first image set comprises a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position; inputting each binary group in the target binary group set into a target neural network model to obtain a set of image similarities output by the target neural network model, wherein the set of image similarities corresponds to the target binary group set one by one, each image similarity is used for representing the similarity between the target test image and the training image in the corresponding binary group, the target neural network model is a network model obtained by training an initial neural network model by using the plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be larger than a first threshold value and determining the similarity between the training image and a negative sample image to be smaller than a second threshold value, and the positive sample image and the training image are images in the same scene, the negative sample image and the training image are images in different scenes, and the first threshold value is larger than the second threshold value; and under the condition that a target training image is determined from the plurality of training images, executing a target operation on a position corresponding to a target position in the target test image, wherein the similarity between the target test image and the target training image is the maximum value in the image similarity group, and the target training image corresponds to the target position in the target training image and the target operation executed on the target position.

According to another aspect of the embodiments of the present invention, there is also provided an object operating apparatus, including: a first obtaining unit, configured to obtain a target binary set, where each binary in the target binary set includes a same target test image and a different training image in a first image set, and the first image set includes a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position; a first obtaining unit, configured to input each binary group in the target binary group set into a target neural network model, and obtain a set of image similarities output by the target neural network model, where the set of image similarities corresponds to the target binary group set in a one-to-one manner, each image similarity is used to represent a similarity between the target test image and the training image in the corresponding binary group, the target neural network model is a network model obtained by training an initial neural network model using the plurality of training images, the target neural network model is used to determine a similarity between the training image and a positive sample image to be greater than a first threshold, and determine a similarity between the training image and a negative sample image to be less than a second threshold, and the positive sample image and the training image are images in the same scene, the negative sample image and the training image are images in different scenes, and the first threshold value is larger than the second threshold value; an execution unit, configured to, when a target training image is determined from the plurality of training images, execute a target operation at a position in the target test image corresponding to a target position, where a similarity between the target test image and the target training image is a maximum value in the set of image similarities, and the target training image corresponds to the target position in the target training image and the target operation executed at the target position.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above object operating method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the object operation method through the computer program.

In the embodiment of the present invention, a target binary set is obtained, where each binary in the target binary set includes a same target test image and a different training image in a first image set, the first image set includes a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position; inputting each binary group in the target binary group set into the target neural network model to obtain a group of image similarity output by the target neural network model, wherein, a group of image similarity corresponds to a target binary set one by one, each image similarity is used for representing the similarity between a target test image and a training image in the corresponding binary set, a target neural network model is a network model obtained by training an initial neural network model by using a plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be more than a first threshold value, determining the similarity between the training image and the negative sample image to be smaller than a second threshold value, wherein the positive sample image and the training image are images under the same scene, the negative sample image and the training image are images under different scenes, and the first threshold value is larger than the second threshold value; under the condition that a target training image is determined from a plurality of training images, target operation is executed on a position corresponding to a target position in a target testing image, wherein the similarity between the target testing image and the target training image is the maximum value of the similarity of a group of images, the target training image corresponds to the target position in the target training image and the target operation executed on the target position, a target network model is obtained by training an initial neural network according to a plurality of training images in advance, the target training image is determined from the plurality of training images through the target network model, and due to the fact that the target training image corresponds to the target position in the target testing image and the target operation executed on the target position, the technical effect of executing the target operation on the target position in the target training image and completing the test technical effect of executing the operation on the target position in the target testing image is achieved, that is to say, the target operation method based on the target neural network can reproduce the previous operation by manually recording a small number of samples, and consumes less manpower and time, thereby solving the technical problem that the time consumed for the UI test of the user interface is longer in the prior art.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment for an alternative method of object manipulation according to an embodiment of the present invention;

FIG. 2 is a flow diagram of an alternative method of object manipulation according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of an alternative target test image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an alternative training pattern and negative and positive sample images according to an embodiment of the present invention;

FIG. 5 is a flow diagram of an alternative UI action execution according to an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative network architecture according to embodiments of the present invention;

FIG. 7 is a UI image of the same scene at different selectable times according to embodiments of the invention;

FIG. 8 is a schematic diagram of an alternative object manipulating device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an alternative object operation method in an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For a better understanding of the embodiments of the present invention, some terms are explained as follows:

AI: the acronym for artifiail Intelligence, chinese, translates to Artificial Intelligence.

And (3) executing actions: and predicting the action of the test sample according to the marked training sample.

CNN: the abbreviation of the conditional Neural Network, Convolutional Neural Network.

Triplet: a set of positive sample pairs and a set of negative sample pairs.

Ternary loss: the goal is to make the positive pairs of samples similar and the negative pairs far apart for the loss of triplet design.

Similar networks: a deep network for computing similarity between images.

According to an aspect of the embodiments of the present invention, an object operating method is provided, and optionally, as an optional implementation manner, the object operating method may be applied, but not limited to, in an object operating system in a hardware environment as shown in fig. 1, where the object operating system may include, but is not limited to, a terminal device 102, a network 110, and a server 112. A target client is run in the terminal device 102, and is used for displaying a user interface UI of an application in the target client.

The terminal device 102 may include, but is not limited to: a human-computer interaction screen 104, a processor 106 and a memory 108. The human-computer interaction screen 104 is used for acquiring a human-computer interaction instruction through a human-computer interaction interface and presenting a user interface in an application; the processor 106 is configured to complete the testing of the user interface of the target application in response to the human-computer interaction instruction. The memory 108 is used for storing attribute information of a user interface of a target application, image information of each binary of a target binary set, and target neural network model information. Here, the server may include but is not limited to: the database 114 and the processing engine 116, the processing engine 116 is configured to invoke the target neural network model stored in the database 114, input each binary group in the target binary group set into the target neural network model, obtain a group of image similarities output by the target neural network model, and execute a target operation on a position corresponding to a target position in the target test image when the target training image is determined from the plurality of training images. The target operation method based on the target neural network can reproduce the previous operation of a small number of samples through manual recording, consumes less manpower and time, and further solves the technical problem that in the prior art, the UI test of a user interface consumes longer time.

The specific process comprises the following steps: a human-machine interaction screen 104 in the terminal device 102 displays a user interface of a target application (shown in fig. 1 as a shooting game, a target virtual character is sniping a target object in a far distance). In steps S102-S112, a user interface (which may also be a picture of a game at any time, i.e., a target test image) in the target application is obtained, and the user interface is sent to the server 112 via the network 110. Obtaining, at the server 112, a target binary set, wherein each binary in the target binary set includes a same target test image and a different training image in a first image set, the first image set includes a plurality of training images, each training image corresponds to a location in each training image and an operation performed on the location; inputting each binary group in the target binary group set into the target neural network model to obtain a group of image similarity output by the target neural network model, wherein, a group of image similarity corresponds to a target binary set one by one, each image similarity is used for representing the similarity between a target test image and a training image in the corresponding binary set, a target neural network model is a network model obtained by training an initial neural network model by using a plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be more than a first threshold value, determining the similarity between the training image and the negative sample image to be smaller than a second threshold value, wherein the positive sample image and the training image are images under the same scene, the negative sample image and the training image are images under different scenes, and the first threshold value is larger than the second threshold value; under the condition that a target training image is determined from a plurality of training images, target operation is executed on a position, corresponding to a target position, in a target testing image, wherein the similarity between the target testing image and the target training image is the maximum value of the similarity of a group of images, the target training image corresponds to the target position in the target training image and the target operation executed on the target position, a target network model is obtained by training an initial neural network in advance according to a plurality of training images, the target training image is determined from the training images through the target network model, and the target training image corresponds to the target position in the target testing image and the target operation executed on the target position. And then returns the determined result to the terminal device 102.

Then, the terminal device 102 acquires the target test image as by steps S114 to S116. The target test image is input into a target neural network, a target training image corresponding to the target test image is determined, the operation of the target position is executed on the target position in the target training image, and then the test of the target test image is completed, so that the problem that a large amount of time and energy are consumed due to the fact that a proper image template is manually selected and a matching area in the image is set is avoided, the technical effect of testing the operation on the target position in the target test image is completed by executing the target operation on the target position of the target training image, namely, the target operation method based on the target neural network can manually record the operation before a small amount of samples reappear, the consumed manpower and time are less, and the technical problem that in the prior art, the UI test of a user interface is long in time is solved.

Optionally, as an optional implementation manner, as shown in fig. 2, the object operation method includes:

step S202, a target binary set is obtained, where each binary in the target binary set includes a same target test image and a different training image in a first image set, the first image set includes a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position.

Step S204, each binary group in the target binary group set is inputted into the target neural network model to obtain a group of image similarity outputted by the target neural network model, wherein, a group of image similarity corresponds to a target binary set one by one, each image similarity is used for representing the similarity between a target test image and a training image in the corresponding binary set, a target neural network model is a network model obtained by training an initial neural network model by using a plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be more than a first threshold value, and determining the similarity between the training image and the negative sample image to be smaller than a second threshold, wherein the positive sample image and the training image are images under the same scene, the negative sample image and the training image are images under different scenes, and the first threshold is larger than the second threshold.

Step S206, when the target training image is determined from the plurality of training images, performing a target operation on a position corresponding to the target position in the target test image, where the similarity between the target test image and the target training image is the maximum value of the similarity of a group of images, and the target training image corresponds to the target position in the target test image and the target operation performed on the target position.

Optionally, in this embodiment, each binary in the target binary combination includes the same target test image and a plurality of different training images. For example, a doublet may be represented by (1, 0), where 1 represents the target test image and 0 may represent the training image, and multiple doublets will be included in the doublet set. For example, (1, 0), (1, 2) (1, 3) (1, 4) (1, 4), where 0, 2, 3, 4 represents different training images and is the training image in the first set of images.

Optionally, in this embodiment, each binary in the target binary set is input into the target neural network model, so as to obtain a set of image similarities output by the target neural network model, for example, the binary set includes 5 sets of binary, that is { (1, 0), (1, 2) (1, 3) (1, 4) (1, 4) }, so as to obtain 5 similarities output by the target neural network model, and the target training image with the largest similarity is used as the target training image of the target test image.

The target test image may be a user interface UI in a target application, which may include, but is not limited to, a game-like application, an education-like application, a travel-like application, and the like. The target test image may also be a web page. The target test image has a target position where an operation can be performed, that is, a touch position where a user UI test can be performed. Taking a game as an example, the game interface includes a touch area, a touch area where a shortcut key can be used, and the like. As shown in fig. 3, a schematic view of a target test image.

That is, the test of the user interface can be performed by the present embodiment.

It should be noted that, in this embodiment, the positive sample image and the training image are images in the same scene, which may be understood as that the types of UI objects browsed by the user existing in the positive sample image and the training image are the same, and the positions of the browsing objects are the same, where browsing may include, but is not limited to, browsing between windows, browsing between fields, and using access methods (Tab key, mouse movement, and shortcut key); objects and features of the window (menu, size, position, status and center).

The negative sample image and the training image are images in different scenes, and it can be understood that there is a difference between the negative sample image and the training image in the types and positions of the UI objects browsed by the user, and the difference is greater than a predetermined threshold. As shown in fig. 4, a schematic diagram of a training pattern and negative and positive sample images.

Optionally, the location in and the operation performed on the location in each training image is a UI test for the training image.

Optionally, in this embodiment, inputting each binary in the target binary set into the target neural network model to obtain a set of image similarities output by the target neural network model, where the obtaining may include:

performing the following operations on each binary group in the target binary group set, wherein each binary group is regarded as a current binary group when the following operations are performed, the current binary group includes a target test image and a current training image, and the image similarity corresponding to the current binary group is the current image similarity:

s1, inputting the current binary into the target neural network model;

s2, respectively acquiring a first image feature in a target test image and a second image feature in a current training image through a target neural network model;

and S3, determining the distance between the first image feature and the second image feature through the target neural network model, wherein the current image similarity is the similarity represented by using the distance.

It should be noted that, the distance between the first image feature and the second image feature is determined by the target neural network model, and the distance is used to represent the similarity between the target test image and the current training image. For example, the similarity may be calculated by a cosine distance.

Optionally, in this embodiment, before obtaining the target binary set, the method further includes: acquiring a triple set, wherein each triple in the triple set comprises a different training image in the first image set, a positive sample image corresponding to the different training image, and a negative sample image corresponding to the different training image; and training the initial neural network model by using the triple set to obtain a target neural network model. As shown in fig. 4, the training pattern forms a triplet with the negative sample image and the positive sample image.

The training of the initial neural network model by using the triplet set to obtain the target neural network model may include: repeatedly performing the following operations until the obtained target neural network model determines that the similarity between the first number of training images in the first image set and the corresponding positive sample images is greater than a first threshold value, and determines that the similarity between the second number of training images in the first image set and the corresponding negative sample images is less than a second threshold value: acquiring a group of triples from the triple set; sequentially inputting a group of triples to an initial neural network model, wherein the initial neural network model comprises a convolutional layer, a pooling layer and a full-link layer, the convolutional layer, the pooling layer and the full-link layer are used for acquiring image features of training images, image features of positive sample images and image features of negative sample images, the image features of the training images, the image features of the positive sample images and the image features of the negative sample images, the first distance between the image features of the training images and the image features of the positive sample images and the second distance between the image features of the training images and the image features of the negative sample images; and in the case that the number of training images with the first distance larger than the first threshold value in the first image set is smaller than the first number, or the number of training images with the second distance larger than the second threshold value in the first image set is smaller than the second number, adjusting partial parameters in the convolutional layer, the pooling layer and the fully-connected layer, and/or adjusting the connection relation of the convolutional layer, the pooling layer and the fully-connected layer.

Optionally, in this embodiment, in a case that a target training image is determined from the plurality of training images, performing a target operation on a position corresponding to a target position in the target test image includes: extracting the features of the target test image through the neural network model, and calculating the distance between the features of the target test image and the features of the target training image; under the condition that the distance is smaller than a third threshold value, storing a target training image corresponding to the target test image into a pending database; and determining the target test image stored in the database as the target test image corresponding to the target training image at the earliest time.

Optionally, in this embodiment, after determining the target test image stored in the database as the target test image corresponding to the target training image, the method further includes: and under the condition that the distance between the features of the target test image and the features of the target training image is greater than a fourth threshold value, matching the target test image with all the target training images, and re-determining the target training image corresponding to the target test image.

It should be noted that, in order to accurately test the test image, the resizing of the test image to the target training image is required, that is, performing the target operation on the position corresponding to the target position in the target test image includes: in the case that the size of the acquired target test image is different from the size of the target training image; scaling the size of the target test image to the size of the target training image.

According to the embodiment, the initial neural network is trained in advance according to a plurality of training images to obtain the target network model, the target training image is determined in the training images through the target network model, and the target training image corresponds to the target position in the target test image and the executed target operation on the target position, so that the technical effect of testing the target position in the target test image by executing the target operation on the target position of the target training image is achieved.

In the embodiment, the difference between the test image and the training image can be judged through the neural network model only by manually completing the UI flow once, so that the UI test flow is reproduced. The difficulty of the hands is small, and the UI flow test can be completed in a short time.

In combination with the above embodiments, the present invention further provides an alternative embodiment, a UI action execution method based on a similar network.

The UI action execution method based on the similar network mainly solves the problems that when a game UI is automatically tested, the AI can realize the automatic test of the UI according to a few people for testing samples of the UI, can run on a plurality of mobile phones simultaneously, and improves the efficiency of the UI test. The method can be applied to UI automation, such as testing and AI automatic clicking interface icons to reach a specified interface.

As shown in FIG. 5, a flow chart for UI action execution. The specific operation content is as follows:

step 1, recording a user interface UI test sample;

step 2, constructing a triple sample set;

firstly, collecting image frames and corresponding actions from a sample set of a manually recorded UI test, and then constructing a triple sample set, wherein each triple sample is composed of a reference image, a positive sample image and a negative sample image, the positive sample image is an image of the same scene as the reference image, and the negative sample image is an image of a different scene from the reference image.

Step 3, training a similar network (equivalent to training an initial neural network model);

step 4, calculating the distance (similarity) between the test image and the training image;

step 5, selecting training images (determining target training images) transmitted by the action;

step 6, executing the action (target operation) corresponding to the matching training image;

after the triple set is collected, the structure and the triple loss of the similar network are designed, so that the distance between the positive sample pairs is reduced, and the distance between the negative sample pairs is increased. After the similar network is trained, inputting a current test image, extracting abstract features of the image, comparing the abstract features with features of training images, calculating differences between the images, if the distance is smaller than a threshold value, transmitting actions corresponding to the training images to an AI, and executing actions corresponding to the matched training images.

First, samples of UI test procedures are collected, UI scenes and actions of corresponding games are recorded manually, and the UI scenes of the games and the clicked UI positions are recorded at intervals of 0.5 second.

After the training sample set is collected, a plurality of reference images are randomly selected, and for each reference image, the image in the same scene is used as a positive sample of the reference image, the UI image in different scenes is used as a negative sample of the reference image, and the triple is constructed. An example of a triplet is shown in figure 4.

And after the triple set is obtained, constructing a lightweight deep network, and mainly aiming at accelerating the speed of extracting the features so that the deep network can run in real time under a Central Processing Unit (CPU). As shown in fig. 6, a schematic diagram of a network structure, the model inputs 100X100 pixel images, the network employs 7 convolutional layers, and convergence of the network is accelerated using a block normalization layer.

When the model is trained, the ternary loss is defined, and the purpose is to enable the characteristic distances of the positive sample pairs of the triples to be close and the characteristic distances of the negative sample pairs to be far away. Represented by the formula:

wherein N represents the number of the triples, α and β are set hyper-parameters, f represents a network for extracting depth features,

reference image, x, representing the ith triplet_i ^pA positive sample image representing the ith triplet,

a negative sample image representing the ith triplet.

And after the similar network is trained, extracting the characteristics of each training image. In the initial stage of testing, a test image is given and is zoomed to 100X100 pixels, abstract features of the test image are extracted through a similar network, the distance between the features of the test image and the features of the training image is calculated, if the distance is smaller than a threshold value, the corresponding training image is added into an undetermined set, an image which is in front of the time in a UI testing process is selected from the undetermined set to serve as a matched training image D, and corresponding clicking action is executed. After the image D for executing the action is determined, the search range of the training image can be reduced, the test image is only matched with the training sample behind the image D, the training sample with the front time is selected from the set to be determined, and then the corresponding click action is executed. If the distances between the training image and the test image after the D image are both greater than the threshold, it may be that synchronization has occurred, and it is necessary to match the test image with all training samples and find the best matching training image again.

As can be seen from the above, the scheme in this embodiment is mainly divided into three steps:

firstly, collecting image frames and corresponding click positions from a sample set of a human test UI (user interface), and constructing a training sample set.

And then, taking the images in the same scene as a positive sample pair and the images in different scenes as a negative sample pair, and training a similar network. The goal is that the distance between the positive samples is as close as possible and the distance between the negative sample pairs is far apart.

And finally, after the similar network is trained, inputting the test image into the similar network, extracting the depth features of the test image, comparing the depth features with the depth features of the training images, calculating the distance between the features, if the minimum distance is smaller than a threshold value, selecting the training image corresponding to the minimum distance, and executing the action corresponding to the matched training image.

It should be noted that, the UI action execution method based on the similar network transfers actions by comparing the similarity between images, and in fact, control information inside the game may also be extracted, and the actions are transferred based on the UI control, which may achieve the same or better effect.

It should be further noted that, in the automatic test of the UI, it is a challenging problem to implement the test flow of the UI on various mobile phones quickly. Because the UI page is often rendered with animation effect, part of the buttons are semitransparent, the general picture of the tested UI scene is more complex, and as shown in FIG. 7, the illumination of the picture and the posture of the character can be seen to be greatly changed in the UI image under the same scene at different times. The traditional template matching method is sensitive to background interference, needs more priori knowledge of template matching and consumes more time. According to the embodiment, the abstract relation between the UI scene and the click action is learned according to the UI test sample recorded by a small number of people, so that the UI action test is realized.

As shown in fig. 7, training the obtained triple sample through a convolutional layer, a pooling layer, and a fully-connected layer, where the core size of the convolutional layer in the first layer is 3, the stride is 2, and outputting 8;

then normalizing the first batch, outputting a second convolution layer with the parameters of 1 core size and 1 stride and outputting 16;

then, normalizing in a second batch, outputting a third convolution layer with the parameters of 3 core sizes and 2 steps, and outputting 32;

then, normalizing the third batch, outputting a fourth convolution layer with the parameters of 1 core size and 1 stride and outputting 64;

then, normalizing the fourth batch, outputting a fifth convolution layer with the parameters of 3 core sizes and the step length of 2, and outputting 128;

then, normalizing the fifth batch, outputting a sixth convolution layer with the parameters of 1 core size and 1 stride and outputting 128;

then, the sixth batch normalization is carried out, the seventh convolution layer is output, the parameters of the seventh convolution layer, the core size and the stride are 3 and 2, and 128 is output;

and finally, inputting the output result of the convolution layer of the sixth layer into a Flatten function.

In the embodiment, the difference between the test image and the training image is mainly calculated by using a similar network, so that the manually recorded UI test process is reproduced, the UI test process can be simultaneously operated on a plurality of mobile phones, and the UI test efficiency is improved. The implementation method of the UI transmission method based on the similar network can realize the automation of UI test through a small amount of manually recorded UI test samples, and improves the efficiency of various mobile phone UI tests. In addition, the method for calculating the image similarity through the similarity network is robust to background change and can better match a test image with a training image.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided an object operating apparatus for implementing the above object operating method. As shown in fig. 8, the apparatus includes: a first acquisition unit 801, a first derivation unit 803, and an execution unit 805.

A first obtaining unit 801, configured to obtain a target binary set, where each binary in the target binary set includes a same target test image and a different training image in a first image set, and the first image set includes a plurality of training images, each training image corresponding to a position in each training image and an operation performed on the position.

A first obtaining unit 803, configured to input each binary group in the target binary group set into a target neural network model, and obtain a set of image similarities output by the target neural network model, where the set of image similarities corresponds to the target binary group set one by one, each image similarity is used to represent a similarity between a target test image and a training image in the corresponding binary group, the target neural network model is a network model obtained by training an initial neural network model using a plurality of training images, the target neural network model is used to determine that the similarity between the training image and a positive sample image is greater than a first threshold, and determine that the similarity between the training image and a negative sample image is smaller than a second threshold, the positive sample image and the training image are images in the same scene, and the negative sample image and the training image are images in different scenes, the first threshold is greater than the second threshold.

The execution unit 805 is configured to, when a target training image is determined from a plurality of training images, execute a target operation at a position in the target test image corresponding to the target position, where a similarity between the target test image and the target training image is a maximum value among a set of image similarities, and the target training image corresponds to the target position in the target test image and the target operation executed at the target position.

Optionally, the first obtaining unit 803 is further configured to perform the following operations: performing the following operations on each binary group in the target binary group set, wherein each binary group is regarded as a current binary group when the following operations are performed, the current binary group includes a target test image and a current training image, and the image similarity corresponding to the current binary group is the current image similarity: inputting the current binary group into a target neural network model; respectively acquiring a first image feature in a target test image and a second image feature in a current training image through a target neural network model; determining a distance between the first image feature and the second image feature through the target neural network model, wherein the current image similarity is a similarity expressed by using the distance.

As an alternative embodiment, the apparatus may further include:

the second obtaining unit is used for obtaining a triple set before obtaining a target binary set, wherein each triple in the triple set comprises a different training image in the first image set, a positive sample image corresponding to a different training image and a negative sample image corresponding to a different training image;

and the second obtaining unit is used for training the initial neural network model by using the triple set to obtain the target neural network model.

Wherein the second obtaining unit is further configured to perform the following operations: repeatedly performing the following operations until the obtained target neural network model determines that the similarity between the first number of training images in the first image set and the corresponding positive sample images is greater than a first threshold value, and determines that the similarity between the second number of training images in the first image set and the corresponding negative sample images is less than a second threshold value: acquiring a group of triples from the triple set; sequentially inputting a group of triples to an initial neural network model, wherein the initial neural network model comprises a convolutional layer, a pooling layer and a full-link layer, the convolutional layer, the pooling layer and the full-link layer are used for acquiring image features of training images, image features of positive sample images and image features of negative sample images, the image features of the training images, the image features of the positive sample images and the image features of the negative sample images, the first distance between the image features of the training images and the image features of the positive sample images and the second distance between the image features of the training images and the image features of the negative sample images; and in the case that the number of training images with the first distance larger than the first threshold value in the first image set is smaller than the first number, or the number of training images with the second distance larger than the second threshold value in the first image set is smaller than the second number, adjusting partial parameters in the convolutional layer, the pooling layer and the fully-connected layer, and/or adjusting the connection relation of the convolutional layer, the pooling layer and the fully-connected layer.

As an alternative embodiment, the execution unit 805 may include: the calculation module is used for extracting the characteristics of the target test image through the neural network model and calculating the distance between the characteristics of the target test image and the characteristics of the target training image; the storage module is used for storing the target training image corresponding to the target test image into an undetermined database under the condition that the distance is smaller than a third threshold value; and the determining module is used for determining the target test image with the earliest time stored in the database as the target test image corresponding to the target training image.

As an alternative embodiment, the apparatus is further configured to perform the following operations: and after the target test image stored in the database at the earliest time is determined as the target test image corresponding to the target training image, matching the target test image with all the target training images under the condition that the distance between the features of the target test image and the features of the target training image is greater than a fourth threshold value, and re-determining the target training image corresponding to the target test image.

As an alternative embodiment, the execution unit 805 includes: the acquisition module is used for acquiring a target training image of a target under the condition that the size of the acquired target testing image is different from that of the target training image; and the scaling module is used for scaling the size of the target test image to the size of the target training image.

For example, the size of the acquired target test image is enlarged to 100 × 100, or the size of the acquired target test image is reduced to 100 × 100.

According to a further aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the method for operating an object, as shown in fig. 9, the electronic device includes a memory 902 and a processor 904, the memory 902 stores a computer program, and the processor 904 is configured to execute the steps in any one of the method embodiments by the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a target binary set, wherein each binary in the target binary set comprises the same target test image and a different training image in a first image set, the first image set comprises a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position;

s2, inputting each binary group in the target binary group set into the target neural network model to obtain a group of image similarity output by the target neural network model, wherein, a group of image similarity corresponds to a target binary set one by one, each image similarity is used for representing the similarity between a target test image and a training image in the corresponding binary set, a target neural network model is a network model obtained by training an initial neural network model by using a plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be more than a first threshold value, determining the similarity between the training image and the negative sample image to be smaller than a second threshold value, wherein the positive sample image and the training image are images under the same scene, the negative sample image and the training image are images under different scenes, and the first threshold value is larger than the second threshold value;

and S3, when the target training image is determined from the plurality of training images, executing the target operation on the position corresponding to the target position in the target test image, wherein the similarity between the target test image and the target training image is the maximum value in the image similarity, and the target training image corresponds to the target position in the target test image and the target operation executed on the target position.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

The memory 902 may be used to store software programs and modules, such as program instructions/modules corresponding to the object operating method and apparatus in the embodiments of the present invention, and the processor 904 executes various functional applications and data processing by running the software programs and modules stored in the memory 902, that is, implements the object operating method described above. The memory 902 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 902 may further include memory located remotely from the processor 904, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 902 may specifically include, but is not limited to, attribute information for the target test image, attribute information of the target training image, and the like. As an example, as shown in fig. 9, the memory 902 may include, but is not limited to, a first obtaining unit 801, a first obtaining unit 803, and an executing unit 805 in the object operating apparatus. In addition, the present invention may further include, but is not limited to, other module units in the object operating apparatus, which are not described in this example again.

Optionally, the transmitting device 906 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 906 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 906 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 908 for displaying the target test image; and a connection bus 910 for connecting the respective module parts in the above-described electronic apparatus.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An object manipulation method, comprising:

acquiring a target binary set, wherein each binary in the target binary set comprises a same target test image and a different training image in a first image set, the first image set comprises a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position;

inputting each binary group in the target binary group set into a target neural network model to obtain a set of image similarities output by the target neural network model, wherein the set of image similarities corresponds to the target binary group set one by one, each image similarity is used for representing the similarity between the target test image and the training image in the corresponding binary group, the target neural network model is a network model obtained by training an initial neural network model by using the plurality of training images, the target neural network model is used for determining the similarity between the training image and a positive sample image to be larger than a first threshold value and determining the similarity between the training image and a negative sample image to be smaller than a second threshold value, and the positive sample image and the training image are images in the same scene, the negative sample image and the training image are images in different scenes, and the first threshold value is larger than the second threshold value;

and under the condition that a target training image is determined from the plurality of training images, executing a target operation on a position corresponding to a target position in the target test image, wherein the similarity between the target test image and the target training image is the maximum value in the image similarity, and the target training image corresponds to the target position in the target test image and the target operation executed on the target position.

2. The method of claim 1, wherein inputting each bin of the target bin set into a target neural network model results in a set of image similarities output by the target neural network model, comprising:

performing the following operations on each binary group in the target binary group set, wherein each binary group is regarded as a current binary group when the following operations are performed, the current binary group includes the target test image and a current training image, and an image similarity corresponding to the current binary group is a current image similarity:

inputting the current doublet into the target neural network model;

respectively acquiring a first image feature in the target test image and a second image feature in the current training image through the target neural network model;

determining, by the target neural network model, a distance between the first image feature and the second image feature, wherein the current image similarity is a similarity represented using the distance.

3. The method of claim 1, wherein prior to the obtaining the target set of tuples, the method further comprises:

obtaining a set of triples, wherein each triplet in the set of triples includes a different training image in the first set of images, one of the positive sample images corresponding to the different training image, and one of the negative sample images corresponding to the different training image;

and training the initial neural network model by using the triple set to obtain the target neural network model.

4. The method of claim 3, wherein the training the initial neural network model using the set of triples to obtain the target neural network model comprises:

repeatedly performing the following operations until the resulting target neural network model determines that a similarity between a first number or more of the training images in the first set of images and the corresponding positive sample images is greater than a first threshold, and determines that a similarity between a second number or more of the training images in the first set of images and the corresponding negative sample images is less than a second threshold:

obtaining a group of triples from the triple set;

sequentially inputting the group of triples to the initial neural network model, wherein the initial neural network model comprises a convolutional layer, a pooling layer and a full-connection layer, the convolutional layer, the pooling layer and the full-connection layer are used together for obtaining image features of the training images, image features of the positive sample images and image features of the negative sample images, the image features of the training images, the image features of the positive sample images and the image features of the negative sample images are included in the triples, a first distance between the image features of the training images and the image features of the positive sample images is determined, and a second distance between the image features of the training images and the image features of the negative sample images is determined;

adjusting a partial parameter in the convolutional layer, the pooling layer and the fully-connected layer and/or adjusting a connection relationship of the convolutional layer, the pooling layer and the fully-connected layer in the case that the number of the training images in the first image set, of which the first distance is greater than the first threshold, is less than the first number or the number of the training images in the first image set, of which the second distance is greater than the second threshold, is less than the second number.

5. The method of claim 1, wherein performing a target operation on a location in the target test image corresponding to a target location if a target training image is determined in the plurality of training images comprises:

extracting the features of the target test image through the neural network model, and calculating the distance between the features of the target test image and the features of the target training image;

under the condition that the distance is smaller than a third threshold value, storing a target training image corresponding to the target test image into a pending database;

and determining the target test image stored in the database as the target test image corresponding to the target training image at the earliest time.

6. The method of claim 5, wherein after determining the target test image stored in the database as the earliest time to be the target test image corresponding to the target training image, the method further comprises:

and under the condition that the distance between the features of the target test image and the features of the target training image is greater than a fourth threshold value, matching the target test image with all the target training images, and re-determining the target training image corresponding to the target test image.

7. The method of claim 1, wherein performing a target operation on a location in the target test image corresponding to a target location comprises:

in the case that the size of the acquired target test image is different from the size of the target training image;

scaling the size of the target test image to the size of the target training image.

8. An object manipulating device, comprising:

a first obtaining unit, configured to obtain a target binary set, where each binary in the target binary set includes a same target test image and a different training image in a first image set, and the first image set includes a plurality of training images, and each training image corresponds to a position in each training image and an operation performed on the position;

a first obtaining unit, configured to input each binary group in the target binary group set into a target neural network model, and obtain a set of image similarities output by the target neural network model, where the set of image similarities corresponds to the target binary group set in a one-to-one manner, each image similarity is used to represent a similarity between the target test image and the training image in the corresponding binary group, the target neural network model is a network model obtained by training an initial neural network model using the plurality of training images, the target neural network model is used to determine a similarity between the training image and a positive sample image to be greater than a first threshold, and determine a similarity between the training image and a negative sample image to be less than a second threshold, and the positive sample image and the training image are images in the same scene, the negative sample image and the training image are images in different scenes, and the first threshold value is larger than the second threshold value;

an execution unit, configured to, when a target training image is determined from the plurality of training images, execute a target operation at a position in the target test image corresponding to a target position, where a similarity between the target test image and the target training image is a maximum value in the group of image similarities, and the target training image corresponds to the target position in the target test image and the target operation executed at the target position.

9. The apparatus of claim 8, wherein the first obtaining unit is further configured to:

inputting the current doublet into the target neural network model;

10. The apparatus of claim 8, further comprising:

a second obtaining unit, configured to obtain a triple set before obtaining the target binary set, where each triple in the triple set includes a different training image in the first image set, one positive sample image corresponding to the different training image, and one negative sample image corresponding to the different training image;

11. The apparatus of claim 10, wherein the second deriving unit is further configured to:

obtaining a group of triples from the triple set;

12. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 7.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.