CN115775386A

CN115775386A - User interface component identification method and device, computer equipment and storage medium

Info

Publication number: CN115775386A
Application number: CN202211515372.9A
Authority: CN
Inventors: 曾晗; 曹思; 黄浩
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-10

Abstract

The application relates to a method, an apparatus, a computer device, a storage medium and a computer program product for identifying a user interface component. The method comprises the following steps: the method comprises the steps that a plurality of interface images are preprocessed to obtain an interface image sample set, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing the user interface component identifier; training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model and obtain the trained recognition model; and identifying the user interface component of the target interface image through the trained identification model. By adopting the method, the accuracy of identifying the user interface component can be improved.

Description

User interface component identification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying a user interface component, a computer device, and a storage medium.

Background

With the rise of the mobile terminal application program APP and the small program, research and development engineers need to design various software and write codes through computer languages. Accurate recognition of USER INTERFACE components (UI components) in a design is a prerequisite for writing code in a computer language.

In the prior art, the existing UI component recognition algorithm is mostly implemented by a machine learning algorithm or a deep learning algorithm. However, because the types of the UI components are many and the visual features of some UI components are similar, the situations of misidentification and classification errors occur in the identification process, which affects the identification result of the UI components.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device, a computer readable storage medium and a computer program product for identifying a user interface component.

In a first aspect, the present application provides a method for identifying a user interface component. The method comprises the following steps:

preprocessing a plurality of interface images to obtain an interface image sample set, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing the user interface component identifier;

training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model and obtain the trained recognition model;

and identifying the user interface component of the target interface image through the trained identification model.

In one embodiment, the pre-processing includes screening, deduplication, and sorting; preprocessing a plurality of interface images to obtain an interface image sample set, which comprises the following steps:

screening the plurality of interface images to obtain a reference interface image;

classifying the user interface components in the reference interface image, and determining a plurality of user interface component types and the number of the user interface components under each user interface component type in the reference interface image;

and adjusting the content of each reference interface image based on the number of the user interface components in each reference interface image to obtain an interface image sample set.

In one embodiment, adjusting the content of each reference interface image based on the number of user interface components in each reference interface image to obtain a sample set of interface images includes:

determining a preset number of each user interface component type;

determining the actual number of each user interface component type in any reference interface image aiming at any reference interface image;

if the actual number corresponding to the user interface component type is smaller than the corresponding preset number, expanding the number of the components of the corresponding user interface component type in any reference interface image according to the component attribute of the user interface component in any reference interface image of the corresponding user interface component type, wherein the component attribute comprises component color information and component position information;

if the actual number corresponding to the user interface component type is larger than the corresponding preset number, the existing user interface components of the corresponding user interface component type in any reference interface image are subjected to covering processing, so that the component number of the existing user interface components of the corresponding user interface component type is reduced.

In one embodiment, the recognition model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

In one embodiment, training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model, and obtaining the trained recognition model includes:

acquiring a training label of each interface image sample in an interface image sample set;

calculating a loss function corresponding to each interface image sample based on the training label of each interface image sample and each training sample;

and adjusting parameters in the user interface component recognition model based on the loss function to obtain the trained user interface component recognition model.

In one embodiment, the method for identifying the user interface component of the target interface image through the trained identification model comprises the following steps:

acquiring a target interface image;

inputting the target interface image into the trained user interface component recognition model, and outputting a plurality of prior frames, wherein the prior frames are used for indicating the position of each type of user interface component in the target interface image;

determining a confidence threshold for a prior box of each type of user interface component; determining a target prior frame set of each type of user interface component according to a confidence threshold of a prior frame of each type of user interface component and the confidence of a plurality of prior frames of the position of each type of user interface component;

and screening the target prior frame set of each type of user interface component by adopting a weighted non-maximum inhibition screening method, and determining a final prediction frame of each type of user interface component.

In a second aspect, the application further provides an identification apparatus for a user interface component. The device comprises:

the processing module is used for preprocessing the plurality of interface images to obtain an interface image sample set, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing the user interface component identifier;

the adjusting module is used for training the recognition model based on the loss function and the interface image sample set so as to adjust parameters in the recognition model and obtain the trained recognition model;

and the recognition module is used for carrying out user interface component recognition on the target interface image through the trained recognition model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

According to the method, the device, the computer equipment, the storage medium and the computer program product for identifying the user interface component, the interface image sample set is obtained by preprocessing a plurality of interface images, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing the user interface component identification, and the negative sample refers to an interface image sample not containing the user interface component identification; training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model and obtain the trained recognition model; through the trained recognition model, the user interface component recognition is carried out on the target interface image, and the accuracy of recognizing the UI component can be improved.

Drawings

FIG. 1 is a flow diagram that illustrates a method for identifying user interface components, according to one embodiment;

FIG. 2 is a diagram of a recognition model in one embodiment;

FIG. 3 is a flow diagram that illustrates a methodology for identifying user interface components in one embodiment;

FIG. 4 is a block diagram of an apparatus for identifying user interface components in one embodiment;

FIG. 5 is a diagram of the internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various terms, but these terms are not limited by these terms unless otherwise specified. These terms are only used to distinguish one term from another. For example, the third preset threshold and the fourth preset threshold may be the same or different without departing from the scope of the present application.

In one embodiment, as shown in fig. 1, a method for identifying a user interface component is provided, which is exemplified by applying the method to a terminal, wherein the terminal may be, but is not limited to, various personal computers, laptops, smartphones, tablets, internet of things devices and portable wearable devices. It is understood that the method can also be applied to a server, and can also be applied to a system comprising a terminal and a server, and is realized through the interaction of the terminal and the server. In this embodiment, the method includes the steps of:

101. preprocessing a plurality of interface images to obtain an interface image sample set, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing the user interface component identifier;

102. training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model and obtain the trained recognition model;

103. and identifying the user interface component of the target interface image through the trained identification model.

The interface image refers to an interface image for application software or a design draft image of an application software interface, such as a user interface image of a mobile banking APP or a design draft image of a user interface component of the mobile banking APP.

The purpose of preprocessing the interface images is to filter out interface images with irregular sizes and contents. For example, in order to fit the size of the APP end, the width of the design image of the mobile banking APP should be not greater than 750px.

Dividing the interface image sample set into a training set and a testing set according to a preset proportion, for example, the interface image sample set comprises 1000 interface image samples, and the interface image sample set is divided into 8: and 2, dividing the interface image sample set according to the proportion to obtain a test set containing 800 interface image samples and a test set containing 200 interface image samples.

The user interface component identification means that the UI components on the interface image are marked by marking boxes through an image marking tool, and each marking box also has a type identification corresponding to the UI component. That is, the UI component in each positive sample (the interface image sample identified by the user interface component) has a corresponding label box and type identification. And negative examples are interface image examples that have not been processed by the image tagging tool. The type identifier is used to describe the category of the UI component, for example, the type identifier of the UI component may be a button, an input box, an icon, text, a carousel, a popup, or a countdown tool. The image marking tool may be labellimg, labelme, or the like. In addition, each interface image sample includes at least one UI component, and the types of UI components in the interface image sample set are at least two, that is, the types of UI components of all image samples in the interface image sample set in this embodiment are at least two.

For the loss function, the embodiment of the present invention does not specifically limit the loss function, and includes but is not limited to: a GIOU _ Loss (Generalized interaction over unit) Loss function, a DIOU _ Loss (Distance interaction over unit) Loss function, and a CIOU _ Loss (complete interaction over unit) Loss function.

In one embodiment, the loss function includes a classification loss function, a localization loss function. The classification loss function is used for calculating whether a prediction frame output by the recognition model and the corresponding calibration classification are correct or not. The localization loss is used to determine the error between the prediction box of the recognition model output and the labeled box in the corresponding positive sample.

The target interface image refers to an interface image for the application software or a design draft image of an application software interface, and at least one UI component is contained on the target interface image.

Specifically, a plurality of interface images are preprocessed, and the preprocessed interface images are used as an interface image sample set. Inputting the interface image samples in the training set into an initial recognition model, training the initial recognition model, reducing the value of the loss function by adjusting parameters in the recognition model, and finishing the training of the recognition model when the value of the loss function is not greater than a preset loss value to obtain the trained recognition model. And inputting the target interface image into the trained recognition model, and outputting the target interface image with the labeling box and the type identifier.

It is worth mentioning that after obtaining the trained recognition model, the performance of the trained recognition model can be evaluated through the test set, including: inputting interface image samples in a test set into a trained recognition model, comparing the samples by adopting an IOU (interaction over Unit) as a prediction index for recognizing the position of a UI (user interface) component of the model, and calculating an IOU value through the position relation and the overlapping area between a prediction frame and a marking frame, wherein the larger the IOU value is, the closer the position of the prediction frame is to the position of a real frame is. In this embodiment, the Average Precision index of the mAP (Mean Average Precision) is used to measure the accuracy of the prediction type, the Average value (i.e., the mAP value) of the APs (Average Precision) of all UI component types in the test set is calculated, the performance of the trained recognition model is evaluated through the mAP value, and if the IOU value of the trained recognition model is smaller than the preset IOU value or the mAP value is smaller than the preset IOU value, the trained recognition model is retrained through the above steps 101-103 until the IOU value of the recognition model is not smaller than the preset IOU value and the mAP value is not smaller than the preset IOU value.

According to the method provided by the embodiment of the invention, the trained recognition model is used for recognizing various types of UI components of the target interface image, so that the recognition efficiency of the recognition model can be improved. In addition, the accuracy of the positioning of the UI component predicted by the recognition model can be improved by determining the difference between the prediction box and the marking box through the IOU intersection ratio. Compared with the component edge detection algorithm based on machine learning, the embodiment of the invention solves the problem that the UI component in the complex interface image formed by superposition and intersection is difficult to detect by the edge detection algorithm, improves the positioning accuracy and the component recall ratio of the UI component, and solves the problems of difficult identification and low positioning accuracy of the UI component in the prior art.

In combination with the above embodiments, in one embodiment, the pre-processing includes screening and classification; preprocessing a plurality of interface images to obtain an interface image sample set, comprising:

The step of screening the plurality of interface images to obtain a reference interface image refers to: and screening and de-duplicating the interface image to obtain a reference interface image. Wherein, the screening process refers to deleting the interface image which does not conform to the preset size; the deduplication processing means: and performing similarity calculation on the interface images to calculate the similarity between any two interface images, and deleting one of the interface images if the similarity is greater than the preset similarity. And taking the interface image obtained after the screening and the de-duplication processing as a reference interface image.

Specifically, the user interface components in the reference interface image are classified according to preset classification rules to obtain a plurality of user interface component types. When determining a plurality of user interface component types, the number of components in each user interface component type is also determined, so that content adjustment can be performed on each reference interface image according to the number of components in each user interface component type to obtain an interface image sample set. Wherein, presetting the classification rule includes: using non-repartitionable components such as icons, texts, shapes and the like as the same type of user interface components; taking basic components such as buttons, input boxes and the like as user interface components of the same type; and taking components meeting business requirements such as the carousel graph, the popup window, the countdown tool and the like as user interface components of the same type. Of course, the preset classification rule may also be other classification rules, and the embodiment of the present invention does not specifically limit the preset classification rule.

According to the method provided by the embodiment of the invention, the interface image sample set is obtained by preprocessing the plurality of interface images, and the quality of the interface image samples in the interface image sample set can be improved.

With reference to the content of the foregoing embodiment, in an embodiment, adjusting content of each reference interface image based on the number of user interface components in each reference interface image to obtain an interface image sample set includes:

determining a preset number of each user interface component type;

The preset number of each user interface component type can be adjusted, and is an integer not less than 1. The predetermined number of each user interface element type may be the same or different. For example, there are three types of user interface components in each reference interface image, namely, user interface component type a, user interface component type B, and user interface component type C. The preset number of user interface component types a, the preset number of user interface component types B, and the preset number of user interface component types C are all the same. Or the preset number of the user interface component types A, the preset number of the user interface component types B and the preset number of the user interface component types C are different; or the preset number of the user interface component types A is the same as that of the user interface component types B, and the preset number of the user interface component types A is different from that of the user interface component types C.

Specifically, for any reference interface image, the actual number of each user interface component type in any reference interface image is determined, and the actual number of each user interface component type in all reference interface images is added to obtain the total component number of each user interface component type (i.e., the corresponding actual number of each user interface component type).

If the total component number of any user interface component type is smaller than the preset number of the corresponding user interface component type, the component number of the corresponding user interface component type is increased, and the total component number of any user interface component type is equal to the preset number of the corresponding user interface component type. For example, if the total number of the user interface component type a is smaller than the preset number of the user interface component type a, the number of the user interface component type a is increased, so that the total number of the user interface component type a is equal to the preset number of the user interface component type a.

The method for increasing the number of components of the corresponding user interface component type comprises the following steps: the number of components of the corresponding user interface component type is increased by changing the position size and color of the corresponding user interface component type component. For example, the user interface component type requiring the increase of the number of components is used as the type to be added, and the method for increasing the number of the components of the type to be added comprises the following steps:

method (1): the method comprises the steps of changing the position of a component to be added in any interface image sample under the constraint of a background and a node of the interface image sample, firstly obtaining the area where the component to be added is located, defining the component to be added as a rectangular area, appointing the upper left corner coordinate of the rectangular area and the length and width of the rectangular area, demarcating the rectangular area to obtain coordinate point information, then extracting the rectangular area through an alpha channel to generate a new component image to be added, and then scaling the height and width of the area to obtain components to be added with different sizes, thereby achieving the purpose of increasing the number of the components to be added.

Method (2): and randomly changing the image color of the interface image sample containing the component to be added, and adjusting the corresponding brightness, contrast and saturation of the interface image sample. And adjusting constant values such as a brightness value, a contrast value, color correction and the like according to the input pixel value of the interface image sample to calculate the output pixel of the interface image sample.

If the total component number of the user interface component types is smaller than the preset number of the corresponding user interface component types, the component number of the corresponding user interface component types is reduced, and the total component number of the corresponding user interface component types is equal to the preset number of the corresponding user interface component types. For example, if the total number of the user interface component type B is less than the preset number of the user interface component type B, the number of the user interface component type B is reduced, so that the total number of the user interface component type B is equal to the preset number of the user interface component type B.

The method for reducing the number of the components of the type to be reduced comprises the following steps: and covering the component of the type to be subtracted in the interface image sample. For example, a mask with the same size as an original interface image sample containing a component to be reduced is generated, the mask is divided into two parts, one part is a white mask and the other part is a black mask, an area covered by the white mask in the mask is a user interface component to be reserved, an area covered by the black mask is a component area to be reduced (pixels of the area are set to be 0), and the original interface image sample is covered by a layer of mask to cover the component to be reduced in the original interface image sample.

According to the method provided by the embodiment of the invention, whether the total component number of any user interface component type is equal to the corresponding preset number or not is determined, the component number of the corresponding user interface component type can be adjusted, and the total component number of the corresponding user interface component type can reach the corresponding preset number, so that the total component number of each user interface component type can be balanced, the problem of overfitting of the identification model caused by uneven component number of each type in an interface image sample set is avoided, and the identification accuracy of the identification model is improved.

In combination with the above embodiments, in one embodiment, the recognition model includes an input layer, a backbone network layer, a detection neck network layer, and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

Specifically, the identification model is constructed based on a YOLOV5 model, and as shown in fig. 2, the identification model includes four parts, namely an input layer, a Backbone network layer (Backbone), a detection Neck network layer (Neck), and a detection Head (Head) output layer. In FIG. 2, the Attention layer is the Attention mechanism layer and the SPP is the pooling operation layer.

In an input layer, the YOLOv5 model uses a Mosaic data enhancement method for interface image samples in a training set, and 4 interface image samples are spliced into 1 image in a random scaling, cutting and arranging mode, so that the network training speed is greatly improved while the data volume is enriched, and the memory requirement of the YOLOv5 model can be reduced. Focus is used as a reference network in the backhaul layer, and the Backbone layer slices and splices the input feature map to complete the extraction of feature information. The heck layer uses a spatial pyramid pooling operation with four size maximum pooling operations 1 x 1,5 x 5,9 x 9, 13 x 13 and subsequent Concat operations. The output layer (Head) is used for completing the output of the recognition result,

the SPP layer added in the Backbone solves the problem of image feature size inconsistency, but the feature channels are not subjected to weighted fusion, so an attention mechanism layer is added behind the SPP layer.

For global features with similar size, shape and color among UI components and the situation that the color, texture and shape of partial regions are different, an attention mechanism layer is introduced in the embodiment, the convolution feature channels are weighted and calculated again according to feature importance degrees, the interdependence among important features is enhanced, the importance degrees of different channel features are learned at the same time, and the local and global features of the input feature map are better fused. And finally, performing shallow extraction on component edge information, such as detailed information of shape, color, position and the like, and fusing with deep extraction semantic features by using Add operation.

According to the method provided by the embodiment of the invention, the attention mechanism layer is introduced to perform weighted fusion on the characteristics output by the SPP layer, so that the calculation time in the process of identifying the model training can be reduced, the difficulty in identifying the model training is reduced, and the training efficiency of the identifying model is improved.

With reference to the content of the foregoing embodiment, in an embodiment, training a recognition model based on a loss function and an interface image sample set to adjust parameters in the recognition model, so as to obtain a trained recognition model, includes:

Wherein the training labels refer to user interface component identifications.

Specifically, each training sample in the training set is input into the recognition model to obtain an output image of each training sample, and the loss function value corresponding to each interface image sample is determined based on the difference between the output image of each training sample and the training label of each training sample. And adjusting parameters of the recognition model in a mode of reducing the loss function value, and finishing the training of the recognition model when the loss function value corresponding to each interface image sample is less than the preset loss value for many times.

According to the method provided by the embodiment of the invention, the training effect of the recognition model can be determined through the loss function, and the parameter adjustment of the recognition model is realized based on the loss function, so that the training efficiency of the recognition model is improved.

With reference to the foregoing embodiments, in one embodiment, the performing user interface component recognition on a target interface image through a trained recognition model includes:

acquiring a target interface image;

Specifically, the recognition model comprises three output channels, and when the target interface image is input into the trained recognition model, each output channel of the recognition model outputs an image comprising a plurality of prior frames. According to the confidence threshold value of the prior frame of each type of user interface component, selecting a corresponding prior frame for each user interface component in the target interface image from all the prior frames through a Weighted Non-Maximum Suppression screening method (Weighted NMS, weighted Non Maximum Suppression), and taking the corresponding prior frame as a final prediction frame of each user interface component, and taking the target interface image containing the final prediction frame of each user interface component as a final recognition result.

According to the method provided by the embodiment of the invention, the result output by the identification model is screened by the weighted non-maximum inhibition screening method, so that redundant identification results can be removed, and the accuracy of the final identification result is improved.

In combination with the above embodiments, in one embodiment, as shown in fig. 3, a method for identifying a user interface component includes:

301. screening the plurality of interface images to obtain a reference interface image; and classifying the user interface components in the reference interface image, and determining a plurality of user interface component types and the number of the user interface components under each user interface component type in the reference interface image.

302. And adjusting the content of each reference interface image based on the number of the user interface components in each reference interface image to obtain an interface image sample set. The interface image sample set comprises a positive sample and a negative sample, wherein the positive sample refers to the interface image sample containing the user interface component identification, and the negative sample refers to the interface image sample not containing the user interface component identification.

303. Training the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model and obtain the trained recognition model; the identification model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

304. Acquiring a target interface image; and inputting the target interface image into the trained user interface component recognition model, and outputting a plurality of prior frames, wherein the prior frames are used for indicating the position of each type of user interface component in the target interface image.

305. Determining a confidence threshold for a prior box of each type of user interface component; and determining a target prior frame set of each type of user interface component according to the confidence threshold of the prior frame of each type of user interface component and the confidence of the plurality of prior frames of the position of each type of user interface component.

306. And screening the target prior frame set of each type of user interface component by adopting a weighted non-maximum inhibition screening method, and determining a final prediction frame of each type of user interface component.

According to the method provided by the embodiment of the invention, the trained recognition model is obtained by training the recognition model of the attention mechanism layer; through the trained recognition model, the user interface component recognition is carried out on the target interface image, and the accuracy of recognizing the position and the type of the UI component can be improved.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides an identification apparatus for a user interface component, which is used for implementing the above-mentioned identification method for a user interface component. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in the embodiment of the device for identifying one or more user interface components provided below can be referred to the limitations on the method for identifying the user interface component, and are not described herein again.

In one embodiment, as shown in fig. 4, there is provided an identification apparatus of a user interface component, including: a processing module 401, an adjustment module 402, and an identification module 403, wherein:

the processing module 401 is configured to pre-process a plurality of interface images to obtain an interface image sample set, where the interface image sample set includes a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing a user interface component identifier;

an adjusting module 402, configured to train the recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model, so as to obtain a trained recognition model;

and the recognition module 403 is configured to perform user interface component recognition on the target interface image through the trained recognition model.

In one embodiment, the processing module 401 includes:

the screening submodule is used for screening the plurality of interface images to obtain a reference interface image;

the classification submodule is used for classifying the user interface components in the reference interface image and determining a plurality of user interface component types and the number of the user interface components under each user interface component type in the reference interface image;

and the adjusting submodule is used for adjusting the content of each reference interface image based on the number of the user interface components in each reference interface image to obtain an interface image sample set.

In one embodiment, the adjustment submodule includes:

a first determining unit for determining a preset number of each user interface component type;

the second determining unit is used for determining the actual number of each user interface component type in any reference interface image according to any reference interface image;

the expansion unit is used for expanding the number of the components of the corresponding user interface component type in any reference interface image according to the component attribute of the user interface component in any reference interface image of the corresponding user interface component type if the actual number corresponding to the user interface component type is smaller than the corresponding preset number, wherein the component attribute comprises component color information and component position information;

and the processing unit is used for covering the existing user interface components of the corresponding user interface component type in any reference interface image if the corresponding actual number of the user interface component types is larger than the corresponding preset number, so as to reduce the component number of the existing user interface components of the corresponding user interface component types.

In one embodiment, the adjustment module 402 includes: the identification model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

In one embodiment, the adjusting module 402 further comprises:

the first obtaining submodule is used for obtaining a training label of each interface image sample in the interface image sample set;

the calculation submodule is used for calculating a loss function corresponding to each interface image sample based on the training label of each interface image sample and each training sample;

and the second obtaining submodule is used for adjusting parameters in the user interface component recognition model based on the loss function and obtaining the trained user interface component recognition model.

In one embodiment, the identification module 403 includes:

the third acquisition sub-module is used for acquiring a target interface image;

the output sub-module is used for inputting the target interface image into the trained user interface component recognition model and outputting a plurality of prior frames, and the prior frames are used for indicating the position of each type of user interface component in the target interface image;

a first determining submodule for determining a confidence threshold for a prior box of each type of user interface component; determining a target prior frame set of each type of user interface component according to a confidence threshold of a prior frame of each type of user interface component and the confidence of a plurality of prior frames of the position of each type of user interface component;

and the second determination submodule is used for screening the target prior frame set of each type of user interface component by adopting a weighted non-maximum inhibition screening method and determining a final prediction frame of each type of user interface component.

The respective modules in the above-mentioned identification means of the user interface member may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 5. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of identifying a user interface element. The display unit of the computer device is used for forming a visual picture and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory having a computer program stored therein and a processor that when executing the computer program performs the steps of:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

determining a preset number of each user interface component type;

determining, for any reference interface image, an actual number of each user interface component type in any reference interface image;

In one embodiment, the processor, when executing the computer program, further performs the steps of: the identification model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

acquiring a target interface image;

inputting a target interface image into a trained user interface component recognition model, and outputting a plurality of prior frames, wherein the prior frames are used for indicating the position of each type of user interface component in the target interface image;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

determining a preset number of each user interface component type;

In one embodiment, the computer program when executed by the processor further performs the steps of: the identification model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

acquiring a target interface image;

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

determining a preset number of each user interface component type;

the identification model comprises an input layer, a backbone network layer, a detection neck network layer and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

acquiring a target interface image;

and screening the target prior frame set of each type of user interface component by adopting a weighted non-maximum suppression screening method, and determining a final prediction frame of each type of user interface component.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A method for identifying a user interface component, the method comprising:

training a recognition model based on a loss function and the interface image sample set to adjust parameters in the recognition model to obtain a trained recognition model;

2. The method of claim 1, wherein the pre-processing comprises screening and classification; the preprocessing is performed on the plurality of interface images to obtain an interface image sample set, and the method comprises the following steps:

3. The method of claim 2, wherein the adjusting the content of each reference interface image based on the number of user interface components in each reference interface image to obtain a sample set of interface images comprises:

determining a preset number of each user interface component type;

determining, for any reference interface image, an actual number of each user interface component type in the any reference interface image;

if the actual number corresponding to the user interface component type is smaller than the corresponding preset number, expanding the component number of the corresponding user interface component type in any reference interface image according to the component attribute of the user interface component in any reference interface image of the corresponding user interface component type, wherein the component attribute comprises component color information and component position information;

if the actual number corresponding to the user interface component type is larger than the corresponding preset number, the existing user interface components of the corresponding user interface component type in any reference interface image are covered, so that the component number of the existing user interface components of the corresponding user interface component type is reduced.

4. The method of claim 1, wherein the recognition model comprises an input layer, a backbone network layer, a detection neck network layer, and a detection head output layer; the backbone network layer comprises a pooling operation layer and an attention mechanism layer; the attention mechanism layer is positioned behind the pooling operation layer and is used for carrying out weight distribution on the characteristics output by the pooling operation layer.

5. The method of claim 1, wherein training a recognition model based on the loss function and the interface image sample set to adjust parameters in the recognition model to obtain a trained recognition model comprises:

acquiring a training label of each interface image sample in the interface image sample set;

6. The method of claim 1, wherein the performing user interface component recognition on the target interface image through the trained recognition model comprises:

acquiring a target interface image;

inputting the target interface image into a trained user interface component recognition model, and outputting a plurality of prior frames, wherein the prior frames are used for indicating the position of each type of user interface component in the target interface image;

determining a confidence threshold for a prior box of each type of user interface component; determining a target prior frame set of each type of user interface component according to the confidence coefficient threshold of the prior frame of each type of user interface component and the confidence coefficients of a plurality of prior frames of the position of each type of user interface component;

7. An apparatus for identifying a user interface component, the apparatus comprising:

the processing module is used for preprocessing a plurality of interface images to obtain an interface image sample set, wherein the interface image sample set comprises a positive sample and a negative sample, the positive sample refers to an interface image sample containing a user interface component identifier, and the negative sample refers to an interface image sample not containing the user interface component identifier;

the adjusting module is used for training the recognition model based on a loss function and the interface image sample set so as to adjust parameters in the recognition model and obtain the trained recognition model;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.