CN112433729A

CN112433729A - Automatic UI image labeling method and device

Info

Publication number: CN112433729A
Application number: CN202011466762.2A
Authority: CN
Inventors: 张丽娟; 杜科
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-02

Abstract

The invention discloses an automatic UI image labeling method, which comprises the following steps: s100: extracting visual features of an image to be annotated by utilizing a deep learning technology; s200: constructing a candidate label set of an image to be marked by using an image library; s300: fusing the visual features and the semantic features of the image to be annotated to obtain the high-level features of the image to be annotated; s400: according to the high-level characteristics of the image to be marked, calculating the probability of marking the image to be marked by using each label in the image library; s500: predicting the number of labels needed by the image to be marked according to the high-level characteristics of the image to be marked; s600: according to the calculated label probability and the predicted label number, marking the image to be marked by utilizing the first N labels with the highest probability; wherein N is the predicted number of tags. The invention reduces repeated marking work of UI design machinery, reduces time cost of repeated communication confirmation of missed marks, wrong marks and engineers, and improves working efficiency.

Description

Automatic UI image labeling method and device

Technical Field

The invention relates to the field of image annotation, in particular to a method and a device for automatically annotating a UI image.

Background

The excellent design can effectively convey the product value, but the technical floor support is not left. With market environment changes and industry level improvements, more and more designers are beginning to realize that: importance of UI draft restoration. However, in the matching process, the deviation of design and research and development thinking gradually appears: the 'requirement cannot be met, and the design draft … … with poor spacing adjustment is usually subjected to multi-round testing and small stool' butt joint to achieve the expected effect after falling on the ground. It can be said that the restoration of the design draft in the growing process of each UI designer is a problem that is always experienced and difficult to solve. There are two drawbacks to traditional UI designers delivering design manuscripts.

(1) The cutting and labeling of the design drawing are all purely manual cutting and labeling operations, and the conditions of label missing and label error often occur.

(2) In the manual map cutting and labeling process, where the label is missed and wrong, engineers find designers to communicate for repeated confirmation after finding in the development process, so that the mental stress is very high, the efficiency is extremely low, and the working efficiency cannot be effectively improved.

Disclosure of Invention

The invention aims to provide a method and a device for automatically labeling a UI image, which aim to solve the problems in the background art. By establishing the relation between the low-level features and the high-level semantic labels, the probability of each label of the image to be labeled is obtained, the number of the labels to be labeled is flexibly predicted, and therefore automatic labeling of the image is completed, and the accuracy of image labeling is improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a UI image automatic annotation method comprises the following steps:

s100: extracting visual features of an image to be annotated by utilizing a deep learning technology;

s200: constructing a candidate label set of an image to be marked by using an image library;

s300: fusing the visual features and the semantic features of the image to be annotated to obtain the high-level features of the image to be annotated;

s400: according to the high-level characteristics of the image to be marked, calculating the probability of marking the image to be marked by using each label in the image library;

s500: predicting the number of labels needed by the image to be marked according to the high-level characteristics of the image to be marked;

s600: according to the calculated label probability and the predicted label number, marking the image to be marked by utilizing the first N labels with the highest probability; wherein N is the predicted number of tags.

Further, the S100 utilizes a deep learning technique to extract visual features of an image to be annotated, including: extracting visual features of the image to be marked by using a visual feature extraction model based on a convolutional neural network; the training method of the visual feature extraction model comprises the following steps: and constructing a first neural network model based on the convolutional neural network, extracting visual features of the image, storing the visual features in an image library, and training the first neural network model by using the visual features of the image extracted from the image library so as to obtain a visual feature extraction model.

Further, the step S200: the method for constructing the candidate label set of the image to be labeled by utilizing the image library comprises the following steps: acquiring the occurrence frequency of each label in an image library; for the image to be annotated, calculating the similarity between the image to be annotated and other images in the image library according to the image distance, thereby obtaining m images with the highest similarity with the image to be annotated; the image distance used for calculating the image similarity is a block distance, an Euclidean distance, an infinite norm, a histogram intersection, a quadratic distance, a Mahalanobis distance and an EMD distance;

obtaining n images with the highest similarity to the image to be annotated from the m images, and obtaining p1 labels appearing in the n images; if the p1 is not less than k, obtaining k labels with the highest occurrence frequency from the p1 labels according to the occurrence frequency of each label in the image library, and using the k labels as k candidate labels, so as to construct a candidate label set of the image to be labeled; otherwise, obtaining p2 labels appearing in the m images, and obtaining k labels with the highest occurrence frequency from the p2 labels according to the occurrence frequency of each label in the image library, wherein the k labels are used as k candidate labels, so that a candidate label set of the image is obtained; wherein k is a preset candidate tag set size, and m, n and k satisfy: k is less than or equal to m, and n is less than or equal to m.

Further, the step S300: fusing the visual features and the semantic features of the image to be annotated to obtain the high-level features of the image to be annotated, comprising the following steps: fusing the visual features and semantic features of the image to be annotated by utilizing a full-connection layer to obtain high-level features of the image to be annotated; the method specifically comprises the following steps:

s301: for an image I in an image library, extracting visual features of the image I by using a visual feature extraction model;

s302: constructing a candidate tag set L of the image I, and extracting semantic features of the image I by using the candidate tag set L and a semantic feature extraction model;

s303: fusing the visual features and the semantic features of the image I to obtain high-level features of the image I; for each image in the image library, step S301 is performed: s303 to extract high-level features of each image in the image library.

Further, the step S400: according to the high-level characteristics of the image to be labeled, calculating the probability when the image to be labeled is labeled by utilizing each label in the image library, wherein the method comprises the following steps: and constructing a third neural network model based on the multilayer perceptron, and calculating the probability of each label in the image library when the label is marked with the image according to the high-level features of the image.

Further, the step S500: predicting the number of labels needed by the image to be labeled according to the high-level characteristics of the image to be labeled, and the method comprises the following steps: constructing a fourth neural network model based on a multilayer perceptron, and predicting the number of labels needed by the image according to the high-level features of the image; the fourth neural network model comprises two hidden layers, 512 neurons and 256 neurons are respectively arranged, dropout is carried out on all the neurons in the hidden layers, and the probability is set to be 0.5; and training a fourth neural network model by using the image library with the extracted high-level features of the images so as to obtain a label number prediction model.

The invention also provides a UI image automatic labeling device, which is applied to the labeling method of any one of the right 1 to the right 6, and comprises the following steps:

the visual characteristic extraction module is used for extracting the visual characteristics of the image to be marked;

the candidate tag set constructing module is used for constructing a candidate tag set of the image to be marked by utilizing the image library;

the semantic feature extraction module is used for extracting the semantic features of the image to be annotated from the candidate label set of the image to be annotated by utilizing a deep learning technology;

the characteristic fusion module is used for fusing the visual characteristic and the semantic characteristic of the image to be annotated to obtain the high-level characteristic of the image to be annotated;

the multi-target classification module is used for calculating the probability of each label in the image library when the image to be labeled is labeled by utilizing a deep learning technology according to the high-level characteristics of the image to be labeled;

the label number prediction module is used for predicting the number of labels needed by the image to be labeled by utilizing a deep learning technology according to the high-level characteristics of the image to be labeled;

and the labeling module is used for labeling the image to be labeled by utilizing the first N labels with the highest probability according to the label probability calculated by the multi-target classification module and the label number predicted by the label number prediction module.

Furthermore, the image in the image library is an image labeled with a label, the candidate label set includes a plurality of labels in the image library, and N is the number of labels predicted by the label number prediction module.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, the image of the UI designer is intelligently cut and labeled without manual cutting and labeling, the design of the UI designer is not influenced, the style information such as the size, the position, the color, the spacing, the word size and the like of each element in the PS design drawing can be completely and clearly synchronized to a system platform, an engineer can check the style information at any time, and percentage labeling is supported; the layers can be selected singly, a plurality of layers can be selected continuously, and the needed measurement can be labeled intelligently. What information the engineer needs to view by himself. Repeated marking work of UI design machinery is reduced, time cost for repeated communication and confirmation of missed marks, wrong marks and engineers is reduced, and working efficiency is improved.

Drawings

FIG. 1 is a flowchart of a method for automatically labeling a UI image according to the present invention;

Detailed Description

The present invention will be further described with reference to the following examples, which are intended to illustrate only some, but not all, of the embodiments of the present invention. Based on the embodiments of the present invention, other embodiments used by those skilled in the art without any creative effort belong to the protection scope of the present invention.

Example 1:

the invention provides an automatic UI image labeling method, which has the overall thought that: respectively extracting visual features and semantic features of the image, and obtaining high-level features of the image by fusing the visual features and the semantic features of the image; and calculating the probability of each label in the image library when the image to be labeled is labeled according to the high-level characteristics of the image, predicting the number of the labels of the image to be labeled, and then combining the calculated probability of the labels and the predicted number of the labels to finish the automatic labeling of the image.

The invention provides an automatic labeling device for a UI image, which is used for completing automatic labeling of an image to be labeled and solving the problem of difficult cooperation through a workflow of a main line and a plurality of nodes in order to enable collaborative design to be more efficient and flexibly adapt to product teams of different scales and types. Labeling is an important vehicle for the delivery cooperation between design and development, and "automatic + manual" labeling is performed under the design principle line of a flexible workflow. Common workflows tend to mix things together and impose a fixed thread to achieve a business goal. It features that the system specification is emphasized and the service is circulated according to fixed mode.

Flexible workflow, the most popular and easy to understand, is based on a process between fixed flow (regularity) and free flow (flexibility). That is, the main line is fixed, and the flow has a fixed step, but one or more nodes in the main line are handed over by using a free flow method, and do not interfere with each other. Compared with the common workflow, the flexible workflow realizes the follow-up and business reuse on the basis of the common workflow, improves the business flow and improves the team efficiency.

As shown in fig. 1, the method comprises the following steps:

s100: extracting visual features of an image to be annotated by utilizing a deep learning technology; the method specifically comprises the following steps: extracting visual features of an image to be labeled by using a visual feature extraction model based on a Convolutional Neural Network (CNN); the training method of the visual feature extraction model comprises the following steps: constructing a first neural network model based on a convolutional neural network, and extracting visual features of an image; wherein, the volume neural network can be AlexNet network, LeNet network, GoogLeNet network, VGG network, inclusion network, ResNet network, inclusion-Resnet-V2 network or other convolution neural networks; in the embodiment, the convolutional neural network is an inclusion-Resnet-V2 network, and the inclusion-Resnet-V2 network is used for extracting visual features of images, so that on one hand, the training speed can be greatly improved, meanwhile, the classification accuracy can be greatly improved, and on the other hand, the nonlinearity of the network can be increased; training a first neural network by using an MMRF image library so as to obtain a visual feature extraction model; constructing a candidate tag set of an image to be annotated by using an MMRF image library, and extracting semantic features of the image to be annotated by using the candidate tag set of the image to be annotated;

s200: constructing a candidate label set of an image to be marked by using an image library; the method comprises the following steps: acquiring the occurrence frequency of each label in an image library; for the image to be annotated, calculating the similarity between the image to be annotated and other images in the image library according to the image distance, thereby obtaining m images with the highest similarity with the image to be annotated; the image distance used for calculating the image similarity may be a block distance, a euclidean distance, an infinite norm, a histogram intersection, a quadratic distance, a mahalanobis distance, an EMD distance, or another image distance.

Obtaining n images with the highest similarity to the image to be annotated from the m images, and obtaining p1 labels appearing in the n images; if the p1 is not less than k, obtaining k labels with the highest occurrence frequency from the p1 labels according to the occurrence frequency of each label in the image library, and using the k labels as k candidate labels, so as to construct a candidate label set of the image to be labeled; otherwise, obtaining p2 labels appearing in the m images, and obtaining k labels with the highest occurrence frequency from the p2 labels according to the occurrence frequency of each label in the image library, wherein the k labels are used as k candidate labels, so that a candidate label set of the image is obtained; wherein k is a preset candidate tag set size, and m, n and k satisfy: k is less than or equal to m, n is less than or equal to m;

the method for constructing the candidate tag set based on the above method specifically further includes: constructing a candidate tag set of an image to be labeled by using an image library, and extracting semantic features of the image to be labeled from the candidate tag set of the image to be labeled by using a semantic feature extraction model based on a Multi-layer perceptron (MLP);

the training method of the semantic feature extraction model comprises the following steps: constructing a candidate label set of each image in an image library; constructing a second neural network model based on a multilayer perceptron, and extracting semantic features of the image from a candidate label set of the image; the second neural network model comprises two hidden layers, and the Relu function is adopted as the activation function; training a second neural network model by using the image library of the constructed candidate label set so as to obtain a semantic feature extraction model;

s300: fusing the visual features and the semantic features of the image to be annotated to obtain the high-level features of the image to be annotated; in an alternative embodiment, step S300: the method specifically comprises the following steps: s301: fusing the visual characteristic and the semantic characteristic of the image to be labeled by utilizing a full connection layer (FC) to obtain the high-level characteristic of the image to be labeled; the method specifically comprises the following steps:

s303: fusing the visual features and the semantic features of the image I to obtain high-level features of the image I; for each image in the image library, step S301 is performed: s303 to extract high-level features of each image in the image library. It should be understood that, besides the full connection layer, other manners for implementing feature fusion can also be used for fusing the visual feature and the semantic feature of the image to be labeled to obtain the high-level feature of the image to be labeled.

S400: according to the high-level characteristics of the image to be marked, calculating the probability of marking the image to be marked by using each label in the image library; in an optional embodiment, the method specifically includes: according to the high-level characteristics of the image to be labeled, calculating the probability of each label in the image library when the image to be labeled is labeled by using a multi-target classification model based on a multi-level perception machine; the training method of the multi-target classification model comprises the following steps: for an image I in an image library, extracting visual features of the image I by using a visual feature extraction model; the image labeling algorithm based on the generated model obtains the appearance or non-appearance of the ith label in the kth image, the training set contains the joint probability P (d, w) of the common appearance of the features d of the K images and the observed image, if 2 labels appear in the same image, the 2 labels are considered to be related, and thus a relational graph zeta (S, epsilon) based on the point S ═ {1,2, …, m } can be constructed, wherein epsilon represents the edge set of the relational graph, and then the model parameters theta in MMRF are substituted.

Inputting: an image I to be labeled, a word list S and a training image set X;

and (3) outputting: MMRF model parameters θ.

For each word i belongs to S do

Constructing a semantic concept relationship graph;

constructing a corresponding training image set;

solving the MMRF model parameter theta;

⑤end for。

the image feature extraction system uses Gist and color histogram (which has the advantages that the system can simply describe the global distribution of colors in an image, namely the proportion of different colors in the whole image, and is particularly suitable for describing images which are difficult to automatically segment and images which do not need to consider the space position of an object.) 2 global image features, and the color histogram respectively calculates 3 color histograms in RGB, LAB and HSV color spaces.

SIFT and intense hue (dense hue) features were used on local features, and 2 features were calculated on dense multi-scale grid (dense multi-scale grid) and image areas detected by Harris display detector (detector), respectively. In order to introduce image content layout information, the image is further divided into 3 areas from the horizontal direction, feature extraction is carried out on the areas according to methods except Gist, and the obtained 3 features are integratedBecomes a complete global feature description. The total number of the used features is 15, the different features have different effects on image labeling, and all the features need to be comprehensively considered when calculating the distance of 2 images. For a specific example, let the Euclidean distance of 2 images on the ith feature be d_iThen the distance of the 2 images over the 15 features is:

wherein W_iIs the weighting factor of the euclidean distance on the ith feature. Weight vector W ═ W₁，W₂，…,W₁₅) (ii) a Constructing a third neural network model based on a multilayer perceptron, and calculating the probability of each label in an image library when the label is marked with an image according to the high-level characteristics of the image; training a third neural network model by using an image library with extracted high-level features of the images so as to obtain a multi-target classification model; in the training process, the crossover is used as a loss function.

S500: predicting the number of labels needed by the image to be marked according to the high-level characteristics of the image to be marked; in an optional embodiment, the method specifically includes: according to the high-level characteristics of the image to be labeled, predicting the number of labels needed by the image to be labeled by using a label number prediction model based on a multilayer perceptron; the training method of the label number prediction model comprises the following steps: extracting high-level features of each image in an image library, and constructing a fourth neural network model based on a multilayer perceptron, wherein the fourth neural network model is used for predicting the number of labels needed by the image according to the high-level features of the image; the fourth neural network model comprises two hidden layers, 512 and 256 neurons respectively, and in order to avoid the over-fitting situation, dropout is carried out on all the neurons in the hidden layers, and the probability is set to be 0.5; and training a fourth neural network model by using the image library with the extracted high-level features of the images so as to obtain a label number prediction model.

S600: according to the calculated label probability and the predicted label number, marking the image to be marked by utilizing the first N labels with the highest probability; the image in the image library is the image marked with the label, the candidate label set comprises a plurality of labels in the image library, and N is the number of the labels predicted by using the deep learning technology.

Predicting the number of labels needed by the image to be marked by utilizing a deep learning technology; the key point is to train on a user computer or cloud by using a TinyML algorithm, and the later training of TinmMl is the place where the label labeling process really starts, and the process is generally called as deep compression. Labeling the image to be labeled by utilizing the first N labels with the highest probability; the image in the image library is the image marked with the label, the candidate label set comprises a plurality of labels in the image library, and N is the number of the labels predicted by using the deep learning technology.

The TinyML algorithm works in substantially the same way as a traditional machine learning model. Typically, these models are trained on the user's computer or cloud. Late training is where TinyML's work really begins, a process often referred to as depth compression.

The invention also provides an automatic image annotation device, which is used for completing the automatic annotation of the image to be annotated and comprises the following components: the visual characteristic extraction module is used for extracting the visual characteristics of the image to be marked; the candidate tag set constructing module is used for constructing a candidate tag set of the image to be marked by utilizing the image library;

the labeling module is used for labeling the image to be labeled by utilizing the first N labels with the highest probability according to the label probability calculated by the multi-target classification module and the label number predicted by the label number prediction module;

the image in the image library is an image marked with a label, the candidate label set comprises a plurality of labels in the image library, and N is the number of the labels predicted by the label number prediction module;

specifically, the two nodes are positioned to finalize and develop under the design main line. The scene division of the two modes of finalization and development ensures that a designer and a development engineer can cooperate efficiently to the maximum extent, and simultaneously do not disturb each other, and the two modes can be freely switched and selected as required to supplement each other. I.e. area, text, coordinates, size, color. The text labeling tool can effectively solve the problem that the automatic labeling cannot express parts, such as adaptive scheme description. In addition, the finalizing mode can also view the drawing pins and change the state of the drawing pin tool, and can mark all the states of the drawing pins as solved by one key to be connected with the commenting mode.

In this embodiment, the specific implementation of each module may refer to the related explanation in the above method embodiment, and will not be repeated here.

The image library with 81 subject labels NUS-WIDE is used for the annotation performance test, and the parameters of the NUS-WIDE image library are shown in the table 1:

TABLE 1NUS-WIDE image library parameters

The existing more classical automatic image annotation model based on the depth network comprises: (1) a CNN model, namely a model for image annotation only by using the image visual features extracted by the convolutional neural network; (2) the main idea of the CNN + softmax model is that the CNN characteristics are utilized to carry out multi-target classification through a softmax function so as to carry out labeling.

By utilizing the NUS-WIDE image library, the image automatic annotation method provided by the invention and the method for carrying out image annotation by utilizing the two image automatic annotation models are compared and analyzed, and the evaluation indexes comprise: recall (c _ R) and precision (c _ P) per label, recall (i _ R) and precision (i _ P) per image, F1-score (c _ F1) per label and F1-score (i _ F1) per image; the results of the comparative analysis are shown in table 2:

TABLE 2 comparative analysis results

The results shown in table 2 show that the automatic labeling and graph cutting method based on the UI image and the system and process thereof provided by the present invention have evaluation indexes superior to those of the other two existing models; therefore, the automatic image annotation method provided by the invention passes

Fusing the visual features and semantic features of the image to obtain high-level features of the image; the method comprises the steps of calculating the probability of each label in an image library when the image to be labeled is labeled according to the high-level characteristics of the image, predicting the number of the labels of the image to be labeled, then combining the calculated probability of the labels and the predicted number of the labels to finish automatic labeling of the image, and effectively improving the accuracy and the labeling performance of image labeling.

The above description is only a preferred embodiment of the present invention, and the present invention is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. An automatic labeling method for UI images is characterized by comprising the following steps:

2. The method according to claim 1, wherein the S100 utilizes a deep learning technique to extract visual features of an image to be annotated, and includes: extracting visual features of the image to be marked by using a visual feature extraction model based on a convolutional neural network; the training method of the visual feature extraction model comprises the following steps: and constructing a first neural network model based on the convolutional neural network, extracting visual features of the image, storing the visual features in an image library, and training the first neural network model by using the visual features of the image extracted from the image library so as to obtain a visual feature extraction model.

3. The method for automatically labeling the UI image according to claim 1, wherein the S200: the method for constructing the candidate label set of the image to be labeled by utilizing the image library comprises the following steps: acquiring the occurrence frequency of each label in an image library; for the image to be annotated, calculating the similarity between the image to be annotated and other images in the image library according to the image distance, thereby obtaining m images with the highest similarity with the image to be annotated; the image distance used for calculating the image similarity is a block distance, an Euclidean distance, an infinite norm, a histogram intersection, a quadratic distance, a Mahalanobis distance and an EMD distance;

4. The method for automatically labeling the UI image according to claim 1, wherein the S300: fusing the visual features and the semantic features of the image to be annotated to obtain the high-level features of the image to be annotated, comprising the following steps: fusing the visual features and semantic features of the image to be annotated by utilizing a full-connection layer to obtain high-level features of the image to be annotated; the method specifically comprises the following steps:

5. The method for automatically labeling the UI image according to claim 4, wherein the S400: according to the high-level characteristics of the image to be labeled, calculating the probability when the image to be labeled is labeled by utilizing each label in the image library, wherein the method comprises the following steps: and constructing a third neural network model based on the multilayer perceptron, and calculating the probability of each label in the image library when the label is marked with the image according to the high-level features of the image.

6. The method for automatically labeling the UI image according to claim 4, wherein the S500: predicting the number of labels needed by the image to be labeled according to the high-level characteristics of the image to be labeled, and the method comprises the following steps: constructing a fourth neural network model based on a multilayer perceptron, and predicting the number of labels needed by the image according to the high-level features of the image; the fourth neural network model comprises two hidden layers, 512 neurons and 256 neurons are respectively arranged, dropout is carried out on all the neurons in the hidden layers, and the probability is set to be 0.5; and training a fourth neural network model by using the image library with the extracted high-level features of the images so as to obtain a label number prediction model.

7. An automatic labeling device for UI images is characterized in that: the labeling method applied to any one of the right 1 to the right 6 comprises the following steps:

8. The apparatus for automatically labeling a UI image according to claim 7, wherein: the image in the image library is the image marked with the label, the candidate label set comprises a plurality of labels in the image library, and N is the number of the labels predicted by the label number prediction module.