CN115373658A

CN115373658A - Method and device for automatically generating front-end code based on Web picture

Info

Publication number: CN115373658A
Application number: CN202210983859.3A
Authority: CN
Inventors: 杨溢龙; 陈晨; 张洁; 张莉
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-11-22

Abstract

The invention provides a method and a device for automatically generating a front-end code based on a Web picture, wherein the method comprises the following steps: and carrying out element identification on the Web picture input into the system by using a trained neural network to obtain the category and position information of the elements, and then generating a container by a container generation algorithm according to the identified element information to obtain the reasonable layout of the Web page. And carrying out independent style inference on each generated element by using a neural network to generate style information of the element. The method solves the problems of limited generated elements, uniform element styles and unmatched layout in the prior art, realizes the generation of various elements, realizes the reasonable layout of the Web page and the generation of style information, and makes up for the defects of the prior art.

Description

Method and device for automatically generating front-end code based on Web picture

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for automatically generating a front-end code based on a Web picture.

Background

The widespread use of GUI (Graphical User Interface, also called Graphical User Interface) is one of the major achievements in the development of computers today, and it is greatly convenient for non-professional users to use the GUI, and from then on, people no longer need to remember a large number of commands hard, but instead operate the GUI conveniently by means of windows, menus, keys, etc. A deep neural network model named as Pix2code can directly output corresponding codes according to input GUI screenshots, and therefore the process of manually compiling the codes at the front end is omitted. The principle is that an input GUI picture is extracted through a convolutional neural network to obtain a characteristic diagram, then the characteristic diagram output by the convolutional neural network is flattened to be used as the input of a cyclic neural network, and a DSL (Domain-Specific Language) sequence is output after the processing of the cyclic neural network. And inputting the obtained DSL sequence into a compiler designed for the DSL to obtain a final front-end code.

However, in the prior art, the types of elements of the Web page extracted by the method are limited, and many necessary elements are not provided, such as an input box (input), a selection box (select), a radio box (radio), a check box (check box), and the like. The elements generated also have no independent styles, and the style of each element is the default style. The generated code also has only HTML code (full name of "Hyper Text Markup Language") and no CSS code (Cascading Style Sheet). The study generated only HTML tags for the elements and the container itself, no layout the Web page should have, and the generated page did not match the input picture.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide the following technical solutions: a front-end code automatic generation method based on Web pictures comprises the following steps:

identifying elements in the Web picture to obtain element information;

generating a container according to the element information, and obtaining page layout information, wherein the page layout information comprises the layout of CSS codes;

performing style reasoning on the element information to obtain a style of the CSS code;

and combining the layout of the CSS codes with the styles of the CSS codes to obtain the CSS codes.

Further, identifying an element in the Web picture, and obtaining element information specifically includes: and inputting the Web picture into an element identification network, and outputting the element information by the element identification network.

Further, training the element recognition network specifically includes:

in the process of training a YOLOX recognition network, element categories are marked on the screenshot of the Web page, and the marked screenshot of the Web page is used as a sample;

and training the YOLOX recognition network through the sample to obtain the element recognition network.

Further, generating a container according to the element information, and obtaining page layout information, specifically including:

generating a container by using a container generation algorithm for the element information, and obtaining position information of the container and nesting relation between the container and the element, wherein the element information comprises element category and element position information;

and generating the layout of the CSS code by combining the element information, the position information of the container and the nesting relation between the container and the element.

Further, generating a container for the element information by using a container generation algorithm specifically includes:

sorting the elements identified from the Web picture according to a first direction to obtain a first sorting result;

in the first sequencing result, if the distance between two adjacent elements is greater than a first threshold, classifying a previous element in the two adjacent elements into a previous container, and classifying a subsequent element in the two adjacent elements into a subsequent container to obtain containers distributed in the first direction;

sorting the elements in the containers distributed in the first direction according to a second direction to obtain a second sorting result;

in the second sorting result, if the distance between two adjacent elements is greater than a second threshold, a previous element in the two adjacent elements is classified into a previous container, and a next element in the two adjacent elements is classified into a next container, so that containers distributed in the second direction are obtained.

And further generating HTML codes by combining the element information, the position information of the container and the nesting relation of the container and the elements.

Further, the color of the element edge pixels is taken as the background color of the element and container.

Further, performing style inference on the element information to obtain a style of the CSS code specifically includes:

inputting a picture of each of the elements into a first neural network, the first neural network outputting a value of font size in the element;

inputting the picture of each element into a second neural network, wherein the second neural network outputs the value of the border fillet in the element.

Further, the first neural network and the second neural network are obtained by training the ResNet34, wherein the loss function of the first neural network and the second neural network is a MAE loss function.

The application also provides a device for automatically generating the front-end code based on the Web picture, which comprises the following steps:

the element identification module is used for identifying elements in the Web picture and acquiring element information;

the page layout module is used for generating a container according to the element information and obtaining page layout information, wherein the page layout information comprises the layout of CSS codes;

the style reasoning module is used for carrying out style reasoning on the element information to obtain the style of the CSS code;

and the code generation module is used for combining the layout of the CSS code with the style of the CSS code to obtain the CSS code.

The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the arbitrary automatic generation method of the front-end code based on the Web picture when executing the computer program.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing any foregoing method for automatically generating a front-end code based on a Web picture is stored in the computer-readable storage medium.

Compared with the prior art, the embodiment of the specification adopts at least one technical scheme which can achieve the beneficial effects that at least: in the embodiment, element recognition is performed on the input Web picture by using a trained neural network to obtain various types of element information, a layout generating part generates a container for the Web page according to the element information obtained by the element recognition part, and then a reasonable layout is generated for the page. And in the pattern reasoning part, the pattern of the elements is reasoned by adopting a neural network method, and the pattern of each element is reasoned independently to obtain the pattern information of the elements. The method and the device realize the generation of various elements, the reasonable layout of the Web page and the generation of style information, automatically generate the front-end code and make up for the defects of the prior art.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for automatically generating a front-end code based on a Web picture according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an automatic front-end code generation structure based on a Web picture according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a computer device provided by an embodiment of the invention;

fig. 4 is a schematic structural diagram of an automatic front-end code generating device based on a Web picture according to an embodiment of the present invention.

Reference numbers in the figures: the device comprises a memory 402, a processor 404, the device 200, an element identification module 201, a page layout module 202, a style inference module 203, and a code generation module 204.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, an embodiment of the present invention provides a method for automatically generating a front-end code based on a Web picture, including: step S101: identifying elements in the Web picture to obtain element information; step S102: generating a container according to the element information, and obtaining page layout information, wherein the page layout information comprises the layout of CSS codes; step S103: performing style reasoning on the element information to obtain a style of the CSS code; step S104: and combining the layout of the CSS codes with the styles of the CSS codes to obtain the CSS codes.

In the embodiment, element recognition is performed on an input Web picture by using a trained neural network to obtain various kinds of element information, a container generation algorithm is designed for a Web page in a layout generation part, a container is generated according to the element information obtained by the element recognition part, and then reasonable layout is generated for the page. And in the pattern reasoning part, the pattern of the elements is reasoned by adopting a neural network method, and the pattern of each element is reasoned independently to obtain the pattern information of the elements. Therefore, the generation of various elements, the reasonable layout of the Web page and the generation of style information are realized, the front-end code is automatically generated, and the defects of the prior art are overcome.

As shown in fig. 1, the first embodiment of the present invention specifically includes the following steps:

s101: identifying elements in the Web picture to obtain element information;

specifically, the Web picture is a Web page screenshot, the Web picture is input to an element identification network, and as shown in fig. 2, the element identification network performs element identification on the picture, so as to output element information in the Web page screenshot. The element recognition network is a neural network model. The HTML elements needing to be identified are determined to be ten types: links, text, paragraphs, pictures, text fields, input boxes, selection boxes, radio boxes, check boxes, buttons. In this embodiment, higher recognition accuracy can be achieved by using YOLOX as an element recognition network. The YOLOX is a high-performance target detection framework and has the advantages of high speed, high detection precision, convenience in deployment and the like. After the network is trained, the trained network can identify elements in the input Web page image. The element information output by the element identification network includes the category of the element and the position information of the element, wherein the position information of the element is specifically the upper left corner coordinate and the lower right corner coordinate of the element.

In specific implementation, the specific process of obtaining the element identification network includes:

s110: in the process of training a YOLOX recognition network, element categories are marked on the screenshot of the Web page, and the marked screenshot of the Web page is used as a sample;

s120: and training the YOLOX recognition network through the sample to obtain the element recognition network.

Specifically, the data set format adopted by the YOLOX recognition network is trained to be the format of a VOC data set, when the application is implemented, screenshots of a plurality of Web pages are automatically intercepted, and the intercepted screenshots of the Web pages are labeled by using a tool for labeling the data set, namely, labelImg, which is a graphical image annotation tool. There are ten categories of elements labeled: links, text, paragraphs, pictures, text fields, input boxes, selection boxes, radio boxes, check boxes, buttons. In the labeling process, the elements in the screenshot of the Web page are selected by using an edge frame, labeled information is recorded as an xml file, 1000 data sets used for training the Yolox recognition network are manufactured in total, and the Yolox recognition network is subjected to data training based on the data sets as training samples, so that the element recognition network is obtained. After training, the accuracy of the elements in the network recognition input screenshot is greatly improved, and almost all the elements in the page can be recognized.

S102: generating a container according to the element information, and obtaining page layout information, wherein the page layout information comprises the layout of CSS codes;

in specific implementation, the container is generated according to the element information, and the specific process of obtaining the page layout information includes:

step S210: generating a container for the element information by using a container generation algorithm, and obtaining position information of the container and a nesting relation between the container and the element, wherein the element information comprises element category and element position information;

step S220: and generating the layout of the CSS code by combining the element information, the position information of the container and the nesting relation between the container and the element.

In specific implementation, combining the element information, the position information of the container, and the nesting relationship between the container and the element, the method further includes: HTML code is generated.

In particular, a container is generated for the identified elements using a container generation algorithm for the category and location information of the elements identified by the element identification network. The container, i.e., the collection of some elements, can provide efficient and flexible memory management. And generating a container based on the container generation algorithm, and obtaining the position information of the container and the nesting relation between the container and the element. And then generating a layout part of the CSS code and the HTML code by utilizing the identified element category information, the element position information and the nesting relation of the container position information, the container and the element. The layout portion of the CSS code includes the width and height of the page, line spacing, location of content tiles, and so on. In the embodiment of the application, screenshots of 20 Web pages are randomly intercepted and used as a test set for evaluating the container generation algorithm. In addition, the text of the element is recognized based on an OCR (Optical Character Recognition) technology, and the text content of the element is obtained, so that the text content of the element is consistent with that of the input Web picture.

In specific implementation, the specific generation process of the container generation algorithm includes:

step S211: sorting the elements identified from the Web picture according to a first direction to obtain a first sorting result;

step S212: in the first sequencing result, if the distance between two adjacent elements is greater than a first threshold, classifying a previous element in the two adjacent elements into a previous container, and classifying a subsequent element in the two adjacent elements into a subsequent container to obtain containers distributed according to the first direction;

specifically, the first direction is a longitudinal direction. And sequencing the elements identified from the Web picture according to the up-down sequence to obtain the first sequencing result. In the first sorting result, if the distance between two elements exceeds the first threshold, the upper element of the two elements is classified as the previous container, and the lower element is classified as the next container. Longitudinally distributed containers can thus be obtained.

Step S213: sorting the elements in the containers distributed in the first direction according to a second direction to obtain a second sorting result;

step S241: in the second sorting result, if the distance between two adjacent elements is greater than a second threshold, a previous element in the two adjacent elements is classified into a previous container, and a next element in the two adjacent elements is classified into a next container, so that containers distributed in the second direction are obtained.

Specifically, the second direction is a lateral direction. And transversely sorting the elements in the containers distributed in the first direction, namely in the containers distributed in the longitudinal direction, and if the distance between the left element and the right element exceeds the second threshold value, classifying the left element as the left container and classifying the right element as the right container, thereby obtaining the transversely distributed containers.

S103: performing style reasoning on the element information to obtain a style of the CSS code;

specifically, as shown in fig. 2, based on the identified element information, the style of each element is individually subjected to style inference through a neural network, resulting in style information of the element. The style inference part needs to make inference styles including: background color, font size, frame fillet. Regarding the background color, the color of the element edge pixel is taken as the background color, thereby obtaining the background color of the element and the container.

In a specific implementation, the style inference is performed on the element information, and the process of obtaining the style of the CSS code includes:

step S310: inputting a picture of each of the elements into a first neural network, the first neural network outputting a value of a font size in the element;

step S320: inputting the picture of each element into a second neural network, wherein the second neural network outputs the value of the border fillet in the element.

In particular, the pattern inference on the element information is performed using a neural network, i.e., a pattern inference network. The first neural network is a font size inference network, and the second neural network is a frame fillet inference network. In the embodiment of the application, elements with different styles are intercepted from different webpages, HTML and CSS codes of the webpages are checked to obtain numerical values of the corresponding styles of the elements, and therefore two data sets are manufactured and are respectively used for training samples of a frame fillet reasoning network and a font size reasoning network. Each data set comprises 1000 pictures, and each picture comprises a text file recorded with a style value. Then, the value of the element font size and the value of the border fillet are output through the first neural network and the second neural network respectively.

In a specific implementation, the first neural network and the second neural network are obtained by training ResNet34, and ResNet34 is a deep residual network. Wherein the loss functions of the first and second neural networks are MAE (Mean Absolute Error) loss functions.

In selecting the neural network used for the style inference, it is considered that the input of the style inference is a picture (picture with one element) and the output is a real number (font size, value of bounding corners). Then, neural networks AlexNet and ResNet34 for image classification are used, the softmax layer of the last layer is changed to a fully connected layer with one value output, and the loss function is changed to MAE loss, so as to complete the regression task. After the self-made data set is used for training the style inference network, the experimental result proves that ResNet34 has a better effect, so ResNet34 is finally used. The method adopts a neural network method to reason the style of the elements, and the style of each element is independently inferred to obtain the style information of the elements, so that the generated page style is more diversified.

S104: and combining the layout of the CSS codes with the style of the CSS codes to obtain the CSS codes.

Specifically, as shown in fig. 2, based on the layout of the CSS code obtained by the layout generating section and the style of the CSS code obtained by the style inference section, HTML code and CSS code that meet the requirements of a page are obtained and finally the page is generated. The method realizes automatic generation of HTML and CSS codes from the input screenshot of the Web page, and comprises three contents of elements, layout and style.

In this embodiment, as shown in fig. 3, a computer device is provided, which includes a memory 402, a processor 404 and a computer program stored in the memory 402 and executable on the processor 404, and when the processor 404 executes the computer program, the processor 404 implements any one of the above-mentioned automatic generation methods of the front-end code based on the Web picture.

In particular, the computer device may be a computer terminal, a server or a similar computing device.

In the present embodiment, a computer-readable storage medium is provided, which stores a computer program for executing any one of the above-described methods for automatically generating a front-end code based on a Web picture.

In particular, computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Based on the same inventive concept, the embodiment of the present invention further provides an apparatus for automatically generating a front-end code based on a Web picture, as described in the following embodiments. The principle of solving the problems of the automatic front-end code generating device based on the Web picture is similar to that of the automatic front-end code generating method based on the Web picture, so the implementation of the automatic front-end code generating device based on the Web picture can refer to the implementation of the automatic front-end code generating method based on the Web picture, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

As shown in fig. 4, an embodiment of the present invention further provides an apparatus 200 for automatically generating a front-end code based on a Web picture, including: an element identification module 201, configured to identify an element in the Web picture, and obtain element information; a page layout module 202, configured to generate a container according to the element information, and obtain page layout information, where the page layout information includes a layout of CSS codes; the style inference module 203 is used for performing style inference on the element information to obtain a style of the CSS code; and the code generating module 204 is configured to combine the layout of the CSS code with the style of the CSS code to obtain the CSS code.

In this embodiment, the element identification module 201 inputs the Web picture into an element identification network, and the element identification network outputs the element information. In the page layout module 202, a container is generated for the element information by using a container generation algorithm, and position information of the container and a nesting relation between the container and the element are obtained; and generating the layout of the HTML codes and the CSS codes by combining the element information, the position information of the container and the nesting relation of the container and the elements. In the style inference module 203, the color of the edge pixel of an element is used as the background color of the element and the container, and the picture of each element is input into a first neural network, and the first neural network outputs the value of the font size in the element; and inputting the picture of each element into a second neural network, and outputting the value of a frame fillet in the element by the second neural network to obtain the pattern of the CSS code. The code generation module 204 combines the layout of the CSS code with the style of the CSS code to obtain the CSS code.

The above description is only exemplary of the invention and should not be taken as limiting the scope of the invention, so that the invention is intended to cover all modifications and equivalents of the embodiments described herein. In addition, the technical features, the technical schemes and the technical schemes can be freely combined and used.

Claims

1. A method for automatically generating a front-end code based on a Web picture is characterized by comprising the following steps:

identifying elements in the Web picture to obtain element information;

and combining the layout of the CSS codes with the style of the CSS codes to obtain the CSS codes.

2. The method for automatically generating the front-end code based on the Web picture as claimed in claim 1, wherein identifying the element in the Web picture and obtaining the element information comprises:

and inputting the Web picture into an element identification network, and outputting the element information by the element identification network.

3. The method for automatically generating the front-end code based on the Web picture as claimed in claim 2, further comprising:

4. The method for automatically generating the front-end code based on the Web picture according to any one of claims 1 to 3, wherein the obtaining of the page layout information according to the element information generation container comprises:

5. The method of claim 4, wherein generating a container for the element information by using a container generation algorithm comprises:

in the first sequencing result, if the distance between two adjacent elements is greater than a first threshold, classifying a previous element in the two adjacent elements into a previous container, and classifying a subsequent element in the two adjacent elements into a subsequent container to obtain containers distributed according to the first direction;

6. The method for automatically generating the front-end code based on the Web picture as claimed in claim 4, further comprising: and generating HTML codes by combining the element information, the position information of the container and the nesting relation of the container and the elements.

7. The method for automatically generating the front-end code based on the Web picture according to any one of claims 1 to 3, wherein performing style inference on the element information to obtain the style of the CSS code comprises:

the color of the element edge pixels is taken as the background color of the element and container.

8. The method for automatically generating the front-end code based on the Web picture according to any one of claims 1 to 3, wherein performing style inference on the element information to obtain the style of the CSS code comprises:

9. The method as claimed in claim 8, wherein the first neural network and the second neural network are obtained by training a ResNet34, and wherein the loss functions of the first neural network and the second neural network are MAE loss functions.

10. An automatic front-end code generation device based on Web pictures is characterized by comprising:

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for automatically generating a front-end code based on a Web picture according to any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method for automatically generating a front-end code based on a Web picture according to any one of claims 1 to 9.