WO2023093850A1 - 组件识别方法、装置、电子设备及存储介质 - Google Patents

组件识别方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023093850A1
WO2023093850A1 PCT/CN2022/134361 CN2022134361W WO2023093850A1 WO 2023093850 A1 WO2023093850 A1 WO 2023093850A1 CN 2022134361 W CN2022134361 W CN 2022134361W WO 2023093850 A1 WO2023093850 A1 WO 2023093850A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
node
block
component
model
Prior art date
Application number
PCT/CN2022/134361
Other languages
English (en)
French (fr)
Inventor
夏姝
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023093850A1 publication Critical patent/WO2023093850A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present invention is based on a Chinese patent application with application number 202111422206.X and a filing date of November 26, 2021, and claims the priority of this Chinese patent application.
  • the entire content of this Chinese patent application is hereby incorporated by reference.
  • the present application relates to the field of computer technology, and relates to a component identification method, device, electronic equipment and storage medium.
  • the embodiments of the present application provide a component identification method, device, electronic device and storage medium, so as to at least solve the problem of low efficiency in component identification in related technologies.
  • An embodiment of the present application provides a component identification method, the method includes:
  • the setting recognition model Inputting the first image into the setting recognition model to obtain at least one UI block in the first image output by the setting recognition model; the first image is determined based on the first visual draft; the setting recognition model Used to identify at least one UI block in the input image; each UI block includes at least an image area that can be rendered with UI components;
  • the determination of a first node corresponding to each UI block in the at least one UI block among the nodes of the Document Object Model DOM corresponding to the first visual draft includes:
  • the area overlapping ratio represents the overlapping ratio between the second image and the first UI block; the second image is obtained by cutting the first visual draft according to the node information of the nodes.
  • the method before inputting the first image into the setting recognition model, the method further includes:
  • the clipping of the first visual draft based on at least one first rectangular area includes:
  • the input of the image corresponding to each first node to set the classification model includes:
  • the second image obtained by cutting the first visual draft according to the node information of the first node is input into the set classification model
  • the third image obtained by cutting the first draft according to the UI block corresponding to the first node is input into the set classification model.
  • the method before inputting the first image into the setting recognition model, the method further includes:
  • the set recognition model and the set classification model are obtained through training;
  • the first label includes the first position information of the image area that can be rendered by the UI component and the corresponding first label; the first position information is used to describe the position of the image area in the second visual draft; the first Tags are used to describe the type of UI component.
  • the training to obtain the set recognition model and the set classification model based on the second visual draft corresponding to at least one first label includes:
  • the second tag indicates that the corresponding image area is a UI block
  • the set recognition model is obtained based on the fifth image training.
  • the method before training the set classification model based on the fourth image, the method further includes:
  • the generation of a page image containing a corresponding UI component type includes:
  • the third visual draft is obtained through the visual draft generation code tool, and the page image corresponding to the UI component type is obtained.
  • the embodiment of the present application also provides a component identification device, including:
  • the first processing module is configured to input the first image into the set recognition model, and obtain at least one UI block in the first image output by the set recognition model; the first image is based on the first visual draft Determine; the set identification model is configured to identify at least one UI block in the input image; each UI block includes at least an image area that can be rendered with UI components;
  • the second processing module is configured to, among the nodes of the DOM corresponding to the first visual draft, determine a first node corresponding to each UI block in the at least one UI block;
  • the third processing module is configured to input the image corresponding to each first node into the set classification model, and obtain the UI component type corresponding to the first node output by the set classification model; the set classification model is configured as Determine the type of UI component corresponding to the input image.
  • the embodiment of the present application also provides an electronic device, including: a processor and a memory for storing a computer program that can run on the processor,
  • the processor when configured to run the computer program, it executes the steps of the above component identification method.
  • the embodiment of the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above component identification method are implemented.
  • the first image is input into the set recognition model, and at least one UI block in the first image output by the set recognition model is obtained; the first The image is determined based on the first visual draft; the recognition model is set to identify at least one UI block in the input image; each UI block includes at least an image area that can be rendered with UI components; corresponding to the first visual draft Among the nodes of the document object model DOM, determine a first node corresponding to each UI block in at least one UI block; input the image corresponding to each first node into the set classification model, and obtain the set classification model The UI component type corresponding to the first output node; the classification model is set to determine the UI component type corresponding to the input image.
  • FIG. 1 is a schematic diagram of the implementation flow of the component identification method provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of identifying a UI block in an image provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an implementation flow of a determination node provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a UI block recognition result provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the implementation process of page segmentation provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of the implementation flow of component identification provided by the application embodiment of the present application.
  • Fig. 7 is a schematic diagram of the component identification page provided by the application embodiment of the present application.
  • FIG. 8 is a flow chart of the implementation of model training provided by the application embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a component identification device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the process of generating UI code from visual draft includes inputting structured visual draft data, and then generating UI code through steps such as layer information analysis, layout, component replacement, semanticization, and code generation.
  • component replacement is a key part of the visual draft to generate UI code.
  • the first image is input into the setting recognition model, and at least one UI block in the first image output by the setting recognition model is obtained; the first image is determined based on the first visual draft ; Set the recognition model to identify at least one UI block in the input image; each UI block at least includes an image area that can be rendered with UI components; in the DOM node corresponding to the first visual draft, determine A first node corresponding to each UI block in at least one UI block; input the image corresponding to each first node into the setting classification model, and obtain the UI component type corresponding to the first node output by the setting classification model ; Set the classification model to determine the type of UI component corresponding to the input image.
  • Fig. 1 is a schematic diagram of the implementation flow of the component identification method provided by the embodiment of the present application.
  • the embodiment of the present application provides a component identification method applied to electronic devices, wherein the electronic devices include but not limited to servers, terminals and other electronic devices.
  • the component identification method includes:
  • Step 101 Input a first image into a set recognition model, and obtain at least one UI block in the first image output by the set recognition model.
  • the first image is determined based on the first visual draft; the set recognition model is used to identify at least one UI block in the input image; each UI block includes at least an image that can be rendered by UI components area.
  • the component recognition device inputs the first image into the set recognition model, and the set recognition model recognizes at least one UI block in the input first image, and each UI block at least includes an image area that can be rendered by UI components.
  • the setting recognition model is obtained through training of visual drafts marked with UI blocks.
  • the first image is determined based on the first visual draft, and may be part or all of the images of the first visual draft.
  • Each UI block is an image area, which can be regarded as an image.
  • Step 102 Among the nodes of the DOM corresponding to the first visual draft, determine a first node corresponding to each UI block in the at least one UI block.
  • the component identification device determines a first node corresponding to each UI block among the nodes corresponding to the first visual draft based on each UI block in the at least one UI block determined by the set identification model. Wherein, in the case that the node and a certain UI block satisfy the set condition, this node is determined as the first node of the UI block.
  • the node of DOM is the node of the structured data (that is, Schema data) obtained by parsing the visual draft
  • the Schema data is a tree structure composed of all elements in the visual draft, which can be stored in (JavaScript Object Notation) JSON format, each node Including node information such as the width, height, and position of the corresponding image.
  • the image of the first visual draft can be obtained through the node information of the root node in the DOM corresponding to the visual draft.
  • the node information of the root node includes a Uniform Resource Locator (Uniform Resource Locator, URL) address of the complete page preview.
  • Step 103 Input the image corresponding to each first node into the setting classification model, and obtain the UI component type corresponding to the first node output by the setting classification model.
  • the set classification model is used to determine the type of UI component corresponding to the input image.
  • the component identification device inputs the image corresponding to each first node into the setting classification model, and the setting classification model classifies the UI component type of the input image, and determines the UI component type corresponding to the image corresponding to the first node, thereby obtaining the second The UI component type corresponding to a node.
  • the classification model is obtained by training images marked with UI component types.
  • the determined classification results can be mounted in the JSON file.
  • the first node corresponding to the image area that can be rendered by UI components in the first visual draft is determined, and the UI component type corresponding to the first node is obtained by setting the classification model, so that manual labeling is not required
  • the UI component type of the node in the DOM corresponding to the visual draft improves the efficiency of component recognition.
  • the component recognition device determines the UI component type corresponding to the first node
  • the UI code is generated based on the code of the corresponding type of UI component, which simplifies the programming work from visual draft to UI code .
  • Using the code of the corresponding component can realize automatic component replacement, lowering the threshold for using UI code development.
  • the component recognition device determines the image area that can be rendered by the UI component in the first visual draft by setting the recognition model, determines the corresponding first nodes, and inputs the images of these first nodes into the setting classification model, so that the setting The classification model does not need to process the images of all the nodes of the first visual draft, which reduces the calculation amount of the model and saves the calculation resources required for component identification.
  • the recognition model when the recognition model is set to recognize the UI block, it is not required that the area of the recognized UI block is exactly the same as the image area rendered by the UI component. In this way, the set classification model recognizes The accuracy requirement is relaxed, which improves the robustness of the set recognition model.
  • the first image includes a plurality of UI blocks (21-23), and the UI block 23 is not exactly the same as the image area 231 rendered by the UI component, but it does not affect the setting of the classification model to determine the corresponding nodes.
  • the UI component type when the recognition model is set to recognize the UI block.
  • the inputting the image corresponding to each first node into the classification model includes:
  • the second image obtained by cutting the first visual draft according to the node information of the first node is input into the set classification model
  • the third image obtained by cutting the first visual draft according to the UI block corresponding to the first node is input into the set classification model.
  • the component identification device inputs and sets the image corresponding to each first node of the classification model, which may be the second image obtained by cutting the first visual draft according to the node information of the first node, or may be based on the UI area corresponding to the first node
  • the third image obtained by block-cutting the first visual draft may also be the second image and the third node corresponding to the first node.
  • this node is determined as the first node of the UI block.
  • the set condition is that the area overlap ratio between the image corresponding to the node and the UI block is greater than a set threshold.
  • determining a first node corresponding to each UI block in the at least one UI block includes:
  • the corresponding node is the first node corresponding to the first UI block
  • the area overlapping ratio represents the overlapping ratio between the second image and the first UI block; the second image is obtained by cutting the first visual draft according to the node information of the nodes.
  • the component identification device determines a corresponding first node in the DOM node corresponding to the first visual draft, traverses the nodes in the DOM corresponding to the first visual draft, and calculates the second image corresponding to the node and If the area overlap rate of the UI block is greater than the first set threshold, the corresponding node is determined to be the first node corresponding to the first UI block. After the first node corresponding to the first UI block is determined, the traversal of the remaining nodes in the DOM of the first visual draft may be stopped, and the determination of the first node corresponding to the next first UI block may be restarted.
  • the area coverage represents the image overlap of the UI block and the second image corresponding to the node, and a higher area coverage indicates a higher overlap ratio of the two images.
  • the traversal of the nodes in the DOM corresponding to the first visual draft by the component identification device may be pre-order traversal, in-order traversal or post-order traversal.
  • the preamble traversal and the first set threshold value of 0.7 are taken as an example for illustration.
  • preorder traversal is performed on the structured data (ie, Schema data) of the mockup.
  • the component identification device takes the root node as the current node, and calculates the area coverage of the second image and the first UI block corresponding to the current node.
  • the area coverage rate is not greater than 0, it means that the two do not intersect, then this node and its child nodes cannot correspond to the first UI block, and judge whether this node has untraversed sibling nodes, if there are untraversed For sibling nodes, take the sibling node as the current node and calculate the area coverage.
  • the node corresponds to the first UI block, and this node is determined and marked as the first node corresponding to the first UI block. Add ⁇ 'smart_ui':'ui ⁇ field in the JSON information of the node for marking.
  • the recognition model when the recognition model is set to recognize the UI block, it does not require that the recognized UI block area is exactly the same as the image area rendered by the UI component, that is, the UI area shown in Figure 4 Schematic diagram of block recognition results.
  • the edges of the recognized UI blocks are not completely accurate.
  • the image corresponding to the first node can be further obtained, which improves the accuracy of the classification result of the set classification model.
  • the method before inputting the first image into the setting recognition model, the method further includes:
  • the component identification device determines image edge features through image algorithms such as edge detection, identifies at least two first rectangular areas of the first visual draft based on edge detection, and divides at least two first rectangular areas, each time based on at least one first rectangular area Region cropping the first mockup to obtain a corresponding first image.
  • image algorithms such as edge detection, identifies at least two first rectangular areas of the first visual draft based on edge detection, and divides at least two first rectangular areas, each time based on at least one first rectangular area Region cropping the first mockup to obtain a corresponding first image.
  • the component identifying means identifies blocky elements (ie first rectangular areas) in the image based on edge detection.
  • the element detection module of the open source algorithm UI2CODE can be used to generate the code of the visual draft, which will not destroy the image with the complete outline of the node.
  • the first visual draft is divided into at least two images, and the aspect ratio of the divided image is within the range of the best recognition effect of the set recognition model, thereby improving Improve the accuracy of identifying the UI block of long mockups, and improve the image recognition effect of the set recognition model.
  • the method when the first visual draft is cropped based on at least one first rectangular area, the method includes:
  • the setting direction may be a vertical direction, that is, the y-axis direction.
  • the component recognition device merges at least two first rectangular areas whose sum of the lengths of the set directions is less than a second set threshold to obtain a second rectangular area, and crops the first visual draft based on the second rectangular area.
  • the component identification device can sequentially determine whether the sum of the lengths of two adjacent first rectangular areas is less than the second set threshold in the set direction, and if the sum of the lengths is less than the second set threshold, this Merge the two first rectangular areas, and continue to judge the sum of the lengths of the merged rectangular area and the next adjacent first rectangular area in the set direction until the sum of the lengths of the two rectangular areas in the set direction greater than or equal to the second set threshold, and crop the first visual draft based on the combined rectangular area.
  • segmentation of the visual draft includes the following steps:
  • Identify block elements ie, the first rectangular area
  • a rectangular area may be inserted between the first rectangular areas for filling.
  • the elements 61 and 62 in FIG. 5 are not recognized and detected, and the rectangular areas 63 and 64 are inserted to ensure that the corresponding information is not lost.
  • the recognition model is set to be trained by visual drafts marked with UI blocks, considering the training cost and other reasons, the number of training samples cannot be unlimited. For some images, such as images whose aspect ratio exceeds that of the training samples, the recognition effect by setting the recognition model is not good.
  • the first visual draft is divided into at least two images, and the aspect ratio of the divided image is set at the maximum value of the recognition model. It is within the range of the best recognition effect, thereby improving the accuracy of recognizing the UI block of the long visual draft, and improving the recognition effect of the set recognition model on the image.
  • the method before inputting the first image into the setting recognition model, the method further includes:
  • the set recognition model and the set classification model are obtained through training;
  • the first label includes the first position information of the image area that can be rendered by the UI component and the corresponding first label; the first position information is used to describe the position of the image area in the second visual draft; the first Tags are used to describe the type of UI component.
  • the component identification device determines the position of an image area that can be rendered by the UI component in the second visual draft based on the first position information of each first mark corresponding to the second visual draft. Based on the first label of each first annotation corresponding to the second mockup, it is determined which type of UI component the image area is rendered based on. Based on the image corresponding to the second mockup with at least one first annotation, a corresponding recognition sample data set and a classification sample data set are determined, and a corresponding model is trained based on the corresponding data set.
  • the second visual draft is a visual draft sample, which may be a real visual draft, or a visual draft generated according to needs.
  • the training to obtain the set recognition model and the set classification model based on the second visual draft corresponding to at least one first annotation includes:
  • the second tag indicates that the corresponding image area is a UI block
  • the set recognition model is obtained based on the fifth image training.
  • the component recognition device crops the second visual draft to obtain the corresponding fourth image, the fourth image corresponds to the first label with the first label, and the corresponding first label can be determined based on the first label.
  • the component recognition device replaces the first label of the second visual draft with the second label to obtain a fifth image, based on the second label of the fifth image, it can determine the corresponding image area as a UI block, and train and set the recognition model based on the fifth image .
  • the component recognition device can obtain two kinds of training samples by multiplexing the images of the second visual draft, which are respectively used to train corresponding types of models, thus reducing the cost of obtaining model training samples.
  • the method before the training of the set classification model based on the fourth image, the method further includes:
  • the component identification device When the number of fourth images corresponding to at least one first tag is less than the third set threshold, the component identification device generates a page image containing the corresponding UI component type, and obtains the corresponding fourth image based on the generated page image.
  • the component recognition device When the number of fourth images of a UI component type is less than the third set threshold, the component recognition device generates a page image corresponding to the UI component type, and then uses tools such as Puppeteer to shake the component node attributes in the page, Including text changes, element position offsets, etc., take a screenshot of the processed page to obtain the fourth image.
  • tools such as Puppeteer to shake the component node attributes in the page, Including text changes, element position offsets, etc.
  • the generating a page image containing a corresponding UI component type includes:
  • the third visual draft is obtained through the visual draft generation code tool, and the page image corresponding to the UI component type is obtained.
  • the component identification device can generate the web page image corresponding to the UI component type in at least one of the following ways: render the component code of the corresponding UI component type on the page to obtain the corresponding web page image; web page image for your manuscript.
  • the component identification device can generate the webpage image by calling the component code in the code library.
  • the component identification device overcomes the problem of unbalanced distribution of samples of different component types in the data set by generating training samples of the model. In this way, the classification model is trained based on the data set, which improves the accuracy of the classification model in classifying images.
  • Fig. 6 shows a schematic diagram of the implementation flow of component identification provided by the application embodiment of the present application.
  • the implementation process of component identification includes the following steps:
  • Step 701 Input structured mockup data Schema.
  • the common visual draft is in sketch or psd format
  • the input of this application embodiment is the structured data description obtained by parsing the visual draft, that is, schema data.
  • Schema data is a tree structure composed of all elements in the visual draft, stored in JSON format, where each node includes node information such as width, height, and position.
  • Step 702 Take the image of the root node as the mockup image.
  • the root node of the Schema data contains the URL address of the full page preview, and the full page preview is downloaded as the original visual draft for subsequent processing.
  • Step 703 page segmentation.
  • the component recognition device divides the long visual draft into multiple images of appropriate height through page segmentation algorithms such as edge detection.
  • the page segmentation divides the long visual draft into images of appropriate height, and uses image algorithms such as edge detection to perform segmentation based on image edge features, without destroying the image area that can be rendered by UI components, and solves the recognition of long visual drafts question.
  • Object detection is one of the basic tasks in the field of computer vision, including two subtasks of object localization and classification. Input an image and find the category and position of the target object in the image.
  • Step 704 UI block identification.
  • Identify at least one UI block in the image by using the UI block recognition model ie, set the recognition model.
  • the UI block recognition uses the UI block recognition model to identify areas in the image that may be components (ie, UI blocks).
  • the UI block recognition model is a target detection network model, which is trained based on the Mask-RCNN pre-training model of the deep learning target detection model through the target detection dataset with UI block labels.
  • the production of the data set uses the Labelme tool to label the collected visual drafts, and mark the position and classification of each component in the visual drafts.
  • the obtained annotation result is in JSON format, which records the type and coordinates of the component.
  • Labelme is a data labeling tool that can be used to label common visual tasks such as classification, detection, and segmentation, and supports VOC format and COCO format export.
  • the component recognition device performs model training and optimization based on the target detection data set, and uses the trained UI block recognition model for UI block recognition.
  • the UI block recognition model is used for recognition, and the coordinates and category "ui" of the UI block are obtained, as shown in FIG.
  • the UI block recognition model will also output the confidence corresponding to each identified UI block, for example, "ui 0.95", that is, the confidence that the image area is a UI block is 0.95.
  • Step 705 Node mapping.
  • the component identification device maps the identified blocks to the nodes in the visual draft by calculating the area coverage of the UI block of the image and the node of the DOM corresponding to the visual draft, and marks the nodes as UI blocks. Determine the smaller nodes corresponding to the UI blocks in the mockup based on the area coverage.
  • the mockup data is the mockup input in step 701 .
  • the edges of the blocks recognized by the UI block recognition model are not accurate, and it is necessary to map the blocks to the nodes in the mockup.
  • node mapping is shown in FIG. 3 , which will not be repeated here. If the area coverage is greater than the first set threshold, the node is considered to correspond to the first UI block, and this node is determined and marked as the first node corresponding to the first UI block. Add ⁇ 'smart_ui':'ui ⁇ field in the JSON information of the node for marking.
  • Step 706 Component Classification.
  • the component identification device cuts out the UI block image from the original image of the visual draft according to the coordinates and width and height information of the node corresponding to the UI block, and sends it into the UI type classification model (ie, the setting classification model) for classification to obtain the type of the component.
  • the UI type classification model ie, the setting classification model
  • image classification is one of the basic tasks in the field of computer vision. An image is input and the classification result of the image is output.
  • Step 707 Output the visual draft structure data with component marks.
  • the component mark is the component type mark of the Dom node corresponding to the mockup.
  • each node in the mockup can also be classified into UI components.
  • the component identification device obtains the nodes that may be blocks through the above steps, and needs to determine the UI component type corresponding to the nodes.
  • Classify by training a MobileNet classification model.
  • Sample sources for classification model datasets including real mockups and/or generated mockups.
  • the real visual draft uses the marked target detection data set, and the visual draft page is cut according to the coordinates in the marked information to obtain a classified data set.
  • Each sample in the classified data set has a corresponding UI component type label.
  • the data set is expanded by generating samples.
  • the component recognition device After the component recognition device is trained to obtain the UI type classification model, according to the marked UI block nodes, according to the node information such as coordinates, width and height, the image is cut from the original visual draft, and sent to the UI type classification model for inference, and the type of UI component is obtained .
  • the determined classification results can be mounted in the JSON file.
  • step 701 after step 701, step 702 and step 703 may be skipped, and step 704 may be directly performed. It is also possible to create corresponding target detection data sets for visual drafts of different lengths, and train multiple UI block recognition models, each of which recognizes input images of different lengths. That is to say, after the structured mockup data is input in step 701, there is no need to segment the page, but step 704 is directly executed to perform image recognition using the UI block recognition model that recognizes the corresponding length.
  • the component recognition device marks the real visual draft, and the mark information includes the position of the component and the type of the UI component, and obtains a target detection data set with a label of the UI component type (the first label).
  • the target detection data set with the UI block label is obtained.
  • the target detection model is trained based on the target detection data set, and the UI block recognition model (ie, the set recognition model) is obtained.
  • the component recognition device Based on the obtained target detection data set with UI component type label, the component recognition device obtains the classification data set with UI component type label by cutting and transforming the target detection data set, and judges the data set with less data in the classification data set. Whether the number of samples of the UI component type satisfies the set condition, by generating samples of the UI component type with a small amount of data. Based on the clipped and generated samples of the classification data set, the classification model is trained to obtain the UI component type classification model (ie, the set classification model).
  • Two methods can be used, one is to render the component code in the page, and the other is to use the webpage that restores the visual draft obtained through the visual draft generation code tool;
  • Puppeteer is a Node library that controls Chrome or Chromium through the DevTools Protocol. Based on a series of APIs provided, it can simulate manual operations such as opening web pages and clicking, and can be used for screenshots, automated testing, etc.
  • the trained model can only recognize visual drafts with an aspect ratio within a certain range, and the recognition effect of the model is not good.
  • the component identification device divides the component identification into two stages: first, identify the UI block through target detection, that is, the area that may be a component; then classify the UI block to obtain the corresponding UI component type, The amount of calculation is greatly reduced. For example, there are 200 nodes in a visual draft, and 10 components need to be identified. If each node in the visual draft is traversed to obtain the classification result, 200 node images need to be processed. Through the scheme of this application, only the target detection Finally, classify 10 images that identify block nodes.
  • the training of the deep learning network requires a large number of samples, and the quality of the samples largely determines the upper limit of the model effect. Due to the different use frequencies of different types of components, the distribution of samples of different component types in the data set is not balanced, and the recognition and classification effect of component types with a small number of samples is poor. Moreover, the simulation generates a complete mockup image sample similar to a real mockup, which requires a relatively high cost, while the expansion of a sample for generating a single UI component is more convenient and requires a lower cost. Moreover, because the images rendered by some UI components have similar outline characteristics and have a high visual similarity, the UI block recognition model trained based on the images corresponding to these UI components can also detect UIs that appear less frequently.
  • the UI block of the component which reduces the number of samples required for the object detection dataset. Then, the UI component type is specifically judged through the classification model.
  • two models are used to process images, and by reusing training samples, only fewer samples are needed to train the model to complete component recognition, which reduces the cost of obtaining training samples.
  • the problem of unbalanced distribution of samples of different component types in the data set is overcome. In this way, training the classification model based on the data set improves the accuracy of the classification model for classifying images, and improves the recognition of the UI component type of the visual draft. Effect.
  • the embodiment of the present application also provides a component identification device, as shown in Figure 9, the device includes:
  • the first processing module 1001 is configured to input the first image into the set recognition model, and obtain at least one UI block in the first image output by the set recognition model; the first image is based on the first visual The manuscript is determined; the set identification model is configured to identify at least one UI block in the input image; each UI block includes at least an image area that can be rendered with UI components;
  • the second processing module 1002 is configured to, among the nodes of the DOM corresponding to the first visual draft, determine a first node corresponding to each UI block in the at least one UI block;
  • the third processing module 1003 is configured to input the image corresponding to each first node into the setting classification model, and obtain the UI component type corresponding to the first node output by the setting classification model; the setting classification model is configured To determine the type of UI component corresponding to the input image.
  • the second processing module 1002 is further configured to:
  • the area overlapping ratio represents the overlapping ratio between the second image and the first UI block; the second image is obtained by cutting the first visual draft according to the node information of the nodes.
  • the device further includes a cropping module configured to:
  • the clipping module is further configured to:
  • said inputting the image corresponding to each first node into the classification model includes:
  • the second image obtained by cutting the first visual draft according to the node information of the first node is input into the set classification model
  • the third image obtained by cutting the first draft according to the UI block corresponding to the first node is input into the set classification model.
  • the device further includes a training module configured to:
  • the set recognition model and the set classification model are obtained through training;
  • the first annotation includes the first position information of the image area that can be rendered by the UI component and the corresponding first label; the first position information is configured to describe the position of the image area in the second mockup; the second A label is configured to describe the type of the UI component.
  • the training module is further configured to:
  • the second tag indicates that the corresponding image area is a UI block
  • the set recognition model is obtained based on the fifth image training.
  • the device further includes a generating module configured to:
  • the generating a page image containing a corresponding UI component type includes:
  • the third visual draft is obtained through the visual draft generation code tool, and the page image corresponding to the UI component type is obtained.
  • the first processing module 1001, the second processing module 1002, the third processing module 1003, the cropping module, the training module, and the generation module can be processed based on the component recognition device.
  • Devices such as central processing unit (CPU, Central Processing Unit), digital signal processor (DSP, Digital Signal Processor), micro control module (MCU, Microcontroller Unit) or programmable gate array (FPGA, Field-Programmable Gate Array), etc. accomplish.
  • the component identification device provided in the above-mentioned embodiment performs component identification, it only uses the division of the above-mentioned program modules for illustration. That is, the internal structure of the device is divided into different program modules to complete all or part of the processing described above.
  • the component identification device and the component identification method embodiments provided by the above embodiments belong to the same idea, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 10 is a schematic diagram of the hardware composition structure of the electronic device in the embodiment of the present application. As shown in FIG. 10, the electronic device includes:
  • Communication interface 1 which can exchange information with other devices such as network devices;
  • the processor 2 is connected to the communication interface 1 to implement information interaction with other devices, and is used to execute the methods provided by one or more of the above technical solutions when running a computer program. Instead, the computer program is stored on the memory 3 .
  • bus system 4 is used to realize connection and communication between these components.
  • the bus system 4 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 4 in FIG. 10 .
  • the memory 3 in the embodiment of the present invention is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program used to operate on an electronic device.
  • the memory 3 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory) Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface storage can be disk storage or tape storage.
  • the volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • RAM Random Access Memory
  • many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, Synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory ).
  • the memory 2 described in the embodiments of the present invention is intended to include, but not be limited to, these and any other suitable types of memory.
  • Processor 2 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 2 or instructions in the form of software.
  • the aforementioned processor 2 may be a general-purpose processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the processor 2 may implement or execute the various methods, steps and logic block diagrams disclosed in the embodiments of the present invention.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in the storage medium, and the storage medium is located in the memory 3, and the processor 2 reads the program in the memory 3, and completes the steps of the foregoing method in combination with its hardware.
  • the embodiment of the present invention also provides a storage medium, that is, a computer storage medium, specifically a computer-readable storage medium, for example, including a memory 3 storing a computer program, the above-mentioned computer program can be executed by the processor 2, To complete the steps described in the aforementioned method.
  • the computer-readable storage medium can be memories such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM.
  • the disclosed device, electronic equipment and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the mutual coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be realized in the form of hardware or in the form of hardware plus software functional unit.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for Make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks.
  • connection should be understood in a broad sense, for example, it can be an electrical connection, or an internal communication between two elements, it can be a direct connection, or it can be an indirect connection through an intermediary. Those of ordinary skill can understand the specific meanings of the above terms according to specific situations.
  • first”, “second”, etc. are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that objects distinguished by “first ⁇ second ⁇ third” may be interchanged under appropriate circumstances such that the embodiments of the application described therein may be implemented in sequences other than those illustrated or described therein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了一种组件识别方法、装置、电子设备及存储介质,其中,组件识别方法包括:将第一图像输入设定识别模型,得到设定识别模型输出的第一图像中的至少一个UI区块;第一图像基于第一视觉稿确定;设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;在第一视觉稿对应的文档对象模型DOM的节点中,确定与至少一个UI区块中的每个UI区块对应的一个第一节点;将每个第一节点对应的图像输入设定分类模型,得到设定分类模型输出的第一节点对应的UI组件类型;设定分类模型用于确定输入的图像对应的UI组件类型。

Description

组件识别方法、装置、电子设备及存储介质
相关申请的交叉引用
本发明基于申请号为202111422206.X、申请日为2021年11月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本发明作为参考。
技术领域
本申请涉及计算机技术领域,涉及一种组件识别方法、装置、电子设备及存储介质。
背景技术
在前端业务开发过程中,为了通过组件代码生成视觉稿中对应的用户界面(UI,User Interface)代码,需要人工识别并标注对应的文档对象模型(DOM,Document Object Model)中的节点对应的UI组件类型,上述做法存在效率不高的问题。
发明内容
有鉴于此,本申请实施例提供一种组件识别方法、装置、电子设备及存储介质,以至少解决相关技术在组件识别时效率不高的问题。
本申请实施例的技术方案是这样实现的:
本申请实施例提供了一种组件识别方法,所述方法包括:
将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个UI区块;所述第一图像基于第一视觉稿确定;所述设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;
在第一视觉稿对应的文档对象模型DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点;
将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型;所述设定分类模型用于确定输入的图像对应的UI组件类型。
上述方案中,所述在第一视觉稿对应的文档对象模型DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点,包括:
遍历所述第一视觉稿对应的文档对象模型DOM的节点;
在区域重叠率大于第一设定阈值的情况下,确定所述区域重叠率对应的节点为第一UI区块对应的第一节点;其中,
所述区域重叠率表征第二图像与所述第一UI区块的重叠率;所述第二图像根据节点的节点信息对所述第一视觉稿裁剪得到。
上述方案中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
通过边缘检测,在第一视觉稿中确定出至少两个第一矩形区域;
基于至少一个第一矩形区域裁剪所述第一视觉稿,得到对应的第一图像。
上述方案中,在所述基于至少一个第一矩形区域裁剪所述第一视觉稿,包括:
合并在设定方向相邻的第一矩形区域,得到第二矩形区域;其中,所述第二矩形区域在设定方向的长度小于第二设定阈值;
基于所述第二矩形区域裁剪所述第一视觉稿。
上述方案中,所述将每个第一节点对应的图像输入设定分类模型,包括:
将根据第一节点的节点信息裁剪所述第一视觉稿得到的第二图像,输入所述设定分类模型;
和/或,
将根据第一节点对应的UI区块裁剪所述第一视觉稿得到的第三图像,输入所述设定分类模型。
上述方案中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型;其中,
所述第一标注包括能够以UI组件渲染得到的图像区域的第一位置信息和对应的第一标签;第一位置信息用于描述所述图像区域在第二视觉稿的位置;所述第一标签用于描述所述UI组件的类型。
上述方案中,所述基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型,包括:
基于每个第一标注裁剪所述第二视觉稿,得到对应的第四图像;所述第四图像对应有第一标签;
将所述第二视觉稿的第一标签替换为第二标签,得到第五图像;所述第二标签表征对应的图像区域为UI区块;
基于第四图像训练得到所述设定分类模型;
基于第五图像训练得到所述设定识别模型。
上述方案中,在所述基于第四图像训练所述设定分类模型之前,所述方法还包括:
在至少一种第一标签对应的第四图像的数量小于第三设定阈值的情况下,生成包含对应UI组件类型的页面图像,并基于生成的页面图像得到对应的第四图像。
上述方案中,所述生成包含对应UI组件类型的页面图像,包括:
在设定界面渲染对应的UI组件类型的组件代码,得到对应UI组件类型的页面图像;
和/或,
通过视觉稿生成代码工具得到第三视觉稿,得到对应UI组件类型的页面图像。
本申请实施例还提供了一种组件识别装置,包括:
第一处理模块,被配置为将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个UI区块;所述第一图像基于第一视觉稿确定;所述设定识别模型被配置为在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;
第二处理模块,被配置为在第一视觉稿对应的DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点;
第三处理模块,被配置为将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型;所述设定分类模型被配置为确定输入的图像对应的UI组件类型。
本申请实施例还提供了一种电子设备,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,
其中,所述处理器用于运行所述计算机程序时,执行上述组件识别方法的步骤。
本申请实施例还提供了一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述组件识别方法的步骤。
本申请实施例提供的一种组件识别方法、装置、电子设备及存储介质,将第一图像输入设定识别模型,得到设定识别模型输出的第一图像中的至少一个UI区块;第一图像基于第一视觉稿确定;设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;在第一视觉稿对应的文档对象模型DOM的节点中,确定与至少一个UI区块中的每个UI区块对应的一个第一节点;将每个第一节点对应的图像输入设定分类模型,得到设定分类模型输出的第一节点对应的UI组件类型;设定分类模型用于确定输入的图像对应的UI组件类型。上述方案中,通过设定 识别模型确定第一视觉稿中能够以UI组件渲染得到的图像区域,确定这些图像区域在第一视觉稿对应的文档对象模型DOM的第一节点,通过设定分类模型得到第一节点对应的UI组件类型,这样,无须人工标注视觉稿对应的DOM中的节点的UI组件类型,提高了组件识别的效率。
附图说明
图1为本申请实施例提供的组件识别方法实现流程示意图;
图2为本申请实施例提供的图像中识别UI区块的示意图;
图3为本申请实施例提供的确定节点的实现流程示意图;
图4为本申请实施例提供的UI区块识别结果的示意图;
图5为本申请实施例提供的页面分割的实现流程示意图;
图6为本申请应用实施例提供的组件识别的实现流程示意图;
图7为本申请应用实施例提供的组件识别的页面示意图;
图8为本申请应用实施例提供的模型训练的实现流程图;
图9为本申请实施例提供的组件识别装置的结构示意图;
图10为本申请实施例提供的电子设备的结构示意图。
具体实施方式
随着低代码平台、无代码平台的出现,少量编码甚至无须编码,即可实现从视觉稿到应用程序前端代码的转换。视觉稿生成UI代码的过程,包括输入结构化的视觉稿数据,经过图层信息解析、布局、组件替换、语义化、生成代码等步骤,得到生成UI代码。其中,组件替换是视觉稿生成UI代码中的关键一环,通过将常用的视觉模块开发为组件,并在视觉稿生成UI代码过程中确定视觉稿中与设定视觉模块一致的部分,使用对应组件的代码进行替换。其中,组件替换的前提是明确视觉稿中能替换为组件的部分。
目前,在前端业务开发过程中,为了通过组件代码生成视觉稿中对应的UI代码,需要人工识别并标注对应的DOM中的节点对应的UI组件类型,上述做法存在效率不高的问题。
基于此,在本申请的各种实施例中,将第一图像输入设定识别模型,得到设定识别模型输出的第一图像中的至少一个UI区块;第一图像基于第一视觉稿确定;设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染 得到的图像区域;在第一视觉稿对应的DOM的节点中,确定与至少一个UI区块中的每个UI区块对应的一个第一节点;将每个第一节点对应的图像输入设定分类模型,得到设定分类模型输出的第一节点对应的UI组件类型;设定分类模型用于确定输入的图像对应的UI组件类型。上述方案中,通过设定识别模型确定第一视觉稿中能够以UI组件渲染得到的图像区域,确定这些图像区域在第一视觉稿对应的DOM的第一节点,通过设定分类模型得到第一节点对应的UI组件类型,这样,无须人工标注视觉稿对应的DOM中的节点的UI组件类型,提高了组件识别的效率。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为本申请实施例提供的组件识别方法的实现流程示意图,本申请实施例提供了一种组件识别方法,应用于电子设备,其中,电子设备包括但不限于服务器、终端等电子设备。如图1示出的,组件识别方法,包括:
步骤101:将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个UI区块。
其中,所述第一图像基于第一视觉稿确定;所述设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域。
组件识别装置将第一图像输入设定识别模型,设定识别模型对输入的第一图像中的至少一个UI区块进行识别,每个UI区块至少包括能够以UI组件渲染得到的图像区域。其中,设定识别模型通过标注有UI区块的视觉稿训练得到。第一图像基于第一视觉稿确定,可以是部分或全部的第一视觉稿的图像。每个UI区块为图像区域,可以认为是图像。
步骤102:在第一视觉稿对应的DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点。
组件识别装置基于设定识别模型确定出的至少一个UI区块中的每个UI区块,在第一视觉稿对应的节点中确定出与每个UI区块对应的一个第一节点。其中,在节点与某个UI区块满足设定条件的情况下,将这个节点确定为这个UI区块的第一节点。
其中,DOM的节点为视觉稿解析得到的结构化数据(即Schema数据)的节点,Schema数据为视觉稿中所有元素构成的树状结构,可以存储为(JavaScript Object Notation)JSON格式,每个节点包括对应图像的宽度、高度、位置等节点信息。
其中,第一视觉稿的图像可以通过视觉稿对应的DOM中的根节点的节点信息得到。 根节点的节点信息包括完整页面预览图的统一资源定位器(Uniform Resource Locator,URL)地址。
步骤103:将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型。
其中,所述设定分类模型用于确定输入的图像对应的UI组件类型。
组件识别装置将每个第一节点对应的图像输入设定分类模型,设定分类模型对输入的图像的UI组件类型进行分类,确定第一节点对应的图像所对应的UI组件类型,从而得到第一节点对应的UI组件类型。其中,设定分类模型通过标注有UI组件类型的图像训练得到。
其中,可以将确定出的分类结果挂载在JSON文件中。
本申请实施例提供的方案中,确定第一视觉稿中能够以UI组件渲染得到的图像区域对应的第一节点,通过设定分类模型得到第一节点对应的UI组件类型,这样,无须人工标注视觉稿对应的DOM中的节点的UI组件类型,提高了组件识别的效率。
同时,在组件识别装置确定出第一节点对应的UI组件类型之后,在将视觉稿转换为UI代码时,基于对应类型的UI组件的代码生成UI代码,简化了视觉稿到UI代码的编程工作,使用对应组件的代码能够实现自动化的组件替换,降低了UI代码开发的使用门槛。
并且,组件识别装置通过设定识别模型确定第一视觉稿中能够以UI组件渲染得到的图像区域,确定对应的第一节点,将这些第一节点的图像输入设定分类模型,这样,设定分类模型无须处理第一视觉稿所有节点的图像,降低了模型的计算量,节约了组件识别所需的计算资源。
此外,在本申请实施例中,设定识别模型在识别UI区块时,不要求识别出的UI区块的区域与通过UI组件渲染得到的图像区域完全相同,这样,设定分类模型识别的精准度要求放宽,提升了设定识别模型的强健性。如图2示出的,第一图像中包括多个UI区块(21-23),UI区块23与UI组件渲染得到的图像区域231不完全相同,但不影响设定分类模型确定节点对应的UI组件类型。
在一实施例中,所述将每个第一节点对应的图像输入设定分类模型,包括:
将根据第一节点的节点信息裁剪所述第一视觉稿得到的第二图像,输入所述设定分类模型;
和/或,
将根据第一节点对应的UI区块裁剪所述第一视觉稿得到的第三图像,输入所述设定 分类模型。
其中,组件识别装置输入设定分类模型的每个第一节点对应的图像,可以是根据第一节点的节点信息裁剪第一视觉稿得到的第二图像,可以是基于第一节点对应的UI区块裁剪第一视觉稿得到的第三图像,也可以是第一节点对应的第二图像和第三节点。其中,由于识别出的UI区块的边缘不一定是精准的,使用节点信息裁剪得到的第二图像进行UI组件类型分类,能够提升分类结果的精确度。
其中,在节点与某个UI区块满足设定条件的情况下,将这个节点确定为这个UI区块的第一节点。在一实施例中,设定条件为节点对应的图像与UI区块的区域重叠率大于设定阈值。
所述在第一视觉稿对应的文档对象模型DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点,包括:
遍历所述第一视觉稿对应的文档对象模型DOM的节点;
在区域重叠率大于第一设定阈值的情况下,确定对应的节点为第一UI区块对应的第一节点;其中,
所述区域重叠率表征第二图像与所述第一UI区块的重叠率;所述第二图像根据节点的节点信息对所述第一视觉稿裁剪得到。
组件识别装置对于每个UI区块,在第一视觉稿对应的DOM的节点中确定对应的一个第一节点时,遍历第一视觉稿对应的DOM中的节点,计算节点对应的第二图像与这个UI区块的区域重叠率,在区域重叠率大于第一设定阈值的情况下,确定对应的节点为第一UI区块对应的第一节点。在确定出第一UI区块对应的第一节点之后,可以停止对第一视觉稿剩余的DOM中节点的遍历,重新开始确定下一个第一UI区块对应的第一节点。其中,区域覆盖率表征这个UI区块和节点对应的第二图像的图像重合情况,区域覆盖率越高说明两个图像的重合比例越高。
其中,组件识别装置对第一视觉稿对应的DOM中的节点的遍历,可以是前序遍历、中序遍历或后序遍历。结合图3示出的确定节点的实现流程示意图,以前序遍历,取第一设定阈值为0.7为例进行说明。对于每个第一UI区块,对视觉稿的结构化数据(即Schema数据)进行前序遍历。
首先,组件识别装置取根节点为当前节点,计算当前节点对应的第二图像与第一UI区块的区域覆盖率。
若区域覆盖率不大于0,表示两者并不相交,则这个节点及节点的子节点不可能对应 到这个第一UI区块,判断这个节点是否有未遍历的兄弟节点,若存在未遍历的兄弟节点,则取兄弟节点为当前节点,计算区域覆盖率。
若区域覆盖率大于0而小于等于0.7,判断这个节点是否有未遍历的子节点。若存在未遍历的子节点,则取第一个未遍历子节点为当前节点,计算区域覆盖率;若不存在未遍历的子节点,判断这个节点是否有未遍历的兄弟节点,若存在未遍历的兄弟节点,则取兄弟节点为当前节点,计算区域覆盖率。
若区域覆盖率大于0.7,则认为该节点与第一UI区块对应,确定并标记这个节点为第一UI区块对应的第一节点。在节点的JSON信息中增加{‘smart_ui’:‘ui}字段进行标记。
如前所述,设定识别模型在识别UI区块时,不要求识别出的UI区块的区域与通过UI组件渲染得到的图像区域完全相同,也就是说,如图4示出的UI区块识别结果的示意图,识别出的UI区块的边缘并非是完全精准的。在本实施例中,通过确定UI区块对应的第一节点,可以进一步获得第一节点对应的图像,提升了设定分类模型分类结果的精确度。
在一实施例中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
通过边缘检测,在第一视觉稿中确定出至少两个第一矩形区域;
基于至少一个第一矩形区域裁剪所述第一视觉稿,得到对应的第一图像。
组件识别装置通过边缘检测等图像算法确定图像边缘特征,基于边缘检测识别第一视觉稿的至少两个第一矩形区域,对至少两个第一矩形区域进行划分,每次基于至少一个第一矩形区域裁剪第一视觉稿,得到一个对应的第一图像。
组件识别装置基于边缘检测识别图像中的成块元素(即第一矩形区域)。其中,可以使用视觉稿生成代码开源算法UI2CODE的元素检测模块,不会破坏节点具有完整轮廓的图像。
这样,在不破坏节点具有完整轮廓的图像的情况下,将第一视觉稿分割成至少两个图像,分割后的图像的长宽比在设定识别模型的最佳识别效果范围内,从而提高了识别长视觉稿的UI区块的精确度,提升了设定识别模型对图像的识别效果。
在一实施例中,在所述基于至少一个第一矩形区域裁剪所述第一视觉稿时,所述方法包括:
合并在设定方向相邻的第一矩形区域,得到第二矩形区域;其中,所述第二矩形区域在设定方向的长度小于第二设定阈值;
基于所述第二矩形区域裁剪所述第一视觉稿。
其中,设定方向可以为竖直方向,也就是y轴方向。
组件识别装置将设定方向的长度之和小于第二设定阈值的至少两个第一矩形区域进行合并,得到第二矩形区域,并基于第二矩形区域裁剪第一视觉稿。
其中,组件识别装置可以在设定方向上依次判断相邻的两个第一矩形区域的长度之和是否小于第二设定阈值,在长度之和小于第二设定阈值的情况下,将这两个第一矩形区域合并,并继续判断合并得到的矩形区域和在设定方向上的下一个相邻的第一矩形区域的长度之和,直到两个矩形区域在设定方向的长度之和大于或等于第二设定阈值,基于合并得到的矩形区域裁剪第一视觉稿。
结合图5示出的页面分割的流程示意图进行说明,视觉稿的分割包括以下步骤:
a)基于边缘检测识别图像中的成块元素(即第一矩形区域)。其中,可以使用视觉稿生成代码开源算法UI2CODE的元素检测模块,将元素检测模块的参数阈值设置为合适大小,检测后得到多个第一矩形区域。由于元素检测模块基于图像边缘信息进行分割,不会破坏节点具有完整轮廓的图像。
b)对识别出的第一矩形区域根据y轴坐标进行排序,即根据距离视觉稿顶部的距离进行排序。
其中,为避免未检测出的区域信息丢失,可以在第一矩形区域之间插入矩形区域进行填充。例如,图5中的元素61、62未被识别检测出,通过插入矩形区域63、64,以保证对应的信息不丢失。
c)对第一矩形区域进行成组。自上向下合并第一矩形区域,当合并后高度<400则继续往下合并,若合并后高度>600,则当前合并的第二矩形区域为一个新组。再取下一个第一矩形区域,重新开始成组。其中的高度,也就是设定的竖直方向上的长度。
d)基于第二矩形区域裁剪第一视觉稿,得到合适高度的第一图像。
由于设定识别模型通过标注有UI区块的视觉稿训练得到,考虑到训练成本等原因,训练样本的数量不可能是无限的。对于一些图像,例如长宽比超过训练样本的长宽比的图像,通过设定识别模型的识别效果不佳。在本实施例中,在不破坏能以UI组件渲染得到的图像区域的情况下,将第一视觉稿分割成至少两个图像,且分割后的图像的长宽比在设定识别模型的最佳识别效果范围内,从而提高了识别长视觉稿的UI区块的精确度,提升了设定识别模型对图像的识别效果。
每次在处理一个新的任务场景时,都需要针对任务场景进行模型训练。具体到组件识别场景,在使用设定识别模型和设定分类模型之前,需要对模型进行训练。在一实施例中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型;其中,
所述第一标注包括能够以UI组件渲染得到的图像区域的第一位置信息和对应的第一标签;第一位置信息用于描述所述图像区域在第二视觉稿的位置;所述第一标签用于描述所述UI组件的类型。
组件识别装置基于第二视觉稿对应的每个第一标注的第一位置信息,确定一个能够以UI组件渲染得到的图像区域在第二视觉稿的位置。基于第二视觉稿对应的每个第一标注的第一标签,确定这个图像区域基于何种类型的UI组件渲染得到。基于对应有至少一个第一标注的第二视觉稿的图像,确定对应的识别样本数据集和分类样本数据集,基于对应的数据集训练对应的模型。其中,第二视觉稿为视觉稿样本,可以是真实的视觉稿,也可以是根据需要生成的视觉稿。
在一实施例中,所述基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型,包括:
基于每个第一标注裁剪所述第二视觉稿,得到对应的第四图像;所述第四图像对应有第一标签;
将所述第二视觉稿的第一标签替换为第二标签,得到第五图像;所述第二标签表征对应的图像区域为UI区块;
基于第四图像训练得到所述设定分类模型;
基于第五图像训练得到所述设定识别模型。
组件识别装置基于每个第一标注的第一位置信息,裁剪第二视觉稿得到对应的第四图像,第四图像对应有这个第一标注的第一标签,基于第一标签能够确定对应的第四图像的UI组件的类型,基于第四图像训练设定分类模型。
组件识别装置将第二视觉稿的第一标签替换为第二标签得到第五图像,基于第五图像的第二标签能够确定对应的图像区域为UI区块,基于第五图像训练设定识别模型。
在本实施例中,组件识别装置通过复用第二视觉稿的图像,能够得到两种训练样本,分别用于训练对应类型的模型,这样,降低了获取模型训练样本的成本。
在一实施例中,在所述基于第四图像训练所述设定分类模型之前,所述方法还包括:
在至少一种第一标签对应的第四图像的数量小于第三设定阈值的情况下,组件识别装置生成包含对应UI组件类型的页面图像,并基于生成的页面图像得到对应的第四图像。
在存在一个UI组件类型的第四图像的数量少于第三设定阈值的情况下,组件识别装 置生成对应UI组件类型的页面图像,再通过Puppeteer等工具对页面中的组件节点属性进行抖动,包括文本更改、元素位置偏移等,对处理后的页面进行截图,得到第四图像。
在一实施例中,所述生成包含对应UI组件类型的页面图像,包括:
在设定界面渲染对应的UI组件类型的组件代码,得到对应UI组件类型的页面图像;
和/或,
通过视觉稿生成代码工具得到第三视觉稿,得到对应UI组件类型的页面图像。
其中,组件识别装置可以采用以下至少一种方式生成对应UI组件类型的网页图像:在页面中渲染对应的UI组件类型的组件代码,得到对应的网页图像;通过视觉稿生成代码工具,得到还原视觉稿的网页图像。
其中,组件识别装置可以通过调用代码库中的组件代码生成网页图像。
组件识别装置通过生成模型的训练样本,克服了数据集中不同组件类型的样本分布不均衡的问题,这样,基于数据集训练分类模型,提高了分类模型分类图像的精确度。
下面结合应用实施例对本申请再作进一步详细的描述。
图6示出了本申请应用实施例提供的组件识别的实现流程示意图。
组件识别的实现流程包括以下步骤:
步骤701:输入结构化的视觉稿数据Schema。
其中,常见的视觉稿为sketch或者psd格式,本应用实施例的输入为视觉稿解析得到的结构化数据描述,即schema数据。Schema数据为视觉稿中所有元素构成的树状结构,存储为JSON格式,其中每个节点包括宽度、高度、位置等节点信息。
步骤702:取根节点的图像作为视觉稿图像。
Schema数据的根节点中包含了完整页面预览图的URL地址,下载完整页面预览图作为视觉稿原图进行后续处理。
步骤703:页面分割。
组件识别装置通过边缘检测等页面分割算法,将长视觉稿分割为合适高度的多个图像。
对于长宽比较大的视觉稿,无法使用通用目标检测模型得到较好的识别效果,所以需要将长视觉稿切分为合适高度的图像。页面分割的流程如图5所示。
其中,页面分割将长视觉稿分割为合适高度的图像,并且采用边缘检测等图像算法,基于图像边缘特征进行分割,不会破坏能以UI组件渲染得到的图像区域,解决了长视觉稿的识别问题。
目标检测是计算机视觉领域的基础任务之一,包含物体定位与分类两个子任务。输入 一张图像,可以找到图像中目标物体的类别与位置。
步骤704:UI区块识别。
使用UI区块识别模型(即设定识别模型),识别图像中的至少一个UI区块。
UI区块识别通过UI区块识别模型识别图像中可能为组件的区域(即UI区块)。其中,UI区块识别模型是一种目标检测网络模型,通过带有UI区块标签的目标检测数据集,基于深度学习目标检测模型的Mask-RCNN预训练模型进行训练。
其中,数据集的制作使用Labelme工具,对收集到的视觉稿进行标注,标注视觉稿中各个组件的位置以及分类。得到的标注结果为JSON格式,其中记录了组件的类型与坐标。对标注结果进行备份。通过脚本将类型统一替换为“ui”。再导出为cocos格式的目标检测数据集。其中,Labelme是一种数据标注工具,可用来标注分类、检测、分割等常见的视觉任务,支持VOC格式和COCO等的格式导出。
组件识别装置基于目标检测数据集进行模型训练与优化,将训练得到的UI区块识别模型用于UI区块识别。对于步骤703中分割的图像,通过UI区块识别模型进行识别,得到UI区块的坐标与类别“ui”,如图7所示出的组件识别的页面示意图。其中,UI区块识别模型还会输出识别出的每个UI区块对应的置信度,例如“ui 0.95”即该图像区域为UI区块的置信度为0.95。
步骤705:节点映射。
组件识别装置通过计算图像的UI区块与视觉稿对应的DOM的节点的区域覆盖率,将识别出的区块映射到视觉稿中的节点,标记节点为UI区块。根据区域覆盖率,确定视觉稿中与UI区块对应的更小节点。其中,视觉稿数据为步骤701输入的视觉稿。
如图4示出的,UI区块识别模型识别到的区块的边缘并非精准的,需要将区块映射到视觉稿中的节点。
节点映射的实现流程参见图3,在此不再赘述。在区域覆盖率大于第一设定阈值的情况下,则认为该节点与第一UI区块对应,确定并标记这个节点为第一UI区块对应的第一节点。在节点的JSON信息中增加{‘smart_ui’:‘ui}字段进行标记。
步骤706:组件分类。
组件识别装置根据UI区块对应的节点的坐标、宽高信息,从视觉稿原图中裁剪UI区块图像,送入UI类型分类模型(即设定分类模型)进行分类,得到组件的类型。
其中,图像分类是计算机视觉领域的基础任务之一,输入一张图像,输出图像的分类结果。
步骤707:输出带组件标记的视觉稿结构数据。
组件标记为视觉稿对应的Dom节点的组件类型标记。
在另一应用实施例中,还可以对视觉稿中每个节点进行UI组件分类。
组件识别装置通过以上步骤得到了可能是区块的节点,还需要确定节点对应的UI组件类型。通过训练MobileNet分类模型来分类。分类模型数据集的样本来源,包括真实的视觉稿和/或生成的视觉稿。其中,真实的视觉稿使用已标注好的目标检测数据集,根据标注信息中的坐标对视觉稿页面进行裁剪,得到分类数据集,分类数据集中的每个样本带有对应的UI组件类型标签。对于数据量过少的UI组件类型,通过生成样本来扩充数据集。
组件识别装置训练得到UI类型分类模型后,根据标记的UI区块节点,根据坐标、宽高等节点信息,从视觉稿原图中裁剪图像,送入UI类型分类模型进行推理,得到UI组件的类型。
其中,可以将确定出的分类结果挂载在JSON文件中。
另外,在一应用实施例中,在步骤701之后,可以跳过步骤702、步骤703,直接执行步骤704。还可以针对不同长度的视觉稿制作对应的目标检测数据集,训练多个UI区块识别模型,每个模型对应识别不同长度的输入图像。也就是说,在步骤701输入结构化的视觉稿数据之后,无需对页面进行分割,而是直接执行步骤704,使用识别对应长度的UI区块识别模型进行图像识别。
此处,结合图8示出的模型训练的实现流程图,进一步说明如何生成模型训练的数据集和如何训练模型。
组件识别装置对真实的视觉稿进行标注,标注信息包括组件位置与UI组件类型,得到带有UI组件类型标签(第一标签)的目标检测数据集。通过修改UI组件类型标签为统一的UI区块标签(第二标签)‘ui’,得到带有UI区块标签的目标检测数据集。基于目标检测数据集训练目标检测模型,得到UI区块识别模型(即设定识别模型)。
组件识别装置基于得到的带有UI组件类型标签的目标检测数据集,通过对目标检测数据集进行裁剪转换,得到带有UI组件类型标签的分类数据集,并判断分类数据集中数据量较少的UI组件类型的样本数量是否满足设定条件,通过生成数据量较少的UI组件类型的样本。基于裁剪得到和生成的分类数据集的样本,训练分类模型,得到UI组件类型分类模型(即设定分类模型)。
其中,生成样本的流程如下:
1)渲染带有组件的页面。
可以采用两种方式,一种是在页面中渲染组件代码,另一种是使用通过视觉稿生成代码工具得到的还原视觉稿的网页;
2)使用Puppeteer工具对页面中的组件节点属性进行抖动,包括文本更改、元素位置偏移等,并截图得到样本。
其中,Puppeteer是一个Node库,通过DevTools Protocol来控制Chrome或Chromium,基于提供的一系列API,可以模拟人工打开网页、点击等操作,可以用于屏幕截图、自动化测试等。
相较于通过深度学习网络目标检测模型识别图像中UI组件的位置与类型,为训练模型需要准备大量各类型的样本,并对大量的样本进行标注,由于真实视觉稿中组件的出现概率不一,导致不同类别样本的数量非常不均衡,准备训练样本的难度较大。同时,训练得到的模型只能识别长宽比在一定范围内的视觉稿,模型的识别效果不佳。
此外,在相关技术中也可以通过对视觉稿的每个节点进行截图,使用训练的分类模型进行分类,由于需要遍历每个节点进行分类,模型计算量巨大,计算成本高。
在本应用实施例中,组件识别装置通过将组件识别分成两个阶段:先通过目标检测识别出UI区块,即可能是组件的区域;再对UI区块进行分类得到对应的UI组件类型,极大减少了计算量。例如,一个视觉稿中有200个节点,有10个组件需要识别,如果对视觉稿中每个节点进行遍历得到分类结果,需要处理200个节点图像,通过本申请的方案,只需要在目标检测后,对10张识别有区块节点的图像进行分类。
深度学习网络的训练需要大量的样本,样本的质量极大程度决定了模型效果的上限。由于不同类型的组件的使用频率不同,造成数据集中不同组件类型的样本分布不均衡,样本数量少的组件类型的识别、分类效果较差。并且,模拟生成类似真实视觉稿的完整视觉稿图像样本,所需的成本较高,而生成单个UI组件的样本的扩充较为方便,所需的成本更低。并且,由于一些UI组件渲染得到的图像具有相似的轮廓特性,且视觉上相似度较高,基于这些UI组件对应的图像训练得到的UI区块识别模型,也可以检测到出现频率较少的UI组件的UI区块,降低了目标检测数据集所需的样本数量。然后,通过分类模型具体判断UI组件类型。在本申请应用实施例中,通过两个模型处理图像,通过复用训练样本,只需要较少样本训练模型,便可完成组件识别,降低了获取训练样本的成本。同时,通过生成训练样本,克服了数据集中不同组件类型的样本分布不均衡的问题,这样,基于数据集训练分类模型,提高了分类模型分类图像的精确度,提升了视觉稿的UI组件类型识别的效果。
为实现本申请实施例的方法,本申请实施例还提供了一种组件识别装置,如图9所示,该装置包括:
第一处理模块1001,被配置为将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个UI区块;所述第一图像基于第一视觉稿确定;所述设定识别模型被配置为在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;
第二处理模块1002,被配置为在第一视觉稿对应的DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点;
第三处理模块1003,被配置为将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型;所述设定分类模型被配置为确定输入的图像对应的UI组件类型。
其中,在一个实施例中,所述第二处理模块1002,还被配置为:
遍历所述第一视觉稿对应的文档对象模型DOM的节点;
在区域重叠率大于第一设定阈值的情况下,确定所述区域重叠率对应的节点为第一UI区块对应的第一节点;其中,
所述区域重叠率表征第二图像与所述第一UI区块的重叠率;所述第二图像根据节点的节点信息对所述第一视觉稿裁剪得到。
在一个实施例中,所述装置还包括裁剪模块,被配置为:
通过边缘检测,在第一视觉稿中确定出至少两个第一矩形区域;
基于至少一个第一矩形区域裁剪所述第一视觉稿,得到对应的第一图像。
在一个实施例中,所述裁剪模块,还被配置为:
合并在设定方向相邻的第一矩形区域,得到第二矩形区域;其中,所述第二矩形区域在设定方向的长度小于第二设定阈值;
基于所述第二矩形区域裁剪所述第一视觉稿。
在一个实施例中,所述将每个第一节点对应的图像输入设定分类模型,包括:
将根据第一节点的节点信息裁剪所述第一视觉稿得到的第二图像,输入所述设定分类模型;
和/或,
将根据第一节点对应的UI区块裁剪所述第一视觉稿得到的第三图像,输入所述设定分类模型。
在一个实施例中,所述装置还包括训练模块,被配置为:
基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型;其中,
所述第一标注包括能够以UI组件渲染得到的图像区域的第一位置信息和对应的第一标签;第一位置信息被配置为描述所述图像区域在第二视觉稿的位置;所述第一标签被配置为描述所述UI组件的类型。
在一个实施例中,所述训练模块,还被配置为:
基于每个第一标注裁剪所述第二视觉稿,得到对应的第四图像;所述第四图像对应有第一标签;
将所述第二视觉稿的第一标签替换为第二标签,得到第五图像;所述第二标签表征对应的图像区域为UI区块;
基于第四图像训练得到所述设定分类模型;
基于第五图像训练得到所述设定识别模型。
在一个实施例中,所述装置还包括生成模块,被配置为:
在至少一种第一标签对应的第四图像的数量小于第三设定阈值的情况下,生成包含对应UI组件类型的页面图像,并基于生成的页面图像得到对应的第四图像。
在一个实施例中,所述生成包含对应UI组件类型的页面图像,包括:
在设定界面渲染对应的UI组件类型的组件代码,得到对应UI组件类型的页面图像;
和/或,
通过视觉稿生成代码工具得到第三视觉稿,得到对应UI组件类型的页面图像。
实际应用时,所述第一处理模块1001、所述第二处理模块1002、所述第三处理模块1003、所述裁剪模块、所述训练模块、所述生成模块可由基于组件识别装置中的处理器,比如中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)、微控制模块(MCU,Microcontroller Unit)或可编程门阵列(FPGA,Field-Programmable Gate Array)等实现。
需要说明的是:上述实施例提供的组件识别装置在进行组件识别时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将装置的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的组件识别装置与组件识别方法实施例属于同一构思,其具体实现过程详见方法实施例,其中不再赘述。
基于上述程序模块的硬件实现,且为了实现本申请实施例组件识别方法,本申请实施例还提供了一种电子设备。图10为本申请实施例电子设备的硬件组成结构示意图,如图10所示,电子设备包括:
通信接口1,能够与其它设备比如网络设备等进行信息交互;
处理器2,与通信接口1连接,以实现与其它设备进行信息交互,用于运行计算机程序时,执行上述一个或多个技术方案提供的方法。而所述计算机程序存储在存储器3上。
当然,实际应用时,电子设备中的各个组件通过总线系统4耦合在一起。可理解,总线系统4用于实现这些组件之间的连接通信。总线系统4除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图10中将各种总线都标为总线系统4。
本发明实施例中的存储器3用于存储各种类型的数据以支持电子设备的操作。这些数据的示例包括:用于在电子设备上操作的任何计算机程序。
可以理解,存储器3可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本发明实施例描述的存储器2旨在包括但不限于这些和任意其它适合类型的存 储器。
上述本发明实施例揭示的方法可以应用于处理器2中,或者由处理器2实现。处理器2可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器2中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器2可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器2可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本发明实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器3,处理器2读取存储器3中的程序,结合其硬件完成前述方法的步骤。
处理器2执行所述程序时实现本发明实施例的各个方法中的相应流程,为了简洁,在此不再赘述。
在示例性实施例中,本发明实施例还提供了一种存储介质,即计算机存储介质,具体为计算机可读存储介质,例如包括存储计算机程序的存储器3,上述计算机程序可由处理器2执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置、电子设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执 行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。除非另有说明和限定,术语“连接”应做广义理解,例如,可以是电连接,也可以是两个元件内部的连通,可以是直接相连,也可以通过中间媒介间接相连,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。
另外,在本申请实例中,“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解“第一\第二\第三”区分的对象在适当情况下可以互换,以使其中描述的本申请的实施例可以除了在其中图示或描述的那些以外的顺序实施。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一个”表示多个中的任意一个或多个中的至少两个的任意组合,例如,包括A、B、C中的至少一个,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
在具体实施方式中所描述的各个实施例中的各个具体技术特征,在不矛盾的情况下,可以进行各种组合,例如通过不同的具体技术特征的组合可以形成不同的实施方式,为了避免不必要的重复,本申请中各个具体技术特征的各种可能的组合方式不再另行说明。

Claims (12)

  1. 一种组件识别方法,所述方法包括:
    将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个用户界面UI区块;所述第一图像基于第一视觉稿确定;所述设定识别模型用于在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;
    在第一视觉稿对应的文档对象模型DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点;
    将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型;所述设定分类模型用于确定输入的图像对应的UI组件类型。
  2. 根据权利要求1所述的方法,其中,所述在第一视觉稿对应的文档对象模型DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点,包括:
    遍历所述第一视觉稿对应的文档对象模型DOM的节点;
    在区域重叠率大于第一设定阈值的情况下,确定所述区域重叠率对应的节点为第一UI区块对应的第一节点;其中,
    所述区域重叠率表征第二图像与所述第一UI区块的重叠率;所述第二图像根据节点的节点信息对所述第一视觉稿裁剪得到。
  3. 根据权利要求1所述的方法,其中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
    通过边缘检测,在第一视觉稿中确定出至少两个第一矩形区域;
    基于至少一个第一矩形区域裁剪所述第一视觉稿,得到对应的第一图像。
  4. 根据权利要求3所述的方法,其中,在所述基于至少一个第一矩形区域裁剪所述第一视觉稿,包括:
    合并在设定方向相邻的第一矩形区域,得到第二矩形区域;其中,所述第二矩形区域在设定方向的长度小于第二设定阈值;
    基于所述第二矩形区域裁剪所述第一视觉稿。
  5. 根据权利要求1所述的方法,其中,所述将每个第一节点对应的图像输入设定分类模型,包括:
    将根据第一节点的节点信息裁剪所述第一视觉稿得到的第二图像,输入所述设定分类模型;
    和/或,
    将根据第一节点对应的UI区块裁剪所述第一视觉稿得到的第三图像,输入所述设定分类模型。
  6. 根据权利要求1所述的方法,其中,在所述将第一图像输入设定识别模型之前,所述方法还包括:
    基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型;其中,
    所述第一标注包括能够以UI组件渲染得到的图像区域的第一位置信息和对应的第一标签;第一位置信息用于描述所述图像区域在第二视觉稿的位置;所述第一标签用于描述所述UI组件的类型。
  7. 根据权利要求6所述的方法,其中,所述基于对应有至少一个第一标注的第二视觉稿,训练得到所述设定识别模型和所述设定分类模型,包括:
    基于每个第一标注裁剪所述第二视觉稿,得到对应的第四图像;所述第四图像对应有第一标签;
    将所述第二视觉稿的第一标签替换为第二标签,得到第五图像;所述第二标签表征对应的图像区域为UI区块;
    基于第四图像训练得到所述设定分类模型;
    基于第五图像训练得到所述设定识别模型。
  8. 根据权利要求7所述的方法,其中,在所述基于第四图像训练所述设定分类模型之前,所述方法还包括:
    在至少一种第一标签对应的第四图像的数量小于第三设定阈值的情况下,生成包含对应UI组件类型的页面图像,并基于生成的页面图像得到对应的第四图像。
  9. 根据权利要求8所述的方法,其中,所述生成包含对应UI组件类型的页面图像,包括:
    在设定界面渲染对应的UI组件类型的组件代码,得到对应UI组件类型的页面图像;
    和/或,
    通过视觉稿生成代码工具得到第三视觉稿,得到对应UI组件类型的页面图像。
  10. 一种组件识别装置,包括:
    第一处理模块,被配置为将第一图像输入设定识别模型,得到所述设定识别模型输出的所述第一图像中的至少一个UI区块;所述第一图像基于第一视觉稿确定;所述设定识别模型被配置为在输入的图像中识别出至少一个UI区块;每个UI区块至少包括能够以UI组件渲染得到的图像区域;
    第二处理模块,被配置为在第一视觉稿对应的DOM的节点中,确定与所述至少一个UI区块中的每个UI区块对应的一个第一节点;
    第三处理模块,被配置为将每个第一节点对应的图像输入设定分类模型,得到所述设定分类模型输出的第一节点对应的UI组件类型;所述设定分类模型被配置为确定输入的图像对应的UI组件类型。
  11. 一种电子设备,包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,
    其中,所述处理器用于运行所述计算机程序时,执行权利要求1至9任一项所述方法的步骤。
  12. 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述方法的步骤。
PCT/CN2022/134361 2021-11-26 2022-11-25 组件识别方法、装置、电子设备及存储介质 WO2023093850A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111422206.X 2021-11-26
CN202111422206.XA CN114037828A (zh) 2021-11-26 2021-11-26 组件识别方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023093850A1 true WO2023093850A1 (zh) 2023-06-01

Family

ID=80138918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134361 WO2023093850A1 (zh) 2021-11-26 2022-11-25 组件识别方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114037828A (zh)
WO (1) WO2023093850A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037828A (zh) * 2021-11-26 2022-02-11 北京沃东天骏信息技术有限公司 组件识别方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000067039A (ja) * 1998-06-09 2000-03-03 Matsushita Electric Ind Co Ltd 文書処理装置
US20120109950A1 (en) * 2004-04-29 2012-05-03 Microsoft Corporation Method and system for calculating importance of a block within a display page
CN111061975A (zh) * 2019-12-13 2020-04-24 腾讯科技(深圳)有限公司 一种页面中无关内容的处理方法、装置
CN111428444A (zh) * 2020-03-27 2020-07-17 新华智云科技有限公司 网页信息自动抽取方法
CN113296769A (zh) * 2020-11-18 2021-08-24 阿里巴巴集团控股有限公司 数据处理方法、视觉稿的处理方法、系统及电子设备
CN114037828A (zh) * 2021-11-26 2022-02-11 北京沃东天骏信息技术有限公司 组件识别方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000067039A (ja) * 1998-06-09 2000-03-03 Matsushita Electric Ind Co Ltd 文書処理装置
US20120109950A1 (en) * 2004-04-29 2012-05-03 Microsoft Corporation Method and system for calculating importance of a block within a display page
CN111061975A (zh) * 2019-12-13 2020-04-24 腾讯科技(深圳)有限公司 一种页面中无关内容的处理方法、装置
CN111428444A (zh) * 2020-03-27 2020-07-17 新华智云科技有限公司 网页信息自动抽取方法
CN113296769A (zh) * 2020-11-18 2021-08-24 阿里巴巴集团控股有限公司 数据处理方法、视觉稿的处理方法、系统及电子设备
CN114037828A (zh) * 2021-11-26 2022-02-11 北京沃东天骏信息技术有限公司 组件识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114037828A (zh) 2022-02-11

Similar Documents

Publication Publication Date Title
CN113609820B (zh) 基于可扩展标记语言文件生成word文件的方法、装置及设备
CN111259873B (zh) 一种表格数据提取方法及装置
CN110348294A (zh) Pdf文档中图表的定位方法、装置及计算机设备
CN114005123A (zh) 一种印刷体文本版面数字化重建系统及方法
CN113469294B (zh) 一种rpa机器人中图标检测方法及其系统
US20220335240A1 (en) Inferring Structure Information from Table Images
CN115424282A (zh) 一种非结构化文本表格识别方法和系统
WO2023093850A1 (zh) 组件识别方法、装置、电子设备及存储介质
CN115268719B (zh) 一种定位界面上目标元素的方法、介质及电子设备
CN114138244A (zh) 模型类文件自动生成方法、装置、存储介质及电子设备
CN116245177A (zh) 地理环境知识图谱自动化构建方法及系统、可读存储介质
CN116610304B (zh) 页面代码生成方法、装置、设备和存储介质
CN113936187A (zh) 文本图像合成方法、装置、存储介质及电子设备
CN113553055A (zh) 一种基于机器学习的可视化图表代码自动生成方法
CN117951009A (zh) 一种测试脚本生成方法、装置、计算设备及存储介质
CN110826488B (zh) 一种针对电子文档的图像识别方法、装置及存储设备
CN117612195A (zh) 一种基于主接线图识别技术的图模生成方法和装置
CN115186240A (zh) 基于关联性信息的社交网络用户对齐方法、装置、介质
WO2022221079A2 (en) Inferring structure information from table images
CN113298822B (zh) 点云数据的选取方法及选取装置、设备、存储介质
CN115373658A (zh) 一种基于Web图片的前端代码自动生成方法和装置
CN111125483B (zh) 网页数据抽取模板生成方法、装置、计算机装置及存储介质
CN111428724B (zh) 一种试卷手写统分方法、装置及存储介质
CN106649628B (zh) 网页可视化区域的交互增强方法及系统
Liang SmartGenerator4UI: A Web Interface Element Recognition and HTML Generation System Based on Deep Learning and Image Processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897932

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 11202403510P

Country of ref document: SG