CN112651942A

CN112651942A - Layout detection method and device

Info

Publication number: CN112651942A
Application number: CN202011576492.0A
Authority: CN
Inventors: 梅新岩; 刘娟; 金道勋; 吴龙海; 陈洁
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-13
Anticipated expiration: 2040-12-28
Also published as: WO2022145723A1; CN112651942B

Abstract

The layout detection method and apparatus provided by the embodiments of the present disclosure obtain bounding box information and class name of each component by identifying a current image with an instance segmentation model in response to obtaining the current image including at least one component, then generating a marking image corresponding to the current image based on the bounding box information of each component, finally inputting the bounding box information, the class name and the marking image of each component into a long-short term memory network detection model, outputting the layout result of the components in the current image, the layout result comprises the sequencing grouping result among all the components in the current image, the layout result of all the components in the image is extracted, the grouping result of all the components can be obtained based on the interdependency relation among the components, the global information among all the parts in the current image is considered, and the accuracy of the layout result is improved.

Description

Layout detection method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a layout detection method and device.

Background

With the continuous progress of science and technology, software testing plays an important role in product development and deployment, testers find errors, and then developers redesign rework repair, which is time-consuming and labor-consuming.

With the popularization of the AI in the smart device, the AI applies human skills and tendencies to inanimate objects or ideas, and starts to affect test automation in various ways, in the performance test of the smart device, reinforcement learning needs to record a state corresponding to each action, namely current page picture information, the layout in the current picture information is relatively stable, but the content of the current picture information is continuously refreshed along with the update of a third-party app, so that to compare the similarities and differences of different picture states, the layout information of pictures needs to be extracted, each part of the pictures can be segmented based on the traditional image segmentation methods, such as fast-RCNN and MASK-RCNN, but global information between the part and the part is ignored, such as a layout alignment mode and the like.

Disclosure of Invention

The embodiment of the disclosure provides a layout detection method and device, electronic equipment and a computer readable medium.

In a first aspect, an embodiment of the present disclosure provides a layout detection method, including: in response to the acquisition of a current image comprising at least one component, identifying the current image by using an example segmentation model to obtain bounding box information and a category name of each component; generating a marking image corresponding to the current image based on the bounding box information of each component; inputting the bounding box information, the category name and the label image of each component into a long-short term memory network detection model, and outputting a layout result of the components in the current image, wherein the layout result comprises a sorting grouping result among the components in the current image.

In some embodiments, inputting the bounding box information, the category name and the label image of each component into the long-short term memory network detection model, and outputting the layout result of the components in the current image, comprises: inputting the bounding box information and the category name of each component into a coding model in the long-short term memory network detection model, and outputting character characteristic information corresponding to each component; inputting the marked image into an image processing model in the long-term and short-term memory network detection model, and outputting image characteristic information corresponding to the marked image; and inputting the character characteristic information and the image characteristic information into a decoding model in the long-term and short-term memory network detection model, and outputting a layout result of the components in the current image.

In some embodiments, the instance segmentation model is obtained based on the following steps: acquiring a sample image set, wherein the sample image set comprises at least one sample image, and each sample image comprises at least one component; determining and labeling the category name and bounding box information of each component in each sample image; and taking each sample image as input, taking the class name and bounding box information of each component in each sample image as expected output, and training to obtain the example segmentation model.

In some embodiments, the method further comprises: performing state similarity search on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images; and comparing the current image with each similar image, and judging whether the current image and each similar image have the same state.

In some embodiments, comparing the current image with each similar image and determining whether the current image and each similar image are in the same state comprises: calculating the intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image; comparing the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image; comparing the intersection ratio and the comparison result with a preset threshold value, and judging whether the current image and each similar image are in the same state.

In some embodiments, the method further comprises: deleting the current image in response to determining that the current image is in the same state as the similar image; in response to determining that the current image is not in the same state as the similar image, the current image is stored.

In some embodiments, the method further comprises: and responding to the obtained layout result of the components in the current image, and aiming at each component with the same layout, adjusting the bounding box information of each component to obtain adjusted bounding box information corresponding to each component.

In some embodiments, the method further comprises: taking the current image, the category name of each component and the adjusted bounding box information as a new sample image set; an example segmentation model is trained based on the new sample image set.

In some embodiments, the method further comprises: and in response to the acquisition of the adjusted bounding box information corresponding to each component, inputting the adjusted bounding box information and the layout result corresponding to each component into the UI2Code to obtain a repeated layout Code.

In a second aspect, an embodiment of the present disclosure provides a layout detection apparatus, including: the identification module is configured to respond to the acquisition of a current image comprising at least one component, identify the current image by using an example segmentation model, and obtain bounding box information and a category name of each component; the generating module is configured to generate a mark image corresponding to the current image based on the bounding box information of each component; and the output module is configured to input the bounding box information, the category name and the label image of each component into the long-short term memory network detection model and output a layout result of the components in the current image, wherein the layout result comprises a sorting grouping result among the components in the current image.

In some embodiments, the output module comprises: the encoding unit is configured to input the bounding box information and the category name of each component into an encoding model in the long-short term memory network detection model and output character characteristic information corresponding to each component; the image processing unit is configured to input the marked image into an image processing model in the long-term and short-term memory network detection model and output image characteristic information corresponding to the marked image; and the decoding unit is configured to input the character characteristic information and the image characteristic information into a decoding model in the long-short term memory network detection model and output a layout result of the components in the current image.

In some embodiments, the apparatus further comprises: the searching module is configured to perform state similarity searching on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images; and the judging module is configured to compare the current image with each similar image and judge whether the current image and each similar image are in the same state.

In some embodiments, the determining module comprises: a calculation unit configured to calculate an intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image; the comparison unit is configured to compare the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image; and the judging unit is configured to compare the intersection ratio and the comparison result with a preset threshold value and judge whether the current image and each similar image are in the same state or not.

In some embodiments, the apparatus further comprises: a deletion module configured to delete the current image in response to determining that the current image is in the same state as the similar image; a storage module configured to store the current image in response to determining that the current image is not in the same state as the similar image.

In some embodiments, the apparatus further comprises: and the adjusting module is configured to respond to the acquired layout result of the components in the current image, and adjust the bounding box information of each component aiming at each component with the same layout to obtain adjusted bounding box information corresponding to each component.

In some embodiments, the apparatus further comprises: an update module configured to take the current image, the class name of each component, and the adjusted bounding box information as a new sample image set; a training module configured to train an instance segmentation model based on the new sample image set.

In some embodiments, the apparatus further comprises: and the Code generation module is configured to respond to the acquired adjusted bounding box information corresponding to each component, input the adjusted bounding box information corresponding to each component and the layout result into the UI2Code, and acquire a repeated layout Code.

In a third aspect, the present application provides an electronic device comprising one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates an exemplary system architecture diagram in which some embodiments of the present disclosure may be applied;

FIG. 2 illustrates a flow diagram of one embodiment of a layout detection method of the present disclosure;

FIG. 3 illustrates a schematic diagram of one application scenario of the layout detection method of the present disclosure;

FIG. 4 illustrates a schematic diagram of one embodiment of an output layout result of the present disclosure;

FIG. 5 shows a schematic diagram of yet another embodiment of a layout detection method of the present disclosure;

FIG. 6 shows a schematic diagram of yet another embodiment of a layout detection method of the present disclosure;

FIG. 7 shows a schematic structural diagram of one embodiment of a layout detection apparatus of the present disclosure;

FIG. 8 illustrates a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which a layout detection method or a layout detection apparatus of an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

104, 105, a network 106, and

servers

101, 102, 103. The network 106 serves as a medium for providing communication links between the

terminal devices

104, 105 and the

servers

101, 102, 103. Network 106 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the

servers

101, 102, 103 via the network 106 via the

terminal devices

104, 105 to receive or transmit information or the like. The

terminal devices

104, 105 may have installed thereon various applications such as image processing applications, data processing applications, instant messaging tools, social platform software, search-type applications, shopping-type applications, and the like.

The

terminal devices

104, 105 may be hardware or software. When the terminal device is hardware, it may be various electronic devices having a display screen and supporting communication with the server, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal device is software, the terminal device can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The

terminal devices

104 and 105 may obtain a current image through the network 106 or through local reading, where the current image may include at least one component, and the

terminal devices

104 and 105 invoke the instance segmentation model to identify the current image, so as to obtain bounding box information and a category name of each component in the current image. Then, the

terminal device

104, 105 labels the current image based on the obtained bounding box information of each component to obtain a labeled image including the bounding box information of each component. Finally, the

terminal devices

104 and 105 input the bounding box information, the category name and the label image of each component into the long-short term memory network detection model, the long-short term memory network detection model processes the input content and outputs the layout result of the components in the current image, and the layout result may include the grouping result of the category name of each component in the current image, that is, which components are regarded as the same group in the same layout.

The

servers

101, 102, 103 may be servers that provide various services, such as background servers that receive requests sent by terminal devices with which communication connections are established. The background server can receive and analyze the request sent by the terminal device, and generate a processing result.

The server may be hardware or software. When the server is hardware, it may be various electronic devices that provide various services to the terminal device. When the server is software, it may be implemented as a plurality of software or software modules for providing various services to the terminal device, or may be implemented as a single software or software module for providing various services to the terminal device. And is not particularly limited herein.

It should be noted that the layout detection method provided by the embodiments of the present disclosure may be executed by the

terminal devices

104 and 105. Accordingly, the layout detection means may be provided in the

terminal devices

104, 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a layout detection method according to the present disclosure is shown. The layout detection method can comprise the following steps:

step 210, in response to acquiring a current image including at least one component, identifying the current image by using the example segmentation model, and acquiring bounding box information and a category name of each component.

In this step, an executing subject of the layout detection method (for example, the terminal device shown in fig. 1) may obtain the current image from the server through the network, or perform operations such as screenshot on the currently displayed image to obtain the current image, or obtain the current image in the local storage. The current image can comprise at least one component, the component is a simple package of some part of data and methods in the image, and the current image can be formed by a plurality of different components, and each component has different attributes and roles.

After the execution main body acquires the current image, the instance segmentation model can be called to process the current image, each component in the current image is identified, and bounding box information and a category name of each component are obtained. The class names of the components in the current image may be output in a sequential manner, i.e., a sequence of class names of the components in the current image is output. Wherein, the bounding box information of the component is used for characterizing the size of the component, and is displayed by bounding boxes, and generally comprises the coordinates of each bounding box; the category name of a component is used to characterize the type name of the component, and may include a custom type name, such as: menu, scrolllow, toolbar, userinfo, tipbox, notibox, keyboard, textlist, textbox, bgimg, banner, other, viewbox, button, popdialog, popmenu, Item, firstscreen, progressbar, and the like.

The example segmentation model may be a Mask R-CNN model, which is used to segment each example in an image to obtain a bounding box and a type of each example, and the Mask R-CNN model is trained based on bounding box information and class names of each component in the image and the image, and may perform corresponding preprocessing operation on an input current image to obtain a preprocessed image, and then input the preprocessed image into a pre-trained neural network to obtain a corresponding feature map, and then set a predetermined number of interest points for each point in the feature map to obtain a plurality of candidate interest points, and send the candidate interest points into an RPN network to perform binary classification and BB regression, filter a part of candidate interest points, and then perform roiign operation on the remaining interest points (i.e. first correspond pixels of an original image and the feature map, then, the feature map and the fixed features are corresponded), and finally, the candidate interest points are classified (N type classification), BB regression and MASK generation (FCN operation is carried out in each interest point), so that bounding box information and category names of each component in the current image are obtained.

Step 220, generating a labeled image corresponding to the current image based on the bounding box information of each component.

In this step, the execution body identifies the current image by using the example segmentation model, obtains bounding box information and a category name of each component, and then replaces each component in the current image with the bounding box according to the bounding box information of each component, that is, each component in the current image is replaced with the corresponding bounding box, so as to obtain a label image including each bounding box.

As an example, the bounding box information is displayed as a bounding box, and the execution main body obtains a rectangular frame of each component in the current image, replaces each component with a corresponding rectangular frame, and uses an image including the rectangular frame as a markup image corresponding to the current image.

And step 230, inputting the bounding box information, the category name and the label image of each component into the long-term and short-term memory network detection model, and outputting the layout result of the components in the current image.

In this step, after acquiring bounding box information, category name, and label image corresponding to the current image of each component in the current image, the execution body inputs the bounding box information, category name, and label image of each component into the long-short term memory network detection model, and the long-short term memory network detection model processes the input content and outputs the layout result of the components in the current image. The layout result may include a sorting and grouping result between the components in the current image, and the long-term and short-term memory network detection model may sort and group the components in the current image according to the association dependency relationship between the components because the components in the current image have the association dependency relationship therebetween, thereby obtaining the sorting and grouping result between the components in the current image. The execution main body may sort and group the category names of the components in the category name sequence to obtain a result of sorting and grouping the category names corresponding to the components, and may distinguish the category names of the same group in the same manner, for example, put the category names of the same group into the same bracket, distinguish different groups by different brackets, and distribute the components in the same group in the same layout, and may have different alignment manners, such as left alignment, vertical middle alignment, right alignment, top alignment, horizontal middle alignment, bottom alignment, and the like.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the layout detection method according to the present embodiment. The method can be applied to the application scenario of fig. 3, and the television 301 intercepts the current image displayed in the screen to obtain the current image. The television 301 identifies the current image using the instance segmentation model, resulting in bounding box information and class names for each component in the current image. Then, the television 301 generates a label image corresponding to the current image according to the bounding box information of each component in the current image, inputs the bounding box information, the category name and the label image of each component into the long-short term memory network detection model to obtain an output layout result 302 of the components in the current image, and then the television 301 acquires a component list which is the same as or similar to the layout result 302 of the current image from a local focus manager according to the obtained layout result 302. When the television 301 receives a voice uttered by a user, the received voice is analyzed to determine whether the voice includes a command such as "Next", "Previous", or the like. When determining that the voice includes a similar instruction such as "Next", "Previous", and the like, the television 301 first selects, according to a preset condition that the same group is switched to a preferred one and the other group is switched to the Next one, from a component list that is the same as the layout result 302 of the current image, and presents the selected image to the user through the screen. If the images in the component list identical to the layout result 302 of the current image have been selected, the image is switched to the similar list for selection.

In the layout detection method provided by the above embodiment of the present disclosure, in response to obtaining a current image including at least one component, the current image is identified by using the instance segmentation model, so as to obtain bounding box information and category name of each component, then a label image corresponding to the current image is generated based on the bounding box information of each component, and finally the bounding box information, the category name and the label image of each component are input into the long-term and short-term memory network detection model, so as to output a layout result of the components in the current image, thereby realizing extraction of the layout result of each component in the image, being capable of obtaining a grouping result of each component based on a mutual dependency relationship between the components, considering global information between each part in the current image, and improving accuracy of the layout result.

With further reference to fig. 4, fig. 4 shows the step 230 in fig. 2, where bounding box information, category name and label image of each component are input into the long-short term memory network detection model, and the layout result of the component in the current image is output, which can be implemented based on the following steps:

and step 410, inputting the bounding box information and the category name of each component into a coding model in the long-short term memory network detection model, and outputting character characteristic information corresponding to each component.

In this step, the execution main body identifies the current image by using the example segmentation model to obtain bounding box information and class name of each component, and then applies embedded coding to the bounding box information and the class name of each component to obtain vector information of the bounding box and the class, and then inputs the obtained vector information of the bounding box and the class into a coding model in a long-short term memory network detection model (LSTM), and outputs character feature information corresponding to each component.

And step 420, inputting the marked image into an image processing model in the long-term and short-term memory network detection model, and outputting image characteristic information corresponding to the marked image.

In this step, after the execution subject obtains the label image corresponding to the current image, the label image is input to the image processing model in the long-short term memory network detection model, and the image feature information corresponding to the label image is output by performing the CNN convolution operation, full connection operation, and the like.

And step 430, inputting the character characteristic information and the image characteristic information into a decoding model in the long-short term memory network detection model, and outputting a layout result of the components in the current image.

In this step, after the execution main body obtains the character feature information and the image feature information, the character feature information and the image feature information are fused to obtain fused feature information. And then inputting the fusion characteristic information into a decoding model in the long-term and short-term memory network detection model, performing softmax classification, and outputting a layout result with the same length as the input class name sequence, namely the layout result of the components in the current image.

In the implementation mode, the problem of interdependence among all the characteristics can be solved by detecting the layout result based on the long-term and short-term memory network, the grouping result of each component is determined based on the interdependence relation among all the characteristics, the relation between local parts is considered, and the accuracy of the grouping result is improved.

In some optional implementations of this embodiment, the example segmentation model is obtained based on the following steps:

in a first step, a sample image set is obtained.

Specifically, the example segmentation model training process occurs in a server, and the server may locally read or obtain a sample image set from the execution subject, where the sample image set includes at least one sample image, and each sample image includes at least one component.

And secondly, determining and labeling the class name and bounding box information of each component in each sample image.

Specifically, the server can label the category name of each component in each sample image according to the customized category name. And labeling the bounding box to which each component in each sample image belongs by the server to label the bounding box information of each component, so as to obtain the category name and the bounding box information corresponding to each component in each sample image.

And thirdly, taking each sample image as input, taking the class name and bounding box information of each component in each sample image as expected output, and training to obtain an example segmentation model.

Specifically, the server may obtain a Mask R-CNN network, take the sample image as input, take the class name and bounding box information of each component corresponding to the input sample image as expected output, train the Mask R-CNN network, and obtain the trained Mask R-CNN network as an example segmentation model. The trained example segmentation model can identify the input image and output the class name and bounding box information of each component in the image.

In the implementation mode, the example segmentation model is obtained through training, so that the accuracy and the recognition efficiency of the class name and bounding box information of each component in the image can be improved.

With further reference to fig. 5, fig. 5 illustrates a flow 500 of yet another embodiment of a layout detection method. The layout detection method can also comprise the following steps:

step 510, in response to acquiring a current image including at least one component, identifying the current image by using an example segmentation model, and acquiring bounding box information and a category name of each component.

In this step, step 510 is the same as step 210 in the embodiment shown in fig. 2, and is not described herein again.

And step 520, generating a marked image corresponding to the current image based on the bounding box information of each component.

In this step, step 520 is the same as step 220 in the embodiment shown in fig. 2, and is not described herein again.

And step 530, inputting the bounding box information, the category name and the label image of each component into the long-term and short-term memory network detection model, and outputting the layout result of the components in the current image.

In this step, step 530 is the same as step 230 in the embodiment shown in fig. 2, and is not described herein again.

And 540, performing state similarity search on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images.

In this step, after the execution subject obtains the layout result of the components in the current image, that is, after the grouping result of each component in the current image is obtained, the execution subject performs state search on the images stored in the local database, that is, performs state similarity search in the stored images according to the layout result of the components in the current image, and searches for images similar to the grouping result of the components in the current image or images similar to the layout in the stored images, thereby obtaining a plurality of similar images similar to the layout result state of the components in the current image.

And step 550, comparing the current image with each similar image, and judging whether the current image and each similar image are in the same state.

In this step, after acquiring a plurality of similar images, the executing entity compares the current image with each similar image, so as to obtain a plurality of comparison results between the current image and each similar image, where the comparison results may include a similarity value and the like. Then the executing body judges whether the current image and each similar image have the same state according to the comparison result.

As an example, the execution subject obtains 3 similar images, which are image 1, image 2, and image 3, compares the current image with image 1 to obtain a first similarity value, then compares the current image with image 2 to obtain a second similarity value, and then compares the current image with image 3 to obtain a third similarity value. And the execution main body judges whether the current image and each similar image have the same state or not according to the first similarity value, the second similarity value and the third similarity value.

In the embodiment, the image is searched through the layout result of the components in the current image, so that the state comparison between the images is realized, and the accuracy of the state comparison of the images can be improved based on the layout result of the components in the current image.

In some optional implementation manners of this embodiment, in step 550, comparing the current image with each similar image, and determining whether the current image and each similar image have the same state, the method may be implemented based on the following steps:

in the first step, the intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image is calculated.

Specifically, after acquiring a plurality of similar images, the execution subject acquires a layout result of each similar image. Then the execution subject determines bounding box information of each component in the current image and bounding box information of the component in each similar image, and calculates intersection ratios between the bounding box information of the component in the current image and the bounding box information of the component in each similar image respectively. An Intersection-over-unity (IOU), a concept used in target detection, is an overlap ratio of a generated candidate frame (candidate frame) and an original labeled frame (ground route frame), i.e., a ratio of an Intersection to a Union of the candidate frame and the original labeled frame, and ideally, the candidate frame and the original labeled frame are completely overlapped, i.e., the ratio is 1. The execution subject may perform intersection ratio calculation on the bounding box of the component in the current image and the bounding box of the component in the similar image, and obtain intersection ratios between the layout result of the component in the current image and the layout result of the component in each similar image respectively.

And secondly, comparing the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image.

Specifically, after acquiring a plurality of similar images, the execution subject acquires pixels of each similar image. Then, the execution subject compares the pixels of the current image with the pixels of each similar image, respectively, and may calculate a difference value between the pixels of the current image and the pixels of each similar image, and use the difference value as a comparison result between the pixels of the current image and the pixels of the similar images.

And thirdly, comparing the intersection and parallel ratio and the comparison result with a preset threshold value, and judging whether the current image and each similar image are in the same state or not.

Specifically, after acquiring the intersection and combination ratio between the layout result of the components in the current image and the layout result of the components in each similar image and the comparison result between the pixels of the current image and the pixels of each similar image, the executing body compares the intersection and combination ratio and the comparison result corresponding to each similar image with a preset threshold value.

The execution main body may add the cross-over ratio corresponding to each similar image and the comparison result, compare the added result with a preset threshold, and determine whether the current image and each similar image are in the same state. The execution main body may determine whether the addition result is not less than a preset threshold, and if the addition result is not less than the preset threshold, determine that the similar image corresponding to the addition result is the same as the current image state; and if the addition result is smaller than the preset threshold value, determining that the similar image corresponding to the addition result is different from the current image state.

The execution main body can also respectively calculate the intersection ratio corresponding to each similar image and the average value of the comparison result, compare the average value with a preset threshold value and judge whether the current image and each similar image are in the same state or not. The execution main body can judge whether the average value is not less than a preset threshold value, and if the addition result is not less than the preset threshold value, the similar image corresponding to the average value is determined to be in the same state as the current image; and if the average value is smaller than the preset threshold value, determining that the similar image corresponding to the average value is different from the current image state.

In the implementation mode, whether the current image and the similar image have the same state or not is judged by calculating the intersection ratio of the current image and the similar image and the pixel comparison result, and the accuracy of state comparison is improved based on multiple judgment.

Referring further to fig. 5, the layout detection method may further include the following steps:

in response to determining that the current image is in the same state as the similar image, step 560, the current image is deleted.

In this step, the execution subject compares the cross-over ratio and the comparison result with a preset threshold value, determines that the current image is in the same state as the similar image, and indicates that the image of the layout result, such as the current image, is already stored in the local database, deletes the current image, and does not store the image.

Step 570, in response to determining that the current image is not in the same state as the similar image, storing the current image.

In this step, the execution subject compares the cross-over ratio and the comparison result with a preset threshold value, determines that the current image is not in the same state as the similar image, and stores the current image into the local database if the image indicating the layout result of the current image is not stored in the local database.

In the embodiment, the current image is deleted or stored through the state comparison result, so that different processing operations on the current image are realized, and the diversity of the current image processing and the applicability of the layout result are improved.

With further reference to fig. 6, fig. 6 illustrates a flow 600 of yet another embodiment of a layout detection method. The layout detection method can also comprise the following steps:

step 610, in response to acquiring a current image including at least one component, identifying the current image by using the example segmentation model, and acquiring bounding box information and class name of each component.

In this step, step 610 is the same as step 210 in the embodiment shown in fig. 2, and is not described herein again.

And step 620, generating a marked image corresponding to the current image based on the bounding box information of each component.

In this step, step 620 is the same as step 220 in the embodiment shown in fig. 2, and is not described herein again.

Step 630, inputting the bounding box information, the category name and the label image of each component into the long-short term memory network detection model, and outputting the layout result of the components in the current image.

In this step, step 630 is the same as step 230 in the embodiment shown in fig. 2, and is not described herein again.

In step 640, in response to the obtained layout result of the components in the current image, for each component with the same layout, adjusting bounding box information of each component to obtain adjusted bounding box information corresponding to each component.

In this step, the execution subject obtains the layout results of the components in the current image, and determines each layout in the layout results, that is, each grouping result. Each group comprises components with the same layout, the bounding box information of each component is adjusted according to each component with the same layout, the bounding box information of each component is aligned, the width of the bounding box information of each component is set to be the same numerical value, and therefore adjusted bounding box information corresponding to each component is obtained.

In this embodiment, the bounding box information of each component in the same layout is adjusted to obtain each component with aligned group and average width, so that the accuracy of the layout result is improved, and more accurate state comparison between images can be realized based on the adjusted layout result.

Referring further to fig. 6, the layout detection method may further include the following steps:

step 650, the current image, the class name of each component and the adjusted bounding box information are used as a new sample image set.

In this step, after the execution subject obtains the adjusted bounding box information of each component in the current image, the class name of each component, and the adjusted bounding box information are used as a new sample image set.

Step 660, train the instance segmentation model based on the new sample image set.

In this step, after the execution subject uses the current image, the class name of each component, and the adjusted bounding box information as a new sample image set, the instance segmentation model may be further trained based on the new sample image set to obtain an updated instance segmentation model. The updated example segmentation model may identify a new image and output bounding box information for each component for the sequence of class names, group alignment, and average width of each component in the new image.

In this embodiment, the example segmentation model is trained through the new sample image set, so that the updated example segmentation model can output bounding box information of each component with aligned and average width, and the accuracy of the output bounding box information is improved.

In some optional implementations of this embodiment, the layout detection method may further include the following steps: and in response to the acquisition of the adjusted bounding box information corresponding to each component, inputting the adjusted bounding box information and the layout result corresponding to each component into the UI2Code to obtain a repeated layout Code.

Specifically, after acquiring the adjusted bounding box information corresponding to each component, the execution body inputs the adjusted bounding box information and the layout result corresponding to each component into the UI2 Code. And the UI2Code performs customized component re-detection, classification and extraction on the input content, and then determines the structural layout of each component by applying a component connection method, thereby obtaining a repeated layout area among the components. And then, generating repeated layout codes by using the obtained repeated layout areas among the components.

In the implementation method, the adjusted bounding box information and the layout result are input into the UI2Code, so that the accuracy of generating the repeated layout Code by the UI2Code is improved, and the reusability of the Code is improved.

With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a layout detection apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the apparatus 700 of the present embodiment includes: an identification module 710, a generation module 720, and an output module 730.

The identifying module 710 is configured to, in response to acquiring a current image including at least one component, identify the current image by using an instance segmentation model, and obtain bounding box information and a category name of each component;

a generating module 720 configured to generate a marker image corresponding to the current image based on the bounding box information of each component;

and the output module 730 is configured to input the bounding box information, the category name and the label image of each component into the long-short term memory network detection model, and output the layout result of the components in the current image, wherein the layout result comprises the sorting grouping result among the components in the current image.

In some optional implementations of this embodiment, the output module includes: the encoding unit is configured to input the bounding box information and the category name of each component into an encoding model in the long-short term memory network detection model and output character characteristic information corresponding to each component; the image processing unit is configured to input the marked image into an image processing model in the long-term and short-term memory network detection model and output image characteristic information corresponding to the marked image; and the decoding unit is configured to input the character characteristic information and the image characteristic information into a decoding model in the long-short term memory network detection model and output a layout result of the components in the current image.

In some optional implementations of this embodiment, the instance segmentation model is obtained based on the following steps: acquiring a sample image set, wherein the sample image set comprises at least one sample image, and each sample image comprises at least one component; determining and labeling the category name and bounding box information of each component in each sample image; and taking each sample image as input, taking the class name and bounding box information of each component in each sample image as expected output, and training to obtain the example segmentation model.

In some optional implementations of this embodiment, the apparatus further includes: the searching module is configured to perform state similarity searching on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images; and the judging module is configured to compare the current image with each similar image and judge whether the current image and each similar image are in the same state.

In some optional implementation manners of this embodiment, the determining module includes: a calculation unit configured to calculate an intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image; the comparison unit is configured to compare the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image; and the judging unit is configured to compare the intersection ratio and the comparison result with a preset threshold value and judge whether the current image and each similar image are in the same state or not.

In some optional implementations of this embodiment, the apparatus further includes: a deletion module configured to delete the current image in response to determining that the current image is in the same state as the similar image; a storage module configured to store the current image in response to determining that the current image is not in the same state as the similar image.

In some optional implementations of this embodiment, the apparatus further includes: and the adjusting module is configured to respond to the acquired layout result of the components in the current image, and adjust the bounding box information of each component aiming at each component with the same layout to obtain adjusted bounding box information corresponding to each component.

In some optional implementations of this embodiment, the apparatus further includes: an update module configured to take the current image, the class name of each component, and the adjusted bounding box information as a new sample image set; a training module configured to train an instance segmentation model based on the new sample image set.

In some optional implementations of this embodiment, the apparatus further includes: and the Code generation module is configured to respond to the acquired adjusted bounding box information corresponding to each component, input the adjusted bounding box information corresponding to each component and the layout result into the UI2Code, and acquire a repeated layout Code.

The layout detection apparatus provided by the above-mentioned embodiment of the present disclosure obtains bounding box information and category name of each component by identifying a current image using an instance segmentation model in response to obtaining the current image including at least one component, then generating a marking image corresponding to the current image based on the bounding box information of each component, finally inputting the bounding box information, the class name and the marking image of each component into a long-short term memory network detection model, outputting the layout result of the components in the current image, the layout result comprises the sequencing grouping result among all the components in the current image, the layout result of all the components in the image is extracted, the grouping result of all the components can be obtained based on the interdependency relation among the components, the global information among all the parts in the current image is considered, and the accuracy of the layout result is improved.

Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structure, such as a processor, memory, etc., which is not shown in fig. 7 in order to not unnecessarily obscure embodiments of the present disclosure.

Referring now to fig. 8, shown is a schematic diagram of an electronic device (e.g., terminal device in fig. 1) 800 suitable for use in implementing embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a current image comprising at least one component, and identifying the current image by using an example segmentation model to obtain bounding box information and a category name of each component; generating a marking image corresponding to the current image based on the bounding box information of each component; inputting the bounding box information, the category name and the label image of each component into a long-short term memory network detection model, and outputting a layout result of the components in the current image, wherein the layout result comprises a sorting grouping result among the components in the current image.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present application may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an identification module, a generation module, and an output module. Where the names of these modules do not constitute a limitation on the module itself under certain circumstances, for example, the identifying module may also be described as "a module that obtains bounding box information and a category name of each component by identifying a current image using an instance segmentation model in response to obtaining the current image including at least one component".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A layout detection method, the method comprising:

in response to the acquisition of a current image comprising at least one component, identifying the current image by using an example segmentation model to obtain bounding box information and a category name of each component;

generating a marker image corresponding to the current image based on the bounding box information of each component;

inputting the bounding box information, the category name and the label image of each component into a long-short term memory network detection model, and outputting a layout result of the components in the current image, wherein the layout result comprises a sorting grouping result among the components in the current image.

2. The method of claim 1, wherein the inputting the bounding box information, the category name and the label image of each component into a long-short term memory network detection model and outputting a layout result of the components in the current image comprises:

inputting the bounding box information and the category name of each component into a coding model in a long-short term memory network detection model, and outputting character characteristic information corresponding to each component;

inputting the marked image into an image processing model in a long-term and short-term memory network detection model, and outputting image characteristic information corresponding to the marked image;

and inputting the character characteristic information and the image characteristic information into a decoding model in a long-short term memory network detection model, and outputting a layout result of the components in the current image.

3. The method according to claim 1 or 2, wherein the instance segmentation model is obtained based on:

acquiring a sample image set, wherein the sample image set comprises at least one sample image, and each sample image comprises at least one component;

determining and labeling the category name and bounding box information of each component in each sample image;

and taking each sample image as input, taking the class name and bounding box information of each component in each sample image as expected output, and training to obtain the example segmentation model.

4. The method of claim 1, wherein the method further comprises:

performing state similarity search on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images;

and comparing the current image with each similar image, and judging whether the current image and each similar image have the same state.

5. The method of claim 4, wherein said comparing the current image to each similar image to determine whether the current image and each similar image are in the same state comprises:

calculating the intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image;

comparing the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image;

and comparing the intersection and parallel ratio and the comparison result with a preset threshold value, and judging whether the current image and each similar image are in the same state or not.

6. The method of claim 4 or 5, wherein the method further comprises:

deleting the current image in response to determining that the current image is in the same state as the similar image;

in response to determining that the current image is not in the same state as the similar image, storing the current image.

7. The method of claim 1, wherein the method further comprises:

and responding to the obtained layout result of the components in the current image, and aiming at each component with the same layout, adjusting the bounding box information of each component to obtain adjusted bounding box information corresponding to each component.

8. The method of claim 7, wherein the method further comprises:

taking the current image, the category name of each component and the adjusted bounding box information as a new sample image set;

the example segmentation model is trained based on a new sample image set.

9. The method of claim 7, wherein the method further comprises:

and responding to the acquired adjusted bounding box information corresponding to each component, and inputting the adjusted bounding box information corresponding to each component and the layout result into the UI2Code to obtain a repeated layout Code.

10. A layout detection apparatus comprising:

the identification module is configured to respond to the acquisition of a current image comprising at least one component, identify the current image by using an example segmentation model, and obtain bounding box information and a category name of each component;

a generating module configured to generate a marker image corresponding to the current image based on bounding box information of each component;

and the output module is configured to input the bounding box information, the category name and the label image of each component into a long-short term memory network detection model, and output a layout result of the components in the current image, wherein the layout result comprises a sorting grouping result among the components in the current image.

11. The apparatus of claim 10, wherein the output module comprises:

the encoding unit is configured to input the bounding box information and the category name of each component into an encoding model in the long-short term memory network detection model and output character characteristic information corresponding to each component;

the image processing unit is configured to input the marked image into an image processing model in a long-term and short-term memory network detection model and output image characteristic information corresponding to the marked image;

and the decoding unit is configured to input the character characteristic information and the image characteristic information into a decoding model in a long-short term memory network detection model and output a layout result of the components in the current image.

12. The apparatus of claim 10 or 11, wherein the instance segmentation model is obtained based on:

13. The apparatus of claim 10, wherein the apparatus further comprises:

the searching module is configured to perform state similarity searching on the stored images based on the layout result of the components in the current image to obtain a plurality of similar images;

and the judging module is configured to compare the current image with each similar image and judge whether the current image and each similar image are in the same state.

14. The apparatus of claim 13, wherein the means for determining comprises:

a calculation unit configured to calculate an intersection ratio between the layout result of the components in the current image and the layout result of the components in each similar image;

the comparison unit is configured to compare the pixels of the current image with the pixels of each similar image to obtain a comparison result between the pixels of the current image and the pixels of each similar image;

and the judging unit is configured to compare the intersection ratio and the comparison result with a preset threshold value and judge whether the current image and each similar image are in the same state or not.

15. The apparatus of claim 13 or 14, wherein the apparatus further comprises:

a deletion module configured to delete the current image in response to determining that the current image is in the same state as the similar image;

a storage module configured to store the current image in response to determining that the current image is not in the same state as the similar image.

16. The apparatus of claim 10, wherein the apparatus further comprises:

and the adjusting module is configured to perform adjusting operation on bounding box information of each component aiming at each component with the same layout in response to the obtained layout result of the components in the current image, so as to obtain adjusted bounding box information corresponding to each component.

17. The apparatus of claim 16, wherein the apparatus further comprises:

the example segmentation model is trained based on a new sample image set.

18. The apparatus of claim 16, wherein the apparatus further comprises:

and the Code generation module is configured to respond to the acquired adjusted bounding box information corresponding to each component, and input the adjusted bounding box information corresponding to each component and the layout result into the UI2Code to obtain a repeated layout Code.

19. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.

20. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-9.