CN111460355A

CN111460355A - Page parsing method and device

Info

Publication number: CN111460355A
Application number: CN202010304984.8A
Authority: CN
Inventors: 王若
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-07-28
Anticipated expiration: 2040-04-17
Also published as: CN111460355B

Abstract

An embodiment of the present specification provides a page parsing method and a device, where the method includes: when a target page to be analyzed is analyzed, a target picture is obtained, wherein the target picture comprises the content in the target page; carrying out control analysis on the target picture, and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information; performing layout generation based on the target picture, and determining layout information of the plurality of controls; and obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

Description

Page parsing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for page parsing.

Background

At present, in various service scenarios, a page needs to be analyzed, so as to execute a service operation corresponding to the service scenario according to an analysis result. For example, in a UI automation test scenario, a test page needs to be parsed to determine whether a desired control (also referred to as an element, such as text, a button, a pop-up window, etc.) is included in the test page.

Generally, the content included in a page is more, and the control information of a control is also more complex, for example, the page may include a plurality of controls of different types, the number of the controls of the same type may be multiple, and the controls of the same type may be distributed in different areas of the page, so that when the page is analyzed, the analysis difficulty is higher, and often, an effective analysis result cannot be obtained. In view of the above, it is desirable to provide an efficient solution for achieving efficient parsing of pages.

Disclosure of Invention

The embodiment of the specification provides a page parsing method and device, which are used for solving the problem that a page cannot be effectively parsed when the page is parsed at present.

In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:

in a first aspect, a method for page parsing is provided, including:

acquiring a target picture, wherein the target picture comprises content in a target page to be analyzed;

carrying out control analysis on the target picture, and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information;

performing layout generation based on the target picture, and determining layout information of the plurality of controls;

and obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In a second aspect, a page resolution apparatus is provided, including:

the acquisition unit is used for acquiring a target picture, and the target picture comprises the content in a target page to be analyzed;

the control analyzing unit is used for carrying out control analysis on the target picture and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information;

the layout generating unit is used for generating a layout based on the target picture and determining the layout information of the plurality of controls;

and the determining unit is used for obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In a third aspect, an electronic device is provided, which includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

In a fourth aspect, a computer-readable storage medium is presented, the computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of:

At least one technical scheme adopted by one or more embodiments of the specification can achieve the following technical effects:

according to the technical scheme provided by one or more embodiments of the present specification, when the target page is analyzed, on the basis that the control analysis is performed on the target page to obtain the attribute information of the plurality of controls included in the target page, the plurality of controls are also subjected to layout generation to obtain the layout information of the plurality of controls, wherein the attribute information of the plurality of controls may include the coordinates, the categories, and the semantic information of the plurality of controls, and therefore, in combination with the analyzed attribute information of the plurality of controls and the generated layout information, a more effective analysis result for the target page may be obtained.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative efforts.

FIG. 1 is a schematic flow chart diagram of a page resolution method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow diagram of one embodiment of the present description for generating primary layout information based on a target picture;

FIG. 3 is a schematic diagram of a target picture according to one embodiment of the present description;

FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present description;

fig. 5 is a schematic structural diagram of a page resolution device according to an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art without making creative efforts based on the embodiments in the present description shall fall within the protection scope of this document.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart illustrating a page resolution method according to an embodiment of the present disclosure. The method is as follows.

S102: and acquiring a target picture, wherein the target picture comprises the content in the target page to be analyzed.

In S102, when the target page is analyzed, a target picture corresponding to the target page may be obtained, so as to analyze the target page based on the target picture. The target page may be a certain page in the APP, a certain page in the browser, or the like, and the target picture may include content in the target page.

In this embodiment, when the target picture is obtained, the target picture can be obtained by capturing a screen of the target page, for example, when a login page of an APP in a smart phone is analyzed, the target picture can be obtained by capturing a screen. Of course, in other implementation manners, the target picture may also be obtained by taking a picture of the target page, and the like, which is not illustrated here.

S104: and carrying out control analysis on the target picture, and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information.

The target page usually includes a plurality of controls, which may be buttons, icons/pictures, text, progress bars, switch switches, more buttons, edit boxes, pop-up windows, check boxes, return buttons, close buttons, and the like, and when the target page is parsed, information related to the controls is often required to be obtained. In view of this, in S104, after the target picture is acquired, the control analysis may be performed on the target picture to obtain attribute information of multiple controls included in the target picture (target page). Wherein the attribute information of the plurality of controls may include coordinates, categories, and semantic information of the plurality of controls.

The coordinates of the control can be understood as position coordinates of the control in the target picture, the type of the control can be understood as text, buttons, icons/pictures, progress bars and the like, the semantic information of the control can be understood as character strings included in the control, and can also be understood as what control the control is specific, for example, if the type of the control is text, the semantic information of the control can be character strings included in the text, and for example, if the control is an icon of an application a, the semantic information of the control can be the name of the application a.

In this embodiment, when the control is analyzed for the target picture, the method specifically includes the following steps:

firstly, control detection can be performed on a target picture based on a preset detection model, and control information of a plurality of controls included in the target picture is obtained.

The preset detection model may be an existing Yolo3-tiny model, the Yolo3-tiny model is a general model, and can detect various types of controls, but it should be noted that, in general, the Yolo3-tiny model can detect hundreds or even thousands of different types of controls by default, and in this embodiment, the types of controls included in the target picture (target page) generally do not exceed hundreds, so in order to detect the controls of the target picture by using the Yolo3-tiny model, the head part of the Yolo3-tiny model needs to be modified, the head part can represent the number of the types of controls output by the Yolo3-tiny model, and by modifying the head part, the number of the types of controls output by the Yolo3-tiny model can be adjusted, and further the types of controls that the Yolo3-tiny model can detect can be adjusted, and the specific modification can be determined according to the types of controls that need to be detected, and is not particularly limited herein.

For example, assuming that the Yolo3-tiny model can detect 1000 types of controls by default, and the types of the controls included in the target picture are 11 (these 11 types may be custom types), the header portion of the Yolo3-tiny model may be modified, so that the Yolo3-tiny model can detect the controls of these 11 types.

Of course, in other application scenarios, if the category of the control is divided into finer granularity or coarser granularity, the head portion of the Yolo3-tiny model may also be correspondingly adjusted, so that the Yolo3-tiny model may flexibly detect different types and numbers of controls in different application scenarios.

It should be further noted that the Yolo3-tiny model is generally suitable for being used in a PC, and if the Yolo3-tiny model needs to be used in a client (such as a smart phone), the Yolo3-tiny model needs to be converted so that the model can be run on the client.

It should be understood that in other implementations, other general models (different types of controls can be detected) may also be used for the control parsing of the target picture, and are not illustrated here.

When the control detection is performed on the target picture based on the preset detection model, the target picture can be used as the input of the model, and the output result of the model is the control information of the plurality of controls included in the target picture, wherein the control information can include the category and the coordinates of the controls. Optionally, the output of the model may further include confidence levels of the detection of the plurality of controls, which may characterize the confidence levels of the detection of the plurality of controls.

Secondly, classifying a first control of which the category is an icon/picture in the plurality of controls based on a preset classification model to obtain semantic information of the first control.

After the categories and coordinates of the multiple controls in the target picture are obtained, for the controls of which the categories are icons/pictures, in order to expand the dimensionality of the controls and embody more service scene information, the controls of the categories can be subdivided to obtain semantic information capable of embodying the service scenes of the icons/picture controls. For example, the target picture includes two icon controls, which are an icon of the application a and an icon of the application B, respectively, and then the two controls need to be subdivided to obtain that the semantic information of one icon is the name of the application a, and the semantic information of the other icon is the name of the application B.

In this embodiment, when the controls of the icon/picture class (which may be represented by a second control for convenience of distinction) are classified, the classification may be implemented based on a preset classification model.

The preset classification model can be an existing MobileNet V1 model, the model can be obtained by training icons under different service scenes and background images representing UI abnormalities, when the second control is classified based on the model, the second control can be used as the input of the model, and the output result of the model is the semantic information of the second control.

It should be noted that the MobileNet V1 model is generally suitable for use in a PC, and if the model needs to be used in a client (e.g. a smartphone), the MobileNet V1 model needs to be converted so that the model can be operated in the client. The specific transformation method can be referred to the above description of transforming the Yolo3-tiny model, and the description is not repeated here.

And finally, performing OCR recognition on the target picture to obtain semantic information of third controls of other categories in the multiple controls.

Based on the above step, we obtain semantic information of the second control of the icon/picture category, and for the controls of other categories (for convenience of distinction, the following may be represented by a third control), when performing control analysis, semantic information of the third control of other categories is also required to be obtained.

In this embodiment, since the third control of the other category is a non-icon/picture category control, semantic information of the third control of the other category can be obtained through an OCR technology. Specifically, OCR recognition may be performed on the target picture to obtain an OCR recognition result of the third control of the other category in the target picture, where the OCR recognition result may include a character string description of the third control and a confidence of the recognition result of the third control, and if the confidence is not less than a set threshold, the character string description obtained through recognition may be used as semantic information of the third control.

By the three steps, the coordinates, the categories and the semantic information of the controls included in the target picture can be obtained through analysis, the coordinates, the categories and the semantic information of the controls are fused, and the attribute information of the controls can be obtained.

Optionally, after the OCR recognition is performed on the target picture, the recognition result may further include coordinates of a third control, and since the accuracy of the coordinates obtained by the OCR recognition is higher, the coordinates of the third control obtained by the OCR recognition may replace the coordinates of the third control obtained by performing the control detection based on the preset detection model.

Specifically, for any third control, it may be determined whether the confidence of the recognition result of the OCR on the third control is not less than a set confidence threshold, if so, it may be determined that the recognition result of the OCR is reliable, at this time, the coordinate of the third control detected by the control may be replaced by the coordinate of the third control recognized by the OCR, and if not, it may be determined that the recognition result of the OCR is unreliable, at this time, the coordinate replacement may not be performed. The confidence threshold may be determined according to actual conditions, and is not specifically limited herein.

S106: and performing layout generation based on the target picture, and determining layout information of the plurality of controls.

In S106, on the basis of obtaining the attribute information of the plurality of controls included in the target picture by parsing, layout generation may be further performed based on the target picture to obtain layout information of the plurality of controls, so as to obtain a parsing result of the target page based on the attribute information and the layout information of the plurality of controls.

The layout information of the multiple controls may be understood as the layout of the multiple controls in a target picture (target page), and specifically may include first-level layout information and second-level layout information, where the first-level layout information may represent row-based layout information of the multiple controls, for example, in which row region the controls are located in the target picture, and the second-level layout information may represent column-based layout information of the multiple controls, for example, whether a child control exists in the control, and the child control may be understood as a control that can be merged with the control.

In this embodiment, when performing layout generation based on a target picture and determining layout information of a plurality of controls, first, the target picture may be segmented based on a preset image morphology algorithm to obtain first-level layout information of the plurality of controls; secondly, performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain secondary layout information of the plurality of controls; and finally, adding the secondary layout information of the plurality of controls into the primary layout information to obtain the layout information of the plurality of controls.

Each step of the above-described layout generation will be described in detail below.

When a preset image morphology algorithm is adopted to segment a target picture to obtain the first-level layout information of a plurality of controls:

firstly, the target picture can be preprocessed to obtain a binarized picture without longitudinal noise.

The binarized picture can be understood as a picture with pixel values of either 0 (black) or 1 (white) in the picture, and since the target picture is usually an RGB picture, in order to obtain the binarized picture, the target picture needs to be converted into a grayscale picture; and after the gray level picture is obtained, performing binarization operation on the gray level picture to obtain a binarization picture.

It should be noted that, because the binarized picture usually includes longitudinal noise, when the target picture is segmented, the noise may generate interference, and the accuracy of the segmentation result is affected, so after the binarized picture is obtained, the binarized picture also needs to be subjected to longitudinal denoising processing. Specifically, morphological structural elements can be constructed, and the morphological structural elements are used for sequentially carrying out corrosion and expansion operations on the binary image, wherein the corrosion and expansion operations can be understood as removing independent noise points in the longitudinal direction, and key elements are reserved, so that the binary image is cleaner, and thus, the longitudinal denoising processing on the binary image can be realized, and the binary image with longitudinal noise removed is obtained.

It should be noted that the ratio of the width of the morphological structuring element to the width of the target picture in the above configuration should not be greater than a preset ratio, which may be 1/30, or may be set according to actual conditions, and is not limited specifically here.

And secondly, determining at least one line boundary based on the obtained binary image.

Generally, the color of a line boundary in a target picture is darker or darker, after a binarization operation is performed on the target picture, the line boundary becomes black, and the pixel value of a corresponding pixel can be represented by 0 (white is 1), therefore, when the line boundary is determined, pixel points in the binarization picture can be traversed according to lines, and a plurality of candidate boundaries are determined, wherein the number of effective pixel points in the line where any candidate boundary is located is not less than a set threshold, the effective pixel points can be understood as pixel points with the pixel value of 0, and the number of effective pixel points in the line where any candidate boundary is located is not less than the set threshold, which can be understood as that, among the pixel points in the line where any candidate boundary is located, the number of pixel points with the pixel value of 0 is not less than the set threshold, and the set threshold can be set according to an actual situation, and is not particularly limited herein. Preferably, the threshold is set to be not less than 90% of the number of pixel points included in one row of pixels.

For example, if the width of a line of pixels in the target picture is 1080 (that is, a line of pixels includes 1080 pixel points), and the threshold is set to be 90% of the number of the line of pixel points, when performing line scanning, if the pixel values of 90% or more of the pixels in the line of pixels are all 0, the line of pixels may be considered as a candidate boundary.

After obtaining the plurality of candidate boundaries, it is also necessary to determine which candidate boundaries belong to the row boundary from the plurality of candidate boundaries, considering that some candidate boundaries may not be true row boundaries. Since the line boundaries in the target picture have a certain height, the line boundaries can be determined from a plurality of candidate boundaries based on the height.

Specifically, for any two adjacent candidate boundaries (hereinafter, may be represented by a first candidate boundary and a second candidate boundary) in the plurality of candidate boundaries, it may be determined whether the distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold, which may be understood as a height threshold between two actually adjacent row boundaries, and may be determined according to actual situations, and preferably, the set height threshold may be 5% of the height of the target picture.

If the distance between the first candidate boundary and the second candidate boundary is not less than the set height threshold, it may be indicated that the distance between the first candidate boundary and the second candidate boundary satisfies the distance condition between the row boundaries, and at this time, the first candidate boundary and the second candidate boundary may be determined as two row boundaries; on the contrary, if the distance between the first candidate boundary and the second candidate boundary is smaller than the set height threshold, it may be stated that the distance between the first candidate boundary and the second candidate boundary does not satisfy the distance condition between the row boundaries, and at this time, it may be considered that the first candidate boundary and the second candidate boundary belong to the same row boundary, and the first candidate boundary and the second candidate boundary are merged.

After the above operations are performed on a plurality of candidate boundaries, at least one line boundary may be finally obtained.

And finally, determining the first-level layout information of the plurality of controls based on the obtained at least one line boundary and the attribute information of the plurality of controls analyzed in the step S104.

Specifically, after obtaining at least one row boundary, a plurality of row regions may be obtained based on the at least one row boundary, where any row region may be composed of two row boundaries that are adjacent in position. Then, based on the attribute information of the multiple controls obtained through analysis in S104, the coordinates of the multiple controls included in the attribute information may be determined, and the multiple controls may be divided into multiple line regions in combination with the region coordinates of the multiple line regions, so as to obtain the first-level layout information of the multiple controls.

The primary layout information for the plurality of controls may include: pixels occupied by any one of the line regions (for example, a certain line region is located between the pixels of the first line and the pixels of the second line in the target picture), which controls are included in any one of the line regions, and attribute information of the controls, and the like.

Optionally, after the primary layout information of the plurality of controls is determined, semantic information corresponding to the primary layout information may be further determined. Specifically, for any one of the plurality of row regions, the following operations may be performed: determining a category of a control included in the row area; semantic information for the row region is determined based on the category of the control included in the row region.

Before determining the semantic information of the row area based on the category of the control, a lookup table may be established in advance, and a correspondence between the category of the control and the semantic information corresponding to the category of the control may be stored in the lookup table, and the correspondence may be defined by a specific service party. In this way, when determining the semantic information of the line region based on the category of the control, the corresponding semantic information can be searched in the lookup table based on the category of the control, and the searched semantic information is used as the semantic information of the line region.

For example, according to the service scenario, a label of "user name input" may be marked on a control with a control category as an edit box and a background prompt text, that is, a correspondence relationship between the control category as the edit box and the background prompt text and the semantic information as "user name input" may be established. In this way, when generating semantic information of a layout, it is possible to determine that the semantic information of a line region including an edit box and a background prompt text is "user name input".

After the layout semantic information is obtained by the method, the layout semantic information can be used as the attribute of the first-level layout information, so that more semantic attributes can be added to the layout.

It should be noted that, in practical applications, some row areas may not have semantic information, for example, if all the controls included in a certain row area are icons/pictures, the row area does not have semantic information, and at this time, it may not be necessary to determine corresponding semantic information.

To facilitate understanding of the overall process of determining the level one layout information for a plurality of controls described above, reference may be made to FIG. 2. Fig. 2 is a flow diagram of generating primary layout information based on a target picture according to an embodiment of the present description, which may include the following steps.

S201: and converting the target picture into a gray picture.

S202: and carrying out binarization operation on the gray level picture to obtain a binarization picture.

S203: a morphological structuring element is constructed.

The ratio of the width of the morphological structuring element to the width of the target picture is not greater than a preset ratio.

S204: and carrying out longitudinal denoising processing on the binary image based on the morphological structural elements to obtain the binary image with longitudinal noise removed.

S205: and traversing the pixel points in the binarized picture according to rows to determine a plurality of candidate boundaries.

And the number of the effective pixel points of the line where any candidate boundary is located is not less than a set threshold.

S206: it is determined whether a distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold.

The first candidate boundary and the second candidate boundary are adjacent candidate boundaries at any two positions in the plurality of candidate boundaries.

If the judgment result is that the distance between the first candidate boundary and the second candidate boundary is not less than the set height threshold, executing S207; otherwise, S208 is performed.

S207: the first candidate boundary and the second candidate boundary are determined as two row boundaries.

S208: the first candidate boundary and the second candidate boundary are merged into one row boundary.

After performing S207 or S208, S209 may be performed.

S209: based on the at least one row boundary, a plurality of row regions is obtained.

Wherein, any line region is composed of two line boundaries with adjacent positions.

S210: and dividing the plurality of controls into the plurality of line regions based on the region coordinates of the plurality of line regions and the coordinates of the plurality of controls included in the attribute information of the plurality of controls, so as to obtain the first-level layout information.

S211: and determining semantic information corresponding to the primary layout information based on the primary layout information and the categories of the controls included in the plurality of row areas.

Specific implementation of each step in S201 to S211 can be referred to specific implementation of the corresponding step in S106, and description thereof is not repeated here.

In this embodiment, after the primary layout information of the plurality of controls is determined by the above-mentioned method, the column-based layout analysis may be performed on the plurality of controls based on the primary layout information, so as to obtain the secondary layout information of the plurality of controls.

In one implementation, when determining the secondary layout information of the multiple controls, for any line region of multiple line regions corresponding to the primary layout information (i.e., multiple line regions obtained based on at least one line boundary in the target picture), the following operations may be performed:

first, a plurality of target controls included in a row region is determined.

In this embodiment, the target control may be understood as a control that may have a sub-control, and considering that the sub-control may exist in the control under the condition that the area occupied by the control is large and the distances between the control and other controls are approximately the same, when determining the plurality of target controls, the plurality of target controls at least need to satisfy the following two conditions: the area of the occupied area is not less than the set area; the longitudinal spacing or the lateral spacing between any two adjacent target controls is the same (or approximately the same). The area of the area occupied by the target control can be the area occupied by the maximum external rectangular frame corresponding to the target control in the target picture, and the set area can be determined according to actual conditions.

Optionally, if the plurality of target controls do not exist in the row area, it may be determined that the controls in the row area do not have corresponding secondary layout information, and at this time, a subsequent step of determining the secondary layout information may not be performed.

And secondly, traversing other controls in the row area according to a preset rule, and determining a plurality of sub-controls corresponding to the target controls.

In this embodiment, one target control may correspond to one sub-control, and the similarity of the position relationship between different target controls and the corresponding sub-controls is not less than the set similarity. When determining the child controls corresponding to the multiple target controls, the specific implementation manner is as follows:

traversing other controls in the row area from near to far in a clockwise direction by taking the target control as a center for any target control; for any two target controls, whether two first controls exist in other controls is judged, wherein, for the same position of the two target controls, the ratio (intersection ratio) of the intersection and the union of the areas occupied by the two first controls is not less than a set ratio, wherein the set ratio can be preferably 0.7 to 1, the intersection ratio of the areas occupied by the two first controls can describe the position similarity between the two first controls and the two target controls, and the higher the intersection ratio (the value range of the intersection ratio is 0 to 1), the more similar the position relationship of the two first controls relative to the two target controls can be described.

If two first controls exist, determining the two first controls as two sub-controls corresponding to the two target controls; if there are no such two first controls, it may be determined that the two target controls do not have child controls.

That is, for each target control, surrounding controls can be traversed clockwise from near to far with the target control as a center, and whether the same control exists at the same position of any two target controls is determined. When judging whether the same controls exist at the same positions of any two target controls, judging whether the intersection ratio between the coordinates of the maximum external rectangular frames corresponding to some two controls in other controls relative to some specified positions (which can be the upper left corner of the target controls, the central positions of the target controls and the like) of the two target controls is not less than a set ratio, if so, considering that the two controls are the same controls existing at the same positions of the two target controls and can be regarded as child controls of the two target controls, otherwise, considering that the two controls are not the same controls existing at the same positions of the two target controls and can not be regarded as child controls of the two target controls.

For ease of understanding, reference may be made to fig. 3. Fig. 3 is a schematic diagram of a target picture according to an embodiment of the present description.

In the target picture shown in fig. 3, the row area located in the middle (other row areas are not shown) includes 4 icon controls, A, B, C and D respectively, and there are 4 text controls, 1, 2, 3 and 4 respectively, below the 4 icon controls.

Based on the row area shown in fig. 3, when determining a plurality of target controls in the row area, since the area of the area occupied by the 4 icon controls is large and the lateral distances between the 4 icon controls are approximately equal, the 4 icon controls can be determined as the plurality of target controls.

Then, when determining the child controls corresponding to the 4 target controls, the 4 icon controls may be used as centers respectively, and the other 4 text controls may be traversed from near to far and in a clockwise manner. Specifically, for the icon control a, when the icon control a is rotated clockwise around the icon control a, it is found that there is a text control 1 in the close 6 o 'clock direction, and for the icon control B, when the icon control B is rotated clockwise around the icon control B, it is also found that there is a text control 2 in the close 6 o' clock direction, and at this time, it can be determined whether the intersection ratio of the occupied areas of the border of the text control 1 and the border of the text control 2 with respect to the same position of the icon control a and the icon control B is not less than the set ratio.

When judging whether the intersection ratio is not less than the set ratio, the icon control A and the text control 1 can be overlapped with the icon control B and the text control 2, and whether the intersection ratio of the areas occupied by the text control 1 and the text control 2 is not less than the set ratio is judged.

Specifically, the center of the icon control a and the center of the icon control B may be used as the same position, and the icon control B is horizontally translated leftward until the center of the icon control a overlaps the center of the icon control B, in the translation process, the text control 2 also translates leftward along with the icon control B, and the relative position relationship between the text control 2 and the icon control B is unchanged. After the icon control A and the text control 1 are overlapped with the icon control B and the text control 2, whether the intersection ratio of the areas occupied by the borders of the text control 1 and the text control 2 is not less than the set ratio is judged.

As can be seen from fig. 3, the intersection ratio of the areas occupied by the respective borders of the text control 1 and the text control 2 is 1, which is not less than the set ratio, and therefore it can be determined that the text control 1 is a sub-control of the icon control a, and the text control 2 is a sub-control of the icon control B.

Based on the same method, it can also be determined that the text control 3 is a child control of the icon control C, and the text control 4 is a child control of the icon control D.

Finally, after determining a plurality of sub-controls corresponding to the plurality of target controls, the plurality of target controls and the plurality of sub-controls can be correspondingly combined to obtain secondary layout information of the plurality of target controls.

Specifically, after obtaining a plurality of sub-controls corresponding to a plurality of target controls, for any target control, the sub-controls corresponding to the target control may be merged with the target control, so as to obtain the secondary layout information of the target control.

In a second implementation manner, when determining the secondary layout information of the multiple controls, for any row area in the multiple row areas corresponding to the primary layout information, the following operations may be performed:

first, a recognition model obtained by training in advance is acquired.

The identification model can be obtained by learning and training different controls and control response regions corresponding to the different controls, where the control response regions can be understood as hot regions or effective response regions of the controls, for example, a control response region of an application icon can be understood as an application corresponding to the application icon that can be opened after the control response region is clicked.

Secondly, the control included in the row area is identified based on the identification model, and a control response area included in the row area is determined.

And finally, obtaining the secondary layout information of at least one control in the row area based on the identified control response area.

It should be noted that, in practical applications, when determining the secondary layout information of the multiple controls in the target picture, the first implementation manner may be adopted, or the second implementation manner may be adopted, and preferably, since the accuracy of the first implementation manner is high, the first implementation manner may be preferentially adopted.

It should be further noted that the layout information of the controls included in the target picture is usually two levels, and if the tree structure represents the layout information of the controls in the target picture, the depth of the tree is 2, where a root node represents the first-level layout information of the controls, and a child node represents the second-level layout information of the corresponding root node controls. Of course, under some special conditions, it may also be that the depth of the tree is greater than 2, that is, the child control of the control also has a corresponding child control, in this case, the three-level layout information corresponding to the child control of the child control may be determined by the determination method of the two-level layout information, and this embodiment only takes the example that the layout information includes two-level layout information as an example for explanation.

S108: and obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In S108, after obtaining the attribute information and the layout information of the multiple controls in the target page, the layout information and the attribute information of the multiple controls may be fused, so as to obtain an analysis result of the target page.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring to fig. 4, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the page resolution device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

The method executed by the page resolution device according to the embodiment shown in fig. 4 in this specification can be applied to a processor, or can be implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware decoding processor, or in a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method shown in fig. 1, and implement the function of the page resolution apparatus in the embodiment shown in fig. 1, which is not described herein again in this specification.

Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present specification also propose a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, are capable of causing the portable electronic device to perform the method of the embodiment shown in fig. 1, and in particular to perform the following:

Fig. 5 is a schematic structural diagram of a page resolution device 50 according to an embodiment of the present disclosure. Referring to fig. 5, in a software implementation, the page resolution device 50 may include: an obtaining unit 51, a control parsing unit 52, a layout generating unit 53 and a determining unit 54, wherein:

the acquiring unit 51 acquires a target picture, wherein the target picture comprises content in a target page to be analyzed;

the control analyzing unit 52 is configured to perform control analysis on the target picture, and determine attribute information of multiple controls included in the target picture, where the attribute information includes coordinates, categories, and semantic information;

a layout generating unit 53 that performs layout generation based on the target picture and determines layout information of the plurality of controls;

the determining unit 54 obtains an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

Optionally, the layout information of the multiple controls includes primary layout information and secondary layout information, the primary layout information represents row-based layout information of the multiple controls, and the secondary layout information represents column-based layout information of the multiple controls.

Optionally, the layout generating unit 53 performs layout generation based on the target picture, and determines the layout information of the plurality of controls, including:

segmenting the target picture based on a preset image morphology algorithm to obtain the primary layout information;

performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain secondary layout information;

and adding the secondary layout information into the primary layout information to obtain the layout information of the plurality of controls.

Optionally, the layout generating unit 53 divides the target picture by using a preset image morphology algorithm to obtain the first-level layout information, including:

preprocessing the target picture to obtain a binarized picture without longitudinal noise;

determining at least one line boundary based on the binarized picture;

determining the primary layout information based on the at least one row boundary and the attribute information of the plurality of controls.

Optionally, the layout generating unit 53 performs preprocessing on the target picture to obtain a binarized picture without longitudinal noise, where the method includes:

converting the target picture into a gray picture;

carrying out binarization operation on the gray level picture to obtain a binarization picture;

constructing a morphological structuring element, wherein the ratio of the width of the morphological structuring element to the width of the target picture is not more than a preset ratio;

and carrying out longitudinal denoising processing on the binary image based on the morphological structural elements to obtain the binary image with longitudinal noise removed.

Optionally, the layout generating unit 53, based on the binarized picture, determines at least one line boundary, including:

traversing the pixel points in the binary image according to rows, and determining a plurality of candidate boundaries, wherein the number of effective pixel points in the row where any one candidate boundary is located is not less than a set threshold;

for any two adjacent first candidate boundaries and second candidate boundaries in the plurality of candidate boundaries, performing the following operations:

determining whether a distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold;

if so, determining the first candidate boundary and the second candidate boundary as two line boundaries;

if not, combining the first candidate boundary and the second candidate boundary into a line boundary.

Optionally, the layout generating unit 53 determines the primary layout information based on the at least one row boundary and the attribute information of the plurality of controls, including:

obtaining a plurality of line areas based on the at least one line boundary, wherein any line area is composed of two line boundaries with adjacent positions;

dividing the plurality of controls into the plurality of line regions based on the region coordinates of the plurality of line regions and the coordinates of the plurality of controls included in the attribute information of the plurality of controls, so as to obtain the first-level layout information.

Optionally, after determining the primary layout information, the layout generating unit 53 performs the following operations on any line region of the plurality of line regions corresponding to the primary layout information:

determining a category of a control included in the row area;

determining semantic information of the row region based on a category of a control included in the row region.

Optionally, the layout generating unit 53, based on the primary layout information, performs column-based layout analysis on the multiple controls to obtain the secondary layout information, including:

executing the following operations on any line region in the plurality of line regions corresponding to the primary layout information:

determining a plurality of target controls included in the row area, wherein the area of the area occupied by the target controls is not smaller than a set area, and the longitudinal distance or the transverse distance between any two adjacent target controls in the target controls is the same;

traversing other controls in the row area according to a preset rule, and determining a plurality of sub-controls corresponding to the target controls;

and correspondingly combining the target controls and the sub-controls to obtain the secondary layout information of the target controls.

Optionally, the layout generating unit 53 traverses other controls in the row area according to a preset rule, and determines a plurality of sub-controls corresponding to the plurality of target controls, including:

traversing other controls in the row area from near to far in a clockwise direction by taking the target control as a center for any target control;

for any two target controls, judging whether two first controls exist in the other controls, wherein the ratio of the intersection and the union of the areas occupied by the two first controls is not less than a set ratio relative to the same position of the two target controls;

and if so, determining the two first controls as two sub-controls corresponding to the two target controls.

acquiring an identification model obtained by pre-training, wherein the identification model is obtained by learning and training different controls and control response areas corresponding to the different controls;

identifying controls included in the row area based on the identification model, and determining a control response area included in the row area;

and obtaining the secondary layout information of at least one control in the row area based on the control response area.

Optionally, the control parsing unit 52 performs control parsing on the target picture, and determines attribute information of multiple controls included in the target picture, including:

performing control detection on the target picture based on a preset detection model to obtain control information of a plurality of controls included in the target picture, wherein the control information includes categories and coordinates of the plurality of controls;

classifying a second control of which the category is an icon/picture in the plurality of controls based on a preset classification model to obtain semantic information of the second control;

and performing OCR recognition on the target picture to obtain semantic information of third controls of other categories in the plurality of controls.

Optionally, the control parsing unit 52 further obtains the coordinates of the third control and the confidence of the recognition result of the third control after performing OCR recognition on the target picture;

and if the confidence coefficient of the recognition result of the third control is not less than a set confidence coefficient threshold value, replacing the coordinate of the third control detected based on the detection model with the coordinate of the third control recognized based on the OCR.

The page parsing apparatus 50 provided in this embodiment of this specification can also execute the method in fig. 1, and implement the functions of the page parsing apparatus 50 in the embodiment shown in fig. 1, which are not described herein again in this embodiment of this specification.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of protection of this document. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present specification shall be included in the scope of protection of this document.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A page resolution method comprises the following steps:

2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of a red light source, a green light source, and a blue light source,

the layout information of the plurality of controls comprises primary layout information and secondary layout information, the primary layout information represents row-based layout information of the plurality of controls, and the secondary layout information represents column-based layout information of the plurality of controls.

3. The method of claim 2, performing layout generation based on the target picture, determining layout information for the plurality of controls, comprising:

4. The method of claim 3, wherein the step of segmenting the target picture by using a preset image morphology algorithm to obtain the primary layout information comprises:

determining at least one line boundary based on the binarized picture;

5. The method as claimed in claim 4, wherein the pre-processing the target picture to obtain the binarized picture without longitudinal noise comprises:

converting the target picture into a gray picture;

6. The method of claim 4, determining at least one row boundary based on the binarized picture, comprising:

7. The method of claim 4, determining the level one layout information based on the at least one row boundary and the property information of the plurality of controls, comprising:

8. The method of claim 3, after determining the level one layout information, the method further comprising:

determining a category of a control included in the row area;

9. The method of claim 3, wherein performing a column-based layout analysis on the plurality of controls based on the primary layout information to obtain the secondary layout information comprises:

10. The method of claim 9, traversing other controls in the row area according to a preset rule, and determining a plurality of child controls corresponding to the plurality of target controls, comprising:

11. The method of claim 3, wherein performing a column-based layout analysis on the plurality of controls based on the primary layout information to obtain the secondary layout information comprises:

12. The method of claim 1, wherein performing control parsing on the target picture and determining attribute information of a plurality of controls included in the target picture comprises:

13. The method of claim 12, the method further comprising:

after OCR recognition is carried out on the target picture, coordinates of the third control and confidence of a recognition result of the third control are obtained;

14. A page resolution apparatus, comprising:

15. An electronic device, comprising:

a processor; and

16. A computer readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform a method of: