CN111460355B

CN111460355B - Page analysis method and device

Info

Publication number: CN111460355B
Application number: CN202010304984.8A
Authority: CN
Inventors: 王若
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2023-06-09
Anticipated expiration: 2040-04-17
Also published as: CN111460355A

Abstract

The embodiment of the specification provides a page parsing method and device, wherein the method comprises the following steps: when a target page to be analyzed is analyzed, a target picture is obtained, wherein the target picture comprises contents in the target page; performing control analysis on the target picture, and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information; generating a layout based on the target picture, and determining layout information of the plurality of controls; based on the attribute information and layout information of the plurality of controls, an analysis result of the target page can be obtained.

Description

Page analysis method and device

Technical Field

The present document relates to the field of computer technologies, and in particular, to a method and apparatus for page resolution.

Background

At present, in various service scenarios, a page needs to be parsed, so as to execute a service operation corresponding to the service scenario according to a parsing result. For example, in a UI automation test scenario, a test page needs to be parsed to determine if a desired control (also referred to as an element, such as text, a button, a pop-up window, etc.) is included in the test page.

In general, the content included in the page is more, the control information of the controls is also more complex, for example, the page can include a plurality of controls of different types, the number of the controls of the same type can be multiple, and the plurality of controls of the same type can be distributed in different areas of the page, so that when the page is analyzed, the analysis difficulty is higher, and the effective analysis result cannot be obtained. In view of this, there is a need to provide an efficient solution that can achieve efficient parsing of pages.

Disclosure of Invention

The embodiment of the specification provides a page analysis method and device, which are used for solving the problem that the page cannot be effectively analyzed when the page is analyzed at present.

In order to solve the above technical problems, the embodiments of the present specification are implemented as follows:

in a first aspect, a method for parsing a page is provided, including:

acquiring a target picture, wherein the target picture comprises contents in a target page to be analyzed;

performing control analysis on the target picture, and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information;

generating a layout based on the target picture, and determining layout information of the plurality of controls;

And obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In a second aspect, a page parsing apparatus is provided, including:

the method comprises the steps that an acquisition unit acquires a target picture, wherein the target picture comprises contents in a target page to be analyzed;

the control analysis unit is used for carrying out control analysis on the target picture and determining attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information;

the layout generation unit is used for generating a layout based on the target picture and determining layout information of the plurality of controls;

and the determining unit is used for obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In a third aspect, an electronic device is presented, the electronic device comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

In a fourth aspect, a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of:

The above at least one technical solution adopted by one or more embodiments of the present disclosure can achieve the following technical effects:

according to the technical scheme provided by one or more embodiments of the present disclosure, when the target page is analyzed, since layout generation is further performed on the plurality of controls on the basis of analyzing the target page to obtain attribute information of the plurality of controls included in the target page, and layout information of the plurality of controls is obtained, wherein the attribute information of the plurality of controls may include coordinates, types and semantic information of the plurality of controls, and therefore, a more effective analysis result for the target page can be obtained by combining the analyzed attribute information of the plurality of controls and the generated layout information.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow diagram of a page resolution method according to one embodiment of the present disclosure;

FIG. 2 is a flow diagram of generating primary layout information based on a target picture according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a target picture according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of the structure of an electronic device according to one embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a page parsing apparatus according to an embodiment of the present disclosure.

Detailed Description

In order that those skilled in the art will better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

FIG. 1 is a flow diagram of a page resolution method according to one embodiment of the present disclosure. The method is as follows.

S102: and obtaining a target picture, wherein the target picture comprises contents in a target page to be analyzed.

In S102, when the target page is parsed, a target picture corresponding to the target page may be acquired, so as to parse the target page based on the target picture. The target page may be a certain page in the APP, a certain page in the browser, or the like, and the target picture may include content in the target page.

In this embodiment, when the target picture is acquired, the target picture may be acquired by capturing a screen of the target page, for example, when the login page of a certain APP in the smart phone is analyzed, the target picture may be acquired by capturing a screen of the target page. Of course, in other implementations, the target picture may be obtained by taking a picture of the target page, which is not illustrated herein.

S104: and analyzing the target picture to determine attribute information of a plurality of controls included in the target picture, wherein the attribute information comprises coordinates, categories and semantic information.

The target page typically includes a plurality of controls, which may be buttons, icons/pictures, text, progress bars, switch switches, more buttons, edit boxes, pop-up windows, check boxes, back buttons, close buttons, etc., and related information of the controls is often needed when parsing the target page. In view of this, in S104, after the target picture is acquired, control parsing may be performed on the target picture to obtain attribute information of a plurality of controls included in the target picture (target page). Wherein the attribute information of the plurality of controls may include coordinates, categories, and semantic information of the plurality of controls.

The coordinates of the control may be understood as the position coordinates of the control in the target picture, the category of the control may be understood as text, buttons, icons/pictures, progress bars and the like, the semantic information of the control may be understood as a character string included in the control, and also may be understood as what control the control is specifically, for example, if the category of the control is text, the semantic information of the control may be a character string included in the text, and further for example, if the control is an icon of the application a, the semantic information of the control may be a name of the application a.

In this embodiment, when the control parsing is performed on the target picture, the method specifically includes the following steps:

firstly, control detection can be performed on a target picture based on a preset detection model, and control information of a plurality of controls included in the target picture is obtained.

The preset detection model may be an existing Yolo 3-tini model, the Yolo 3-tini model is a general model, and may detect a plurality of different types of controls, but it should be noted that, in general, the Yolo 3-tini model may default to detect hundreds or even thousands of different types of controls, and in this embodiment, the types of controls included in the target picture (target page) may not be hundreds, so in order to facilitate the detection of the target picture using the Yolo 3-tini model, a header portion of the Yolo 3-tini model needs to be modified, the header portion may characterize the number of types of controls output by the Yolo 3-tini model, by modifying the header portion, the number of types of controls output by the Yolo 3-tini model may be adjusted, and further, the types of controls that may be detected by the Yolo 3-tini model may be adjusted.

For example, assuming that the Yolo 3-tiniy model defaults to detect 1000 categories of controls, the categories of controls included in the target picture are 11 (these 11 categories may be custom categories), then the header portion of the Yolo 3-tiniy model may be modified so that the Yolo 3-tiniy model may detect controls of these 11 categories.

Of course, in other application scenarios, if the category of the control is divided into finer granularity or coarser granularity, the header part of the Yolo3-tiny model can be correspondingly adjusted, so that the Yolo3-tiny model can flexibly detect different types of controls in different application scenarios.

It should also be noted that the Yolo 3-tiniy model is generally suitable for use in a PC, and if it needs to be used in a client (such as a smart phone), it needs to be converted so that the model can be run in the client. Specifically, the conversion from the PB model to the Tensorflow Lite model may be accomplished using a TOCO model conversion tool provided by Tensorflow, and the adaptation to the operating system Android, iOS, linux of the client and the like is implemented through the c++ API.

It should be appreciated that in other implementations, other generic models (different types of controls may be detected) may be used to parse the target picture, which is not illustrated herein.

When the control is detected on the target picture based on the preset detection model, the target picture can be used as the input of the model, and the output result of the model is control information of a plurality of controls included in the target picture, wherein the control information can comprise the types and coordinates of the controls. Optionally, the output results of the model may further include a confidence level for the detection results of the plurality of controls, which may characterize a degree of confidence for the detection results of the plurality of controls.

And classifying the first control with the category of the icon/picture in the plurality of controls based on a preset classification model to obtain semantic information of the first control.

After the categories and coordinates of a plurality of controls in the target picture are obtained, aiming at the controls with the categories of icons/pictures, in order to expand the dimension of the controls, more business scene information is reflected, the controls with the categories can be subdivided, and semantic information of business scenes capable of reflecting the icons/pictures can be obtained. For example, the target picture includes two icon controls, namely an icon of the application a and an icon of the application B, and then the two controls need to be subdivided to obtain semantic information of one icon which is the name of the application a and semantic information of the other icon which is the name of the application B.

In this embodiment, when classifying the controls (for convenience of distinction, hereinafter may be represented by the second control) of the icon/picture class, the classification may be implemented based on a preset classification model.

The preset classification model can be an existing MobileNet V1 model, the model can be obtained by training icons under different service scenes and background images representing UI anomalies, when the second control is classified based on the model, the second control can be used as input of the model, and the output result of the model is semantic information of the second control.

It should be noted that the MobileNet V1 model is generally suitable for use in a PC, and if the MobileNet V1 model needs to be used in a client (such as a smart phone), the MobileNet V1 model needs to be converted so that the model can be run in the client. Specific conversion methods can be found in the above description of the conversion of the Yolo3-tiny model, and the description thereof will not be repeated here.

And finally, performing OCR (optical character recognition) on the target picture to obtain semantic information of a third control in other categories in the plurality of controls.

Based on the previous step, we obtain the semantic information of the second control of the icon/picture category, and for the controls of other categories (for convenience of distinguishing, the following may be represented by the third control), when the control analysis is performed, the semantic information of the third control of other categories needs to be obtained.

In this embodiment, since the third control of the other category is a control of the non-icon/picture category, semantic information of the third control of the other category can be obtained through OCR technology. Specifically, OCR recognition can be performed on the target picture to obtain OCR recognition results of the third control of other types in the target picture, where the OCR recognition results may include a description of a character string of the third control and a confidence level of the recognition result of the third control, and if the confidence level is not less than a set threshold, the description of the character string obtained by recognition may be used as semantic information of the third control.

Through the three steps, the coordinates, the categories and the semantic information of the plurality of controls included in the target picture can be analyzed and obtained, the coordinates, the categories and the semantic information of the plurality of controls are fused, and the attribute information of the plurality of controls can be obtained.

Optionally, after performing OCR recognition on the target picture, the recognition result may further include coordinates of the third control, and because the accuracy of the coordinates obtained by OCR recognition is higher, the coordinates of the third control obtained by OCR recognition may replace the coordinates of the third control obtained by performing control detection based on the preset detection model.

Specifically, for any one of the third controls, it may be determined whether the confidence level of the recognition result of the OCR on the third control is not less than the set confidence threshold, if so, it may be determined that the recognition result of the OCR is relatively reliable, in which case the coordinates of the third control detected by the control may be replaced with the coordinates of the third control obtained by OCR recognition, and if not, it may be determined that the recognition result of the OCR is unreliable, in which case the above-mentioned coordinates replacement may not be performed. The confidence threshold may be set according to practical situations, and is not specifically limited herein.

S106: and generating a layout based on the target picture, and determining layout information of the plurality of controls.

In S106, based on analyzing the attribute information of the plurality of controls included in the target picture, layout generation may be further performed based on the target picture, so as to obtain layout information of the plurality of controls, so as to obtain an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

The layout information of the plurality of controls can be understood as the layout of the plurality of controls in the target picture (target page), and specifically can include primary layout information and secondary layout information, wherein the primary layout information can represent row-based layout information of the plurality of controls, such as which row region in the target picture the controls are located in, and the secondary layout information can represent column-based layout information of the plurality of controls, such as whether the controls have sub-controls, and the sub-controls can be understood as controls which can be combined with the controls.

In this embodiment, when performing layout generation based on a target picture and determining layout information of a plurality of controls, firstly, a preset image morphology algorithm may be used to perform line segmentation on the target picture to obtain first-level layout information of the plurality of controls; secondly, performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain secondary layout information of the plurality of controls; and finally, adding the secondary layout information of the plurality of controls to the primary layout information to obtain the layout information of the plurality of controls.

Each step of the layout generation will be described in detail below.

When a preset image morphology algorithm is adopted to segment a target image, and primary layout information of a plurality of controls is obtained:

firstly, the target picture can be preprocessed to obtain a binarized picture with longitudinal noise removed.

A binarized picture can be understood as a picture in which the pixel value is either 0 (black) or 1 (white), and since the target picture is typically an RGB picture, it is necessary to convert the target picture into a grayscale picture in order to obtain the binarized picture; and after the gray level picture is obtained, performing binarization operation on the gray level picture, thereby obtaining a binarized picture.

It should be noted that, since the binarized picture generally includes vertical noise, when the target picture is segmented, the noise may interfere with each other, and the accuracy of the segmentation result is affected. Specifically, morphological structural elements can be constructed, and the morphological structural elements are used for sequentially carrying out corrosion and expansion operations on the binary image, wherein the corrosion and expansion operations can be understood as removing independent noise points in the longitudinal direction, and key elements are reserved, so that the binary image is cleaner, and the longitudinal denoising treatment on the binary image can be realized, and the binary image with longitudinal noise removed can be obtained.

It should be noted that the ratio of the width of the morphological structural element of the above configuration to the width of the target picture should not be greater than a preset ratio, which may be 1/30, or may be set according to the actual situation, and is not specifically limited herein.

Next, at least one line boundary is determined based on the obtained binarized picture.

In general, the color of a line boundary in a target picture is darker or darker, after the binarization operation is performed on the target picture, the line boundary may become black, and the pixel value of a corresponding pixel may be represented by 0 (white is 1), so that when determining the line boundary, the pixel points in the binarized picture may be traversed by lines to determine multiple candidate boundaries, where the number of valid pixel points in the line where any candidate boundary is located is not less than a set threshold, where the valid pixel point may be understood as a pixel point where the pixel value is 0, and the number of valid pixel points in the line where any candidate boundary is located is not less than a set threshold, where the number of pixel points where the pixel value is 0 is not less than a set threshold, where the set threshold may be set according to practical situations, and is not specifically limited. Preferably, the set threshold is not less than 90% of the number of pixel points included in a row of pixels.

For example, assuming that the width of a line of pixels in the target picture is 1080 (i.e., a line of pixels includes 1080 pixels), the threshold is set to 90% of the number of pixels in a line, when the line scan is performed, if the pixel values of greater than or equal to 90% of the pixels in a line of pixels are all 0, the line of pixels may be considered as a candidate boundary.

After the multiple candidate boundaries are obtained, it is also necessary to determine which candidate boundaries belong to the line boundaries from among the multiple candidate boundaries, considering that some candidate boundaries may not be true line boundaries. Since the line boundaries in the target picture have a certain height, the line boundaries can be determined from a plurality of candidate boundaries based on this.

Specifically, for a candidate boundary (hereinafter may be represented by a first candidate boundary and a second candidate boundary) adjacent to any two positions among the plurality of candidate boundaries, it may be determined whether or not the distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold, which may be understood as a height threshold between two actually adjacent line boundaries, which may be specifically determined according to the actual situation, and preferably, the set height threshold may be 5% of the height of the target picture.

If the distance between the first candidate boundary and the second candidate boundary is not smaller than the set height threshold, it can be stated that the distance between the first candidate boundary and the second candidate boundary meets the distance condition between the line boundaries, and at this time, the first candidate boundary and the second candidate boundary can be determined as two line boundaries; conversely, if the distance between the first candidate boundary and the second candidate boundary is smaller than the set height threshold, it may be stated that the distance between the first candidate boundary and the second candidate boundary does not satisfy the distance condition between the line boundaries, and at this time, the first candidate boundary and the second candidate boundary may be considered to belong to the same line boundary, and the first candidate boundary and the second candidate boundary may be combined.

After performing the above operations on the plurality of candidate boundaries, at least one row boundary may ultimately be obtained.

And finally, determining the first-level layout information of the plurality of controls based on the obtained at least one row boundary and the attribute information of the plurality of controls obtained by analysis in the step S104.

Specifically, after at least one row boundary is obtained, a plurality of row regions may be obtained based on the at least one row boundary, wherein any row region may be constituted by two row boundaries that are adjacent in position. Then, based on the attribute information of the plurality of controls obtained by parsing in S104, coordinates of the plurality of controls included in the attribute information may be determined, and the plurality of controls may be divided into a plurality of row areas in combination with area coordinates of the plurality of row areas, so as to obtain first-level layout information of the plurality of controls.

The first level layout information of the plurality of controls may include: pixels occupied by any one of the multiple line areas (for example, a line area is located between the lines of pixels in the target picture), and any one line area includes controls and attribute information of the controls.

Optionally, after determining the first-level layout information of the plurality of controls, semantic information corresponding to the first-level layout information may be further determined. Specifically, for any one of the plurality of row areas, the following operations may be performed: determining a category of a control included in the row region; semantic information of the row region is determined based on the category of the control included in the row region.

Before determining the semantic information of the row area based on the category of the control, a lookup table may be pre-established, in which a correspondence between the category of the control and the semantic information corresponding to the category of the control may be stored, which may be defined by a specific business party. In this way, when the semantic information of the row area is determined based on the category of the control, the corresponding semantic information can be searched in the lookup table based on the category of the control, and the searched semantic information is used as the semantic information of the row area.

For example, according to a service scene, a control with a control category of an edit box and a background prompt text is labeled with a user name input, namely, a corresponding relation between the control category of the edit box and the background prompt text and the semantic information of the user name input is established. Thus, when semantic information of a layout is generated, the semantic information of a line area including an edit box and a background prompt text can be determined as "user name input".

After the semantic information of the layout is obtained by the method, the semantic information of the layout can be used as the attribute of the first-level layout information, so that more semantic attributes can be added to the layout.

It should be noted that, in practical application, some row areas may not have semantic information, for example, if the controls included in a certain row area are icons/pictures, the row area has no semantic information, and at this time, it may not be necessary to determine the corresponding semantic information.

To facilitate an understanding of the overall process of determining the primary layout information for a plurality of controls described above, reference may be made to FIG. 2. Fig. 2 is a schematic flow chart of generating first-level layout information based on a target picture according to an embodiment of the present specification, which may include the following steps.

S201: and converting the target picture into a gray picture.

S202: and carrying out binarization operation on the gray level picture to obtain a binarized picture.

S203: constructing morphological structural elements.

The ratio of the width of the morphological structuring element to the width of the target picture is not greater than a preset ratio.

S204: and carrying out longitudinal denoising treatment on the binarized picture based on the morphological structural element to obtain the binarized picture with longitudinal noise removed.

S205: and traversing pixel points in the binarized picture according to rows to determine a plurality of candidate boundaries.

The number of the effective pixel points of the row where any candidate boundary is located is not smaller than a set threshold value.

S206: it is determined whether a distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold.

The first candidate boundary and the second candidate boundary are any two adjacent candidate boundaries in the plurality of candidate boundaries.

If the determination result is that the distance between the first candidate boundary and the second candidate boundary is not less than the set height threshold, S207 is executed; otherwise, S208 is performed.

S207: the first candidate boundary and the second candidate boundary are determined as two row boundaries.

S208: the first candidate boundary and the second candidate boundary are merged into one line boundary.

After S207 or S208 is performed, S209 may be performed.

S209: a plurality of row regions is obtained based on the at least one row boundary.

Wherein, any row area is formed by two adjacent row boundaries.

S210: dividing the plurality of controls into a plurality of row areas based on the area coordinates of the plurality of row areas and the coordinates of the plurality of controls included in the attribute information of the plurality of controls, and obtaining first-level layout information.

S211: and determining semantic information corresponding to the primary layout information based on the primary layout information and the categories of the controls included in the plurality of row areas.

The specific implementation of each step in S201 to S211 may be referred to the specific implementation of the corresponding step in S106, and the description will not be repeated here.

In this embodiment, after determining the first-level layout information of the plurality of controls by the method described above, column-based layout analysis may be performed on the plurality of controls based on the first-level layout information, to obtain the second-level layout information of the plurality of controls.

In one implementation, when determining the secondary layout information of the plurality of controls, for any one of a plurality of row areas (i.e., a plurality of row areas obtained based on at least one row boundary in the target picture) corresponding to the primary layout information, the following operations may be performed:

First, a plurality of target controls included in a row area are determined.

In this embodiment, the target control may be understood as a control in which a child control may exist, and considering that the child control may exist in a case where the area occupied by the control is large and the distance between the child control and other child controls is approximately the same, when determining multiple target controls, the multiple target controls at least need to satisfy the following two conditions: the area of the occupied area is not smaller than the set area; the longitudinal or lateral spacing between any two adjacently positioned target controls is the same (or approximately the same). The area of the area occupied by the target control can be the area occupied by the maximum circumscribed rectangle frame corresponding to the target control in the target picture, and the set area can be determined according to actual conditions.

Alternatively, if the plurality of target controls do not exist in the row area, it may be determined that the controls in the row area do not have corresponding secondary layout information, and at this time, a subsequent step of determining the secondary layout information may not be performed.

Secondly, traversing other controls in the row area according to a preset rule, and determining a plurality of sub-controls corresponding to the plurality of target controls.

In this embodiment, one target control may correspond to one sub control, and the similarity of the positional relationships between different target controls and the corresponding sub controls is not less than the set similarity. When determining sub-controls corresponding to a plurality of target controls, the specific implementation mode is as follows:

Aiming at any target control, traversing other controls in the row area from near to far in a clockwise direction by taking the target control as a center; for any two target controls, judging whether two first controls exist in other controls, wherein the ratio of the intersection and the union of the areas occupied by the two first controls (the intersection ratio) is not smaller than a set ratio relative to the same position of the two target controls, wherein the set ratio can be preferably 0.7 to 1, the intersection ratio of the areas occupied by the two first controls can describe the position similarity degree between the two first controls and the two target controls, and the higher the intersection ratio (the value range of the intersection ratio is 0 to 1), the more similar the position relationship between the two first controls relative to the two target controls can be described.

If the two first controls exist, determining the two first controls as two sub-controls corresponding to the two target controls; if there are no such two first controls, it may be determined that the two target controls have no child controls.

That is, for each target control, the surrounding controls can be traversed from the near to the far and clockwise by taking the target control as a center, and whether the same control exists at the same position of any two target controls is judged. When judging whether the same control exists at the same position of any two target controls, judging whether the intersection ratio between the coordinates of the largest circumscribed rectangular frame corresponding to some two controls in other controls relative to some appointed position (which can be the upper left corner of the target controls or the center position of the target controls) of the two target controls is not smaller than the set ratio, if so, the two controls can be considered to be the same control existing at the same position of the two target controls, and can be considered to be child controls of the two target controls, otherwise, the two controls can not be considered to be the same control existing at the same position of the two target controls, and cannot be considered to be child controls of the two target controls.

For ease of understanding, reference may be made to fig. 3. Fig. 3 is a schematic diagram of a target picture according to an embodiment of the present specification.

In the target picture shown in fig. 3, 4 icon controls are included in the middle row area (other row areas are not shown), A, B, C and D are respectively included, and 4 text controls are respectively included below the 4 icon controls, 1, 2, 3 and 4.

Based on the row area shown in fig. 3, when determining a plurality of target controls in the row area, since the areas occupied by the 4 icon controls are larger and the lateral distances between the 4 icon controls are approximately equal, the 4 icon controls can be determined as a plurality of target controls.

And then, when determining the child controls corresponding to the 4 target controls, traversing other 4 text controls from the near to the far and in a clockwise mode by taking the 4 icon controls as centers respectively. Specifically, for the icon control a, when the icon control a is rotated clockwise about the center, one text control 1 is found in the 6 o 'clock direction at a short distance, and for the icon control B, when the icon control B is rotated clockwise about the center, one text control 2 is also found in the 6 o' clock direction at a short distance, at this time, it can be determined whether the intersection ratio of the border of the text control 1 and the border of the text control 2 with respect to the same positions of the icon control a and the icon control B is not less than a set ratio.

When judging whether the intersection ratio is not smaller than the set ratio, the icon control A and the text control 1 can be overlapped with the icon control B and the text control 2, and whether the intersection ratio of the areas occupied by the text control 1 and the text control 2 is not smaller than the set ratio is judged.

Specifically, the center of the icon control a and the center of the icon control B can be used as the same position, the icon control B is horizontally translated leftwards until the center of the icon control a and the center of the icon control B overlap, the text control 2 also translates leftwards along with the icon control B in the translation process, and the relative position relationship between the text control 2 and the icon control B is unchanged. After the icon control A and the text control 1 are overlapped with the icon control B and the text control 2, judging whether the intersection ratio of the areas occupied by the frames of the text control 1 and the text control 2 is not smaller than a set ratio.

As can be seen from fig. 3, the intersection ratio of the areas occupied by the frames of the text control 1 and the text control 2 is 1 and is not smaller than the set ratio, so that the text control 1 can be determined to be a child control of the icon control a, and the text control 2 can be determined to be a child control of the icon control B.

Based on the same method, it can also be determined that the text control 3 is a child control of the icon control C, and the text control 4 is a child control of the icon control D.

And finally, after determining a plurality of sub-controls corresponding to the plurality of target controls, the plurality of target controls and the plurality of sub-controls can be correspondingly combined to obtain the secondary layout information of the plurality of target controls.

Specifically, after obtaining a plurality of sub-controls corresponding to a plurality of target controls, aiming at any target control, the sub-control corresponding to the target control can be combined with the target control, so as to obtain the secondary layout information of the target control.

In a second implementation manner, when determining the secondary layout information of the plurality of controls, for any one of the plurality of row areas corresponding to the primary layout information, the following operations may be performed:

first, a recognition model obtained by training in advance is acquired.

The recognition model can be obtained by learning and training different controls and control response areas corresponding to the different controls, wherein the control response areas can be understood as hot areas or effective response areas of the controls, for example, the control response area of a certain application icon can be understood as an application corresponding to the application icon can be opened after clicking the control response area.

Secondly, the control included in the row area is identified based on the identification model, and a control response area included in the row area is determined.

And finally, obtaining the secondary layout information of at least one control in the row area based on the identified control response area.

In practical application, the first implementation manner may be adopted or the second implementation manner may be adopted when determining the second-level layout information of the plurality of controls in the target picture, and preferably, the first implementation manner may be adopted preferentially because the accuracy of the first implementation manner is higher.

It should be further noted that, the layout information of the control included in the target picture is usually two-level, and if the tree structure represents the layout information of the control in the target picture, the depth of the tree is 2, where the root node is a control, represents the first-level layout information of the control, the child node is a child control, and represents the second-level layout information of the corresponding root node control. Of course, in some special cases, the depth of the tree may be greater than 2, that is, the child control of the control also has a corresponding child control, in this case, the three-level layout information corresponding to the child control of the child control may be determined by the above-mentioned method for determining the two-level layout information, and this embodiment only uses the layout information including the two-level layout information as an example for illustration.

S108: and obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls.

In S108, after obtaining the attribute information and the layout information of the plurality of controls in the target page, the layout information and the attribute information of the plurality of controls may be fused, so as to obtain an analysis result of the target page.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Fig. 4 is a schematic structural view of an electronic device according to an embodiment of the present specification. Referring to fig. 4, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the page analyzing device on the logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:

The method performed by the page parsing apparatus disclosed in the embodiment shown in fig. 4 of the present specification may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of this specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may further execute the method of fig. 1 and implement the functions of the page parsing device in the embodiment shown in fig. 1, which is not described herein.

Of course, in addition to the software implementation, the electronic device of the embodiments of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

The present description also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to perform the operations of:

Fig. 5 is a schematic diagram of the structure of the page analyzing apparatus 50 according to an embodiment of the present specification. Referring to fig. 5, in a software implementation, the page parsing device 50 may include: an acquisition unit 51, a control parsing unit 52, a layout generating unit 53, and a determining unit 54, wherein:

an obtaining unit 51, configured to obtain a target picture, where the target picture includes content in a target page to be parsed;

the control parsing unit 52 parses the target picture to determine attribute information of a plurality of controls included in the target picture, where the attribute information includes coordinates, category and semantic information;

a layout generation unit 53 that performs layout generation based on the target picture, and determines layout information of the plurality of controls;

and a determining unit 54, configured to obtain a parsing result of the target page based on the attribute information and the layout information of the plurality of controls.

Optionally, the layout information of the plurality of controls includes primary layout information and secondary layout information, the primary layout information characterizing row-based layout information of the plurality of controls, and the secondary layout information characterizing column-based layout information of the plurality of controls.

Optionally, the layout generating unit 53 performs layout generation based on the target picture, and determines layout information of the plurality of controls, including:

dividing the target image into lines based on a preset image morphology algorithm to obtain the first-level layout information;

performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain the secondary layout information;

and adding the secondary layout information to the primary layout information to obtain the layout information of the plurality of controls.

Optionally, the layout generating unit 53 performs line segmentation on the target image by using a preset image morphology algorithm to obtain the first-level layout information, including:

preprocessing the target picture to obtain a binarized picture with longitudinal noise removed;

determining at least one row boundary based on the binarized picture;

the first level layout information is determined based on the at least one row boundary and attribute information of the plurality of controls.

Optionally, the layout generating unit 53 performs preprocessing on the target picture to obtain a binarized picture with longitudinal noise removed, including:

Converting the target picture into a gray picture;

performing binarization operation on the gray level picture to obtain a binarized picture;

constructing morphological structural elements, wherein the ratio of the width of the morphological structural elements to the width of the target picture is not larger than a preset ratio;

and carrying out longitudinal denoising treatment on the binarized picture based on the morphological structural element to obtain the binarized picture with longitudinal noise removed.

Optionally, the layout generating unit 53 determines at least one row boundary based on the binarized picture, including:

traversing the pixel points in the binarized picture according to the rows, determining a plurality of candidate boundaries, wherein the number of effective pixel points of the row where any candidate boundary is located is not smaller than a set threshold value;

for a first candidate boundary and a second candidate boundary adjacent to any two positions in the plurality of candidate boundaries, performing the following operations:

determining whether a distance between the first candidate boundary and the second candidate boundary is not less than a set height threshold;

if yes, determining the first candidate boundary and the second candidate boundary as two line boundaries;

if not, merging the first candidate boundary and the second candidate boundary into a line boundary.

Optionally, the layout generating unit 53 determines the first-level layout information based on the at least one row boundary and attribute information of the plurality of controls, including:

based on the at least one line boundary, a plurality of line areas are obtained, and any line area is formed by two line boundaries adjacent in position;

dividing the plurality of controls into the plurality of row areas based on the area coordinates of the plurality of row areas and the coordinates of the plurality of controls included in the attribute information of the plurality of controls, and obtaining the first-level layout information.

Optionally, after determining the primary layout information, the layout generating unit 53 performs the following operation on any one of a plurality of row areas corresponding to the primary layout information:

determining a category of controls included in the row region;

semantic information of the row region is determined based on the category of the control included in the row region.

Optionally, the layout generating unit 53 performs column-based layout analysis on the plurality of controls based on the primary layout information, to obtain the secondary layout information, including:

the following operations are executed for any one of a plurality of row areas corresponding to the primary layout information:

Determining a plurality of target controls included in the row region, wherein the area of the region occupied by the plurality of target controls is not smaller than a set area, and the longitudinal spacing or the transverse spacing between any two adjacent target controls in the plurality of target controls is the same;

traversing other controls in the row area according to a preset rule, and determining a plurality of sub-controls corresponding to the plurality of target controls;

and correspondingly combining the plurality of target controls and the plurality of sub-controls to obtain the secondary layout information of the plurality of target controls.

Optionally, the layout generating unit 53 traverses other controls in the row area according to a preset rule, and determines a plurality of sub-controls corresponding to the plurality of target controls, including:

aiming at any target control, traversing other controls in the row area from near to far in a clockwise direction by taking the target control as a center;

judging whether two first controls exist in the other controls or not according to any two target controls, wherein the ratio of the intersection and the union of the areas occupied by the two first controls is not smaller than a set ratio relative to the same position of the two target controls;

and if so, determining the two first controls as two sub-controls corresponding to the two target controls.

acquiring an identification model obtained by training in advance, wherein the identification model is obtained by learning and training different controls and control response areas corresponding to the different controls;

identifying controls included in the row area based on the identification model, and determining a control response area included in the row area;

and obtaining the secondary layout information of at least one control in the row area based on the control response area.

Optionally, the control parsing unit 52 parses the target picture to determine attribute information of a plurality of controls included in the target picture, including:

performing control detection on the target picture based on a preset detection model to obtain control information of a plurality of controls included in the target picture, wherein the control information comprises categories and coordinates of the plurality of controls;

classifying a second control with the category of an icon/picture in the plurality of controls based on a preset classification model to obtain semantic information of the second control;

And performing OCR (optical character recognition) on the target picture to obtain semantic information of a third control in other categories in the plurality of controls.

Optionally, the control parsing unit 52 further obtains coordinates of the third control and confidence level of the recognition result of the third control after performing OCR recognition on the target picture;

and if the confidence coefficient of the recognition result of the third control is not smaller than a set confidence coefficient threshold value, replacing the coordinate of the third control detected based on the detection model with the coordinate of the third control recognized based on the OCR.

The page parsing apparatus 50 provided in the embodiment of the present disclosure may further execute the method of fig. 1, and implement the functions of the embodiment of the page parsing apparatus 50 shown in fig. 1, which is not described herein again.

In summary, the foregoing description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of one or more embodiments of the present disclosure should be included in the protection scope of this document.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims

1. A method of page resolution, comprising:

based on the attribute information and layout information of the plurality of controls, obtaining an analysis result of the target page;

the layout information of the plurality of controls comprises primary layout information and secondary layout information, the primary layout information characterizes row-based layout information of the plurality of controls, and the primary layout information at least comprises: pixels occupied by any one of the plurality of row areas and attribute information of controls and controls included in any one of the plurality of row areas, wherein the secondary layout information characterizes layout information of the plurality of controls based on columns, the secondary layout information at least comprises whether sub-controls exist in target controls in any one of the plurality of row areas, and the sub-controls can be combined with the corresponding target controls.

2. The method of claim 1, wherein determining layout information for the plurality of controls based on layout generation of the target picture comprises:

3. The method of claim 2, wherein the performing line segmentation on the target image by using a preset image morphology algorithm to obtain the first-level layout information includes:

determining at least one row boundary based on the binarized picture;

4. The method of claim 3, wherein preprocessing the target picture to obtain a binarized picture with longitudinal noise removed, comprises:

converting the target picture into a gray picture;

5. A method as claimed in claim 3, determining at least one row boundary based on the binarized picture, comprising:

6. The method of claim 3, determining the primary layout information based on the at least one row boundary and attribute information of the plurality of controls, comprising:

7. The method of claim 2, after determining the primary layout information, the method further comprising:

determining a category of controls included in the row region;

8. The method of claim 2, performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain the secondary layout information, comprising:

9. The method of claim 8, traversing other controls in the row area according to a preset rule, determining a plurality of sub-controls corresponding to the plurality of target controls, comprising:

10. The method of claim 2, performing column-based layout analysis on the plurality of controls based on the primary layout information to obtain the secondary layout information, comprising:

11. The method of claim 1, performing control parsing on the target picture, determining attribute information of a plurality of controls included in the target picture, comprising:

12. The method of claim 11, the method further comprising:

after OCR recognition is carried out on the target picture, coordinates of the third control and confidence coefficient of a recognition result of the third control are obtained;

13. A page parsing apparatus, comprising:

the determining unit is used for obtaining an analysis result of the target page based on the attribute information and the layout information of the plurality of controls;

14. An electronic device, comprising:

a processor; and

15. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of: