CN111611503A - Page processing method and device, electronic equipment and storage medium - Google Patents

Page processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111611503A
CN111611503A CN202010462451.2A CN202010462451A CN111611503A CN 111611503 A CN111611503 A CN 111611503A CN 202010462451 A CN202010462451 A CN 202010462451A CN 111611503 A CN111611503 A CN 111611503A
Authority
CN
China
Prior art keywords
page
skeleton
candidate
region
skeleton region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010462451.2A
Other languages
Chinese (zh)
Other versions
CN111611503B (en
Inventor
王亚楠
尹飞
葛鹏
薛大伟
刘兰英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010462451.2A priority Critical patent/CN111611503B/en
Publication of CN111611503A publication Critical patent/CN111611503A/en
Application granted granted Critical
Publication of CN111611503B publication Critical patent/CN111611503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The application discloses a page processing method and device, and relates to the technical field of page processing. The specific implementation mode comprises the following steps: acquiring a plurality of candidate pages and determining display interfaces of the candidate pages; extracting a skeleton structure diagram of the display interface, and identifying a skeleton region in the skeleton structure diagram; clustering the skeleton region to generate at least one cluster set; and selecting at least one skeleton region from each cluster set of the at least one cluster set, and taking the candidate page corresponding to the at least one skeleton region as a target page. According to the method and the device, the framework of the page can be identified, the framework regions are clustered, and the interference of the detailed content of the page on the selected target page is avoided. In addition, the method and the device can reduce the dimension of the screening page to the screening skeleton area, so that the target page is selected for each type of skeleton area, the selected target page can comprehensively cover various page features, and the recall rate of selecting different page types is improved.

Description

Page processing method and device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to the technical field of webpage processing, and particularly relates to a page processing method and device, electronic equipment and a storage medium.
Background
With the development of internet technology, more and more users use the internet to work and leisure. The internet-related visual content is often presented to the user in the form of pages, such as landing pages.
In floor page related project testing, page display problems caused by data flow, customer operation, rendering and the like exist, however, the business platform floor page is in the order of millions. Under such a huge magnitude order, how to ensure the total quality of the pages within a cost controllable range becomes a problem to be solved by a landing page test of a commercial platform.
Disclosure of Invention
A page processing method, a page processing device, an electronic device and a storage medium are provided.
According to a first aspect, a method for processing a page is provided, which includes: acquiring a plurality of candidate pages and determining display interfaces of the candidate pages; extracting a skeleton structure diagram of a display interface, and identifying a skeleton region in the skeleton structure diagram; clustering the skeleton region to generate at least one cluster set; selecting at least one skeleton region from each cluster set of at least one cluster set, and taking a candidate page corresponding to the at least one skeleton region as a target page.
According to a second aspect, there is provided a page processing apparatus, comprising: the display device comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is configured to acquire a plurality of candidate pages and determine display interfaces of the candidate pages; the extraction unit is used for extracting a framework structure diagram of the display interface and identifying a framework region in the framework structure diagram; the clustering unit is configured to cluster the skeleton regions to generate at least one clustering set; the selecting unit is configured to select at least one skeleton region from each cluster set of the at least one cluster set, and take a candidate page corresponding to the at least one skeleton region as a target page.
According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method of any embodiment of a method of processing, such as a page.
According to a fourth aspect, there is provided a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method of any of the embodiments as a method of processing a page.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;
FIG. 2a is a flow diagram of one embodiment of a method for processing a page according to the present application;
FIG. 2b is a schematic illustration of a skeleton region of a method of processing a page according to the present application;
FIG. 3 is a schematic diagram of an application scenario of a method of processing a page according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method of processing a page according to the present application;
FIG. 5 is a schematic block diagram of one embodiment of a device for processing pages in accordance with the present application;
fig. 6 is a block diagram of an electronic device for implementing a page processing method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 of an embodiment of a processing method of a page or a processing apparatus of a page to which the present application can be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.
Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and perform other processing on the received data of the multiple candidate pages and the like, and feed back a processing result (for example, a target page) to the terminal device.
It should be noted that the processing method of the page provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the processing device of the page may be disposed in the server 105 or the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2a, a flow 200 of one embodiment of a method of processing a page according to the present application is shown. The page processing method comprises the following steps:
step 201, obtaining a plurality of candidate pages, and determining display interfaces of the plurality of candidate pages.
In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the page processing method operates may obtain a plurality of candidate pages, and determine a display interface of each of the plurality of candidate pages.
In practice, the executing agent may obtain a plurality of candidate pages in various ways, for example, the executing agent may directly obtain a plurality of pre-stored candidate pages from the electronic device or other electronic devices. In addition, the execution body can select a plurality of candidate pages from the specified candidate page set in real time.
The execution subject may determine the display interfaces of the candidate pages in various ways. For example, the execution subject may directly obtain a display interface of a plurality of pre-stored candidate pages. In addition, screenshot can be performed on the whole visual candidate page, and the screenshot result can be used as a display interface.
Step 202, extracting a skeleton structure diagram of the display interface, and identifying a skeleton region in the skeleton structure diagram.
In this embodiment, the execution subject may extract a skeleton structure diagram of the display interface. The execution subject may also recognize a skeleton region in the skeleton structure diagram. The skeleton structure diagram refers to a structure diagram formed in an area where contents such as characters and pictures in a page are located. That is, the skeleton structure diagram can indicate which regions in the page are formed by characters, which regions are formed by pictures, and the arrangement and combination of the regions.
Different areas in the page can display different contents, and correspondingly, different skeleton areas corresponding to the areas respectively exist in the skeleton structure diagram of the page. In practice, each skeleton region is an equivalent class in the skeleton structure diagram. The execution subject may identify the skeleton region in various manners, for example, the execution subject may divide the skeleton structure diagram into at least one equivalence class, and use the at least one equivalence class as the identification result of the skeleton region. For another example, the execution body may identify each skeleton region by using adjacent pixels with similar gray levels (for example, a difference between gray levels is smaller than a preset threshold) as pixels of the same skeleton region based on gray levels of pixels in the skeleton structure diagram. In addition, the execution body can perform edge detection on the skeleton structure diagram to obtain different skeleton areas surrounded by different edges.
As shown in fig. 2b, the left diagram in the figure shows the display interface of one candidate page. The right drawing is the upper and lower skeleton areas in the skeleton structure diagram of the display interface. A text element and an image element are included in the skeleton region.
And step 203, clustering the skeleton region to generate at least one cluster set.
In this embodiment, the executing entity may perform clustering on each skeleton region, and a clustering result is at least one cluster set. Specifically, the executing entity may perform clustering in various ways, such as an image clustering algorithm, a K-means (K-means) clustering algorithm, and the like. In practice, the execution subject may also perform clustering as follows: and coding the skeleton structure diagrams of the candidate pages, acquiring coding features corresponding to the skeleton structure diagrams of the candidate pages, clustering the coding features corresponding to the skeleton structure diagrams of the candidate pages, and generating a plurality of first-level clustering sets. In addition, the execution main body may further extract a plurality of elements from each candidate page in the first-level cluster set, and obtain element features of the plurality of elements in each candidate page. Then, the execution subject may perform second-level clustering on the pages in the first-level clustering set according to the element features of the plurality of elements in each page, and generate the plurality of clustering sets. The elements comprise characters and images in the skeleton structure diagram, namely two element types of character elements and image elements.
Step 204, selecting at least one skeleton region from each cluster set of the at least one cluster set, and taking a candidate page corresponding to the at least one skeleton region as a target page.
In this embodiment, the execution subject may select at least one skeleton region from each of the generated multiple cluster sets, and use a candidate page corresponding to the selected skeleton region as a target page. The target page may be used to perform page tests to verify the quality of the candidate page. Specifically, the candidate page corresponding to the skeleton region may refer to that the display interface where the skeleton region is located is a display interface determined for the candidate page.
In practice, the execution body may select the at least one skeleton region in various ways. For example, the execution subject may randomly select a skeleton region in each cluster set.
According to the method provided by the embodiment of the application, the interference of the detailed content of the page on the selected target page can be avoided by identifying the skeleton of the page and clustering the skeleton area. In addition, the dimension of the screening page can be reduced to the screening skeleton area, so that the target page is selected for each type of skeleton area, the selected target page can comprehensively cover various page features, and the recall rate of selecting different page types is improved.
In some optional implementations of this embodiment, the selecting at least one skeleton region from each cluster set of the at least one cluster set in step 204 may include: for each cluster set of the at least one cluster set, in response to determining that a first candidate page of the cluster set includes a skeleton region of a second candidate page based on features of the skeleton region, selecting the skeleton region of the first candidate page as a skeleton region in the at least one skeleton region.
In these optional implementation manners, in the process of selecting a skeleton region from any one of the cluster sets, if it is determined that one of the candidate pages of the cluster set includes a skeleton region of another candidate page, that is, the skeleton region of the one candidate page is equal to or more than the skeleton region of the another candidate page, the skeleton region of the one candidate page may be selected as at least a part of the skeleton region in the selected at least one skeleton region. In practice, the execution subject may take the one candidate page as a first candidate page and the other candidate page as a second candidate page.
The number of skeleton regions in one of the candidate pages may be equal to or greater than the number of skeleton regions in the other candidate page. For example, the skeleton region of one candidate page of a cluster set includes skeleton regions No. 1, No. 2, and No. 3, and the skeleton region of another candidate page includes skeleton regions No. 1 and No. 3.
In practice, the execution subject may determine the skeleton region included in the candidate page based on the feature of the skeleton region. For example, the characteristics of the skeleton region may include the size and element type of the skeleton region. If the size and the element type of one skeleton region of one candidate page are consistent with those of one skeleton region of another candidate page, the two skeleton regions can be considered as the same skeleton region, that is, the two candidate pages both include the skeleton region.
In the implementation manners, the skeleton region of one candidate page can be selected under the condition that the candidate page completely comprises the skeleton region of the other candidate page, so that under the condition that the number of the selected pages is limited, as many skeleton regions as possible are selected, and the richness of the skeleton regions in the finally determined target page is improved.
In some optional implementations of this embodiment, the selecting at least one skeleton region from each of the multiple cluster sets in step 204 may include: for each cluster set of the multiple cluster sets, in response to determining that the skeleton regions in at least two candidate pages in the cluster set are the same and the arrangement sequence of the skeleton regions is different based on the characteristics of the skeleton regions, selecting the skeleton regions of at least two candidate pages as the skeleton regions in at least one skeleton region.
In these optional implementation manners, if skeleton regions in at least two candidate pages in the cluster set are the same and the arrangement order of the skeleton regions is different, the skeleton regions of the at least two candidate pages are selected as at least part of the skeleton regions of the selected skeleton region of the at least one candidate page. Skeletal region identity may refer to the identity of skeletal regions.
Specifically, the arrangement order of the skeleton regions may refer to a default arrangement order of the skeleton regions in the skeleton structure diagram, for example, the arrangement sequence numbers of the skeleton regions increase from top to bottom and from left to right.
For example, the three candidate pages include skeleton regions in the order of No. 1, No. 2, No. 3, No. 1, and No. 1, No. 3, No. 2, respectively. The execution subject may select each of the three candidate pages as a skeleton region in at least one skeleton region.
The implementation modes can select the skeleton regions with different arrangement sequences, so that the characteristics of the pages with different arrangement sequences of the skeleton regions can be acquired, and the page test can be performed more comprehensively.
In some optional application scenarios of the foregoing implementations, the features of the skeleton region may be obtained by: for a skeleton region, extracting elements of the skeleton region; determining element characteristics of elements in the skeleton region as the characteristics of the skeleton region, wherein the element characteristics comprise at least one of the following: element type, element size, element number, and element order.
In these optional application scenarios, the execution main body may acquire features of elements in the skeleton region as features of the skeleton region, so as to refine the features of the skeleton region to acquire more comprehensive and accurate skeleton region features.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the processing method of the page according to the present embodiment. In the application scenario of fig. 3, the execution subject 301 obtains a plurality of candidate pages 302, for example, 5 ten thousand candidate pages, and determines the display interfaces 303 of the plurality of candidate pages 302. The execution subject 301 extracts the skeleton structure diagram 304 of the display interface 303, and identifies a skeleton region 305 in the skeleton structure diagram 304. The execution subject 301 clusters the skeleton region 305, generating at least one cluster set 306. The execution main body 301 selects at least one skeleton region from each cluster set of the at least one cluster set 306, and takes a candidate page corresponding to the at least one skeleton region as a target page 307, where the number of the target pages may be 500.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method of processing a page is shown. The process 400 includes the following steps:
step 401, obtaining a candidate page set, and determining an attribute state of a page attribute of a candidate page in the candidate page set.
In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the page processing method operates may obtain a candidate page set composed of candidate pages, and determine an attribute state of a page attribute of a candidate page in the candidate page set.
In practice, the page attributes may include at least one of: account attributes, page type, and page characteristics. Specifically, an account may refer to an account that is logged in on a page, i.e., a page that is triggered by a user who is logged into the account. The account attributes may include at least one of: consumption level, account activity and account grade corresponding to the account. The page type may include a home page, a detail page, a list page, and the like. The page characteristics may include: page quality scores, page popularity (search and/or click popularity), page update frequency, etc.
An attribute status may refer to an attribute value, such as an attribute value may be a specific amount consumed in a detail page in a page type or an account attribute. In addition, in the case where the attribute value is a numerical value, the attribute state may also correspond to at least two numerical ranges into which the attribute value is divided. For example, the consumption is higher than 1 ten thousand yuan, the attribute state is high consumption, the consumption is 1 thousand yuan to 1 ten thousand yuan, the attribute state is medium consumption, the consumption is less than 1 thousand yuan, and the attribute state is low consumption.
Step 402, for each attribute state of at least two determined attribute states of the page attribute, selecting a candidate page corresponding to the attribute state from the candidate page set.
In this embodiment, the execution body may determine, for each page attribute of the at least one page attribute, a candidate page corresponding to each of at least two determined attribute states of the page attribute in the candidate page set. Specifically, the candidate page corresponding to the attribute state may indicate that the page attribute of the candidate page is the attribute state.
Step 403, using the selected candidate page as a plurality of candidate pages, and determining display interfaces of the plurality of candidate pages.
In this embodiment, the execution main body may use all the selected candidate pages as the plurality of candidate pages, and determine the display interfaces of the candidate pages.
And step 404, extracting a skeleton structure diagram of the display interface, and identifying a skeleton region in the skeleton structure diagram.
In this embodiment, the execution subject may extract a skeleton structure diagram of the display interface. The execution subject may also recognize a skeleton region in the skeleton structure diagram. The skeleton structure diagram refers to a structure diagram formed in an area where contents such as characters and pictures in a page are located.
Step 405, clustering the skeleton region to generate at least one cluster set.
In this embodiment, the execution subject may perform clustering on each skeleton region, and the clustering result is a plurality of clustering sets. Specifically, the executing entity may perform clustering in various ways, such as an image clustering algorithm, a k-means clustering algorithm, and the like.
Step 406, selecting at least one skeleton region from each cluster set of the at least one cluster set, and taking a candidate page corresponding to the at least one skeleton region as a target page.
In this embodiment, the execution subject may select at least one skeleton region from each of the generated multiple cluster sets, and use a candidate page corresponding to the selected skeleton region as a target page. The target page may be used to perform page tests to verify the quality of the candidate page. Specifically, the candidate page corresponding to the skeleton region may refer to a candidate page to which a display interface where the skeleton region is located belongs.
The embodiment may select at least two candidate pages corresponding to the determined attribute states to ensure that pages in various attribute states are selected, thereby improving the recall rate of selecting pages of different types.
In some optional implementations of this embodiment, step 402 may include: and selecting a candidate page corresponding to each attribute state from the candidate page set for each determined attribute state of each page attribute.
In these optional implementation manners, the execution main body may select, for each page attribute of the candidate page, the candidate page corresponding to each determined attribute state, so as to ensure that the pages in all attribute states are selected, thereby further improving the recall rate for selecting the pages of different types.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a device for processing a page, which corresponds to the embodiment of the method shown in fig. 2a, and which may include the same or corresponding features or effects as the embodiment of the method shown in fig. 2a, in addition to the features described below. The device can be applied to various electronic equipment.
As shown in fig. 5, the page processing apparatus 500 of the present embodiment includes: the device comprises an acquisition unit 501, an extraction unit 502, a clustering unit 503 and a selection unit 504. The acquiring unit 501 is configured to acquire a plurality of candidate pages and determine display interfaces of the plurality of candidate pages; an extracting unit 502 configured to extract a skeleton structure diagram of the display interface, and identify a skeleton region in the skeleton structure diagram; a clustering unit 503 configured to cluster the skeleton regions to generate at least one cluster set; the selecting unit 504 is configured to select at least one skeleton region from each of the at least one cluster set, and take a candidate page corresponding to the at least one skeleton region as a target page.
In this embodiment, specific processing of the obtaining unit 501, the extracting unit 502, the clustering unit 503, and the selecting unit 504 of the page processing apparatus 500 and technical effects thereof can refer to related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2a, respectively, and are not described herein again.
In some optional implementations of the embodiment, the obtaining unit is further configured to perform obtaining the plurality of candidate pages as follows: acquiring a candidate page set, and determining the attribute state of the page attribute of a candidate page in the candidate page set; for each attribute state in at least two determined attribute states of the page attributes, selecting a candidate page corresponding to the attribute state from a candidate page set; and taking the selected candidate pages as a plurality of candidate pages.
In some optional implementation manners of this embodiment, the obtaining unit is further configured to execute, for each attribute state of the at least two determined attribute states of the page attribute, selecting, from the candidate page set, a candidate page corresponding to the attribute state as follows: and selecting a candidate page corresponding to each attribute state from the candidate page set for each determined attribute state of each page attribute.
In some optional implementations of the embodiment, the selecting unit is further configured to perform selecting the at least one skeleton region from each of the at least one cluster set as follows: for each cluster set of the at least one cluster set, in response to determining that a first candidate page of the cluster set includes a skeleton region of a second candidate page based on features of the skeleton region, selecting the skeleton region of the first candidate page as a skeleton region in the at least one skeleton region.
In some optional implementations of the embodiment, the selecting unit is further configured to perform selecting at least one skeleton region from each of the plurality of cluster sets as follows: for each cluster set of the multiple cluster sets, in response to determining that the skeleton regions in at least two candidate pages in the cluster set are the same and the arrangement sequence of the skeleton regions is different based on the characteristics of the skeleton regions, selecting the skeleton regions of at least two candidate pages as the skeleton regions in at least one skeleton region.
In some optional implementations of this embodiment, the features of the skeleton region are obtained by: for a skeleton region, extracting elements of the skeleton region; determining element characteristics of elements in the skeleton region as the characteristics of the skeleton region, wherein the element characteristics comprise at least one of the following: element type, element size, element number, and element order.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device of a page processing method. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the page processing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method of processing a page provided by the present application.
The memory 602 is used as a non-transitory computer readable storage medium and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the processing method of the page in the embodiment of the present application (for example, the obtaining unit 501, the extracting unit 502, the clustering unit 503, and the selecting unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implements the processing method of the page in the above method embodiment, by running the non-transitory software programs, instructions, and modules stored in the memory 602.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the processing electronics of the page, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected to the processing electronics of the page via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the page processing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing electronics of the page, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an extraction unit, a clustering unit, and a selection unit. The names of the units do not form a limitation on the units themselves in some cases, for example, a clustering unit may also be described as "a unit that clusters a skeleton region to generate at least one cluster set".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a plurality of candidate pages and determining display interfaces of the candidate pages; extracting a skeleton structure diagram of a display interface, and identifying a skeleton region in the skeleton structure diagram; clustering the skeleton region to generate at least one cluster set; selecting at least one skeleton region from each cluster set of at least one cluster set, and taking a candidate page corresponding to the at least one skeleton region as a target page.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A method of processing a page, the method comprising:
obtaining a plurality of candidate pages and determining display interfaces of the candidate pages;
extracting a skeleton structure diagram of the display interface, and identifying a skeleton region in the skeleton structure diagram;
clustering the skeleton region to generate at least one cluster set;
and selecting at least one skeleton region from each cluster set of the at least one cluster set, and taking a candidate page corresponding to the at least one skeleton region as a target page.
2. The method of claim 1, wherein the obtaining a plurality of candidate pages comprises:
acquiring a candidate page set, and determining the attribute state of the page attribute of a candidate page in the candidate page set;
for each attribute state in at least two determined attribute states of the page attributes, selecting a candidate page corresponding to the attribute state from the candidate page set;
and taking the selected candidate pages as the plurality of candidate pages.
3. The method according to claim 2, wherein said selecting, for each of the at least two determined attribute states of the page attribute, a candidate page from the candidate page set corresponding to that attribute state comprises:
and selecting a candidate page corresponding to each attribute state from the candidate page set according to each determined attribute state of each page attribute.
4. The method of claim 1, wherein said selecting at least one skeleton region from each of said at least one cluster set comprises:
for each cluster set of the at least one cluster set, in response to determining that a first candidate page of the cluster set includes a skeleton region of a second candidate page based on characteristics of the skeleton region, selecting the skeleton region of the first candidate page as the skeleton region in the at least one skeleton region.
5. The method of claim 1, wherein said selecting at least one skeleton region from each of said plurality of cluster sets comprises:
and for each cluster set of the plurality of cluster sets, in response to the fact that the skeleton regions in at least two candidate pages in the cluster set are the same and the arrangement sequence of the skeleton regions is different based on the characteristics of the skeleton regions, selecting the skeleton regions of the at least two candidate pages as the skeleton regions in the at least one skeleton region.
6. The method of claim 4 or 5, wherein the characteristics of the skeleton region are obtained by:
for a skeleton region, extracting elements of the skeleton region;
determining element characteristics of elements in the skeleton region as the characteristics of the skeleton region, wherein the element characteristics comprise at least one of the following: element type, element size, element number, and element order.
7. An apparatus for processing a page, the apparatus comprising:
the display device comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is configured to acquire a plurality of candidate pages and determine display interfaces of the candidate pages;
the extraction unit is used for extracting the framework structure diagram of the display interface and identifying a framework region in the framework structure diagram;
a clustering unit configured to cluster the skeleton region to generate at least one cluster set;
and the selecting unit is configured to select at least one skeleton region from each cluster set of the at least one cluster set, and take a candidate page corresponding to the at least one skeleton region as a target page.
8. The apparatus of claim 7, wherein the obtaining unit is further configured to perform the obtaining the plurality of candidate pages as follows:
acquiring a candidate page set, and determining the attribute state of the page attribute of a candidate page in the candidate page set;
for each attribute state in at least two determined attribute states of the page attributes, selecting a candidate page corresponding to the attribute state from the candidate page set;
and taking the selected candidate pages as the plurality of candidate pages.
9. The apparatus according to claim 8, wherein the obtaining unit is further configured to perform the determining for each of the at least two determined attribute states of the page attribute, and select a candidate page corresponding to the attribute state from the candidate page set as follows:
and selecting a candidate page corresponding to each attribute state from the candidate page set according to each determined attribute state of each page attribute.
10. The apparatus of claim 7, wherein the selecting unit is further configured to perform the selecting at least one skeleton region from each of the at least one cluster set as follows:
for each cluster set of the at least one cluster set, in response to determining that a first candidate page of the cluster set includes a skeleton region of a second candidate page based on characteristics of the skeleton region, selecting the skeleton region of the first candidate page as the skeleton region in the at least one skeleton region.
11. The apparatus of claim 7, wherein the selecting unit is further configured to perform the selecting at least one skeleton region from each of the plurality of cluster sets as follows:
and for each cluster set of the plurality of cluster sets, in response to the fact that the skeleton regions in at least two candidate pages in the cluster set are the same and the arrangement sequence of the skeleton regions is different based on the characteristics of the skeleton regions, selecting the skeleton regions of the at least two candidate pages as the skeleton regions in the at least one skeleton region.
12. The apparatus of claim 10 or 11, wherein the characteristics of the skeleton region are obtained by:
for a skeleton region, extracting elements of the skeleton region;
determining element characteristics of elements in the skeleton region as the characteristics of the skeleton region, wherein the element characteristics comprise at least one of the following: element type, element size, element number, and element order.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202010462451.2A 2020-05-27 2020-05-27 Page processing method and device, electronic equipment and storage medium Active CN111611503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010462451.2A CN111611503B (en) 2020-05-27 2020-05-27 Page processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010462451.2A CN111611503B (en) 2020-05-27 2020-05-27 Page processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111611503A true CN111611503A (en) 2020-09-01
CN111611503B CN111611503B (en) 2023-07-14

Family

ID=72200738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010462451.2A Active CN111611503B (en) 2020-05-27 2020-05-27 Page processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111611503B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010292A1 (en) * 2006-07-05 2008-01-10 Krishna Leela Poola Techniques for clustering structurally similar webpages based on page features
WO2012094718A1 (en) * 2011-01-14 2012-07-19 Andre Douen Systems, methods and articles for managing presentation of information
US20120330952A1 (en) * 2011-06-23 2012-12-27 Microsoft Corporation Scalable metadata extraction for video search
US20160275081A1 (en) * 2013-03-20 2016-09-22 Nokia Technologies Oy Method and apparatus for personalized resource recommendations
CN106708952A (en) * 2016-11-25 2017-05-24 北京神州绿盟信息安全科技股份有限公司 Web page clustering method and device
US20180101423A1 (en) * 2016-10-11 2018-04-12 Oracle International Corporation Cluster-based processing of unstructured log messages
US20180210864A1 (en) * 2017-01-25 2018-07-26 International Business Machines Corporation Web page design snapshot generator
WO2018149250A1 (en) * 2017-02-15 2018-08-23 宗刚 Chinese character skeleton code input method and system having suggestion screen interface
CN109902248A (en) * 2019-02-25 2019-06-18 百度在线网络技术(北京)有限公司 Page display method, device, computer equipment and readable storage medium storing program for executing
CN110058838A (en) * 2019-04-28 2019-07-26 腾讯科技(深圳)有限公司 Sound control method, device, computer readable storage medium and computer equipment
US20190266780A1 (en) * 2018-02-23 2019-08-29 Canon Kabushiki Kaisha 3d skeleton reconstruction from images using volumic probability data
CN110187878A (en) * 2019-05-29 2019-08-30 北京三快在线科技有限公司 A kind of page generation method and device
CN111026946A (en) * 2019-12-12 2020-04-17 杭州昕华信息科技有限公司 Page information extraction method, device, medium and equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010292A1 (en) * 2006-07-05 2008-01-10 Krishna Leela Poola Techniques for clustering structurally similar webpages based on page features
WO2012094718A1 (en) * 2011-01-14 2012-07-19 Andre Douen Systems, methods and articles for managing presentation of information
US20120330952A1 (en) * 2011-06-23 2012-12-27 Microsoft Corporation Scalable metadata extraction for video search
US20160275081A1 (en) * 2013-03-20 2016-09-22 Nokia Technologies Oy Method and apparatus for personalized resource recommendations
US20180101423A1 (en) * 2016-10-11 2018-04-12 Oracle International Corporation Cluster-based processing of unstructured log messages
CN106708952A (en) * 2016-11-25 2017-05-24 北京神州绿盟信息安全科技股份有限公司 Web page clustering method and device
US20180210864A1 (en) * 2017-01-25 2018-07-26 International Business Machines Corporation Web page design snapshot generator
WO2018149250A1 (en) * 2017-02-15 2018-08-23 宗刚 Chinese character skeleton code input method and system having suggestion screen interface
US20190266780A1 (en) * 2018-02-23 2019-08-29 Canon Kabushiki Kaisha 3d skeleton reconstruction from images using volumic probability data
CN109902248A (en) * 2019-02-25 2019-06-18 百度在线网络技术(北京)有限公司 Page display method, device, computer equipment and readable storage medium storing program for executing
CN110058838A (en) * 2019-04-28 2019-07-26 腾讯科技(深圳)有限公司 Sound control method, device, computer readable storage medium and computer equipment
CN110187878A (en) * 2019-05-29 2019-08-30 北京三快在线科技有限公司 A kind of page generation method and device
CN111026946A (en) * 2019-12-12 2020-04-17 杭州昕华信息科技有限公司 Page information extraction method, device, medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵涓涓等: "基于Web页面结构和主色调的聚类算法", 《计算机工程》, no. 03 *

Also Published As

Publication number Publication date
CN111611503B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN110337641B (en) Determining application test results using screen shot primitive data
CN112650907A (en) Search word recommendation method, target model training method, device and equipment
CN110427436B (en) Method and device for calculating entity similarity
CN111522940A (en) Method and device for processing comment information
CN111582477B (en) Training method and device for neural network model
CN111309200B (en) Method, device, equipment and storage medium for determining extended reading content
CN112559522A (en) Data storage method and device, query method, electronic device and readable medium
CN112836072A (en) Information display method and device, electronic equipment and medium
CN111770384A (en) Video switching method and device, electronic equipment and storage medium
CN112529180A (en) Method and apparatus for model distillation
CN112561059A (en) Method and apparatus for model distillation
CN106021279B (en) Information display method and device
CN112346612A (en) Page display method and device
CN111767990A (en) Neural network processing method and device
CN111611503B (en) Page processing method and device, electronic equipment and storage medium
CN111510376B (en) Image processing method and device and electronic equipment
CN111325006B (en) Information interaction method and device, electronic equipment and storage medium
CN111524123B (en) Method and apparatus for processing image
CN112446716B (en) UGC processing method and device, electronic equipment and storage medium
CN112733879A (en) Model distillation method and device for different scenes
CN111651229A (en) Font changing method, device and equipment
CN112598136A (en) Data calibration method and device
CN111352685A (en) Input method keyboard display method, device, equipment and storage medium
CN112529181A (en) Method and apparatus for model distillation
CN111582480A (en) Method and device for pruning a model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant