CN113722631A - Page synthesis method and device - Google Patents

Page synthesis method and device Download PDF

Info

Publication number
CN113722631A
CN113722631A CN202010430928.9A CN202010430928A CN113722631A CN 113722631 A CN113722631 A CN 113722631A CN 202010430928 A CN202010430928 A CN 202010430928A CN 113722631 A CN113722631 A CN 113722631A
Authority
CN
China
Prior art keywords
page
same
user
synthesis
accessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010430928.9A
Other languages
Chinese (zh)
Other versions
CN113722631B (en
Inventor
郑辉
唐蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Hebei Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Hebei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Hebei Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010430928.9A priority Critical patent/CN113722631B/en
Publication of CN113722631A publication Critical patent/CN113722631A/en
Application granted granted Critical
Publication of CN113722631B publication Critical patent/CN113722631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention discloses a page synthesis method and device. Wherein, the method comprises the following steps: acquiring a URI (Uniform resource identifier) request message generated for a page element when a page of a webpage is accessed; generating a page synthesis data group containing a preset page synthesis field according to the URI request message; performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time; and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page. Based on the scheme provided by the invention, the page synthesis is carried out based on the user internet log of the page element level to obtain the user internet log of the page level, so that a basis is provided for obtaining the user internet quality index of the page level of the page at the later stage, and the analysis of more truly and effectively evaluating the real feeling of the user for browsing the page can be supported.

Description

Page synthesis method and device
Technical Field
The invention relates to the technical field of internet, in particular to a page synthesis method and device.
Background
In recent years, with the progress of science and technology, the flow of the existing network is continuously expanded. In the occupation ratio of each service flow, the browsing service flow occupation ratio is always in the first place, and meanwhile, in the current fast paced life, the requirement of a webpage browser on the network quality of the webpage browsing service is higher. At present, there are two main network quality detection methods for browsing services: one is to simulate the user internet behavior based on a dial-up test means, and the other is to obtain the user access record based on a DPI probe detection technology.
The mode of simulating the internet surfing behavior of the user and counting the network quality of the browsing service based on the dial-up measurement means cannot comprehensively reflect the real internet access quality condition of the user; the mode of obtaining user access record statistics browsing service network quality based on DPI probe detection technology is a quality index obtained by summarizing and summarizing the URI, Host and ICP levels of elements, and the quality index is still different from the quality index of a page.
Disclosure of Invention
In view of the above, embodiments of the present invention are proposed to provide a page composition method and apparatus that overcome the above problems or at least partially solve the above problems.
According to an aspect of an embodiment of the present invention, there is provided a page composition method, including:
acquiring a URI (Uniform resource identifier) request message generated for a page element when a page of a webpage is accessed;
generating a page synthesis data group containing a preset page synthesis field according to the URI request message;
performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time;
and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page.
According to another aspect of the embodiments of the present invention, there is provided a page composition apparatus including:
the acquisition module is suitable for acquiring a URI (Uniform resource identifier) request message generated by a page element when a webpage is accessed;
the generating module is suitable for generating a page synthesis data group containing a preset page synthesis field according to the URI request message;
the clustering module is suitable for performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm and determining page elements belonging to the same page accessed by the same user at the same time;
and the page synthesis module is suitable for performing page synthesis on page elements belonging to the same page accessed by the same user at the same time to obtain a synthesized page.
According to still another aspect of an embodiment of the present invention, there is provided a computing device including: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the page synthesis method.
According to another aspect of the embodiments of the present invention, there is provided a computer storage medium, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform operations corresponding to the page composition method.
According to the scheme provided by the embodiment of the invention, a URI (Uniform resource identifier) request message generated for a page element when a page is accessed is obtained; generating a page synthesis data group containing a preset page synthesis field according to the URI request message; performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time; and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page. Based on the scheme provided by the invention, the page synthesis is carried out based on the user internet log of the page element level to obtain the user internet log of the page level, so that a basis is provided for obtaining the user internet quality index of the page level of the page at the later stage, and the analysis of more truly and effectively evaluating the real feeling of the user for browsing the page can be supported.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the embodiments of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the embodiments of the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a page composition method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a page composition method according to another embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a page composition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device provided in an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a flowchart of a page composition method provided by an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S101, a URI request message generated for a page element when a page of a webpage is accessed is obtained.
In most cases, a web page is composed of a plurality of different page elements, when any user accesses any web page, an access request for the plurality of page elements is triggered, and a URI request message is generated for the access request for any page element. The page element may be PNG, JPG, Video, HTML, JS, CSS, or the like, and the URI request message may include many pieces of information, for example, information such as a service tag, a stream start time, a user IP address, user terminal information, a resource URI visited by the user, a resource domain name visited by the user, a previous source address refer _ URI of the resource, uplink and downlink traffic, and a time delay.
Step S102, generating a page synthesis data group containing a preset page synthesis field according to the URI request message.
The URI request message obtained in step S101 contains many pieces of information, and some pieces of information do not help much in determining whether a page element is a page element of the same page accessed by the same user at the same time, but may affect the determination or increase the calculation resources, so that when the URI request message is obtained, a page composition data set containing a preset page composition field may be generated according to the URI request message, where the preset page composition field contains: user terminal information, user IP address, source address, stream start time. The generated page synthesis data group extracts the information corresponding to the preset page synthesis field from the URI request message, and the information contained in the page synthesis data group is obviously less than that of the URI request message.
And step S103, performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time.
The page composition data group containing the preset page composition field generated in step S102 covers page elements of a plurality of pages accessed by a plurality of users, and when performing page composition, the page elements of the same page accessed by the same user at the same time are used for composition, so that the page elements of the plurality of pages need to be divided, which relates to page element clustering processing.
And step S104, page composition is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a composite page.
After determining page elements belonging to the same page accessed by the same user at the same time according to step S103, page composition may be performed on the page elements belonging to the same page accessed by the same user at the same time, so as to obtain a composite page.
According to the method provided by the embodiment of the invention, the URI request message generated for the page element when the page is accessed is obtained; generating a page synthesis data group containing a preset page synthesis field according to the URI request message; performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time; and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page. Based on the scheme provided by the invention, the page synthesis is carried out based on the user internet log of the page element level to obtain the user internet log of the page level, so that a basis is provided for obtaining the user internet quality index of the page level of the page at the later stage, and the analysis of more truly and effectively evaluating the real feeling of the user for browsing the page can be supported.
Fig. 2 is a flowchart illustrating a page composition method according to another embodiment of the present invention. As shown in fig. 2, the method comprises the steps of:
step S201, a URI request message generated for a page element when accessing a page of a web page is acquired.
In most cases, a web page is composed of a plurality of different page elements, when a web page is accessed, besides a GET request of a main page, GET requests of a plurality of page elements are also issued, and all URI request messages corresponding to the GET requests generated when the web page is accessed can be captured by a browser capture tool (e.g., a DPI probe), including URI request messages for obtaining page elements (such as PNG, JPG, Video, HTML, JS, CSS, etc.). The resource addresses of the page elements are not generally under the same domain name resource, but are distributed in different domain name resources, but by combining the interaction principle of web browsing and the head header information of the URI request message, it can be found that, except the GET request of the main page html, the source addresses of different GET requests refer _ URI are the same and are the main page addresses initiating the current GET request, so that which page elements belong to the same page can be located based on the refer _ URI fields, thereby performing page synthesis.
The URI request packet may include many information, for example, information such as a service tag, a flow start time, a user IP address, user terminal information, a resource URI accessed by a user, a resource domain name accessed by a user, a top source address refer _ URI of a resource, uplink and downlink traffic, and a time delay.
For example, the user enters the web address https:// m.baidu.com/? In addition to sending a GET request for html elements of the main page, the browser also sends a GET request for all elements on the page, and at the same time, opens a browser capture tool (e.g., a DPI probe), and obtains a URI request message generated for a page element, such as a URI request message corresponding to a JS file request, a URI request message corresponding to a Gif file request, and so on, where table 1 shows partial fields of the obtained URI request message.
Table 1:
Figure BDA0002500560080000051
Figure BDA0002500560080000061
optionally, after the URI request message is obtained, the URI request message may be transmitted to the collection server, and then the collection server synthesizes an XDR ticket that meets the unified specification of the operator. When a user accesses a certain page, the browser sends GET requests of a plurality of page elements to generate a plurality of TCP streams, the synthetic rule of the XDR ticket is to record the TCP streams one by one GET, namely, the GET request of each page element has one record in the XDR ticket, the acquisition server transmits the generated element level XDR ticket to a Hadoop big data processing platform, and the XDR ticket is subsequently processed on the big data processing platform.
Step S202, according to the URI request message, generating a page synthesis data group containing a preset page synthesis field.
In most cases, the page elements for page composition need to be page elements of the same page accessed by the same user at the same time, the URI request message obtained in step S201 contains a lot of information, and some information does not help to determine whether a page element is a page element of the same page accessed by the same user at the same time, but may affect the determination or increase computing resources. Here, the preset page composition field includes: the page composition data set is generated by user terminal information (usergent), a user IP address (userip), a source address (refer _ URI), and a stream start time, and thus, when the URI request message is obtained, the page composition data set including the preset page composition field may be generated according to the URI request message. The generated page synthesis data group extracts the information corresponding to the preset page synthesis field from the URI request message, and the information contained in the page synthesis data group is obviously less than that of the URI request message.
Specifically, r (h) field representing user IP address, r (k) field representing user terminal information, r (u) field representing source address, r (t) field representing stream start time, and page composition data set may be represented as<rx(h),rx(k),rx(u),rx(t)>The set of arrays is represented as: chinese character' S<r1(h),r1(k),r1(u),r1(t)>,…,<rx(h),rx(k),rx(u),rx(t)>…, the size of the array S is x.
Definition h denotes a user IP address field in a training sample, k denotes a user terminal information field in the training sample, u denotes a source address refer _ uri field in the training sample, t denotes a stream start time field in the training sample, and each sample record is s ═<rx(h),rx(k),rx(u),rx(t)>(S ∈ S), which is composed of the user IP address, the user terminal information, the source address refer _ uri, and the flow start time acquired by the packet capture tool, and is defined as r (h), r (k), r (u), and r (t), respectively. The universe of all user IPs is denoted H ═ H1,h2,…,hlAnd the universe of all the user terminal information is represented as K ═ K1,k2,…,kmAll source addresses refer uri are denoted U ═ U1,u2,…,unThe universe of all stream start times is denoted T ═ T1,t2,…,tu}。
Then the full set of user IP addresses in step S201 is: h ═ 211.143.53.158, 211.143.53.159, there is only one ue model in the sample record, so the ue information K ═ Chrome/63.0.3239.132, and the full set of all resources refer _ uri is denoted U ═ https:// www.iqiyi.com/v _19rsbcpyxs. Http:// www.iqiyi.com/edu/zsff. html? Http:// sports.iqiyi.com/sports/channel.html? The full set of all stream start times is denoted as T ═ 2019/4/215: 32:02, 2019/4/215: 32:03, 2019/4/215: 32:04, 2019/4/215: 32: 05.
Step S203, the validity of the page synthetic data set is filtered according to the source address, and an effective page synthetic data set is obtained.
According to the interaction principle of the web browsing service, the source address refer _ uri field r (u) is the most key field for page composition, so that it is necessary to ensure the element rxThe value (u) is valid. Specifically, the effective filtering may be performed on the page composition data set according to the source address to obtain an effective page composition data set, where r is mainly filteredx(u) data sets with null or abnormal values, resulting in an effective page composition data set<rx(h),rx(k),rx(u),rx(t)>. Wherein, table 2 shows the relevant records of the valid page composition data set after the validity filtering.
Table 2:
Figure BDA0002500560080000071
Figure BDA0002500560080000081
step S204, calculating the Euclidean distance between any two page synthetic data sets according to the user terminal information, the user IP address and the source address, and if the Euclidean distance is equal to a first preset threshold value, dividing the corresponding page synthetic data sets into the same type.
When the same user accesses the same page, the user IP address, the user terminal information and the source address refer _ uri are completely the same, and the Euclidean distance d between any two page elements of the same pagexyAll should be 0, and the euclidean distance between two different sample records of any one of the user IP address, the user terminal information, and the source address refer _ uri is not equal to 0, it may be determined that the first clustering process is zero-distance clustering. Therefore, the user IP address, the user terminal information and the source address refer _ uri (namely the page composition array) can be selected according to the user IP address, the user terminal information and the source address<r(h),r(k),r(u)>) Performing zero-distance clustering, specifically, calculating the euclidean distance between any two page synthetic data sets according to user terminal information, a user IP address and a source address, judging whether the calculated euclidean distance is a first preset threshold (wherein the first preset threshold is 0), and if the euclidean distance is equal to the first preset threshold, dividing the corresponding page synthetic data sets into the same class; and if the Euclidean distance is not equal to the first preset threshold value, determining that the corresponding page synthetic data groups do not belong to the same user and/or the same page.
Wherein two points a (x) in two dimensions1,y1) And b (x)2,y2) The calculation of the Euclidean distance between the two is as follows:
Figure BDA0002500560080000082
three-dimensional space two points a (x)1,y1,z1) And b (x)2,y2,z2) The calculation of the Euclidean distance between the two is as follows:
Figure BDA0002500560080000083
two points a (x) in n-dimensional space11,x12,…,x1n) And b (x)21,x22,…,x2n) The calculation of the Euclidean distance between the two is as follows:
Figure BDA0002500560080000091
in this embodiment, the following formula (1) can be used to calculate any two page composition data sets (b)<rx(h),rx(k),rx(u)>And<ry(h),ry(k),ry(u)>) Euclidean distance between:
Figure BDA0002500560080000092
in order to realize the division of the page composition data group, two temporary fields, namely "page ID" and "page element ID", can be added to indicate the page to which the page element belongs and mark the page element. Table 3 shows the relative records of the valid page composition data set after zero distance clustering.
Table 3:
Figure BDA0002500560080000093
Figure BDA0002500560080000101
as shown in the above table: the page composition data group with sequence number 1/2/3 belongs to the same page, and the page ID is 1.
Step S205, for the page synthesis data groups divided into the same class, sorting the page synthesis data groups of the same class according to the stream start time, calculating an euclidean distance between the page synthesis data group corresponding to the minimum stream start time and the page synthesis data group corresponding to the maximum stream start time according to the user terminal information, the user IP address, the source address, and the stream start time, and if the euclidean distance is less than or equal to a second preset threshold, determining a page element corresponding to the corresponding page synthesis data group as a page element belonging to the same page accessed by the same user at the same time.
In practical applications, it is likely that a user accesses an interested content twice or more, so that access requests of page elements of different orders need to be distinguished in time, specifically, after zero-distance clustering is performed in step S204 to divide page composition data groups corresponding to the same page belonging to the same user into one category, for the page composition data groups divided into the same category, the page composition data groups of the same category are sorted according to stream start time, for example, sorted in an ascending order or a descending order, where the ascending order is described as an example, for convenience of subsequent description, the page composition data groups sorted in the ascending order are numbered, for example, page element IDs shown in table 3 are respectively for page 1, 3 page element IDs: 001/002/003.
The Get requests for browsing the elements of the same page are similar in time, the second preset threshold is a time threshold, and is usually set to be 20s, although those skilled in the art may set other values according to practice, for example, 25s, and the value of the second preset threshold is not suitable to be set too large.
In this step, after sorting, a page composition data set corresponding to the minimum stream start time and a page composition data set corresponding to the maximum stream start time may be selected from the sorted page composition data sets, and an euclidean distance between the page composition data set corresponding to the minimum stream start time and the page composition data set corresponding to the maximum stream start time, for example, a page element ID, is calculated by using formula (2) according to the user terminal information, the user IP address, the source address, and the stream start time: 001 corresponding page composition data set and page element ID: 003, if the euclidean distance is less than or equal to a second preset threshold, then the page element ID: 001-Page element ID: 003 are all determined to be page elements belonging to the same page accessed by the same user at the same time. After sorting, the calculation of the euclidean distance between the page composite data group corresponding to the minimum stream start time and the page composite data group corresponding to the maximum stream start time can reduce the calculation amount and improve the page composite efficiency.
Figure BDA0002500560080000111
Of course, a case may also occur where the euclidean distance between the page composition data group corresponding to the minimum stream start time (which may also be referred to as a first page composition data group) and the page composition data group corresponding to the maximum stream start time is greater than a second preset threshold, which indicates that the page element corresponding to the page composition data group corresponding to the maximum stream start time is not the last page element accessed at the same time and needs to be continuously probed for the last page element accessed at the same time, at this time, the unselected page composition data groups may be selected in a reverse order, the euclidean distance between the page composition data group and the first page composition data group is calculated, whether the calculated euclidean distance is less than or equal to the second preset threshold is determined, and if the calculated euclidean distance is less than or equal to the second preset threshold, the page element between the selected page composition data group and the first page composition data group (which includes the selected page element) is determined to be present Page elements corresponding to the page composition data set and the first page composition data set) are determined as page elements belonging to the same page accessed by the same user at the same time; if the calculated euclidean distance is greater than the second preset threshold, it indicates that the page element corresponding to the selected page synthesis data set is not the last page element accessed at the same time, and the last page element accessed at the same time needs to be continuously probed. The implementation process is similar to the above description and will not be described again. After determining a part of page elements as page elements belonging to the same page accessed by the same user at the same time, a situation that the part of page elements are not clustered may occur, and at this time, clustering may be continued according to the above description, where it should be noted that the selected first page composition data group is a page composition data group with the smallest stream start time among the remaining page composition data groups. And if the Euclidean distance between the page elements is more than 20s, the two page elements are split into two page elements accessed at different times, so that the page synthesis accuracy is greatly improved.
After the above processing, the page element with the serial number 1/2/3 in table 3 is attributed to the same page visited by the same user at the same time, and the temporary serial number is 1; the page element of the serial number 4/7/10 belongs to the same page visited by the same user at the same time, and is temporarily numbered as 2; the page element of the serial number 5/6/8/9/11 belongs to the same page visited by the same user at the same time, and is temporarily numbered as 3; the page element of serial number 12/13 belongs to the same page visited by the same user at the same time, and is temporarily numbered 4.
The page elements that the user IP address, the user terminal information and the source address refer _ uri are consistent and the time interval delta t of the up-down search starting time does not exceed 20s are obtained through the clustering processing, and are one-time access records of a user at a certain time. The page synthesis based on the clustering algorithm can avoid the condition that a plurality of users access the same page, and greatly improves the page synthesis accuracy.
Step S206, page composition is carried out on page elements belonging to the same page accessed by the same user at the same time, and a composite page is obtained.
After determining page elements belonging to the same page accessed by the same user at the same time according to step S205, page composition may be performed on the page elements to obtain a composite page. The synthesized page can be used for outputting page-level quality index data, so that the real feeling of a user when the user browses the webpage is truly and effectively evaluated.
According to the method provided by the embodiment of the invention, the page synthesis is carried out based on the user internet log of the page element level to obtain the user internet log of the page level, a basis is provided for obtaining the user internet quality index of the page level of the page at the later stage, the influence of individual poor page elements on the whole page evaluation is reduced, and the quality condition of the user accessing the browsing service can be more truly and approximately perceived by the user compared with the existing method by judging the quality condition of the user accessing the browsing service through the quality index of the page level, so that the real feeling of the user when the user browses the page can be more truly and effectively evaluated.
Fig. 3 is a schematic structural diagram illustrating a page composition apparatus according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes: the system comprises an acquisition module 301, a generation module 302, a clustering module 303 and a page synthesis module 304.
The acquiring module 301 is adapted to acquire a URI request message generated for a page element when accessing a page of a web page;
a generating module 302, adapted to generate a page composition data set including a preset page composition field according to the URI request packet;
the clustering module 303 is adapted to perform page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determine page elements belonging to the same page accessed by the same user at the same time;
the page composition module 304 is adapted to perform page composition on page elements belonging to the same page accessed by the same user at the same time to obtain a composite page.
Optionally, the preset page composition field includes: user terminal information, user IP address, source address, stream start time.
Optionally, the clustering module is further adapted to: calculating the Euclidean distance between any two page synthetic data sets according to the user terminal information, the user IP address and the source address, and if the Euclidean distance is equal to a first preset threshold value, dividing the corresponding page synthetic data sets into the same class;
and aiming at the page synthetic data groups divided into the same class, calculating the Euclidean distance between the page synthetic data group corresponding to the minimum stream starting time and the page synthetic data group corresponding to the maximum stream starting time according to the user terminal information, the user IP address, the source address and the stream starting time, and if the Euclidean distance is less than or equal to a second preset threshold value, determining the page elements corresponding to the corresponding page synthetic data groups as the page elements belonging to the same page accessed by the same user at the same time.
Optionally, the clustering module is further adapted to: and aiming at the page synthesis data groups divided into the same class, sequencing the page synthesis data groups of the same class according to the stream starting time.
Optionally, the apparatus further comprises: and the filtering module is suitable for carrying out effectiveness filtering on the page synthesis data set according to the source address to obtain an effective page synthesis data set.
According to the device provided by the embodiment of the invention, the URI request message generated for the page element when the page is accessed is obtained; generating a page synthesis data group containing a preset page synthesis field according to the URI request message; performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time; and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page. Based on the scheme provided by the invention, the page synthesis is carried out based on the user internet log of the page element level to obtain the user internet log of the page level, so that a basis is provided for obtaining the user internet quality index of the page level of the page at the later stage, and the analysis of more truly and effectively evaluating the real feeling of the user for browsing the page can be supported.
The embodiment of the invention provides a nonvolatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the page synthesis method in any method embodiment.
Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor (processor), a Communications Interface (Communications Interface), a memory (memory), and a Communications bus.
Wherein: the processor, the communication interface, and the memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers. The processor is used for executing the program, and particularly can execute the relevant steps in the page composition method embodiment for the computing device.
In particular, the program may include program code comprising computer operating instructions.
The processor may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program may specifically be configured to cause the processor to execute the page composition method in any of the above-described method embodiments. For specific implementation of each step in the program, reference may be made to corresponding steps and corresponding descriptions in units in the above-described page synthesis embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best modes of embodiments of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed embodiments of the invention require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. Embodiments of the invention may also be implemented as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing embodiments of the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A page composition method, comprising:
acquiring a URI (Uniform resource identifier) request message generated for a page element when a page of a webpage is accessed;
generating a page synthesis data group containing a preset page synthesis field according to the URI request message;
performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm, and determining page elements belonging to the same page accessed by the same user at the same time;
and page synthesis is carried out on page elements belonging to the same page accessed by the same user at the same time, so as to obtain a synthesized page.
2. The method of claim 1, wherein the preset page composition field contains: user terminal information, user IP address, source address, stream start time.
3. The method of claim 2, wherein the performing the page element clustering process on the page composition data group by using a preset page composition clustering algorithm, and the determining the page elements belonging to the same page accessed by the same user at the same time further comprises:
calculating the Euclidean distance between any two page synthetic data sets according to the user terminal information, the user IP address and the source address, and if the Euclidean distance is equal to a first preset threshold value, dividing the corresponding page synthetic data sets into the same class;
and aiming at the page synthetic data groups divided into the same class, calculating the Euclidean distance between the page synthetic data group corresponding to the minimum stream starting time and the page synthetic data group corresponding to the maximum stream starting time according to the user terminal information, the user IP address, the source address and the stream starting time, and if the Euclidean distance is less than or equal to a second preset threshold value, determining the page elements corresponding to the corresponding page synthetic data groups as the page elements belonging to the same page accessed by the same user at the same time.
4. The method of claim 3, wherein after the corresponding page composition data groups are classified into the same class if the euclidean distance is equal to the first preset threshold, the method further comprises:
and aiming at the page synthesis data groups divided into the same class, sequencing the page synthesis data groups of the same class according to the stream starting time.
5. The method of any of claims 2-4, wherein prior to page element clustering processing of the page composition data set using a preset page composition clustering algorithm, the method further comprises:
and carrying out effectiveness filtering on the page synthesis data set according to the source address to obtain an effective page synthesis data set.
6. A page composition apparatus, comprising:
the acquisition module is suitable for acquiring a URI (Uniform resource identifier) request message generated by a page element when a webpage is accessed;
the generating module is suitable for generating a page synthesis data group containing a preset page synthesis field according to the URI request message;
the clustering module is suitable for performing page element clustering processing on the page synthesis data group by using a preset page synthesis clustering algorithm and determining page elements belonging to the same page accessed by the same user at the same time;
and the page synthesis module is suitable for performing page synthesis on page elements belonging to the same page accessed by the same user at the same time to obtain a synthesized page.
7. The apparatus of claim 6, wherein the preset page composition field comprises: user terminal information, user IP address, source address, stream start time.
8. The apparatus of claim 7, wherein the clustering module is further adapted to: calculating the Euclidean distance between any two page synthetic data sets according to the user terminal information, the user IP address and the source address, and if the Euclidean distance is equal to a first preset threshold value, dividing the corresponding page synthetic data sets into the same class;
and aiming at the page synthetic data groups divided into the same class, calculating the Euclidean distance between the page synthetic data group corresponding to the minimum stream starting time and the page synthetic data group corresponding to the maximum stream starting time according to the user terminal information, the user IP address, the source address and the stream starting time, and if the Euclidean distance is less than or equal to a second preset threshold value, determining the page elements corresponding to the corresponding page synthetic data groups as the page elements belonging to the same page accessed by the same user at the same time.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the page synthesis method according to any one of claims 1-5.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the page composition method of any one of claims 1-5.
CN202010430928.9A 2020-05-20 2020-05-20 Page synthesis method and device Active CN113722631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010430928.9A CN113722631B (en) 2020-05-20 2020-05-20 Page synthesis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010430928.9A CN113722631B (en) 2020-05-20 2020-05-20 Page synthesis method and device

Publications (2)

Publication Number Publication Date
CN113722631A true CN113722631A (en) 2021-11-30
CN113722631B CN113722631B (en) 2023-11-21

Family

ID=78671252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010430928.9A Active CN113722631B (en) 2020-05-20 2020-05-20 Page synthesis method and device

Country Status (1)

Country Link
CN (1) CN113722631B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085226A (en) * 1998-01-15 2000-07-04 Microsoft Corporation Method and apparatus for utility-directed prefetching of web pages into local cache using continual computation and user models
US6981037B1 (en) * 2001-01-09 2005-12-27 International Business Machines Corporation Method and system for using access patterns to improve web site hierarchy and organization
US20080040653A1 (en) * 2006-08-14 2008-02-14 Christopher Levine System and methods for managing presentation and behavioral use of web display content
CN105162676A (en) * 2015-04-03 2015-12-16 中国科学院信息工程研究所 Method and system for acquiring WeChat data
CN105975984A (en) * 2016-04-29 2016-09-28 吉林大学 Evidence theory-based network quality evaluation method
CN107797908A (en) * 2017-11-07 2018-03-13 南威软件股份有限公司 A kind of behavioral data acquisition method of website user
CN108270637A (en) * 2016-12-30 2018-07-10 中国移动通信集团浙江有限公司 A kind of Website quality multilayer drills through system and method
CN108536700A (en) * 2017-03-02 2018-09-14 塞纳德(北京)信息技术有限公司 A kind of method that nothing buries a collector journal
US20180349506A1 (en) * 2017-06-06 2018-12-06 Tealium Inc. Configuration of content site user interaction monitoring in data networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085226A (en) * 1998-01-15 2000-07-04 Microsoft Corporation Method and apparatus for utility-directed prefetching of web pages into local cache using continual computation and user models
US6981037B1 (en) * 2001-01-09 2005-12-27 International Business Machines Corporation Method and system for using access patterns to improve web site hierarchy and organization
US20080040653A1 (en) * 2006-08-14 2008-02-14 Christopher Levine System and methods for managing presentation and behavioral use of web display content
CN105162676A (en) * 2015-04-03 2015-12-16 中国科学院信息工程研究所 Method and system for acquiring WeChat data
CN105975984A (en) * 2016-04-29 2016-09-28 吉林大学 Evidence theory-based network quality evaluation method
CN108270637A (en) * 2016-12-30 2018-07-10 中国移动通信集团浙江有限公司 A kind of Website quality multilayer drills through system and method
CN108536700A (en) * 2017-03-02 2018-09-14 塞纳德(北京)信息技术有限公司 A kind of method that nothing buries a collector journal
US20180349506A1 (en) * 2017-06-06 2018-12-06 Tealium Inc. Configuration of content site user interaction monitoring in data networks
CN107797908A (en) * 2017-11-07 2018-03-13 南威软件股份有限公司 A kind of behavioral data acquisition method of website user

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
罗耀满 等: "非法博彩对网络质量影响分析", 《电信快报》, no. 03, pages 19 - 25 *
董智纯 等: "一种基于Kettle的数据核查方法", 信息通信, no. 04, pages 258 - 259 *

Also Published As

Publication number Publication date
CN113722631B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN109145934B (en) User behavior data processing method, medium, equipment and device based on log
WO2022117063A1 (en) Method and apparatus for training isolation forest, and method and apparatus for recognizing web crawler
CN107609135B (en) Page element determining method and device, and user behavior path determining method and device
CN107273290B (en) A/B test method and device for page service
CN106682144A (en) Page display method and device
EP3161610B1 (en) Optimized browser rendering process
TW201239655A (en) Determining machine behavior
US20220383427A1 (en) Method and apparatus for group display
CN111163072B (en) Method and device for determining characteristic value in machine learning model and electronic equipment
JP2004258911A (en) Server, method for collecting information, and program
Bomhardt et al. Web robot detection-preprocessing web logfiles for robot detection
CN102880613A (en) Identification method of porno pictures and equipment thereof
US8473574B2 (en) Automatic online video discovery and indexing
CN110515631B (en) Method for generating application installation data packet, server and computer storage medium
CN114157568B (en) Browser secure access method, device, equipment and storage medium
CN104580109B (en) Generation clicks the method and device of identifying code
CN113055420B (en) HTTPS service identification method and device and computing equipment
CN113722631B (en) Page synthesis method and device
KR100902757B1 (en) Method and system for providing searching result based on url
JP2018018523A (en) Method for associating user access log, apparatus, system, program and recording medium
CN114448849B (en) Method for detecting supporting mode of IPv6 network of website and electronic equipment
CN113723720B (en) Page browsing quality evaluation method and device
CN113014555B (en) Method and device for determining attack event, electronic equipment and storage medium
CN110825976B (en) Website page detection method and device, electronic equipment and medium
CN113453076B (en) User video service quality evaluation method, device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant