Disclosure of Invention
The invention aims to provide a method and a device for generating a super webpage template and a page data transmission method, which can reduce the issued non-image WEB page data, accelerate the issuing speed of resources and improve the response time consumption of page browsing.
In a first aspect, an embodiment of the present invention provides a method for generating a super webpage template, including:
collecting a plurality of resource files within a preset range;
performing similarity calculation on every two resource files in the plurality of resource files, removing one of the two resource files when the data are the same, and removing the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar;
and merging the reserved resource files to generate the super webpage template.
Preferably, the resource file is a WEB page data resource file; the preset range includes: presetting a WEB site, presetting a path of the WEB site, or presetting a resource keyword.
Preferably, the performing similarity calculation on every two resource files in the plurality of resource files includes:
grouping a plurality of collected resource files into intervals according to the data volume contained in the resource files, wherein the resource files with the data volume in a certain interval are grouped into one group;
and carrying out similarity calculation on every two resource files in the plurality of resource files in each group.
Preferably, the merging the reserved resource files includes:
merging a group of reserved resource files with the maximum interval value to generate a temporary webpage template;
and respectively carrying out similarity calculation on the temporary webpage template and the resource files reserved by other groups, rejecting the resource files when the data of the resource files in one group is the same as or similar to the data of the temporary webpage template, otherwise combining the resource files into the temporary webpage template, and continuously carrying out similarity calculation on the temporary webpage template and the resource files reserved by the next group in the same way, wherein the finally generated temporary webpage template is a super webpage template.
Preferably, the acquiring a plurality of resource files within a preset range includes: and filtering the small resource files by presetting a lower threshold of the size of the resource files.
Preferably, the reserved resource files are intercepted and reserved in the middle section with different strategies according to different conditions, and head and tail data are removed.
Preferably, the generated super webpage template is split line by line;
and comparing every two split lines of resource data from front to back in sequence, removing one of the two lines of resource data when the data are the same, and removing the line resource data containing all or most of the content if one line of resource data in the two lines of resource data contains all or most of the content of the other line of resource data when the data are similar to generate the simplified super webpage template.
Preferably, the method further comprises the following steps: dividing line resource data with large data volume in the generated simplified super webpage template into a plurality of block data;
and performing similarity calculation on each block of data and other rows of resource data, and removing the block of data which is the same as or similar to the other rows of resource data.
In a second aspect, an embodiment of the present invention further provides a page data transmission method, including:
acquiring current non-image WEB resource data of a page requested by a client;
performing difference operation on the non-image WEB resource data by using a pre-established super webpage template corresponding to the page to acquire difference data; the super webpage template is generated according to the super webpage template generation method;
sending the delta data to a client.
In a third aspect, an embodiment of the present invention further provides a device for generating a super webpage template, where the device includes:
the acquisition module is used for acquiring a plurality of resource files within a preset range;
the similarity identification and elimination module is used for carrying out similarity calculation on every two resource files in the plurality of resource files, identifying two resource files with the same or similar data, eliminating one of the two resource files when the data are the same, and eliminating the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar;
and the template generating module is used for combining the reserved resource files to generate the super webpage template.
Preferably, the resource file is a WEB page data resource file; the preset range includes: presetting a WEB site, presetting a path of the WEB site, or presetting a resource keyword.
Preferably, the similarity identifying and rejecting module further includes:
the interval grouping submodule is used for carrying out interval grouping on the plurality of acquired resource files according to the data volume contained in the resource files, and the plurality of resource files with the data volume in a certain interval are divided into a group;
and the similarity identification submodule is used for carrying out similarity calculation and identification on every two resource files in the plurality of resource files in each group.
Preferably, the template generating module includes:
merging the submodules: the method comprises the steps of merging the reserved resource files of the group with the maximum interval value to generate a temporary webpage template;
and the similarity operation and generation sub-module is used for performing similarity operation on the temporary webpage template and each group of other reserved resource files, rejecting the resource file when the data of the resource file in one group is the same as or similar to the data of the temporary webpage template, otherwise merging the resource file into the temporary webpage template, and continuing performing similarity operation on the resource file in the next group in the same way, wherein the finally generated temporary webpage template is a super webpage template.
Preferably, the acquiring a plurality of resource files within a preset range includes: and filtering the small resource files by presetting a lower threshold of the size of the resource files.
Preferably, the method further comprises the following steps: and the middle section intercepting and reserving module is used for intercepting and reserving the reserved resource files according to different strategies under different conditions and removing head and tail data.
Preferably, the method further comprises the following steps:
and the line resource splitting and generating module is used for splitting the generated super webpage template line by line, sequentially comparing every two split lines of resource data from front to back, removing one of the two lines of resource data when the data are the same, removing the line resource data containing all or most of the content if one of the two lines of resource data contains all or most of the content of the other line of resource data when the data are similar, and finally generating the simplified super webpage template.
Preferably, the method further comprises the following steps:
and the block data splitting and removing module is used for splitting the row resource data with large data volume in the generated simplified super webpage template into a plurality of block data, carrying out similarity calculation on each block data and other row resource data, and removing the block data which is the same as or similar to other row resource data.
The generation method, the device and the page data transmission method of the super webpage template provided by the embodiment of the invention collect a plurality of resource files in a preset range, carry out similarity calculation on every two resource files in the collected resource files, reject one of the two resource files when the data are the same, reject the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar, simultaneously reserve data which are different or dissimilar with other data, finally use the residual data as the super webpage template, carry out difference operation on the non-image WEB resource data of a related page and the super webpage template when a server issues the non-image WEB resource data to obtain the non-image WEB resource data which are different from the data in the super webpage template, and sending the different non-image WEB resource data to the client. Therefore, the data volume of the issued non-image WEB resources can be reduced, the issuing speed of the resources is accelerated, and the response time consumption of page browsing is improved. The client only needs to consume shorter data receiving time to receive the non-image WEB resource data, so that the response speed of the server is increased, and the client experience is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and the specific embodiments of the present invention.
FIG. 1 is a flowchart of a method for generating a super webpage template according to the present invention. Referring to fig. 1, a method for generating a super webpage template provided by the embodiment of the present invention includes:
s101: and collecting a plurality of resource files within a preset range.
The resource file is a page data resource file; the preset range includes: presetting a WEB site, presetting a path of the WEB site, or presetting a resource keyword.
In specific implementation, for a specific WEB site, the Javascript resource and the CSS resource are referred to in a page in an external connection manner, and the purpose of the reference is to multiplex the resources onto pages with the same path or different paths, so that the same or similar interactive functions and the same or similar layouts are provided on different pages. In this case, the same data or similar data may exist in the content between Javascript resources and CSS resources of different pages based on the WEB site. Taking Javascript resources as an example, different resource contents may include some identical function modules. Or, the Javascript resources have similarity to the same kind of resources in the spatial extent (the specific site), that is, the resource files in the preset range have aggregation in the spatial extent. Although the website will update the Javascript resources and CSS resources over time, the new Javascript resources and CSS resources will include a portion of the old resources as long as they are not significantly modified. For example, some Javascript resources are updated, but some kernel function modules included in the Javascript resources remain unchanged, that is, similar Javascript resources in the time dimension also have similarity. I.e. the resource files within the preset range have convergence in the time dimension.
In addition, although the service emphasis points of different sites are different, the same front-end Javascript or CSS library may be selected due to the requirements of the front-end interaction function or the page layout. For example, jquery's series of Javascript libraries/CSS libraries are widely used by many sites, and non-graphic WEB resources of pages in these sites should have similar data. This illustrates that in a given batch of sites, there are resources that are identical or similar in content based on being within the collection of non-graph web resources with the same keyword in the name. The common feature of this class of similar resources is that the resource names carry the same keywords, although the respective domain names are different.
The preset range is a site, some paths in the site, where the similar resources of the Javascript resource and the CSS resource in the time extent and the space extent have similarities, or a non-image WEB collection, where the same front-end Javascript library or the CSS library is selected in a plurality of different sites and the name of the non-image WEB collection has the same resource keyword.
When the resource file in the preset range is collected, the resource file is a non-image WEB resource file. Generally, since non-graphical WEB resource files generally include: javascript resources and CSS resources, and the two resources belong to two different files in application, so that a separate acquisition mode can be adopted. During collection, a collection range is preset, and then time and pages to be collected are collected randomly for multiple times from the defined collection range. For example, a plurality of pages in a WEB site are collected at regular intervals. In order to ensure that the finally generated super webpage template has strong continuity, the collection process can be set to be a collection period. The period can be specifically set according to actual conditions.
In addition, it should be noted that, when collecting a plurality of resource files within a preset range, the method specifically includes: and filtering the small resource files by presetting a lower threshold of the size of the resource files.
In a specific implementation, the sizes (i.e., byte sizes) of different resource files are different. Because the resource files are directly collected from the preset range, the size spans of different resource files may be larger, and some resource files with smaller sizes may be more, which results in excessive long tail resources in the resource files. When the resource file is used as a basic operation unit in similarity operation (that is, similarity operation is performed on every two resource files in a plurality of resource files), the finally obtained super webpage template is large, which is not beneficial to practical application. And the meaning of adding the long tail resource into the super webpage template is not very large because the size of the long tail resource is too small. Therefore, in order to ensure that the size of the super webpage template is within a certain range, the size of the resource file is filtered. A threshold for the size of the resource is preset prior to screening. The threshold for the resource size will typically set a lower limit. Only when the size of the resource file is larger than the lower threshold of the resource size, the resource file can be selected to be used in the subsequent operation process. And the lower limit threshold of the resource size is set, so that some long-tail resources with smaller sizes can be effectively reduced, and the size of the super webpage template is reduced. Meanwhile, the filtering strategy is implemented, the generated super webpage template performs the same limitation on the used resources when in use, namely, in the process of sending the resource file with the undersize from the server to the client, the resource file is directly compressed by a compression algorithm without the need of carrying out differential operation with the super webpage template, and then is sent to the client.
It should be noted that the step of filtering the resource file is an optional step, and in an actual application process, the filtering step may be performed on the resource file, or all the collected resource files may be directly processed in S102.
S102: and performing similarity calculation on every two resource files in the plurality of resource files, removing one of the two resource files when the data are the same, and removing the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar. For example, one of the two resource files contains more than half of the content of the other resource file, and the resource file containing more than half of the content may be selected to be removed. By way of example from another aspect, the comparison of the contents of two resource files may comprise at least one of the following dimensions: comparison of data volumes, comparison of valuable information categories, comparison of information volumes that are not easily obtained by re-downloading, and the like.
In a specific implementation, the similarity calculation for the resource file is generally performed by using an existing delta algorithm, for example, an algorithm library such as xdelta3, open-vcdiff, and the like. Generally, in the delta algorithm, for a given source file (source) a, for some continuous data m existing in a, a matching data block m' is retrieved in the content of a template (template) B, and an offset instruction is generated (the data amount of the offset instruction is small and fixed, and only includes an offset address value and a data length); for m that cannot retrieve the matching data in B, if its content is the repeated data of a single character data, such as "CCCCCCCC", a repeat instruction is generated (the data amount of the repeat instruction is small and fixed, and its content is only a limited number of fixed bytes more than the offset instruction), and if its content is not repeated, a copy instruction is generated (the data amount of the copy instruction may be large, based on the length of the data block m itself). Finally, the instructions are organized sequentially to generate delta data.
Generally, based on a given homogeneous resource set within a certain preset range, similarity calculation is performed on every two resource files in a plurality of resource files, one of the two resource files is removed when data are the same, and if one of the two resource files contains all or most of the content of the other resource file when the data are similar, the resource file containing all or most of the content is removed; and merging the reserved resource files to finally generate a large data file, wherein the data file is the finally generated super webpage template. In the calculation process, the collected resource files form an operation set resource file data, and the homogeneous resource sets within the preset range are called as operation set resources. The file data of the operation set resource includes all the data in all the acquired resource files. In the similarity calculation of the resource files, the calculation is also performed on the resource file data of the calculation set.
The process of eliminating the same data and similar data in the resource file is to eliminate the repeated data in all the resource data (i.e. the resource file). For example, the document data of a certain operation set resource includes eight resource files a, b, c, d, a ', b', e, and f, where a and a 'are identical or similar resource files, b and b' are identical or similar resource files, and in the process of performing similarity operation, a ', b, and b' of identical or similar data can be obtained through delta operation, so in the process of removing identical data and similar data, the included one of a and a 'is removed, for example, a includes all or most of a', the included one of a ', b and b' is removed, for example, b includes all or most of b ', and b' is removed, so the super web template obtained by final operation should include: a. b, c, d, e, f.
In addition, it should be noted that, performing similarity calculation on two resource files in the resource files is to use each resource file as a basic calculation unit, and in the process of performing similarity calculation, perform difference calculation between the resource files. If the data in two resource files are identical (in this case, in the result of the delta operation, only the offset instruction should be included), then the two resource files are identical in content. Two resource files are considered to be similar resources if the data portions in the two resource files are the same and the same data portion occupies above a predetermined threshold of one of the resource files, for example, occupies above 95% of the resource file. If the data in the two resource files are completely different, or only a small part of the data is the same, and the same part occupies the two resource files below a preset threshold, for example, the two resource files are less than 95%, the two resource files are considered as different resource files.
It should be noted that the preset threshold value can be specifically set according to actual needs. Generally, if the conditions for culling are to be tightened, i.e., strict culling conditions, the preset threshold is increased. If the condition for culling is to be relaxed, the preset threshold is lowered.
In the similarity calculation process, a difference algorithm is used for calculation, and the obtained difference data is a difference calculation result. During this operation, the most basic computing units respectively obtain the resource roles called winner, loser and/or neutralizers according to the difference calculation result (which is used and/or because each basic computing unit may have different resource roles in different difference calculations).
Herein, if two resources (most basic arithmetic units) for performing similarity calculation are referred to as a first resource and a second resource, respectively. If the contents of the first resource and the second resource are the same, the first resource and the second resource are the same data; if the first resource contains all the content of the second resource, or the content of the second resource contained in the first resource exceeds a preset threshold value, the first resource and the second resource are similar data; the first resource is the winner and the second resource is the loser.
If the first resource and the other resources are neither the same data nor similar data, that is, the data in the first resource and the second resource are completely different, or the first resource and the second resource are only partially identical, and the identical part occupies the first resource and the second resource below a preset threshold, the first resource is a neutral resource.
In general, in the delta algorithm, a delta operation is performed on source file data based on template data, and a percentage obtained by dividing the size of the delta data by the size of the source file data is referred to as a delta rate. In the present invention, if the first resource is used as the template data and the second resource is used as the source file data, that is, the delta rate is the percentage occupied by the different parts (data after delta operation) in the second resource when the first resource and the second resource are partially the same. Vice versa, if the second resource is used as the template data and the first resource is used as the source file data, that is, the delta rate is the percentage of the first resource occupied by the different parts (data after delta operation) when the first resource and the second resource are partially the same. And when judging whether the two resources are similar resources or whether the two data are similar data, the delta rate is based on the magnitude of the delta rate. The preset should be consistent with this delta rate.
It should be noted that, in the first resource and the second resource for performing the delta operation in the present invention, the larger one of the data is used as the template data, and the smaller one of the data is used as the source data. If the data size of the first resource and the second resource is equal, any one of the first resource and the second resource can be used as template data, and the other one can be used as source data.
S103: and merging the reserved resource files to generate the super webpage template.
The method for generating the super webpage template comprises the steps of collecting a plurality of resource files within a preset range, carrying out similarity calculation on every two resource files in the collected resource files, removing one of the two resource files when the data are the same, removing the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar, simultaneously reserving data which are different or dissimilar from the other data, and finally taking the remaining data as the super webpage template. When the server issues the non-image WEB resource data, the non-image WEB resource data of the relevant page and the super webpage template are subjected to differential operation to obtain non-image WEB resource data different from the data in the super webpage template, and the different non-image WEB resource data are sent to the client. Therefore, the data volume of the issued non-image WEB resources can be reduced, the issuing speed of the resources is accelerated, and the response time consumption of page browsing is improved. The client only needs to consume shorter data receiving time to receive the non-image WEB resource data, so that the response speed of the server is increased, and the client experience is improved.
In the above embodiment, in order to achieve a better use effect of the obtained super webpage template, the retained resource files may be subjected to finer-grained similar data merging to generate the super webpage template. As will be described in detail below.
Referring to fig. 2, an embodiment of the present invention further provides a method for generating a super webpage template, where the method performs optimization processing on steps S102 and S103.
Specifically, the method for performing similarity calculation on every two resource files in the plurality of resource files specifically includes:
s201: and grouping the collected resource files into intervals according to the data volume contained in the resource files, wherein the resource files with the data volume in a certain interval are grouped into one group.
In a specific implementation, because the same or similar resource files are also generally similar in size, it generally does not happen that the resource content with a smaller size contains most of similar data of the resource content with a larger size. Therefore, the resource files can be grouped according to the preset resource file siz interval, and the grouping is carried out according to the size of the data volume contained in the resource files. By grouping, the number of subsequent pairwise difference operation can be effectively reduced, and the operation speed of the super webpage template during generation is increased. For example, the resource files are grouped according to the following resource file siz intervals: (204800,1024000000,),(102400,204800,),(51200,102400),(10240,51200),(1024,10240),(0,1024). Namely, the resource files with the size ranging from 0 byte to 1024 bytes are divided into one group, the resource files with the size ranging from 1024 bytes to 10240 bytes are divided into one group, … …, and the resource files with the size ranging from 204800 bytes to 1024000000 bytes are divided into one group. Typically, each group includes at least one resource file. And, in particular, some packets may not include resource files. In this example, the lower limit of the resource size threshold is not set at the time of grouping. If the lower limit of the resource size threshold is set, the minimum byte size of the size minimum one packet should be the lower limit of the resource size threshold. For example, when the preset resource size threshold is 1024 bytes, the packet may be: (204800,1024000000),(102400,204800),(51200,102400),(10240,51200),(1024,10240).
S202: and carrying out similarity calculation on every two resource files in the plurality of resource files in each group.
In the specific implementation, pairwise difference operation in each group is performed for the resource files in each group. And in particular if only one resource file is contained in a certain packet, the resource file is reserved directly. If the resource file is not contained in a certain packet, the packet is directly rejected.
In addition, when pairwise difference operation is performed on a plurality of resource files in each group, there may be several ways as follows:
1. according to the organization sequence of the resource files, reading two resource files each time from front to back to perform delta operation.
For example: in a group containing A, B, C, D, E, F six resource files, the order of performing the delta operations may be:
①, performing delta operation on A and B;
②, if A and B are the same or similar resource files, eliminating the resource files with small data size, and carrying out difference operation on the residual resource files and C;
③, if two of A and B are not the same or similar, then both resource files are reserved, and then the two resource files are respectively subjected to delta operation … … with C;
until all resource files are finally subjected to pairwise difference operation with other reserved resource files.
2. According to the organization sequence of the resource files, sequentially reading the two resource files from front to back to perform differential operation, and performing differential operation on the reserved resource files and other reserved resource files.
For example: in a group containing A, B, C, D, E, F six resource files, the order of performing the delta operations may be:
①, performing difference calculation on A and B, performing difference calculation on C and D, and performing difference calculation on E and F;
②, if the three difference operations eliminate a same resource or similar resource with smaller data quantity, then two-by-two difference operations are performed between the remaining three resource files until eliminating the same resource or similar resource with smaller data quantity, and if none of the three difference operations eliminate any resource file, then two-by-two difference operations are performed between the reserved resource file and other reserved resource files until eliminating the resource with smaller data quantity in all the same or similar two resource files in the difference operations.
And when the data are similar, if one of the two resource files contains all or most of the content of the other resource file, the resource file containing all or most of the content is removed.
S203: and merging the reserved resource files of the group with the maximum interval value to generate a temporary webpage template.
S204: and respectively carrying out similarity calculation on the temporary webpage template and the resource files reserved by other groups, rejecting the resource files when the data of the resource files in one group is the same as or similar to the data of the temporary webpage template, otherwise combining the resource files into the temporary webpage template, and continuously carrying out similarity calculation on the temporary webpage template and the resource files reserved by the next group in the same way, wherein the finally generated temporary webpage template is a super webpage template.
In a specific implementation, when performing delta operation, generally, a resource with a larger size of two resources participating in the delta operation is used as template data, and another resource with a smaller size is used as source data to perform the delta operation, and if the two resources are the same or similar resources, the source data is discarded. Therefore, in pairwise difference calculation between groups, the group with the largest size interval in a plurality of groups of data files generated after pairwise difference calculation in the groups is taken as a basic group, and reserved resource files are combined to generate a temporary webpage template; i.e. the resource files in the group with the largest size interval are used as template data. It should be noted that only one temporary web page template is generated in the basic group, and the temporary web page template contains all the different resource files in the basic group (the group with the largest interval value).
And taking other groups as comparison groups, and taking the resource files in all the comparison groups as source data. And performing pairwise difference operation on the resource files in each comparison group, removing the loser resource files by using the removing method described above, and performing difference operation between groups on the resource files reserved in each comparison group and the temporary webpage template. In the process of the differential operation, if the data in a certain resource file in the comparison group and the data in the temporary webpage template are the same data or similar data, the resource file is removed. If not, the resource file is merged into the temporary webpage template of the basic group to generate a new temporary webpage template.
In a preferred embodiment, the reserved resource files are intercepted and reserved in the middle section with different strategies according to different conditions, and head and tail data are removed.
In the specific implementation process, the middle section intercepting strategy is based on the assumption that most of key data are concentrated in the middle section of the resource file, and the size of the super webpage template is effectively reduced by removing the head and tail data of each resource file. In a specific operation, based on the super webpage template output in S204, according to different situations of the resource file in the super webpage template (i.e. according to different resource roles of the resource file in the super webpage template), middle segments with different policies are intercepted and reserved. For example, for a neutral, the segment interception policy may be relatively aggressive, that is, only a small amount of middle segment data may be intercepted, and more data at the beginning and end may be removed; for the winner, because the winner contains more similar data of other resource files (i.e. the resource files that are removed), the segment intercepting policy can be relatively conservative, i.e. more middle segment data needs to be intercepted. And different resource roles, wherein the percentage of the intercepted data in the corresponding resource file in the resource file is different. Namely, different data ranges can be set for intercepting the middle section of the resource file according to different resource roles.
For example, when a data range in the middle-segment truncation is preset, the configuration parameters specifically include:
FILE_DEFEATER:[(1024000,0.4),(512000,0.5),(409600,0.6),(307200,0.7),(204800,0.8),(102400,0.9),],
FILE_NEUTRALIER:[(1024000,0.1),(512000,0.1),(307200,0.15),(204800,0.3),(102400,0.6),]
in each value pair, the first parameter is the lower limit value of the resource file size, and the second parameter represents the percentage of only the middle section of the reserved file. For example, when the resource file of the winner is intercepted in the middle section, if the size of the resource file is 204800 to 307200 bytes, 10% of data of the first section and 10% of the data of the last section of the file are removed, and only 80% of the data of the middle section is reserved. It can be seen that, in the configuration parameters of the example, the interception (retention) ratio of the middle segment of the winner is relatively more conservative than that of the middle segment of the winner, because the resource file of the winner contains most of similar contents of other removed resource files, more data needs to be retained; and the neutral has no obvious similarity with other resource files except the neutral, so that less data is reserved.
In a preferred embodiment, in order to achieve a better use effect of the super webpage template, the method further considers performing similarity calculation on the line resource data of the super webpage template, when the data are the same, one of the two lines of resource data is removed, when the data are similar, if one of the two lines of resource data contains all or most of the content of the other line of resource data, the line resource data containing all or most of the content is removed, and finally a simplified super webpage template is generated, and the method specifically comprises the following steps:
splitting the generated super webpage template line by line;
and comparing every two split lines of resource data from front to back in sequence, removing one of the two lines of resource data when the data are the same, and removing the line resource data containing all or most of the content if one line of resource data in the two lines of resource data contains all or most of the content of the other line of resource data when the data are similar to generate the simplified super webpage template.
Generally, in the organization of Javascript resource content, it is basically followed that each function module is written in a row unit, and spaces between data in a row are reduced as much as possible, so as to reduce the resource size. Therefore, each row of data generally includes at least one function block. The same property of organizing data at row granularity is true for CSS resources. Based on the logic administration principle or the data multiplexing principle between related Javascript resources or CSS resources, the same or similar line data may exist in different individual similar resources (for example, two Javascript line resources may include line data of the same function module).
Generally, when comparing every two rows of resource data, a delta operation method is generally used.
In the comparison process, if two rows of data are identical (in this case, in the result of the delta operation, only the offset instruction should be included), the two rows of data are said to be identical in content. If the data portions in the two line data are the same and the same portion occupies above a preset threshold of one of the line data, for example, occupies above 95% of the line data, the two line data are considered to be similar data. Two line data contents are considered to be different if the data in the two line data are completely different or only partially identical and the same portion occupies below a preset threshold in both line data, for example less than 95% in both line data.
It should be noted that the preset threshold value can be set according to actual needs. The setting principle is similar to the similarity calculation based on the file granularity, and is not described in detail herein.
After the super webpage template file is split into a plurality of row data files, pairwise difference operation is carried out on the row data files from front to back according to the file organization sequence of the row data. The difference-by-difference calculation is similar to the above-mentioned method for performing difference-by-difference calculation on the plurality of resource files in each group, and thus is not described again. And judging whether the two row data resource files are the same resource or not according to the result of the differential operation. And if the resources are the same, removing the losers in the resources, and taking the resource data left after the losers are removed as a simplified super webpage template.
In another preferred embodiment, in order to enable the super webpage template to achieve a better use effect, the present invention further considers a block granularity-based delta operation method for a simplified super webpage template, and the method specifically includes:
dividing line resource data with large data volume in the generated simplified super webpage template into a plurality of block data;
and performing similarity calculation on each block of data and other rows of resource data, and removing the block of data which is the same as or similar to the other rows of resource data.
Specifically, the line data with a large data size is further refined. The longer row is divided into a plurality of block data according to the specified number of bytes, the size of the block data can be specifically set according to actual needs, and part of the block data is possibly the same as or similar to other row resource data, and the block data which is the same as or similar to other row resource data is removed. The similarity operation based on the block granularity is carried out on the simplified super webpage template, so that similar data can be effectively removed, and the data volume in the super webpage template is further reduced.
The above various preferred embodiments can be freely combined to enable the generation of the best super web page template.
An embodiment of the present invention further provides a device for generating a super webpage template, and referring to fig. 3, the device for generating a super webpage template provided by the embodiment of the present invention includes:
the acquisition module is used for acquiring a plurality of resource files within a preset range;
the similarity identification and elimination module is used for carrying out similarity calculation on every two resource files in the plurality of resource files, identifying two resource files with the same or similar data, eliminating one of the two resource files when the data are the same, and eliminating the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar;
and the template generating module is used for combining the reserved resource files to generate the super webpage template.
The resource file is a WEB page data resource file; the preset range includes: presetting a WEB site, presetting a path of the WEB site, or presetting a resource keyword.
In a preferred embodiment, the similarity identifying and eliminating module further includes:
the interval grouping submodule is used for carrying out interval grouping on the plurality of acquired resource files according to the data volume contained in the resource files, and the plurality of resource files with the data volume in a certain interval are divided into a group;
and the similarity identification submodule is used for carrying out similarity calculation and identification on every two resource files in the plurality of resource files in each group.
In a preferred embodiment, the template generation module includes:
merging the submodules: the method comprises the steps of merging the reserved resource files of the group with the maximum interval value to generate a temporary webpage template;
and the similarity operation and generation sub-module is used for performing similarity operation on the temporary webpage template and each group of other reserved resource files, rejecting the resource file when the data of the resource file in one group is the same as or similar to the data of the temporary webpage template, otherwise merging the resource file into the temporary webpage template, and continuing performing similarity operation on the resource file in the next group in the same way, wherein the finally generated temporary webpage template is a super webpage template.
In a preferred embodiment, the acquiring a plurality of resource files within a preset range includes: and filtering the small resource files by presetting a lower threshold of the size of the resource files.
In a preferred embodiment, further comprising: and the middle section intercepting and reserving module is used for intercepting and reserving the reserved resource files according to different strategies under different conditions and removing head and tail data.
In a preferred embodiment, further comprising:
and the line resource splitting and generating module is used for splitting the generated super webpage template line by line, sequentially comparing every two split lines of resource data from front to back, removing one of the two lines of resource data when the data are the same, removing the line resource data containing all or most of the content if one of the two lines of resource data contains all or most of the content of the other line of resource data when the data are similar, and finally generating the simplified super webpage template.
In a preferred embodiment, further comprising:
and the block data splitting and removing module is used for splitting the row resource data with large data volume in the generated simplified super webpage template into a plurality of block data, carrying out similarity calculation on each block data and other row resource data, and removing the block data which is the same as or similar to other row resource data.
In this embodiment, the specific functions and interaction modes of the functional modules may refer to the descriptions in the embodiments corresponding to fig. 1 and 2, and are not described herein again.
The device for generating the super webpage template provided by the embodiment of the invention has the beneficial effects that: collecting resource files in a preset range through a collection module, performing similarity operation on every two resource files in the collected resource files through a similarity identification and rejection module, identifying two resource files with the same or similar data, rejecting one of the two resource files when the data are the same, rejecting the resource file containing all or most of the content if one of the two resource files contains all or most of the content of the other resource file when the data are similar, simultaneously reserving data which are different or dissimilar to the other data, finally using the remaining data as a super webpage template, performing differential operation on the non-image WEB resource data of the related page and the super webpage template when the server issues the non-image WEB resource data to obtain the non-image WEB resource data which are different from the data in the super webpage template, and sending the different non-image WEB resource data to the client. The client only needs to consume shorter data receiving time to receive the non-image WEB resource data, so that the response speed of the server is increased, and the client experience is improved.
In addition, an embodiment of the present invention further provides a page data transmission method, as shown in fig. 4, the method includes:
s301: acquiring current non-image WEB resource data of a page requested by a client;
s302: performing difference operation on the non-image WEB resource data by using a pre-established super webpage template corresponding to the page to acquire difference data; wherein, the super webpage template is generated according to the super webpage template generation method of the embodiment or the combination thereof;
s303: sending the delta data to a client.
According to the page data transmission method provided by the embodiment of the invention, the super webpage template is generated by the super webpage template generation method provided by the embodiment. When a server receives a client request of a client, current non-image WEB resource data of a WEB page corresponding to the client request is obtained according to the client request, differential operation is carried out on the non-image WEB resource data by using a pre-established super webpage template corresponding to the WEB page, and differential data is obtained, wherein the differential data is the differential data between the non-image WEB resource data of the WEB page and the super webpage template. The server sends the delta data to the client where all the data of the super web template has been included. The client displays the required WEB page to the user by combining the differential data and the super webpage template. In the process, the data sent by the server to the client is only the differential data, so that the data volume of the non-image WEB resources sent to the client is reduced to a great extent, the sending speed of the resources is increased, the flow consumed during sending the data is reduced, and the response time consumption of page browsing is improved.
The method and the computer program product of the system provided by the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and details are not described herein.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.