CN111783006A - Page generation method and device, electronic equipment and computer readable medium - Google Patents

Page generation method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN111783006A
CN111783006A CN202010713440.7A CN202010713440A CN111783006A CN 111783006 A CN111783006 A CN 111783006A CN 202010713440 A CN202010713440 A CN 202010713440A CN 111783006 A CN111783006 A CN 111783006A
Authority
CN
China
Prior art keywords
page
data
generating
file
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010713440.7A
Other languages
Chinese (zh)
Inventor
黄富华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202010713440.7A priority Critical patent/CN111783006A/en
Publication of CN111783006A publication Critical patent/CN111783006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1475Passive attacks, e.g. eavesdropping or listening without modification of the traffic monitored

Abstract

The disclosure relates to a page generation method and device, electronic equipment and a computer readable medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring page data required by generating a current page; randomly generating a data name of the page data, and generating a hypertext file of the page according to the page data; generating a corresponding page style file according to the data name of the page data; and introducing the page style file into a hypertext file of the page, and generating the page according to the hypertext file of the page after the page style file is introduced. According to the method and the device, the data name of the page data is randomly generated, so that a web crawler can be prevented from acquiring the data from the page, and the safety of the page is improved at a lower cost.

Description

Page generation method and device, electronic equipment and computer readable medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a page generation method, a page generation apparatus, an electronic device, and a computer-readable medium.
Background
With the development of the internet, websites serve as carriers of a large amount of information, and many websites with high access amount and large amount of information have many web crawlers to automatically capture data on the websites, which may adversely affect the websites.
The existing crawler-prevention scheme mainly displays some important data of a webpage by using pictures or prevents a web crawler from acquiring the data by adding verification codes, but the method has higher cost and more complex steps.
In view of the above, there is a need in the art for a method for effectively preventing web crawlers from acquiring web page data at a low cost.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a page generation method, a page generation apparatus, an electronic device, and a computer-readable medium, which effectively prevent a web crawler from acquiring data in a web page at least to a certain extent at a low cost.
According to a first aspect of the present disclosure, there is provided a page generation method, including:
acquiring page data required by generating a current page;
randomly generating a data name of the page data, and generating a hypertext file of the page according to the page data;
generating a corresponding page style file according to the data name of the page data;
and introducing the page style file into a hypertext file of the page, and generating the page according to the hypertext file of the page after the page style file is introduced.
In an exemplary embodiment of the present disclosure, the method further comprises:
randomly inserting interference tags into the hypertext document, wherein the interference tags are tags which are irrelevant to the generation of the page.
In an exemplary embodiment of the present disclosure, the hypertext document includes a document header, and the randomly inserting an interference tag in the hypertext document includes:
and randomly inserting an interference tag in a file header of the hypertext file.
In an exemplary embodiment of the present disclosure, the randomly generating the data name of the page data includes:
determining key data in the page data according to the data content of the page data;
and randomly generating the data name of the key data in the page data.
In an exemplary embodiment of the present disclosure, the randomly generating the data name of the page data includes:
and randomly generating a plurality of character strings with unique identification codes, and respectively using the character strings as data names of various groups of data in the page data.
In an exemplary embodiment of the present disclosure, the randomly generating the data name of the page data includes:
randomly generating a plurality of random character strings, and respectively combining a fixed character string with the plurality of random character strings to obtain the original data name of each group of data in the page data;
and respectively converting the original data name into a plurality of hash character strings through a hash algorithm, and respectively using the plurality of hash character strings as the data names of each group of data in the page data.
In an exemplary embodiment of the present disclosure, the method further comprises:
acquiring a length threshold of the data name;
and limiting the character string length of the data name according to the length threshold of the data name.
In an exemplary embodiment of the present disclosure, the introducing the page style file into the hypertext document of the page includes:
acquiring a file address of the page style file;
and introducing the page style file into the hypertext file of the page according to the file address of the page style file.
According to a second aspect of the present disclosure, there is provided a page generation apparatus, including:
the page data acquisition module is used for acquiring page data required by generating a current page;
the data name generation module is used for randomly generating a data name of the page data and generating a hypertext file of the page according to the page data;
the page file generation module is used for generating a corresponding page style file according to the data name of the page data;
and the page generation module is used for introducing the page style file into the hypertext file of the page and generating the page according to the hypertext file of the page after the page style file is introduced.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method for generating a page of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method of generating a page of any of the above.
The exemplary embodiments of the present disclosure may have the following advantageous effects:
in the method for generating a page according to the exemplary embodiment of the disclosure, the data name of the page data is randomly generated in the process of generating the page, and different randomly generated character strings can be used as the name or the identifier of the data in the page every time the page is accessed, so that a web crawler can be effectively prevented from capturing the data in the page according to the data name of the page. The method for generating the page in the disclosed example embodiment not only has high success rate and efficiency of preventing the crawler, improves the security of the website to a certain extent, but also can greatly reduce the design cost of preventing the crawler of the page.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a diagram illustrating an exemplary system architecture of a page generation method and apparatus to which embodiments of the present invention may be applied;
FIG. 2 shows a flowchart diagram of a method of generating a page of an example embodiment of the present disclosure;
FIG. 3 illustrates a flow diagram for randomly generating a data name through a hashing algorithm according to an example embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram for length-defining data names according to an example embodiment of the present disclosure;
FIG. 5 illustrates a flowchart of importing a page style file according to an example embodiment of the present disclosure;
FIG. 6 shows a block diagram of a page generation apparatus of an example embodiment of the present disclosure;
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a page generation method and apparatus according to an embodiment of the present invention may be applied.
As shown in fig. 1, the system architecture 100 may include a plurality of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wireless communication links and the like.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The terminal devices 101, 102, 103 may be various electronic devices having a processor including, but not limited to, smart phones, tablet computers, portable computers, and the like. The server 105 may be a server that provides various services. For example, the terminal devices 101, 102, and 103 may acquire, by the processor, page data required to generate a current page, and upload the page data to the server 105. The server 105 may randomly generate a data name of the page data and generate a hypertext document of the page according to the page data; generating a corresponding page style file according to the data name of the page data; and introducing the page style file into a hypertext file of the page, and generating the page according to the hypertext file of the page.
A web crawler (also called web spider, web robot) is a program or script that automatically captures web page information according to certain rules.
In some related embodiments, the web crawler may be prevented from crawling the data information in the page by:
1. and identifying the access subject, judging whether the current access is a user or a machine crawler, and then returning different contents according to the identification result.
2. Some important data in the webpage are displayed by using pictures, so that difficulty is caused for a web crawler to crawl and use the data.
3. Whether the same IP (Internet Protocol) is requested for multiple times in a certain time period is judged, and a threshold value is set to control the access frequency of the IP.
4. Add the verification code.
However, the above method has some problems as follows:
1. for the method for judging whether the user or the crawler accesses, many crawlers access web pages by opening a headless browser and then crawl the data. For this access, the web site cannot determine whether the web site is accessed by the user or the machine because the crawler actually opens the web page and then crawls the data.
2. The text data in the webpage is displayed through the pictures, and the cost of the scheme is high. If the content in the webpage comprises characters and data and the character style needs to be modified, a new picture needs to be regenerated every time the character style needs to be modified.
3. The access frequency of the IP is controlled, the scheme can only identify the IP as the web crawler and control the access frequency of the IP when the request amount in the same time period reaches a certain threshold, and the IP cannot be effectively identified when the request amount does not reach the threshold.
4. The method for adding the verification code needs to add a verification interface every time when entering the webpage, so that the cost is high, and operation steps when a user enters the webpage are added, so that inconvenience is brought to the user.
It can be seen that the method for preventing web crawlers in the above embodiments has the problems of high cost, or incomplete effective prevention of web crawlers, and the like.
Therefore, in view of the above problems, the present exemplary embodiment provides a page generation method capable of effectively preventing web crawlers and reducing costs, in the first place. Referring to fig. 2, the method for generating the page may include the following steps:
and S210, acquiring page data required by generating the current page.
S220, randomly generating a data name of the page data, and generating a hypertext file of the page according to the page data.
And S230, generating a corresponding page style file according to the data name of the page data.
And S240, introducing the page style file into the hypertext file of the page, and generating the page according to the hypertext file of the page after the page style file is introduced.
The web crawler generally obtains HTML (Hyper Text markup Language) of a web page, then analyzes data in the HTML file, finds a data name (such as a data ID or a data class name) of the data, and then obtains data corresponding to the data name. Therefore, in the present exemplary embodiment, the data name of the page data may be randomly generated while the page is generated, so that the data name of the page data is different every time the page is opened, and thus, even if the crawler acquires the HTML file of the web page, the crawler cannot acquire the corresponding page data according to the data name.
For example, in an HTML file of the game ranking list website, the category name of the ranking list is "rank-list", the category name of the ranking list data is "rank-item", and a crawler obtains the category names "rank-list" and "rank-item" by acquiring the category names, and then analyzes corresponding data content, thereby obtaining ranking list information. Therefore, according to the method in the present exemplary embodiment, after the page acquires the ranking list data, the HTML structure of the page is rendered, and the class name of the data is randomly generated. The class names in the HTML structure generated by the method are not fixed 'rank-list' and 'rank-item', but are changed into irregular randomly generated character strings such as 'abc' or 'def', and therefore the web crawler can be prevented from acquiring corresponding data according to the fixed data names.
In the method for generating a page according to the exemplary embodiment, the data name of the page data is randomly generated in the process of generating the page, so that different randomly generated character strings can be used as the name or the identifier of the data in the page every time the page is accessed, and a web crawler can be effectively prevented from capturing the data in the page according to the data name of the page. The method for generating the page in the disclosed example embodiment not only has high success rate and efficiency of preventing the crawler, improves the security of the website to a certain extent, but also can greatly reduce the design cost of preventing the crawler of the page.
The above steps of the present exemplary embodiment will be described in more detail with reference to fig. 3 to 6.
In step S210, page data necessary for generating the current page is acquired.
In this exemplary embodiment, before generating the page, the server needs to request page data of the background, and perform subsequent steps according to the page data returned by the background.
In step S220, a data name of the page data is randomly generated, and a hypertext document of the page is generated from the page data.
In this exemplary embodiment, the data names of all the page data may be randomly generated, or the data names of the key data in the page data may be randomly generated after determining the key data in the page data according to the data content of the page data. The key data refers to important data in the page, for example, the ranking list data in the ranking list page.
A hypertext document of a page refers to an HTML document of the page. HTML is a markup language that includes a series of tags that unify the format of documents on a network, linking discrete network resources into a logical whole. Commands in HTML can specify text, graphics, animations, sounds, tables, links, etc. The method is characterized in that the HTML language is used, information required to be expressed is written into HTML files according to certain rules, the HTML files are identified through a special browser, and the HTML files are translated into identifiable information, namely the web pages seen at present, so that the nature of the web pages is the hypertext markup language.
In this exemplary embodiment, the data name of each group of data or the data name of the key data in the page data may be obtained by randomly generating a plurality of character strings having unique identification codes.
In this exemplary embodiment, a plurality of character strings having Unique identifiers may be obtained by a method of generating a UUID (universal Unique Identifier). The UUID allows all data names to have unique identifying information so that each data can create a UUID that does not conflict with other data. In such a case, the duplication of data names need not be considered. The UUID is calculated based on the current time, a counter (counter), and hardware identification (typically the MAC address of the wireless network card). The probability of duplicate UUIDs being generated and causing errors is very low, so this problem can be ignored.
In this exemplary embodiment, the data name of each group of data in the page data or the data name of the key data may also be generated by a hash function. As shown in fig. 3, the method for randomly generating a data name through a hash function may specifically include the following steps:
s310, generating a plurality of random character strings randomly, and combining a fixed character string with the plurality of random character strings respectively to obtain the original data name of each group of data in the page data.
In this exemplary embodiment, a random string or a random number may be added to a fixed string to obtain the original data name of each group of data.
And S320, converting the original data name into a plurality of hash character strings respectively through a hash algorithm, and taking the hash character strings as the data name of each group of data in the page data respectively.
The Hash algorithm transforms an input of arbitrary length into an output of fixed length, which is a Hash value, by a Hash (generally translation Hash, or transliteration Hash) function.
In this exemplary embodiment, the original data name of each group of data may be converted into a hash value, which may be a character string, by a hash algorithm, and the obtained hash value is used as the data name of each group of data in the page data.
In addition to the above two methods, there are many methods for randomly generating data names, and the present exemplary embodiment is not particularly limited as long as a character string that is different as much as possible can be randomly generated each time.
After the data name of the page data is randomly generated by the method, the length of the data name can be limited by several steps as shown in fig. 4:
and S410, acquiring a length threshold of the data name.
And S420, limiting the character string length of the data name according to the length threshold of the data name.
In the present exemplary embodiment, since the length of the randomly generated character string may be long, and the HTML file may be large, it is necessary to limit the length of the character string in the data name. And acquiring a length threshold of a preset data name, intercepting a part with a corresponding length in the character string according to the length threshold, and taking the intercepted character string as the data name of each group of data in the page data.
In step S230, a corresponding page style file is generated according to the data name of the page data.
The page style file refers to a CSS style file. CSS (Cascading Style Sheets) is a computer language used to represent the Style of HTML documents, provides a Style description for HTML markup language, defines the display modes of elements therein, such as fonts, colors, positions, etc., and can be used to describe the information formatting and display modes on web pages. The CSS style file may be stored directly in the HTML web page or a separate style sheet file. For external use, it can be placed in an external style sheet document with file extension _ css.
In the present exemplary embodiment, after dynamically generating a random data name and obtaining an HTML file of a page, a CSS style file corresponding to the data name needs to be generated and introduced.
In step S240, the page style file is imported into the hypertext file of the page, and the page is generated according to the hypertext file of the page after the page style file is imported.
In this example embodiment, the page may include a static page and a server-rendered page.
If the page is rendered by the server, after the random data name is generated, the corresponding CSS style file is generated according to the data name, and then the CSS style file is introduced into the HTML file.
If the page is a static page, as shown in fig. 5, introducing the page style file into the hypertext file of the page may specifically include the following steps:
and step S510, acquiring a file address of the page style file.
And S520, introducing the page style file into the hypertext file of the page according to the file address of the page style file.
And if the page is a static page, a service is required to be newly established at the server side to receive the generated random data name, and then a corresponding CSS style file is generated according to the data name. The static page firstly acquires the file address of the CSS style file, and then introduces the CSS style file into the HTML file according to the file address.
In addition to the above steps, the method for generating a page provided in this exemplary embodiment may further include:
and randomly inserting interference tags into the hypertext file, wherein the interference tags are tags which are irrelevant to the generation of the page.
The complete HTML file includes at least HTML tags, header tags, title tags, and body tags, and these tags are present in pairs, with content being added between each two tags. After the crawler acquires the HTML file of the webpage, the layout of the HTML needs to be known, so that some interference tags can be dynamically inserted, the interference tags do not influence the display of the page, but the HTML structure of the page is disordered, the layout structure of the HTML can be randomly changed when the HTML is accessed every time, and the crawler is prevented from acquiring data through the same HTML structure.
Further, an interference tag may be randomly inserted in a header of the hypertext document.
In general, when a crawler acquires tags in an HTML structure, the crawler acquires the tags from top to bottom, that is, the head tags are acquired first, and then other tags are acquired. Therefore, interference tags can be preferentially and randomly inserted into the head of the hypertext file, and a crawler can be effectively prevented from acquiring the HTML structure of the page.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Furthermore, the disclosure also provides a device for generating the page. Referring to fig. 6, the page generation means may include a page data acquisition module 610, a data name generation module 620, a page file generation module 630, and a page generation module 640. Wherein:
the page data obtaining module 610 may be configured to obtain page data required for generating a current page;
the data name generating module 620 may be configured to randomly generate a data name of the page data, and generate a hypertext document of the page according to the page data;
the page file generating module 630 may be configured to generate a corresponding page style file according to a data name of the page data;
page generation module 630 may be configured to import a page style file into a hypertext file for a page and generate a page from the hypertext file for the page after the page style file is imported.
In some exemplary embodiments of the present disclosure, a device for generating a page provided by the present disclosure may further include an interference tag insertion module, which may be configured to randomly insert an interference tag in a hypertext document, where the interference tag is a tag unrelated to the generation of the page.
In some exemplary embodiments of the present disclosure, the interference tag inserting module may include a header tag inserting unit that may be used to randomly insert the interference tag in a file header of the hypertext file.
In some exemplary embodiments of the present disclosure, the data name generation module 620 may include a key data determination unit and a data name generation unit. Wherein:
the key data determining unit may be configured to determine key data in the page data according to data content of the page data;
the data name generating unit may be configured to randomly generate data names of key data in the page data.
In some exemplary embodiments of the present disclosure, the data name generating module 620 may further include a unique identification string generating unit, which may be configured to randomly generate a plurality of character strings with unique identification codes, and use the plurality of character strings as data names of respective sets of data in the page data, respectively.
In some exemplary embodiments of the present disclosure, the data name generating module 620 may further include an original name determining unit and a data name converting unit. Wherein:
the original name determining unit can be used for randomly generating a plurality of random character strings and combining a fixed character string with the plurality of random character strings respectively to obtain the original data name of each group of data in the page data;
the data name conversion unit may be configured to convert the original data name into a plurality of hash strings through a hash algorithm, and use the plurality of hash strings as data names of each group of data in the page data.
In some exemplary embodiments of the present disclosure, the data name generating module 620 may further include a length threshold acquiring unit and a data name defining unit. Wherein:
the length threshold value obtaining unit may be configured to obtain a length threshold value of the data name;
the data name defining unit may be configured to define a string length of the data name according to a length threshold of the data name.
In some exemplary embodiments of the present disclosure, the page generating module 630 may include a file address obtaining unit and a page file importing unit. Wherein:
the file address obtaining unit may be configured to obtain a file address of the page style file;
the page file importing unit may be configured to import the page style file into the hypertext file of the page according to a file address of the page style file.
The specific details of each module/unit in the page generating device have been described in detail in the corresponding method embodiment section, and are not described herein again.
FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
It should be noted that the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not bring any limitation to the function and the scope of the application of the embodiment of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for system operation are also stored. The CPU701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to an embodiment of the present invention, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 701.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.
It should be noted that although in the above detailed description several modules of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A method for generating a page, comprising:
acquiring page data required by generating a current page;
randomly generating a data name of the page data, and generating a hypertext file of the page according to the page data;
generating a corresponding page style file according to the data name of the page data;
and introducing the page style file into a hypertext file of the page, and generating the page according to the hypertext file of the page after the page style file is introduced.
2. The method for generating a page according to claim 1, wherein the method further comprises:
randomly inserting interference tags into the hypertext document, wherein the interference tags are tags which are irrelevant to the generation of the page.
3. The method for generating pages according to claim 2, wherein said hypertext document comprises a document header, and said randomly inserting an interference tag in said hypertext document comprises:
and randomly inserting an interference tag in a file header of the hypertext file.
4. The method for generating a page according to claim 1, wherein the randomly generating a data name of the page data includes:
determining key data in the page data according to the data content of the page data;
and randomly generating the data name of the key data in the page data.
5. The method for generating a page according to claim 1, wherein the randomly generating a data name of the page data includes:
and randomly generating a plurality of character strings with unique identification codes, and respectively using the character strings as data names of various groups of data in the page data.
6. The method for generating a page according to claim 1, wherein the randomly generating a data name of the page data includes:
randomly generating a plurality of random character strings, and respectively combining a fixed character string with the plurality of random character strings to obtain the original data name of each group of data in the page data;
and respectively converting the original data name into a plurality of hash character strings through a hash algorithm, and respectively using the plurality of hash character strings as the data names of each group of data in the page data.
7. The method for generating a page according to claim 5 or 6, characterized in that it further comprises:
acquiring a length threshold of the data name;
and limiting the character string length of the data name according to the length threshold of the data name.
8. The method for generating a page as claimed in claim 1, wherein said page comprises a static page, and said introducing said page style file into a hypertext document of said page comprises:
acquiring a file address of the page style file;
and introducing the page style file into the hypertext file of the page according to the file address of the page style file.
9. An apparatus for generating a page, comprising:
the page data acquisition module is used for acquiring page data required by generating a current page;
the data name generation module is used for randomly generating a data name of the page data and generating a hypertext file of the page according to the page data;
the page file generation module is used for generating a corresponding page style file according to the data name of the page data;
and the page generation module is used for introducing the page style file into the hypertext file of the page and generating the page according to the hypertext file of the page after the page style file is introduced.
10. An electronic device, comprising:
a processor; and
memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of generating a page as claimed in any one of claims 1 to 8.
11. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of generating a page according to any one of claims 1 to 8.
CN202010713440.7A 2020-07-22 2020-07-22 Page generation method and device, electronic equipment and computer readable medium Pending CN111783006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010713440.7A CN111783006A (en) 2020-07-22 2020-07-22 Page generation method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010713440.7A CN111783006A (en) 2020-07-22 2020-07-22 Page generation method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN111783006A true CN111783006A (en) 2020-10-16

Family

ID=72763825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010713440.7A Pending CN111783006A (en) 2020-07-22 2020-07-22 Page generation method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN111783006A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022111591A1 (en) * 2020-11-26 2022-06-02 北京有竹居网络技术有限公司 Page generation method and apparatus, storage medium, and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2340298A1 (en) * 2000-03-09 2001-09-09 Ethereal Minds, Inc. System and method for dynamically managing web content using a browser-independent framework
US20050091583A1 (en) * 2000-10-30 2005-04-28 Microsoft Corporation String template pages for generating HTML document
CN106960158A (en) * 2017-03-22 2017-07-18 福建中金在线信息科技有限公司 A kind of method and apparatus for preventing blog from being retrieved by web crawlers
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN110442815A (en) * 2019-06-24 2019-11-12 北京奇艺世纪科技有限公司 Page generation method, system, device and computer readable storage medium
CN111212033A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 Page display method and device based on combined web crawler defense technology and electronic equipment
CN111339548A (en) * 2018-12-18 2020-06-26 北京京东尚科信息技术有限公司 Anti-crawler data processing method, browser, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2340298A1 (en) * 2000-03-09 2001-09-09 Ethereal Minds, Inc. System and method for dynamically managing web content using a browser-independent framework
US20050091583A1 (en) * 2000-10-30 2005-04-28 Microsoft Corporation String template pages for generating HTML document
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN106960158A (en) * 2017-03-22 2017-07-18 福建中金在线信息科技有限公司 A kind of method and apparatus for preventing blog from being retrieved by web crawlers
CN111339548A (en) * 2018-12-18 2020-06-26 北京京东尚科信息技术有限公司 Anti-crawler data processing method, browser, computer equipment and storage medium
CN110442815A (en) * 2019-06-24 2019-11-12 北京奇艺世纪科技有限公司 Page generation method, system, device and computer readable storage medium
CN111212033A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 Page display method and device based on combined web crawler defense technology and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022111591A1 (en) * 2020-11-26 2022-06-02 北京有竹居网络技术有限公司 Page generation method and apparatus, storage medium, and electronic device

Similar Documents

Publication Publication Date Title
CN104766014B (en) For detecting the method and system of malice network address
US9060007B2 (en) System and methods for facilitating the synchronization of data
US10430514B2 (en) Method and terminal for extracting webpage content, and non-transitory storage medium
CN108090351B (en) Method and apparatus for processing request message
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
US20140164296A1 (en) Chatbot system and method with entity-relevant content from entity
US20160359989A1 (en) Recording And Triggering Web And Native Mobile Application Events With Mapped Data Fields
CN103577427A (en) Browser kernel based web page crawling method and device and browser containing device
WO2014153457A1 (en) Merging web page style addresses
CN112637361B (en) Page proxy method, device, electronic equipment and storage medium
WO2022134776A1 (en) Label-based anti-crawler method and apparatus, computer device, and storage medium
CN109325192B (en) Advertisement anti-shielding method and device
JP5039946B2 (en) Technology for relaying communication between client devices and server devices
CN111783006A (en) Page generation method and device, electronic equipment and computer readable medium
CN109145209B (en) Method, apparatus and storage medium for searching blockchain data
CN105808727A (en) Website cross-screen adaptation technology architecture and adaptation method based on HTML5 (Hypertext Markup Language 5)
CN112433752B (en) Page analysis method, device, medium and electronic equipment
CN115643054A (en) Identity information verification method, device, server, medium and product
JPWO2018056299A1 (en) INFORMATION COLLECTION SYSTEM, INFORMATION COLLECTION METHOD, AND PROGRAM
CN111611462A (en) APP data acquisition method and system
US11829434B2 (en) Method, apparatus and computer program for collecting URL in web page
CN112287097B (en) Method and device for analyzing markup language text, storage medium and electronic equipment
CN109960531B (en) Page display method and device
CN117675238A (en) Data access method, device, electronic equipment and storage medium
CN117407623A (en) Method and system for analyzing computer webpage content into mobile phone page

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination