CN114254219A - Data acquisition method and device, computer storage medium and electronic equipment - Google Patents

Data acquisition method and device, computer storage medium and electronic equipment Download PDF

Info

Publication number
CN114254219A
CN114254219A CN202111558466.XA CN202111558466A CN114254219A CN 114254219 A CN114254219 A CN 114254219A CN 202111558466 A CN202111558466 A CN 202111558466A CN 114254219 A CN114254219 A CN 114254219A
Authority
CN
China
Prior art keywords
browser
random
data
characteristic value
acquired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111558466.XA
Other languages
Chinese (zh)
Inventor
陈祖德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN202111558466.XA priority Critical patent/CN114254219A/en
Publication of CN114254219A publication Critical patent/CN114254219A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9562Bookmark management

Abstract

The embodiment of the invention provides a data acquisition method and a device thereof, a computer storage medium and electronic equipment, wherein the data acquisition method comprises the following steps: starting a browser; acquiring a random characteristic value; modifying the characteristic value of the browser according to the acquired random characteristic value; controlling the browser to access a webpage corresponding to the data to be acquired; triggering a tag on the web page to request access to a detail page; and acquiring the detail page data. By the data acquisition method provided by the embodiment of the invention, the server can be regarded as the access requests of different users, so that the data can be effectively acquired.

Description

Data acquisition method and device, computer storage medium and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a data acquisition method and device, a computer storage medium and electronic equipment.
Background
With the coming of the big data era and the continuous development of computer technology, the demand of each excellent enterprise on data is very high. The amount of information data on the network is higher and higher, and nowadays, intelligentization and automation are rapidly developed, a plurality of modes capable of rapidly acquiring webpage data are derived. However, as illegal acquisition methods are more and more, it is a serious issue how to effectively acquire data only when a pure sending network cannot acquire data.
Disclosure of Invention
Embodiments of the present invention provide a data acquisition method and apparatus, a computer storage medium, and an electronic device, so as to overcome or alleviate the above technical problems in the prior art.
According to an aspect of the present invention, there is provided a data acquisition method, the method including:
starting a browser;
acquiring a random characteristic value;
modifying the characteristic value of the browser according to the acquired random characteristic value;
accessing a webpage corresponding to the data to be acquired through the browser;
triggering a tag on the web page to request access to a detail page; and
and acquiring data to be acquired in the detail page.
Optionally, the method further comprises:
acquiring a specific object of a browser;
and generating a random characteristic value according to the specific object of the browser, and storing the random characteristic value.
Optionally, the random feature value is in an assignment format of the browser-specific object.
Optionally, the method further comprises configuring the runtime environment, wherein configuring the runtime environment comprises installing a browser, a first script, and a browser control tool.
Optionally, the browser is started through the browser control tool; wherein the browser is a chrome and the browser control tool is pyppeneer based on python.
Optionally, the triggering the tag of the web page to request access to the detail page includes: and controlling the first script simulation to generate the user behavior characteristics. Wherein the user behavior characteristics include one or more of:
mouse click, mouse slide, and mouse scroll bar slide.
Optionally, the method further comprises: and repeating the steps from the step of obtaining the random characteristic value to the step of obtaining the data of the detail page every preset time length or random time length.
According to another aspect of the present invention, there is provided a data acquisition apparatus including:
the starting unit is used for starting the browser;
the browser characteristic value acquisition unit is used for acquiring a random characteristic value;
the browser characteristic value modifying unit is used for modifying the characteristic value of the browser according to the acquired random characteristic value;
the access unit is used for accessing a webpage needing data to be acquired through the browser;
the triggering unit is used for triggering the label of the webpage to request for accessing the detail page; and
and the data acquisition unit is used for acquiring the data to be acquired in the detail page.
According to yet another aspect of the invention, there is provided a computer storage medium having stored thereon a computer-executable program that is executed to implement the method of any one of the embodiments of the invention.
According to yet another aspect of the invention, there is provided an electronic device comprising a memory for storing thereon a computer-executable program and a processor for executing the computer-executable program to implement a method according to any of the embodiments of the invention.
According to the embodiment of the invention, the characteristic value of the browser is modified according to the acquired random characteristic value, so that the server is regarded as an access request of different users, and the server is prevented from being mistaken as an illegal mode, thereby effectively acquiring data.
Furthermore, the user behavior characteristics are generated through simulation to trigger the labels on the webpage, so that the server can be used for normal webpage access request and webpage browsing operation of the user, and data can be effectively acquired.
Drawings
Fig. 1 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
According to the embodiment of the invention, different browser characteristics and behavior operations of the user on the browser are simulated and added into the network request, so that correct data are obtained.
Fig. 1 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention; as shown in fig. 1, specifically, the data acquisition method 10 includes:
and step 11, starting the browser.
In this embodiment, a browser control tool is used to control the browser to open.
In this embodiment, the browser may be a chrome browser. The chrome browser is a web browser developed by Google corporation, and based on a Webkit rendering engine of KHTML, releases and opens source code with multiple free rights such as BSD license. Has the characteristics of high speed, stability, safety and the like. In some operating environments, the chrome browser may be installed-free, and may be operated directly after downloading the package version and decompressing. It will be appreciated that in other runtime environments, such as a Windows system, the installation version may also be downloaded and run after installation. It should be noted that the above description is only exemplary, and the present embodiment is not limited thereto.
In this embodiment, the browser control tool is pyppeneer based on Python. Puppeteer is a tool developed by Google corporation based on node. js, and can control some operations of Chrome browser and Chrome browser through JavaScript. Pppeteeer is a browser control tool written in the Python language on Puppeteeer. It will be appreciated that in other embodiments, selenium may be used to control the browser, the programming language need not be python, and any suitable language may be used, such as java, nodejs, etc.
In some embodiments, step 11 further includes the step of configuring the operating environment if the operating environment is not configured. The configuration operation environment comprises browser installation, first script installation and a browser control tool. The first script may be a Javascript.
And step 12, acquiring a random characteristic value.
In some embodiments, the random characteristic value is pre-stored in a storage device. The storage device prestores a plurality of random characteristic values, one or more of which can be acquired in a predetermined sequence or randomly.
In some embodiments, before obtaining the random feature value, the method further includes: acquiring a specific object of a browser; and generating a random characteristic value according to the specific object of the browser, and storing the random characteristic value.
In this embodiment, the browser specific object may include, but is not limited to, a browser window. Navigator is an attribute of a Window object, and points to an object containing information about a browser. A value of a window navigator user object is a special string header, and a server can identify information such as an operating system and version, a CPU type, a browser and version, etc. used by a client according to the value of the object. Browser window navigator language object's value represents the browser language, and browser window navigator plugin object's value represents the plug-in installed by the browser. It will be appreciated that other browser specific objects, such as a window navigator display, may also be included in some embodiments.
It is understood that step 11 and step 12 are not limited to a sequential order, and may be performed simultaneously or in an order reversed.
In some embodiments, the generating the random eigenvalues may generate random eigenvalues based on one or a combination of the browser specific objects described above. In some embodiments, multiple random feature values may be generated based on each browser-specific object. For example, a feature value set may be preset for each browser-specific object, and one feature value may be randomly selected from the feature value sets as a random feature value. Each random feature value can adopt an assignment format corresponding to a specific browser object, so that the browser feature value can be directly assigned to the acquired random feature value.
Because the server identifies the software and hardware environment of the user equipment through the browser characteristic values, different operating environments can be simulated at random by assigning the browser characteristic values to different random characteristic values, the server can mistakenly think that different users request to access the webpage, and the operation which is mistaken as an illegal mode by the server is prevented.
And step 13, modifying the characteristic value of the browser according to the acquired random characteristic value.
Specifically, the feature value of the browser is set to the acquired random feature value, for example, the browser language is set to the value of the randomly generated window.
For example, the obtained random feature values corresponding to the language object include: languages [ "zh-CN", "zh", "zh-TW", "en-US" ], languages [ "zh-CN", "zh" ]; the characteristic value of the language object of the corresponding browser comprises: [ 'zh-CN', 'zh', 'hy', 'ps', 'qu', 'bg', 'sn' ] or [ 'zh-CN', 'th', 'is', 'az' ].
And step 14, accessing a webpage corresponding to the data to be acquired through the browser.
Specifically, the web page corresponding to the data to be collected is one of common page types, and is a middle page that receives the navigation page and the detail page, and all tags of related content are collected. For example, if the website of the data to be collected is the internet news, the corresponding webpage is the home page of the internet news, a plurality of news tags are listed on the home page, and the detail page corresponding to the news tag can be further shown by clicking the news tag.
More specifically, the browser is controlled to access the webpage corresponding to the data to be collected, that is, a Request access Request is sent to a server through the browser to Request to access the webpage. The Request access Request encapsulates the web address (URL) of the web page and the browser feature value described above. And the server returns a webpage according to the Request access Request and decodes the webpage into a format supported by the browser.
And step 15, triggering the label on the webpage to request to access the detail page.
The list page is integrated with labels corresponding to the detail pages, and a request for accessing the detail pages is sent by triggering the labels of the detail pages. In this embodiment, in order to further improve the effect, the browser control tool controls the first script to simulate and generate the user behavior feature to trigger the tag on the webpage to request for accessing the detail page.
In this embodiment, the JavaScript is controlled by pyppeteeer to simulate and generate the user behavior characteristics. In particular, the user behavior characteristic may include, but is not limited to, a mouse click, a mouse slide, or a mouse scroll bar slide. Wherein, the mouse click is performed, and the click times can be randomly one or more, for example, 5 to 20 times; sliding the mouse to a random position after leaving the current position of the mouse; and sliding the scroll bar up and down, wherein the sliding length is random. The user behavior characteristics generated by the simulation may include one or a combination of several of the user behaviors described above. For example, in the webpage, a tag on the webpage is triggered by simulating mouse click, so that the server is operated by the user on the webpage. And simulating a mouse to slide randomly or sliding a scroll bar up and down on the webpage or the detail page, so that the server is the operation of the user when browsing the webpage.
And step 16, acquiring the detail page data.
Specifically, after the user behavior characteristics are generated in a simulation mode, the label on the webpage is triggered, the detail page corresponding to the label is downloaded from the server and returned to the request end, and therefore the data to be acquired on the detail page can be acquired.
In the above embodiment, the web page includes a plurality of tags, and after step 16, the method further includes the steps of: and (5) judging whether the rest labels are not triggered, if so, repeating the step 15 and the step 16, and if not, ending the process. In this manner, data of all the detail pages included in the web page can be acquired by repeatedly performing steps 15 and 16. It can be understood that, if the number of the web pages corresponding to the data to be collected is 2 or more than 2, the step 14 and the following steps are repeatedly executed until all the web pages are collected.
In the embodiment, the browser characteristic value is modified into the random characteristic value, and the user behavior characteristic is generated in a simulation mode, so that the situation that the server considers illegal acquisition and refuses access is avoided, and the data acquisition effect is improved.
In some embodiments, to further improve the effect, the random feature value may be re-acquired at intervals of a predetermined time or at intervals of a random time, and injected into the browser, so that the server considers the access of different users, i.e., increases the frequency of randomly generating the browser feature value to improve the acquisition effect.
It can be understood that in some websites where the acquisition rejection mechanism is not set or websites where the mechanism is not very strict, the types of the acquired random features and the frequency of acquiring the random feature values may also be dynamically reduced, and user behavior features generated by simulation may be reduced or omitted, for example, only mouse clicks are simulated, and the number of clicks is also correspondingly reduced, so as to improve the efficiency of acquiring data.
Fig. 2 is a schematic structural diagram of a web crawler apparatus 2 based on browser feature randomization according to an embodiment of the present invention; as shown in fig. 2, it includes:
the starting unit 22 is used for starting the browser.
In this embodiment, a browser control tool is used to control the browser to open.
In this embodiment, the browser may be a chrome browser. The chrome browser is a web browser developed by Google corporation, and based on a Webkit rendering engine of KHTML, releases and opens source code with multiple free rights such as BSD license. Has the characteristics of high speed, stability, safety and the like. In some operating environments, the chrome browser may be installed-free, and may be operated directly after downloading the package version and decompressing. It will be appreciated that in other runtime environments, such as a Windows system, the installation version may also be downloaded and run after installation. It should be noted that the above description is only exemplary, and the present embodiment is not limited thereto.
In this embodiment, the browser control tool is pyppeneer based on Python. Puppeteer is a tool developed by Google corporation based on node. js, and can control some operations of Chrome browser and Chrome browser through JavaScript. Pppeteeer is a browser control tool written in the Python language on Puppeteeer. It will be appreciated that in other embodiments, selenium may be used to control the browser, the programming language need not be python, and any suitable language may be used, such as java, nodejs, etc.
In some embodiments, an operating environment configuration unit 21 is also included. The environment configuration unit 21 is configured to configure an operating environment, and includes installing a browser, installing a first script, and a browser control tool. The first script may be a Javascript.
The random eigenvalue acquisition unit 23 is used to acquire random eigenvalues.
In some embodiments, the random characteristic value is pre-stored in a storage device. The storage device prestores a plurality of random characteristic values, one or more of which can be acquired in a predetermined sequence or randomly.
In some embodiments, before obtaining the random feature value, the method further includes: acquiring a specific object of a browser; and generating a random characteristic value according to the specific object of the browser, and storing the random characteristic value.
In this embodiment, the browser specific object may include, but is not limited to, a browser window. Navigator is an attribute of a Window object, and points to an object containing information about a browser. A value of a window navigator user object is a special string header, and a server can identify information such as an operating system and version, a CPU type, a browser and version, etc. used by a client according to the value of the object. Browser window navigator language object's value represents the browser language, and browser window navigator plugin object's value represents the plug-in installed by the browser. It will be appreciated that other browser specific objects, such as a window navigator display, may also be included in some embodiments.
In some embodiments, the random feature value may be generated based on one or a combination of the browser-specific objects described above. In some embodiments, multiple random feature values may be preset for each browser-specific object. For example, a feature value set may be preset for each browser-specific object, and one feature value may be randomly selected from the feature value sets as a random feature value. Each random feature value can adopt an assignment format corresponding to a specific browser object, so that the browser feature value can be directly assigned to the acquired random feature value.
Because the server identifies the software and hardware environment of the user equipment through the browser characteristic values, different operation environments can be simulated at random by modifying the browser characteristic values into random characteristic values, the server can mistakenly think that different users request to access the webpage, and the operation which is mistakenly regarded as an illegal mode by the server is prevented.
The browser feature value modification unit 24 is configured to modify the browser feature value into the acquired random feature value.
Specifically, the feature value of the browser is set to a random feature value, for example, a browser language is set to a value of a randomly generated window.
The access unit 25 is configured to access, through the browser, a web page corresponding to the data to be collected.
Specifically, the web page corresponding to the data to be collected is one of common page types, and is a middle page that receives the navigation page and the detail page, and all tags of related content are collected. For example, if the website corresponding to the data to be collected is the internet news, the webpage corresponding to the data to be collected is the home page of the internet news, a plurality of strip news tags are arranged on the home page, and a news title set or a news detail page can be further shown by clicking the news tags.
More specifically, the browser is controlled to access the webpage corresponding to the data to be collected, that is, a Request is sent to a server through the browser, and the Request encapsulates a website (URL) of the webpage and the browser characteristic value. The server returns a webpage according to the Request and decodes the webpage into a format supported by the browser.
The triggering unit 26 is used to trigger a tag on the web page to request access to a detail page.
The webpage is integrated with tags corresponding to a plurality of detail pages, and a request for accessing the detail pages is sent out by triggering the tags. In this embodiment, in order to further improve the effect, the browser control tool controls the first script to simulate and generate the user behavior feature to trigger the tag on the webpage to request for accessing the detail page.
In this embodiment, the pyppeneer controls JavaScript simulation to generate user behavior characteristics. In particular, the user behavior characteristics may include, but are not limited to, mouse clicks, mouse swipes, and mouse scrollbar swipes. Specifically, mouse clicking is performed, and the number of clicks is randomly one or more, for example, randomly 5 to 20 times; sliding the mouse, leaving the current position of the mouse, and randomly locating the east China; and sliding the scroll bar up and down, wherein the sliding length is random. The user behavior characteristics generated by the simulation may include one or a combination of several of the user behaviors described above.
The data acquisition unit 27 is used to acquire the data to be acquired in the detail page.
Specifically, after the user behavior characteristics are generated in a simulation mode, the labels on the webpage are triggered, the detail pages corresponding to the labels are downloaded from the server and returned to the request end, and therefore data to be collected in the detail pages can be obtained.
In the above embodiment, the web page includes a plurality of tags, the data obtaining apparatus 2 further includes a determining unit configured to determine whether there are remaining tags that are not triggered, if yes, repeat simulation to generate user behavior characteristics to trigger the tags and obtain data of the detail page corresponding to the tags, and if not, end the process. In this manner, data of all the detail pages included in the web page can be acquired by generating user behavior characteristics and acquiring detail page data through repeated simulation. It can be understood that if 2 or more than 2 webpages corresponding to the data to be collected are required, the webpages are repeatedly accessed, the user behavior characteristics are simulated to trigger the label and obtain the data of the detailed page corresponding to the label until the data corresponding to all the webpages are collected.
In the embodiment, the browser characteristic value is modified into the random characteristic value, and the user behavior characteristic is generated in a simulation mode, so that the situation that the server considers illegal acquisition and refuses access is avoided, and the data acquisition effect is improved.
In some embodiments, to further improve the effect, the random feature value may be retrieved and injected into the browser at intervals of a predetermined time or at intervals of a random time, so that the server improves the effect for the access of different users, i.e., for improving the frequency of randomly generating the browser feature value.
It can be understood that in some websites where the acquisition rejection mechanism is not set or websites where the mechanism is not very strict, the types of the acquired browser features and the frequency of acquiring the browser feature values may also be dynamically reduced, and user behavior features generated by simulation may be reduced or omitted, for example, only mouse clicks are simulated, and the number of clicks is also correspondingly reduced, so as to improve the efficiency of acquiring data.
Fig. 3 is a schematic structural diagram of an electronic device 3 according to an embodiment of the present invention; as shown in fig. 3, the electronic device 3 comprises a memory 31 for storing a computer-executable program and a processor 32 for running the computer-executable program to implement the method according to any of the embodiments of the present invention.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program performs the above-described functions defined in the method of the present invention when executed by a Central Processing Unit (CPU). It should be noted that the computer readable medium of the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be located in the processor. Where the names of the cells do not in some cases constitute a limitation on the cells themselves, for example, a trigger cell may also be described as "simulation generated user behavior features".
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the scope of the invention as defined by the appended claims. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims (10)

1. A method of data acquisition, comprising:
starting a browser;
acquiring a random characteristic value;
modifying the characteristic value of the browser according to the acquired random characteristic value;
accessing a webpage corresponding to the data to be acquired through the browser;
triggering a tag of the web page to request access to a detail page; and
and acquiring data to be acquired in the detail page.
2. The method of claim 1, further comprising:
acquiring the specific browser object;
and generating a random characteristic value according to the specific object of the browser, and storing the random characteristic value.
3. The method of claim 1, wherein the random feature value is in an assignment format of the browser-specific object.
4. The method of claim 1, further comprising configuring a runtime environment, the configuring the runtime environment comprising installing a browser, a first script, and a browser control tool.
5. The method of claim 4, wherein the launching the browser is launching the browser via the browser control tool.
6. The method of claim 4, wherein triggering the tab of the web page to request access to a detail page comprises: controlling the first script simulation to generate user behavior characteristics, the user behavior characteristics including one or more of:
mouse click, mouse slide, or mouse scroll bar slide.
7. The method of claim 1, further comprising: and repeating the steps from the step of obtaining the random characteristic value to the step of obtaining the data of the detail page every preset time length or random time length.
8. A data acquisition apparatus, comprising:
the starting unit is used for starting the browser;
a random eigenvalue acquisition unit for acquiring a random eigenvalue;
the browser characteristic value modifying unit is used for modifying the characteristic value of the browser according to the acquired random characteristic value;
the access unit is used for controlling the browser to access a webpage corresponding to the data to be acquired;
the triggering unit is used for triggering the label of the webpage to request for accessing the detail page; and
and the data acquisition unit is used for acquiring the data to be acquired in the detail page.
9. A computer storage medium having a computer-executable program stored thereon, the computer-executable program being executed to implement the method of any one of claims 1-7.
10. An electronic device, comprising a memory for storing a computer-executable program and a processor for executing the computer-executable program to perform the method of any of claims 1-7.
CN202111558466.XA 2021-12-17 2021-12-17 Data acquisition method and device, computer storage medium and electronic equipment Pending CN114254219A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111558466.XA CN114254219A (en) 2021-12-17 2021-12-17 Data acquisition method and device, computer storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111558466.XA CN114254219A (en) 2021-12-17 2021-12-17 Data acquisition method and device, computer storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114254219A true CN114254219A (en) 2022-03-29

Family

ID=80795834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111558466.XA Pending CN114254219A (en) 2021-12-17 2021-12-17 Data acquisition method and device, computer storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114254219A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185645A1 (en) * 2012-01-18 2013-07-18 International Business Machines Corporation Determining repeat website users via browser uniqueness tracking
CN106993009A (en) * 2016-01-20 2017-07-28 青岛海信移动通信技术股份有限公司 A kind of method and apparatus for loading webpage in a browser
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium
CN110909229A (en) * 2019-11-27 2020-03-24 佛山科学技术学院 Webpage data acquisition and storage system based on simulated browser access
CN113191844A (en) * 2021-04-29 2021-07-30 北京奇保信安科技有限公司 Product recommendation method and device based on anonymous user online operation and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185645A1 (en) * 2012-01-18 2013-07-18 International Business Machines Corporation Determining repeat website users via browser uniqueness tracking
CN106993009A (en) * 2016-01-20 2017-07-28 青岛海信移动通信技术股份有限公司 A kind of method and apparatus for loading webpage in a browser
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium
CN110909229A (en) * 2019-11-27 2020-03-24 佛山科学技术学院 Webpage data acquisition and storage system based on simulated browser access
CN113191844A (en) * 2021-04-29 2021-07-30 北京奇保信安科技有限公司 Product recommendation method and device based on anonymous user online operation and electronic equipment

Similar Documents

Publication Publication Date Title
CN110708346B (en) Information processing system and method
CN106708899B (en) Automatic point burying method and device
US10880227B2 (en) Apparatus, hybrid apparatus, and method for network resource access
EP3143497B1 (en) Interactive viewer of intermediate representations of client side code
CN109522500B (en) Webpage display method, device, terminal and storage medium
CN108416021B (en) Browser webpage content processing method and device, electronic equipment and readable medium
CN110365724B (en) Task processing method and device and electronic equipment
CN112637361B (en) Page proxy method, device, electronic equipment and storage medium
CN110442286B (en) Page display method and device and electronic equipment
CN110598135A (en) Network request processing method and device, computer readable medium and electronic equipment
US20150207691A1 (en) Preloading content based on network connection behavior
US20210133270A1 (en) Referencing multiple uniform resource locators with cognitive hyperlinks
CN112685671A (en) Page display method, device, equipment and storage medium
CN110795181A (en) Application program interface display method and device based on skip protocol and electronic equipment
CN114528269A (en) Method, electronic device and computer program product for processing data
CN112087370A (en) Method, system, electronic device and computer-readable storage medium for issuing GitHub Issues
US10997357B2 (en) User interface navigation management
US20150205767A1 (en) Link appearance formatting based on target content
US10931771B2 (en) Method and apparatus for pushing information
CN116569165A (en) Page display method and device, storage medium and electronic equipment
US11392663B2 (en) Response based on browser engine
CN112749351B (en) Link address determination method, device, computer readable storage medium and equipment
US20120216132A1 (en) Embedding User Selected Content In A Web Browser Display
CN108037914B (en) Method and device for developing android native system by combining js
CN114254219A (en) Data acquisition method and device, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220329