CN111125598A - Intelligent data query method, device, equipment and storage medium - Google Patents

Intelligent data query method, device, equipment and storage medium Download PDF

Info

Publication number
CN111125598A
CN111125598A CN201911321635.0A CN201911321635A CN111125598A CN 111125598 A CN111125598 A CN 111125598A CN 201911321635 A CN201911321635 A CN 201911321635A CN 111125598 A CN111125598 A CN 111125598A
Authority
CN
China
Prior art keywords
webpage
event
information
content
tree structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911321635.0A
Other languages
Chinese (zh)
Inventor
谢伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN201911321635.0A priority Critical patent/CN111125598A/en
Publication of CN111125598A publication Critical patent/CN111125598A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation, credit approval, mortgages, home banking or on-line banking
    • G06Q40/025Credit processing or loan processing, e.g. risk analysis for mortgages

Abstract

The invention discloses an intelligent data query method, and belongs to the technical field of traversal query. The method comprises the following steps: executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure; adopting a DOM (document object model) analysis mode to perform structured analysis on the webpage to obtain a DOM tree structure of the webpage; sequentially extracting information by traversing each node in the DOM tree structure; and sequentially writing the information into a preset document template according to the extraction sequence to generate a file. The RPA technology is adopted to replace a repetitive manual processing process, so that the dependence on manual operation is reduced, the risk possibly brought by the manual operation is effectively reduced, and the risk of information leakage possibly brought by the manual operation is avoided.

Description

Intelligent data query method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of traversal query, in particular to a method, a device, equipment and a storage medium for intelligently querying data.
Background
The credit investigation information has great meaning in credit business, and if the credit investigation information has bad records, the credit investigation information can be reduced in loan amount or approved and rejected when the loan is transacted. The credit investigation report inquiry is a necessary link for credit approval of the financial institution. At present, an interface query mode is opened for financial institutions with excellent assets by a credit investigation platform, but more financial institutions do not have credit investigation technology of the interface mode. The information of the person to be checked can be input for inquiry only by logging in a credit investigation platform website by a user name and a password in a manual mode. After the result is inquired, the credit investigation key field is manually extracted, and the credit investigation key field is copied to the loan platform one by one for credit approval.
The manual treatment process is inefficient. The information copying is incomplete, and the data item loss is easy to occur. If the data item is lost and not found in time, the wind control model is inaccurate, credit risk is generated finally, and the financial institution suffers loss. And the credit investigation information is the important privacy of individuals, and the risk of information leakage can be caused if the credit investigation platform is logged in manually to inquire the credit investigation information, and the credit investigation platform is not operated properly.
Disclosure of Invention
The invention aims to solve the technical problems that information leakage, data loss and the like are easy to occur when information is manually inquired in the prior art, and provides a data intelligent inquiry method, a device, equipment and a storage medium.
The invention solves the technical problems through the following technical scheme:
an intelligent data query method comprises the following steps:
executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure;
adopting a DOM (document object model) analysis mode to perform structured analysis on the webpage to obtain a DOM tree structure of the webpage;
sequentially extracting information by traversing each node in the DOM tree structure;
and sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
Preferably, the method further comprises the step of automatically generating the event operation script:
acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in a system and downloading a webpage of the system in an event monitoring mode;
and automatically generating the operation script according to the acquired operation event and the operation sequence.
Preferably, the automatically generating the event operation script according to the acquired operation event and the operation sequence includes the following steps:
calling a preset implementation method code corresponding to the type of the operation event according to the acquired operation event;
writing the called codes into corresponding fields of a preset script file;
when the operation event comprises input content, writing the input content into a corresponding field of the code.
Preferably, when the input content is from a picture or a character identified in real time, a code for calling a third party identification system is added to a corresponding field of the script file, and a result returned by the third party identification system is automatically backfilled into a code of a corresponding implementation method in the script file.
Preferably, the structured parsing parses the DOM tree structure of the web page by obtaining the nesting relationship and the hierarchical relationship between the tags in the web page and each tag.
Preferably, the step of writing the information into a preset document template in sequence according to the extraction sequence to generate a file includes the following steps:
extracting the content in the element node, judging whether the element node has a next-level element node or not, if so, acquiring the content of a text node corresponding to the element node and the sub-content of the text node corresponding to the next-level element node, wherein the sub-content is the information;
writing the sub-content into an editing area corresponding to the content in the document template;
and generating a file according to the document template written with the sub-content.
The invention also discloses a data intelligent query device, which comprises:
the webpage downloading module is used for executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure;
the webpage analysis module is used for carrying out structural analysis on the webpage by adopting a DOM analysis mode to obtain a DOM tree structure of the webpage;
the information extraction module is used for sequentially extracting information by traversing each node in the DOM tree structure;
and the file generation module is used for sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
Preferably, the apparatus further comprises:
the script generation module is used for acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in the system and downloading the webpage of the system in an event monitoring mode; and automatically generating the operation script according to the acquired operation event and the operation sequence.
The invention also discloses computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the computer program realizes the steps of the intelligent data query method when being executed by the processor.
The invention also discloses a computer readable storage medium, in which a computer program is stored, and the computer program can be executed by at least one processor to implement the steps of the aforementioned data intelligent query method.
The positive progress effects of the invention are as follows: by adopting the RPA technology to replace a repetitive manual processing process, the dependence on manual operation is reduced, so that the risk possibly brought by the manual operation is effectively reduced, and meanwhile, the risk of information leakage possibly brought by the manual operation is avoided.
Drawings
FIG. 1 is a flow chart of a first embodiment of the intelligent query method for data according to the present invention;
FIG. 2 is a flow chart showing the generation of files in the first embodiment of the intelligent query method for data of the present invention;
FIG. 3 illustrates a DOM tree structure diagram;
FIG. 4 is a flow chart of a second embodiment of the intelligent query method for data according to the present invention;
FIG. 5 is a block diagram showing a first embodiment of the data query apparatus according to the present invention;
FIG. 6 is a block diagram showing a second embodiment of the data query device according to the present invention;
fig. 7 shows a hardware architecture diagram of an embodiment of the computer apparatus of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Firstly, the invention provides an intelligent data query method.
In one embodiment, as shown in fig. 1, the intelligent data query method includes the following steps:
step 10: and executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure.
Taking the credit investigation system as an example, the event operation script records codes simulating the operation process of manually logging in the credit investigation system and downloading the webpage of the credit investigation system, and the codes can be executed to automatically log in the credit investigation system and download the required webpage. The operation process may specifically include logging in a platform, querying data to be downloaded, downloading a webpage of a query result, and the like, and the whole operation process may involve operations of a mouse and a keyboard, which are all recorded in an event operation script in the form of codes, and the desired webpage may be automatically downloaded by running the codes.
The credit investigation information belongs to the important privacy of individuals, and if a manual credit investigation platform login mode is adopted for inquiring the credit investigation information, once the operation is improper, the risk of information leakage is easily caused. The information to be inquired is automatically logged in and downloaded by executing the event operation script, so that the risk of data leakage is greatly avoided, and the operation flow is simplified.
Step 20: and carrying out structural analysis on the webpage by adopting a DOM analysis mode to obtain a DOM tree structure of the webpage.
Common structured parsing methods for HTML web pages include a DOM parsing mode, a character string head-to-tail intercepting mode, a regular expression and the like, wherein the DOM parsing mode is specifically adopted, and a DOM tree structure of the web page is parsed by acquiring nesting relations and hierarchical relations between tags in the web page and the tags. The webpage is an HTML webpage with a DOM tree structure, and each node is embodied by a tag. The web page of this structure, the HTML document, starts with the HTML tag and ends with the HTML tag. The content between the < head > tags is used to describe the header information of the page, such as the title, author, abstract, keywords, copyright, auto-refresh, etc. of the page. The content between the < body > </body > tags is the body content of the page. The < title > tag is used to define the title of the page, is a pair of tags, and is located between the < head > tags. Here, head, body, etc. are tags, so the DOM tree structure of the web page can be parsed by analyzing the tags.
After the webpage is analyzed into a DOM tree structure, the webpage corresponds to document nodes in the DOM tree structure, each tag corresponds to each element node in the DOM tree structure, the text content under each tag corresponds to each text node in the DOM tree structure, and the document nodes, the element nodes and the text nodes have hierarchical relations. There are three main relationships between nodes, which are father, son and sibling, and the relationships between nodes are not one-layer invariant. In the DOM tree, the top node is called a root node, and the root node is a document node; the father node and the child nodes are opposite, and each node has a father node except a root node; siblings refer to nodes that have the same parent node, and a parent node may have any number of children.
Step 30: and sequentially extracting information by traversing each node in the DOM tree structure.
The DOM tree structure obtained by performing the structural analysis on the web page may be regarded as a document object, and thus, a method of traversing the document object may be specifically adopted to traverse each node in the DOM tree structure to extract information, where the information is content in each text node.
The traversal starts from the root node of the DOM tree and accesses all descendant nodes under the root node in turn.
Step 40: and sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
The information is sequentially acquired according to the traversal sequence, and each acquired information is written into the document template once until the traversal is completed. The method specifically comprises the following steps (as shown in figure 2):
step 41: extracting the content in the element node, judging whether the element node has a next-level element node or not, if so, acquiring the content of the text node corresponding to the element node and the sub-content of the text node corresponding to the next-level element node, wherein the sub-content is the information.
In the DOM tree structure shown in fig. 3, assuming that the traversed element node is the < title > tag, and determining that the next-level element node of the < title > tag is the < a > tag, at this time, the content in the text node corresponding to the < title > tag is obtained, assuming that the content is the "name", and the sub-content in the text node corresponding to the < a > tag is also obtained, assuming that the sub-content is the "abc", that is, the first obtained information is the "abc".
Step 42: and writing the sub-content into an editing area corresponding to the content in the document template.
The correspondence between the content and the editing area in the document template is preset, where the editing area in the document template may be understood as the target location of the content to be filled in, and the target location in the document template is located by using a configuration file, where the configuration file includes location information of the location where the information content is to be filled in (i.e., the editing area). The configuration file may include row-column coordinate information to be filled with information content, such as determining the positions of the first row and the second column in the document template as target positions. The name of the target location may be configured as the name of the content to be filled in, for example, the name of the content to be filled in is the name. Through the setting of the configuration file, the content and the editing area in the document template can be correspondingly set, and the sub-content can be written into the corresponding area in the document template according to the corresponding relation. Assuming that the acquired information is "abc" representing a name, the "abc" is written in the name editing area in the document template.
Step 43: and generating a file according to the document template written with the sub-content.
When traversing of the DOM tree is completed, all information is written into the document template, and the document template with the information written is stored as a file at the moment, namely, the generation of the file is completed.
In the second embodiment, based on the first embodiment, as shown in fig. 4, the intelligent data query method includes the following steps:
step 01: and acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in a system and downloading a webpage of the system in an event monitoring mode.
The monitoring utilizes a Pyhook interface to create event monitoring of a windows operating system, wherein the Pyhook is a Python-based hook library and is mainly used for monitoring the current mouse and keyboard events on a computer. The mouse hooks are used for receiving mouse events, can call back required events, and can selectively receive only left button pressing events, mouse moving events, or all mouse events and the like according to requirements; the Keyboard Hooks are used for receiving Keyboard events, the operation mode is the same as that of Mouse Hooks, and only returned information is different.
And returning information such as mouse coordinates, mouse click buttons, keyboard click buttons, menu handle acquisition, pull-down bar handle and the like and events such as execution sequence and the like by monitoring operations of logging in a system, downloading information, saving files and the like performed by a user.
Step 02: and automatically generating the operation script according to the acquired operation event and the operation sequence.
The step 02 specifically comprises the following steps:
step 021: and calling a preset implementation method code corresponding to the type of the operation event according to the acquired operation event.
The automatic generation of the script requires setting corresponding implementation method codes in advance according to different types of operation events, for example: the mouse right key operation event corresponds to the code MouseRightClick (), and the keyboard input operation event corresponds to KeyboardInput ($ para).
Step 022: and writing the called codes into corresponding fields of a preset script file.
When a user clicks a mouse or a keyboard, Pyhook captures an operation event, and a method code corresponding to the event is written into a corresponding field of the script file. For example: the user inputs a user name and a password on a webpage, Pyhook captures the operation event, DOM data of the user name and the password input box input into the webpage by the user are obtained by JavaScript, a corresponding input box tag name is further obtained, and then an implementation method code corresponding to the operation event and the obtained input box tag name are written into a corresponding field of a preset script file together.
Step 023: when the operation event comprises input content, writing the input content into a corresponding field of the code.
When the user taps the keyboard to input the content, the content input by the user is transmitted into the corresponding field of the code corresponding to the keyboard input operation event through the $ para.
Further, when the input content is derived from a picture or a character identified in real time, mainly aiming at the condition of inputting a verification code, a code for calling a third party identification system is added in a corresponding field of the script file, and a result (verification code) returned by the third party identification system is automatically backfilled into a code of a corresponding implementation method in the script file. In other words, the verification code recognized by the third-party recognition system is written into the implementation method corresponding to the keyboard input.
Steps 10 to 40 are the same as those in the first embodiment, and are not described herein again.
Secondly, the invention provides a data intelligent query device, and the device 20 can be divided into one or more modules.
For example, fig. 5 shows a block diagram of a first embodiment of the data intelligent query device 20, in which the device 20 can be divided into a web page download module 201, a web page parsing module 202, an information extraction module 203 and a file generation module 204. The following description will specifically describe the specific functions of the module 201 and 204.
The web page downloading module 201 is configured to execute a preset event operation script to log in a system and download a web page of the system, where the web page is an HTML web page with a DOM tree structure.
Taking the credit investigation system as an example, the event operation script records codes simulating the operation process of manually logging in the credit investigation system and downloading the webpage of the credit investigation system, and the codes can be executed to automatically log in the credit investigation system and download the required webpage. The operation process may specifically include logging in a platform, querying data to be downloaded, downloading a webpage of a query result, and the like, and the whole operation process may involve operations of a mouse and a keyboard, which are all recorded in an event operation script in the form of codes, and the desired webpage may be automatically downloaded by running the codes.
The credit investigation information belongs to the important privacy of individuals, and if a manual credit investigation platform login mode is adopted for inquiring the credit investigation information, once the operation is improper, the risk of information leakage is easily caused. The information to be inquired is automatically logged in and downloaded by executing the event operation script, so that the risk of data leakage is greatly avoided, and the operation flow is simplified.
The webpage parsing module 202 is configured to perform structural parsing on the webpage by using a DOM parsing manner, so as to obtain a DOM tree structure of the webpage.
The DOM tree structure of the webpage can be analyzed through analyzing the tags. After the webpage is analyzed into a DOM tree structure, the webpage corresponds to document nodes in the DOM tree structure, each tag corresponds to each element node in the DOM tree structure, the text content under each tag corresponds to each text node in the DOM tree structure, and the document nodes, the element nodes and the text nodes have hierarchical relations. There are three main relationships between nodes, which are father, son and sibling, and the relationships between nodes are not one-layer invariant. In the DOM tree, the top node is called a root node, and the root node is a document node; the father node and the child nodes are opposite, and each node has a father node except a root node; siblings refer to nodes that have the same parent node, and a parent node may have any number of children.
The information extraction module 203 is configured to sequentially extract information by traversing each node in the DOM tree structure.
The DOM tree structure obtained by performing the structural analysis on the web page can be regarded as a document object, so that each node in the DOM tree structure can be traversed by specifically adopting a document object method to extract information, wherein the information is the content in each text node.
The traversal starts from the root node of the DOM tree and accesses all descendant nodes under the root node in turn.
The file generating module 204 is configured to sequentially write the information into a preset document template according to the extraction sequence to generate a file.
The information is sequentially acquired according to the traversal sequence, and each acquired information is written into the document template once until the traversal is completed. The file generation process is as follows: firstly, extracting the content in the element node, judging whether the element node has a next-level element node, if so, acquiring the content of a text node corresponding to the element node and the sub-content of the text node corresponding to the next-level element node, wherein the sub-content is the information. And then writing the sub-content into an editing area corresponding to the content in the document template. And finally, generating a file according to the document template written with the sub-content.
For another example, fig. 6 shows a block diagram of a second embodiment of the data intelligent query device 20, in this embodiment, the data intelligent query device 20 can be further divided into a web page download module 201, a web page parsing module 202, an information extraction module 203, a file generation module 204, and a script generation module 205.
The modules 201 and 204 are the same as those of the first embodiment, and are not described herein again.
The script generating module 205 is configured to obtain an operation event and an operation sequence corresponding to the operation event in a process of manually logging in a system and downloading a web page of the system in an event monitoring manner; and automatically generating the operation script according to the acquired operation event and the operation sequence.
The monitoring utilizes a Pyhook interface to create event monitoring of a windows operating system, wherein the Pyhook is a Python-based hook library and is mainly used for monitoring the current mouse and keyboard events on a computer. The mouse hooks are used for receiving mouse events, can call back required events, and can selectively receive only left button pressing events, mouse moving events, or all mouse events and the like according to requirements; the Keyboard Hooks are used for receiving Keyboard events, the operation mode is the same as that of Mouse Hooks, and only returned information is different.
And returning information such as mouse coordinates, mouse click buttons, keyboard click buttons, menu handle acquisition, pull-down bar handle and the like and events such as execution sequence and the like by monitoring operations of logging in a system, downloading information, saving files and the like performed by a user.
The automatic generation process of the operation script is as follows:
firstly, according to the acquired operation event, calling a preset implementation method code corresponding to the type of the operation event.
The automatic generation of the script requires setting corresponding implementation method codes in advance according to different types of operation events, for example: the mouse right key operation event corresponds to the code MouseRightClick (), and the keyboard input operation event corresponds to KeyboardInput ($ para).
Secondly, writing the called codes into corresponding fields of a preset script file.
When a user clicks a mouse or a keyboard, Pyhook captures an operation event, and a method code corresponding to the event is written into a corresponding field of the script file. For example: the user inputs a user name and a password on a webpage, Pyhook captures the operation event, DOM data of the user name and the password input box input into the webpage by the user are obtained by JavaScript, a corresponding input box tag name is further obtained, and then an implementation method code corresponding to the operation event and the obtained input box tag name are written into a corresponding field of a preset script file together.
Thirdly, when the operation event contains input content, writing the input content into a corresponding field of the code.
When the user taps the keyboard to input the content, the content input by the user is transmitted into the corresponding field of the code corresponding to the keyboard input operation event through the $ para.
Further, when the input content is derived from a picture or a character identified in real time, mainly aiming at the condition of inputting a verification code, a code for calling a third party identification system is added in a corresponding field of the script file, and a result (verification code) returned by the third party identification system is automatically backfilled into a code of a corresponding implementation method in the script file. In other words, the verification code recognized by the third-party recognition system is written into the implementation method corresponding to the keyboard input.
The invention further provides computer equipment.
Fig. 7 is a schematic diagram of a hardware architecture of an embodiment of the computer device according to the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. For example, the server may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster composed of a plurality of servers). As shown, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, and a network interface 23 communicatively coupled to each other via a system bus. Wherein:
the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as a computer program for implementing the intelligent data query method. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is generally configured to control the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 22 is configured to run a program code stored in the memory 21 or process data, for example, run a computer program for implementing the data intelligent query method.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is typically used to establish a communication connection between the computer device 2 and other computer devices. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 7 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the computer program stored in the memory 21 for implementing the data intelligent query method may be executed by one or more processors (in this embodiment, the processor 22) to perform the following steps:
step 01: acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in a system and downloading a webpage of the system in an event monitoring mode;
step 02: automatically generating the operation script according to the acquired operation events and the operation sequence;
step 10: executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure;
step 20: adopting a DOM (document object model) analysis mode to perform structured analysis on the webpage to obtain a DOM tree structure of the webpage;
step 30: sequentially extracting information by traversing each node in the DOM tree structure;
step 40: and sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
In addition, the present invention relates to a computer-readable storage medium, which is a non-volatile readable storage medium, and a computer program is stored in the non-volatile readable storage medium, and the computer program can be executed by at least one processor to implement the operations of the above data intelligent query method or apparatus.
The computer-readable storage medium includes, among others, a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In this embodiment, the computer-readable storage medium is generally used for storing an operating system and various types of application software installed in a computer device, such as the aforementioned computer program for implementing the data intelligent query method. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (10)

1. An intelligent data query method is characterized by comprising the following steps:
executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure;
adopting a DOM (document object model) analysis mode to perform structured analysis on the webpage to obtain a DOM tree structure of the webpage;
sequentially extracting information by traversing each node in the DOM tree structure;
and sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
2. The intelligent query method for data according to claim 1, further comprising the step of automatically generating the event operation script:
acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in a system and downloading a webpage of the system in an event monitoring mode;
and automatically generating the operation script according to the acquired operation event and the operation sequence.
3. The intelligent data query method according to claim 2, wherein the automatically generating the event operation script according to the acquired operation events and the operation sequence comprises the following steps:
calling a preset implementation method code corresponding to the type of the operation event according to the acquired operation event;
writing the called codes into corresponding fields of a preset script file;
when the operation event comprises input content, writing the input content into a corresponding field of the code.
4. The intelligent data query method according to claim 3, wherein when the input content is derived from a picture or a text recognized in real time, a code for calling a third-party recognition system is added to a corresponding field of the script file, and a result returned by the third-party recognition system is automatically backfilled into a code of a corresponding implementation method in the script file.
5. The intelligent data query method of claim 1, wherein the structured parsing parses the DOM tree structure of the web page by obtaining nested relationships and hierarchical relationships between tags in the web page and respective tags.
6. The intelligent data query method according to claim 5, wherein the step of sequentially writing the information into a preset document template in the extraction order to generate a file comprises the steps of:
extracting the content in the element node, judging whether the element node has a next-level element node or not, if so, acquiring the content of a text node corresponding to the element node and the sub-content of the text node corresponding to the next-level element node, wherein the sub-content is the information;
writing the sub-content into an editing area corresponding to the content in the document template;
and generating a file according to the document template written with the sub-content.
7. An intelligent data query device, comprising:
the webpage downloading module is used for executing a preset event operation script to log in a system and downloading a webpage of the system, wherein the webpage is an HTML webpage with a DOM tree structure;
the webpage analysis module is used for carrying out structural analysis on the webpage by adopting a DOM analysis mode to obtain a DOM tree structure of the webpage;
the information extraction module is used for sequentially extracting information by traversing each node in the DOM tree structure;
and the file generation module is used for sequentially writing the information into a preset document template according to the extraction sequence to generate a file.
8. The intelligent query device for data according to claim 7, further comprising:
the script generation module is used for acquiring an operation event and an operation sequence corresponding to the operation event in the process of manually logging in the system and downloading the webpage of the system in an event monitoring mode; and automatically generating the operation script according to the acquired operation event and the operation sequence.
9. A computer device comprising a memory and a processor, characterized in that the memory has stored thereon a computer program which, when executed by the processor, carries out the steps of the intelligent query method for data as claimed in any one of claims 1 to 6.
10. A computer-readable storage medium, in which a computer program is stored, the computer program being executable by at least one processor to implement the steps of the data intelligent query method as claimed in any one of claims 1 to 6.
CN201911321635.0A 2019-12-20 2019-12-20 Intelligent data query method, device, equipment and storage medium Pending CN111125598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911321635.0A CN111125598A (en) 2019-12-20 2019-12-20 Intelligent data query method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911321635.0A CN111125598A (en) 2019-12-20 2019-12-20 Intelligent data query method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111125598A true CN111125598A (en) 2020-05-08

Family

ID=70500449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911321635.0A Pending CN111125598A (en) 2019-12-20 2019-12-20 Intelligent data query method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111125598A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099972A (en) * 2020-09-08 2020-12-18 中国平安人寿保险股份有限公司 Office file processing method, device, equipment and storage medium based on RPA robot
CN113420201A (en) * 2021-06-09 2021-09-21 湖南大学 Cross-domain element positioning and tree generating method for browser RPA system
WO2021232603A1 (en) * 2020-05-19 2021-11-25 深圳市商汤科技有限公司 Data processing method and apparatus, processor, electronic device, and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021232603A1 (en) * 2020-05-19 2021-11-25 深圳市商汤科技有限公司 Data processing method and apparatus, processor, electronic device, and storage medium
CN112099972A (en) * 2020-09-08 2020-12-18 中国平安人寿保险股份有限公司 Office file processing method, device, equipment and storage medium based on RPA robot
CN113420201A (en) * 2021-06-09 2021-09-21 湖南大学 Cross-domain element positioning and tree generating method for browser RPA system

Similar Documents

Publication Publication Date Title
CN111125598A (en) Intelligent data query method, device, equipment and storage medium
Lawson Web scraping with Python
CN104766014A (en) Method and system used for detecting malicious website
CA2684822C (en) Data transformation based on a technical design document
US11256912B2 (en) Electronic form identification using spatial information
CN108595583A (en) Dynamic chart class page data crawling method, device, terminal and storage medium
CN102158365A (en) User clustering method and system in weblog mining
JP6901816B2 (en) Entity-related data generation methods, devices, devices, and storage media
CN111552633A (en) Interface abnormal call testing method and device, computer equipment and storage medium
CN111475700A (en) Data extraction method and related equipment
CN111339427A (en) Book information recommendation method, device and system and storage medium
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN109614319B (en) Automatic testing method and device, electronic equipment and computer readable medium
CN111966881A (en) Webpage information extraction method and system and electronic equipment
US20210049234A1 (en) Web Element Rediscovery System and Method
CN110851346A (en) Method, device and equipment for detecting boundary problem of query statement and storage medium
KR101231329B1 (en) System for web data extraction for mobile platform
KR100762712B1 (en) Method for transforming of electronic document based on mapping rule and system thereof
CN113176878B (en) Automatic query method, device and equipment
AU2021106041A4 (en) Methods and systems for obtaining and storing web pages
US9990343B2 (en) System and method for in-browser editing
Ma et al. A Template Independent Approach for Web News and Blog Content Extraction
CN108153817B (en) Intelligent web page data acquisition method
CN113886245A (en) System acceptance method and device based on artificial intelligence, computer equipment and medium
CN111598159A (en) Training method, device, equipment and storage medium of machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination