CN107729006B - Page information intelligent acquisition tool and method based on pc end - Google Patents

Page information intelligent acquisition tool and method based on pc end Download PDF

Info

Publication number
CN107729006B
CN107729006B CN201711034890.8A CN201711034890A CN107729006B CN 107729006 B CN107729006 B CN 107729006B CN 201711034890 A CN201711034890 A CN 201711034890A CN 107729006 B CN107729006 B CN 107729006B
Authority
CN
China
Prior art keywords
data
page
node
extracted
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711034890.8A
Other languages
Chinese (zh)
Other versions
CN107729006A (en
Inventor
张�林
高树
王立钧
徐新皎
郑跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Weishi Technology Co ltd
Original Assignee
Nanjing Weishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Weishi Technology Co ltd filed Critical Nanjing Weishi Technology Co ltd
Priority to CN201711034890.8A priority Critical patent/CN107729006B/en
Publication of CN107729006A publication Critical patent/CN107729006A/en
Application granted granted Critical
Publication of CN107729006B publication Critical patent/CN107729006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/34Graphical or visual programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces

Abstract

The invention provides a page information intelligent acquisition tool and method based on a pc end, which comprises a page extractable data display module, a page data extraction module, a page data processing module and a page data display operation module; the page extractable data display module displays the extractable information in a marking way in the process of browsing the page by the user; the page data extraction module extracts and classifies data of pages; the page data processing module merges the extracted page data; and the page data display operation module displays the extracted data and the relation thereof on the canvas in a node and connection mode. The invention can help the user to extract data from a plurality of pages and intelligently combine the extracted data according to a certain relation, thereby reducing the analysis time of the user on the extracted data and reducing the workload. Meanwhile, the extracted data is displayed and can be customized and edited by a user, and the user can conveniently analyze the data.

Description

Page information intelligent acquisition tool and method based on pc end
Technical Field
The invention relates to the field of web internet, in particular to a page information intelligent acquisition tool and method based on a pc end.
Background
The existing page data extraction tool extracts the data of the page by utilizing the analysis algorithm of the node and the configuration parameters required by the corresponding webpage information extraction action according to the webpage elements selected by the user, thereby achieving the purpose of extracting the webpage information. Although the data is extracted, it is typically a single bar that is listed and then presented to the user for viewing.
1) Which data on the page can be extracted and which data can not be extracted are not clearly identified, and a certain confusion effect is achieved for the user.
2) The extracted information is scattered in a single piece and has no correlation between the information
3) The extracted information can not be modified, added, deleted, relationship established, grouping added, remark added and the like again when displayed, and browsing operation of a user is facilitated.
4) The operations of screenshot, data storage, export, accessory uploading and the like cannot be performed on the extracted data, and the operation of continuing browsing and backup for the next time by a user is facilitated
5) The extracted data can not be analyzed again, namely, the extracted data is used as an information source, and secondary analysis is performed, so that secondary positioning of a user on problems is facilitated, and the data can be analyzed deeply.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides the page information intelligent acquisition tool and method based on the pc end, the extraction is quick and simple, the problem of quick positioning of a user can be helped, and the working efficiency is greatly improved.
The technical scheme is as follows:
a page information intelligent acquisition tool based on a pc end comprises a page extractable data display module, a page data extraction module, a page data processing module and a page data display operation module;
the page extractable data display module displays the extractable information in a marking way in the process of browsing the page by the user according to the preset data type information on the page;
the page data extraction module extracts and classifies the page data according to the identification data of different data types to obtain classified page data;
the page data processing module judges the relationship between the data by comparing the identification data among the page data extracted by the page data extraction module and merges the extracted page data according to the preset data type information;
and the page data display operation module displays the extracted data and the relation thereof on the canvas in a node and connection mode.
The subsequent analysis module is used for performing subsequent analysis operation on the contents of the plurality of nodes, listing analyzable data, performing secondary analysis on the selected data through the analysis items selected freely by the user, and displaying the analysis result.
The preset data types comprise five types of data, namely task data, case data, card data, relationship and object data and activity track data.
And the mark display adopts highlight prompt for display.
The relationship between the data is specifically as follows: if the data are data of the same node relationship, merging the data into the same node, and if the two data meet the relationship structure, establishing relationship connection.
An intelligent acquisition method for page information comprises the following steps:
(1) presetting data type information on a page, wherein the data type comprises five types of data, namely task data, case data, card data, relation and object data and activity track data;
(2) in the process of browsing the page by the user, according to the preset data type information on the page, marking and displaying the extractable page data, and screening the page data to be extracted by the user;
(3) after the user finishes screening the page data to be extracted, extracting the page data and classifying the page data according to identification data of different data types in the page data to be extracted to obtain classified page data;
(4) judging the relation between the data by comparing the identification data between the page data extracted in the step (3), and merging the page data extracted in the step (3) according to the preset data type information in the step (1);
(5) and (4) displaying the merged result of the step (4) on a canvas in a node and line mode.
In the step (5), the nodes on the canvas can be freely dragged to perform layout, and the nodes and the content on the nodes, the relationship and the content on the relationship line are customized and edited.
The custom editing specifically comprises:
manually adding nodes: selecting node types and key levels, and adding the node types and the key levels to the relational graph; determining a node default display icon according to the type, selecting the color of a node frame according to the key level, and selecting an identity card photo as the node icon if the node is a personnel node;
manually add relationships: selecting a relation line and a relation type, and inputting related contents;
and modifying the relationship: editing the content of the relation, and dragging the relation line to edit the position;
deleting the relationship;
the new grouping function: creating a group for a plurality of or single nodes outside the group;
a grouping editing function: editing the designated groups, including modifying the group names and remarks;
node unbinding function: performing unbinding operation on a single node or a plurality of nodes, and directly unbinding the selected node from the corresponding group;
a packet cancellation function: the designated packet is cancelled.
Has the advantages that: the invention can help the user to extract data from a plurality of pages (the extracted data are all highlighted and can be seen at a glance, so that the user can conveniently identify the extracted data), and the extracted data are intelligently merged or connected according to a certain relation, thereby reducing the analysis time of the user on the extracted data and reducing the workload. Meanwhile, the extracted data is displayed and can be customized and edited by a user, and the user can conveniently analyze the data. The small tool module provided by matching can help a user to carry out overall processing (storage, accessory adding, remarking, additional storage, exporting, navigation, highlighting mode and the like) on the task to which the data belongs, and in addition, the extracted data can be subjected to diversified secondary analysis (follow-up operation), useful data can be extracted, and irrelevant data can be removed. The positioning device can help a user to quickly position, and greatly improves the working efficiency.
Drawings
FIG. 1 is a schematic structural diagram of the present invention.
FIG. 2 is a schematic view of a tool module of the present invention.
FIG. 3 is a schematic diagram of a display operation module according to the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The invention relates to a page information intelligent acquisition based on pc end, which combines a background restful interface, a foreground operating node.js environment and a JavaScript function library D3.js for generating graphics for use, so that single foreground and background frames are connected in series to form a system with a front platform and a back platform separated, comprising:
the page extractable data presentation module: the method is used for marking and highlighting the extractable information in the process of browsing the page by the user, displaying the information to the user and facilitating the screening of the user.
The invention predefines a specific information format on the page, which contains the common data types required by the user, wherein the data types comprise five types of data, namely task data, case data, card data, relationship and object data and activity track data. Wherein the task data is task information data; case data is case related information; the card data is information such as an identity card, a telephone, a person name and the like; the activity track represents activity information, such as what time is in place; the relation and the object data represent the relation among the data, so that the relation is established by matching the data. These types of data are exposed when page fetches are performed. In the present invention, highlighting is used for display.
The page data extraction module: after the user screens the page extractable data, the pages are extracted and classified according to the page identifications of different data types, and classified page data are obtained. The module is also provided with a single extraction function and a one-key extraction function; thereby performing a single data extraction and a one-touch extraction of the user-selected data, respectively.
The page identification is as follows:
task data:
Figure BDA0001450243650000041
case data:
<span ibox_case_id=””ibox_case_code=””ibox_case_name=””
ibox_case_remark=””></span>'
card data:
Figure BDA0001450243650000042
relationship and object data:
<tr>
<td>
<span class="iboxExtract"srctype="15014"srcvalue="AJBH111111"ibox_case_code=xxx ibox_case_name=xxx ibox_case_remark=xxxx ibox_relation_type_src="15014"ibox_relation_value_src="AJBH111111"desttype="11097"destvalue="653221198010111752"ibox_relation_type_dest="11097"ibox_relation_value_dest="653221198010111752"relationtypes="8"></span>
</td>
<td>xxxxx</td>
<td>xxxxxxxx</td>
</tr>
activity track data:
Figure BDA0001450243650000051
the page data processing module: merging the extracted data, and judging whether the data has a relation and what relation by comparing the identification data in the two data according to a specific data structure: if the data are data of the same node relationship, merging the data into the same node, and if the two data meet the relationship structure, establishing relationship connection. The page data become non-scattered and single after being processed by the page data merging module.
The page data display operation module: a visualization module developed on the basis of the D3.js graphic function displays the extracted data and the relation thereof on a canvas; the specific relationship between the nodes and the connecting lines is marked.
And carrying out free dragging and layout on the nodes on the canvas. The content on the node can be edited in a user-defined mode, the content on the relation line can be edited in a user-defined mode, a grouping function can be provided, and the nodes are grouped and remarked; the method specifically comprises the following steps:
manually adding nodes: and selecting the node type and the emphasis level, and adding the node type and the emphasis level to the relational graph. Determining a default display icon of the node according to the type, selecting the color of a node frame according to the key level, and selecting an identity card photo as the node icon if the node is a personnel node;
manually add relationships: selecting a relation line and a relation type, and inputting related contents;
and modifying the relationship: the content of the relation can be edited by selecting and editing, the position of the relation can be changed by dragging the relation line, and a storage button is clicked to submit the modification operation;
deleting the relationship: the relation can be deleted by selecting deletion, and a save button is clicked to submit deletion operation;
the new grouping function: grouping a plurality of or single nodes outside the group by a right-click menu, and automatically giving a default group name (the number of the new group plus is the default name) to the group when the group is newly established;
a grouping editing function: editing the designated groups, including modifying the group names and remarks;
node unbinding function: performing unbinding operation on a single node or a plurality of nodes, and directly unbinding the selected node from the corresponding group;
a packet cancellation function: the designated packet is cancelled.
A subsequent analysis module: subsequent analysis operation on the contents of the plurality of nodes, selecting attribute functions, wherein the attributes comprise names, identity cards, telephones, vehicles, addresses and the like; and for subsequent analysis operation of the contents of the plurality of nodes, analyzable data is listed, analysis items can be selected freely, secondary analysis is carried out on the selected data, the secondary analysis can jump to a result page of a corresponding analysis module, the analysis result is displayed, and the two functions of extraction and module analysis are chained, so that the extracted data is processed in a diversified manner.
The invention also provides a tool module comprising:
manually creating a new task: the system is used for manually adding tasks by a user, inputting task names and storing the tasks;
data extraction creation task: after a user extracts service module data associated with a task identifier, automatically creating a task according to the data;
and (3) modifying the task: when the existing task is selected and opened, the remarks of the task are modified and stored;
and (4) deleting the task: the task association system is used for selecting the existing task, deleting the task and completely deleting the task association;
refreshing: for loading up-to-date data in a library
Text extraction adding attribute: the function of adding attributes is provided for text extraction, and the attributes can be added in a user-defined mode;
uploading pictures: supporting user-defined uploading of node pictures and relationship pictures;
uploading accessories: accessory supporting custom uploading task
The task is stored as follows: a user designates a task, stores the task as an operation, copies the original task and all the contents of nodes, relations and the like of the original task together, and stores the original task and all the contents of the nodes, relations and the like in a new task name;
navigation: providing a navigation button and displaying a navigation map;
and (3) navigation picture derivation: and providing an export button to export the picture in the current navigation frame.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. The utility model provides a page information intelligent acquisition instrument based on pc end which characterized in that: the system comprises a page extractable data display module, a page data extraction module, a page data processing module and a page data display operation module;
the page extractable data display module displays the extractable information in a marking way in the process of browsing the page by the user according to the preset data type information on the page; the preset data types comprise five types of data, namely task data, case data, card data, relationship and object data and activity track data, wherein in a page, the task data adopts an ibox _ task prefix, the case data adopts an ibox _ case prefix, the card data adopts an ibox _ relationship prefix, the relationship and object data adopt an ibox _ pro _ type prefix, and the activity track data adopts an ibox _ pro _ value prefix;
the page data extraction module extracts and classifies the page data according to the identification data of different data types to obtain classified page data; the page data extraction module is used for realizing a single extraction function and a one-key extraction function;
the page data processing module judges the relationship between the data by comparing the identification data among the page data extracted by the page data extraction module and merges the extracted page data according to the preset data type information;
and the page data display operation module displays the extracted data and the relation thereof on the canvas in a node and connection mode.
2. The intelligent page information acquisition tool according to claim 1, wherein: the subsequent analysis module is used for performing subsequent analysis operation on the contents of the plurality of nodes, listing analyzable data, performing secondary analysis on the selected data through the analysis items selected freely by the user, and displaying the analysis result.
3. The intelligent page information acquisition tool according to claim 1, wherein: and the mark display adopts highlight prompt for display.
4. The intelligent page information acquisition tool according to claim 1, wherein: the relationship between the data is specifically as follows: if the data are in the same node relationship, merging the data into the same node, and if the two data meet the relationship structure, establishing the relationship.
5. The intelligent page information acquisition method of the intelligent page information acquisition tool according to claim 1, characterized in that: the method comprises the following steps:
(1) presetting data type information on a page, wherein the data type comprises five types of data, namely task data, case data, card data, relation and object data and activity track data;
(2) in the process of browsing the page by the user, according to the preset data type information on the page, marking and displaying the extractable page data, and screening the page data to be extracted by the user;
(3) after the user finishes screening the page data to be extracted, extracting the page data and classifying the page data according to identification data of different data types in the page data to be extracted to obtain classified page data;
(4) judging the relation between the data by comparing the identification data between the page data extracted in the step (3), and merging the page data extracted in the step (3) according to the preset data type information in the step (1);
(5) and (4) displaying the merged result of the step (4) on a canvas in a node and line mode.
6. The intelligent collection method of page information according to claim 5, characterized in that: in the step (5), the nodes on the canvas can be freely dragged to perform layout, and the nodes and the content on the nodes, the relationship and the content on the relationship line are customized and edited.
7. The intelligent collection method of page information according to claim 6, characterized in that: the custom editing specifically comprises:
manually adding nodes: selecting node types and key levels, and adding the node types and the key levels to the relational graph; determining a node default display icon according to the type, selecting the color of a node frame according to the key level, and selecting an identity card photo as the node icon if the node is a personnel node;
manually add relationships: selecting a relation line and a relation type, and inputting related contents;
and modifying the relationship: editing the content of the relation, and dragging the relation line to edit the position;
deleting the relationship;
the new grouping function: creating a group for a plurality of or single nodes outside the group;
a grouping editing function: editing the designated groups, including modifying the group names and remarks;
node unbinding function: performing unbinding operation on a single node or a plurality of nodes, and directly unbinding the selected node from the corresponding group;
a packet cancellation function: the designated packet is cancelled.
CN201711034890.8A 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end Active CN107729006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711034890.8A CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711034890.8A CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Publications (2)

Publication Number Publication Date
CN107729006A CN107729006A (en) 2018-02-23
CN107729006B true CN107729006B (en) 2021-06-04

Family

ID=61202384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711034890.8A Active CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Country Status (1)

Country Link
CN (1) CN107729006B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284598B (en) * 2018-07-23 2020-11-10 深圳点猫科技有限公司 Method for generating electronic identity card on education cloud platform page and electronic equipment
CN108958877A (en) * 2018-08-15 2018-12-07 北京无线电计量测试研究所 A kind of drafting system and method for real-time update acquisition image data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160380875A1 (en) * 2012-10-05 2016-12-29 Google Inc. Identifying referral pages based on recorded url requests
CN106484408A (en) * 2016-09-29 2017-03-08 电子科技大学 A kind of node relationships figure display methods based on HTML5 and system
CN106789286A (en) * 2016-12-28 2017-05-31 曙光信息产业(北京)有限公司 The display methods and device of a kind of network topological diagram

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160380875A1 (en) * 2012-10-05 2016-12-29 Google Inc. Identifying referral pages based on recorded url requests
CN106484408A (en) * 2016-09-29 2017-03-08 电子科技大学 A kind of node relationships figure display methods based on HTML5 and system
CN106789286A (en) * 2016-12-28 2017-05-31 曙光信息产业(北京)有限公司 The display methods and device of a kind of network topological diagram

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Web信息抽取系统的设计与实现;丁乔毅;《中国优秀硕士学位论文全文数据库》;20140515(第05期);正文第8-14页 *

Also Published As

Publication number Publication date
CN107729006A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN104750737B (en) A kind of photograph album management method and device
CN109635120A (en) Construction method, device and the storage medium of knowledge mapping
EP1150215A2 (en) A method of annotating an image
US9720886B2 (en) System and method for dynamic linking between graphic documents and comment data bases
CN110245069A (en) The methods of exhibiting and device of the test method and device of page versions, the page
US20150278248A1 (en) Personal Information Management Service System
US10769196B2 (en) Method and apparatus for displaying electronic photo, and mobile device
CN103761303A (en) Method and device for picture arrangement display
CN112035708A (en) Knowledge graph display method and device, computer device and readable storage medium
JP6351219B2 (en) Image search apparatus, image search method and program
CN107729006B (en) Page information intelligent acquisition tool and method based on pc end
CN112052343A (en) Knowledge graph display method and device, electronic equipment and storage medium
JP2009116638A (en) System and method for supporting understanding of business specification
US10579718B2 (en) System and method for interacting in layers in channels over the display of a resource by another application
CN114647735A (en) Relation graph display method, medium and display device
CN106775337A (en) The method for sorting and mobile terminal of multipad icon
CN107767156A (en) A kind of information input method, apparatus and system
JP2007323474A (en) Ocr system, ocr format parameter preparation method, its program and program recording medium
CN114155547B (en) Chart identification method, device, equipment and storage medium
CN115238662A (en) Bidding file rapid editing method and system
TWI579718B (en) System and Methods for Graphical Resources Management Application for Graphical Resources Management
JP2019067359A (en) System and method for visual exploration of subnetwork patterns in two-mode networks, program, and computer device
CN115248970A (en) Parameterized drawing frame construction method for AutoCAD
CN113407678A (en) Knowledge graph construction method, device and equipment
CN106874684A (en) A kind of image labeling system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant