CN107729006A - A kind of page info intelligent acquisition instrument and method based on pc ends - Google Patents
A kind of page info intelligent acquisition instrument and method based on pc ends Download PDFInfo
- Publication number
- CN107729006A CN107729006A CN201711034890.8A CN201711034890A CN107729006A CN 107729006 A CN107729006 A CN 107729006A CN 201711034890 A CN201711034890 A CN 201711034890A CN 107729006 A CN107729006 A CN 107729006A
- Authority
- CN
- China
- Prior art keywords
- data
- page
- node
- relation
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/34—Graphical or visual programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/38—Creation or generation of source code for implementing user interfaces
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention provides a kind of page info intelligent acquisition instrument and method based on pc ends, including the page can extract data display module, page data extraction module, page data processing module and page data displaying operation module;The page can extract data display module and extractible information be marked into displaying during user's browsing pages;The page data extraction module carries out data extraction to the page and classified;The page data processing module merges to the page data extracted;Page data displaying operation module shows on painting canvas the data extracted and its relation in a manner of node and line.The present invention can help user to carry out data extraction from multiple pages, and the intelligent data by extraction merge according to certain relation, reduce analysis time of the user to data after extraction, reduce workload.The data of extraction are shown simultaneously, and supply user-defined editor, facilitate user to be analyzed.
Description
Technical field
The present invention relates to web internet arenas, and in particular to a kind of based on the page info intelligent acquisition instrument at pc ends and side
Method.
Background technology
The web page element that existing page data extracting tool selects according to user, utilize the parser of node and corresponding
Info web extraction action needed for configuration parameter, the data pick-up of the page is come out, so as to reach info web extraction
Purpose.Although data all extract, these data are typically all that being set out for wall scroll wall scroll is come, and then show use
Family is gone to browse.
1) which data can be extracted on the page, and which can not be extracted, and be not identified clearly, be had to user necessarily
Obscure effect.
2) information extracted is all that wall scroll is scattered, not associating between information and information
3) information after extracting can not be changed again when displaying, is added, is deleted, opening relationships, addition point
Group, addition remarks etc., is advantageous to the operation that user is browsed.
4) sectional drawing can not be carried out to the data of extraction, data preserve, export, the operations such as annex uploads, are advantageous under user
The secondary operation for continuing to browse and backing up
5) data after extraction can not be analyzed again, that is, using the data of extraction as information source, then carry out two
Secondary analysis, be advantageous to secondary positioning of the user to problem, be easy to depth analysis data.
The content of the invention
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention provides a kind of page letter based on pc ends
Intelligent acquisition instrument and method are ceased, extraction is quick succinct, can help the quick orientation problem of user, substantially increases work effect
Rate.
Technical scheme:
A kind of page info intelligent acquisition instrument based on pc ends, including the page can extract data display module, page number
Operation module is shown according to extraction module, page data processing module and page data;
The page can extract data display module according to the data type information set in advance on the page, clear in user
Look at and displaying is marked in extractible information during the page;
The page data extraction module carries out data extraction to the page according to the mark data of different types of data and divided
Class, the page data after being classified;
Between the page data that the page data processing module is extracted by the page data extraction module
Mark data judges the relation between data, and the page data extracted is carried out according to data type information set in advance
Merge;
The page data displaying operation module is drawing the data extracted and its relation in a manner of node and line
Shown on cloth.
Also include subsequent analysis module, the subsequent analysis module is operated to the subsequent analysis of multiple node contents, listed
Analyzable data, secondary analysis are carried out to choosing data by the analysis item that user freely chooses, and analysis result is shown
Out.
The data type set in advance includes five class data, is task data, case data, card data, relation
And object data, event trace data.
The mark displaying is shown using highlighted prompting.
Relation between the data is specially:If the data of same node relationships are then merged under same node,
If two datas meet relational structure, so opening relationships connects.
A kind of page info intelligent acquisition method, including step:
(1) the preliminary setting data type information on the page, the data type include five class data, be task data,
Case data, card data, relation and object data, event trace data;
(2) during user's browsing pages, according to data type information set in advance on the page, to extractible page
Displaying is marked in face data, and user screens to the page data of required extraction;
(3) after the page data of extraction required for user has screened, data extraction is carried out to the page and according to required
The mark data of different types of data is classified in the page data of extraction, the page data after being classified;
(4) mark data between the page data extracted by comparison step (3) judges the relation between data, and root
The page data extracted according to step (1) data type information set in advance to step (3) merges;
(5) amalgamation result of step (4) is shown in a manner of node and line on painting canvas.
In the step (5), node that can be on freely dragging painting canvas is laid out, on the node and the node
Content on content, the relation and the relation line carries out self-defined editor.
The self-defined editor specifically includes:
Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation;Wherein determine to save according to type
Point acquiescence shows icon, and node border color is selected according to emphasis rank, if personnel's node, selects identity card picture as section
Point icon;
Addition relation manually:Choice relation line and relationship type, and typing related content;
Modification relation:Edition Contains is carried out to relation, dragging relation line carries out position editor;
Deletion relation;
Newly-built block functions:To organizing the newly-built packet of outer multiple or individual node;
It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;
Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is in corresponding packet
Directly unbind;
Function is cancelled in packet:Cancel the packet specified.
Beneficial effect:The present invention can help user, and from the extraction of multiple pages progress data, (data that can be extracted all are high
Bright, can find out at a glance, facilitate user to be identified), and the intelligent data by these extractions are according to certain relation
Merge or establish connection, reduce analysis time of the user to data after extraction, reduce workload.Simultaneously by the number of extraction
According to being shown, and user-defined editor is supplied, facilitate user to be analyzed.The small tool module provided is coordinated to help
User carries out disposed of in its entirety to the task belonging to data and (preserves, addition annex, remarks, save as, export, navigation, hi-bright mode
Deng), the data extracted can be carried out with the secondary analysis (subsequent operation) of diversification in addition, extract useful data, removed
It is incoherent.The quick orientation problem of user can be helped, substantially increases operating efficiency.
Brief description of the drawings
Fig. 1 is the structural representation of the present invention.
Fig. 2 is the schematic diagram of inventive article module.
Fig. 3 is the schematic diagram of present invention displaying operation module.
Embodiment
The present invention is further described below in conjunction with the accompanying drawings.
The page info intelligent acquisition based on pc ends of the present invention, by backstage restful interfaces, front stage operation Node.js
Environment, the JavaScript function storehouse D3.js with generating figure are used in combination with so that single foreground, rear table frame string
Connection gets up, the system for forming a front and back separation, including:
The page can extract data display module:For extractible information to be entered into rower during user's browsing pages
Note height is put forward, and is shown to user, is easy to user to screen.
The present invention has pre-defined specific information format on the page, contains the conventional data class needed for user
Type, data type include five class data, are task data, case data, card data, relation and object data, event trace
Data.Wherein, task data is mission bit stream data;Case data are case relevant information;Card data is identity card, electricity
The information such as words, name;Event trace represents action message, such as when in place;Relation and object data represent each number
Relation between, so as to it to being matched between each data, opening relationships.The data of these types are carrying out the page
It can be shown when extraction.In the present invention, it is shown using highlighted prompting.
Page data extraction module:The page can extract after data screen in user, according to different types of data
Page iden-tity the page data extraction and classify, the page data after being classified.It is additionally provided with the module " single
Abstraction function ", " a key abstraction function ";So as to carry out individual data extraction and key extraction user-selected number evidence respectively.
Wherein, page iden-tity is as follows:
Task data:
Case data:
<Span ibox_case_id=" " ibox_case_code=" " ibox_case_name=" "
Ibox_case_remark=" "></span>'
Card data:
Relation and object data:
<tr>
<td>
<Span class=" iboxExtract " srctype=" 15014 " srcvalue=" AJBH111111 " ibox_
Case_code=xxx ibox_case_name=xxx ibox_case_remark=xxxx ibox_relation_type_
Src=" 15014 " ibox_relation_value_src=" AJBH111111 " desttype=" 11097 " destvalue
=" 653221198010111752 " ibox_relation_type_dest=" 11097 " ibox_relation_value_
Dest=" 653221198010111752 " relationtypes=" 8 "></span>
</td>
<td>xxxxx</td>
<td>xxxxxxxx</td>
</tr>
Event trace data:
Page data processing module:The data of extraction are merged, according to specific data structure, by comparing two
Mark data in data judges whether to have relation between data and is what relation:If the data of same node relationships are then
It is merged under same node, if two datas meet relational structure, so opening relationships connects.Mould is merged by page data
Page data becomes not scattered after block processing, single.
Page data shows operation module:The visualization model developed based on D3.js graph functions, by the number of extraction
According to and its relation shown on painting canvas;With node and line, the physical relationship between them is indicated.
Node on painting canvas is freely dragged, is laid out.Content on node can be with self-defined editor, on relation line
Content self-defined can also edit, block functions can also be provided, node is grouped and remarks;Specifically include:
Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation.Wherein determine to save according to type
Point acquiescence shows icon, selects node border color according to emphasis rank, identity card picture can be selected to make if personnel's node
For node icon;
Addition relation manually:Choice relation line and relationship type, and typing related content;
Modification relation:Editor is selected to carry out Edition Contains to the relation, dragging relation line can change its position, click on
Save button submits modification operation;
Deletion relation:Selection deletion can delete the relation, click on save button and submit deletion action;
Newly-built block functions:It is automatic during newly-built packet to organizing outer multiple or individual node right button menu " newly-built packet "
Group name to one acquiescence of packet (with " newly-built packet "+numeral, to give tacit consent to name);
It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;
Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is in corresponding packet
Directly unbind;
Function is cancelled in packet:Specified packet is cancelled.
Subsequent analysis module:The subsequent analysis of multiple node contents is operated, selects attribute function, attribute includes name,
Identity card, phone, vehicle, address etc.;The subsequent analysis of multiple node contents is operated, lists analyzable data, can be certainly
Secondary analysis are carried out by choosing analysis item, and to choosing data, secondary analysis can jump to the results page of corresponding analysis module,
Analysis result is shown, extraction is got up with module analysis the two function strings, makes what the data processing after extraction became
Diversification.
The present invention is additionally provided with tool model, including:
Manual newly-built task:Task is added manually for user, and typing task names preserve the task;
Data extract establishment task:After user's extraction is associated with the business module data of task identification, according to the data
Automatically create task;
Modification task:For selecting and when opening existing task, changing remarks and the preservation of task;
Deletion task:For selecting existing task, deletion task operating is carried out, the task is associated and all deleted;
Refresh:For loading data newest in storehouse
Text Feature Extraction adds attribute:, being capable of self-defined addition attribute for providing Text Feature Extraction the function of addition attribute;
Uploading pictures:Support self-defined uploading nodes picture, relation picture;
Upload annex:Support self-defined upload task annex
Task saves as:User specifies a task, and the task is saved as operating, by predecessor's business and its all sections
The contents such as point, relation copy in the lump, are preserved with new task name;
Navigation:The navigation button is provided, shows navigation picture;
Navigation Picture exports:Export button is provided, exports the picture in Present navigation frame.
Described above is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (8)
- A kind of 1. page info intelligent acquisition instrument based on pc ends, it is characterised in that:Data display mould is can extract including the page Block, page data extraction module, page data processing module and page data displaying operation module;The page can extract data display module according to the data type information set in advance on the page, in user's browse page Displaying is marked in extractible information during face;The page data extraction module carries out data extraction to the page according to the mark data of different types of data and classified, and obtains Page data after to classification;Mark between the page data that the page data processing module is extracted by the page data extraction module Data judge the relation between data, and the page data extracted is closed according to data type information set in advance And;Page data displaying operation module by the data extracted and its relation in a manner of node and line on painting canvas Show.
- 2. page info intelligent acquisition instrument according to claim 1, it is characterised in that:Also include subsequent analysis module, The subsequent analysis module operates to the subsequent analysis of multiple node contents, lists analyzable data, is freely hooked by user The analysis item of choosing carries out secondary analysis to choosing data, and analysis result is shown.
- 3. page info intelligent acquisition instrument according to claim 1, it is characterised in that:The data class set in advance Type includes five class data, is task data, case data, card data, relation and object data, event trace data.
- 4. page info intelligent acquisition instrument according to claim 1, it is characterised in that:The mark displaying is using highlighted Prompting is shown.
- 5. page info intelligent acquisition instrument according to claim 1, it is characterised in that:Relation tool between the data Body is:If the data of same node relationships are then merged under same node, if two datas meet relational structure so Opening relationships connects.
- 6. a kind of page info intelligent acquisition method of page info intelligent acquisition instrument according to claim 1, it is special Sign is:Including step:(1) the preliminary setting data type information on the page, the data type include five class data, are task data, case Data, card data, relation and object data, event trace data;(2) during user's browsing pages, according to data type information set in advance on the page, to extractible page number According to displaying is marked, user screens to the page data of required extraction;(3) after the page data of extraction required for user has screened, data extraction is carried out to the page and according to required extraction Page data in the mark data of different types of data classified, the page data after being classified;(4) mark data between the page data extracted by comparison step (3) judges the relation between data, and according to step Suddenly the page data that (1) data type information set in advance is extracted to step (3) merges;(5) amalgamation result of step (4) is shown in a manner of node and line on painting canvas.
- 7. page info intelligent acquisition method according to claim 6, it is characterised in that:, can be free in the step (5) Node on dragging painting canvas is laid out, on the content on the node and the node, the relation and the relation line Content carry out self-defined editor.
- 8. page info intelligent acquisition method according to claim 7, it is characterised in that:The specific bag of self-defined editor Include:Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation;Wherein determine that node is write from memory according to type Recognize display icon, node border color is selected according to emphasis rank, if personnel's node, select identity card picture as node diagram Mark;Addition relation manually:Choice relation line and relationship type, and typing related content;Modification relation:Edition Contains is carried out to relation, dragging relation line carries out position editor;Deletion relation;Newly-built block functions:To organizing the newly-built packet of outer multiple or individual node;It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is direct in corresponding packet Unbind;Function is cancelled in packet:Cancel the packet specified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711034890.8A CN107729006B (en) | 2017-10-30 | 2017-10-30 | Page information intelligent acquisition tool and method based on pc end |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711034890.8A CN107729006B (en) | 2017-10-30 | 2017-10-30 | Page information intelligent acquisition tool and method based on pc end |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107729006A true CN107729006A (en) | 2018-02-23 |
CN107729006B CN107729006B (en) | 2021-06-04 |
Family
ID=61202384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711034890.8A Active CN107729006B (en) | 2017-10-30 | 2017-10-30 | Page information intelligent acquisition tool and method based on pc end |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107729006B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958877A (en) * | 2018-08-15 | 2018-12-07 | 北京无线电计量测试研究所 | A kind of drafting system and method for real-time update acquisition image data |
CN109284598A (en) * | 2018-07-23 | 2019-01-29 | 深圳点猫科技有限公司 | A kind of method and electronic equipment generating electronic identity card in the education cloud platform page |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160380875A1 (en) * | 2012-10-05 | 2016-12-29 | Google Inc. | Identifying referral pages based on recorded url requests |
CN106484408A (en) * | 2016-09-29 | 2017-03-08 | 电子科技大学 | A kind of node relationships figure display methods based on HTML5 and system |
CN106789286A (en) * | 2016-12-28 | 2017-05-31 | 曙光信息产业(北京)有限公司 | The display methods and device of a kind of network topological diagram |
-
2017
- 2017-10-30 CN CN201711034890.8A patent/CN107729006B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160380875A1 (en) * | 2012-10-05 | 2016-12-29 | Google Inc. | Identifying referral pages based on recorded url requests |
CN106484408A (en) * | 2016-09-29 | 2017-03-08 | 电子科技大学 | A kind of node relationships figure display methods based on HTML5 and system |
CN106789286A (en) * | 2016-12-28 | 2017-05-31 | 曙光信息产业(北京)有限公司 | The display methods and device of a kind of network topological diagram |
Non-Patent Citations (1)
Title |
---|
丁乔毅: "Web信息抽取系统的设计与实现", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284598A (en) * | 2018-07-23 | 2019-01-29 | 深圳点猫科技有限公司 | A kind of method and electronic equipment generating electronic identity card in the education cloud platform page |
CN109284598B (en) * | 2018-07-23 | 2020-11-10 | 深圳点猫科技有限公司 | Method for generating electronic identity card on education cloud platform page and electronic equipment |
CN108958877A (en) * | 2018-08-15 | 2018-12-07 | 北京无线电计量测试研究所 | A kind of drafting system and method for real-time update acquisition image data |
Also Published As
Publication number | Publication date |
---|---|
CN107729006B (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10140392B1 (en) | Computer aided systems and methods for creating custom products | |
AU654019B2 (en) | User interface system having programmable user interface elements | |
US10108586B2 (en) | Previews of electronic notes | |
US20040135815A1 (en) | Method and apparatus for image metadata entry | |
US20070050697A1 (en) | Integrated spreadsheet expanding table with collapsable columns | |
US20060212455A1 (en) | Method and system for organizing image files based upon workflow | |
CN108139794A (en) | Browser is can customize for computer file system and Email | |
US20100107050A1 (en) | Digital photo frame with annotation function and method thereof | |
CN103761303B (en) | The arrangement display methods and device of a kind of picture | |
KR20170035313A (en) | System and method for creating electronic laboratory note | |
US20180084198A1 (en) | Method of displaying images in a multi-dimensional mode based on personalized topics | |
CN107729006A (en) | A kind of page info intelligent acquisition instrument and method based on pc ends | |
US9864479B2 (en) | System and method for managing and reviewing document integration and updates | |
CN108228846A (en) | Resource file management method and device | |
CN105242832A (en) | Method and device for displaying screen locking information flow | |
JP2009527830A (en) | Method and system for creating a tree file in a computer | |
CN105094581B (en) | The method and apparatus of information processing | |
US20190310999A1 (en) | A method and apparatus for cataloguing an electronic document | |
JP2005173705A (en) | Conference support system, program and storage medium | |
TW201348993A (en) | Information processing device and information processing method | |
KR101045850B1 (en) | Device for providing digital timeline through the website | |
Artese et al. | Good 50x70 project: a portal for cultural and social campaigns | |
TW201810078A (en) | System and methods for graphical resources management application for graphical resources management | |
JP4618595B2 (en) | Specification creation support device | |
JP4969153B2 (en) | Article management apparatus, article providing method, computer program, and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |