CN107729006A - A kind of page info intelligent acquisition instrument and method based on pc ends - Google Patents

A kind of page info intelligent acquisition instrument and method based on pc ends Download PDF

Info

Publication number
CN107729006A
CN107729006A CN201711034890.8A CN201711034890A CN107729006A CN 107729006 A CN107729006 A CN 107729006A CN 201711034890 A CN201711034890 A CN 201711034890A CN 107729006 A CN107729006 A CN 107729006A
Authority
CN
China
Prior art keywords
data
page
node
relation
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711034890.8A
Other languages
Chinese (zh)
Other versions
CN107729006B (en
Inventor
张�林
高树
王立钧
徐新皎
郑跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Weishi Technology Co Ltd
Original Assignee
Nanjing Weishi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Weishi Technology Co Ltd filed Critical Nanjing Weishi Technology Co Ltd
Priority to CN201711034890.8A priority Critical patent/CN107729006B/en
Publication of CN107729006A publication Critical patent/CN107729006A/en
Application granted granted Critical
Publication of CN107729006B publication Critical patent/CN107729006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/34Graphical or visual programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a kind of page info intelligent acquisition instrument and method based on pc ends, including the page can extract data display module, page data extraction module, page data processing module and page data displaying operation module;The page can extract data display module and extractible information be marked into displaying during user's browsing pages;The page data extraction module carries out data extraction to the page and classified;The page data processing module merges to the page data extracted;Page data displaying operation module shows on painting canvas the data extracted and its relation in a manner of node and line.The present invention can help user to carry out data extraction from multiple pages, and the intelligent data by extraction merge according to certain relation, reduce analysis time of the user to data after extraction, reduce workload.The data of extraction are shown simultaneously, and supply user-defined editor, facilitate user to be analyzed.

Description

A kind of page info intelligent acquisition instrument and method based on pc ends
Technical field
The present invention relates to web internet arenas, and in particular to a kind of based on the page info intelligent acquisition instrument at pc ends and side Method.
Background technology
The web page element that existing page data extracting tool selects according to user, utilize the parser of node and corresponding Info web extraction action needed for configuration parameter, the data pick-up of the page is come out, so as to reach info web extraction Purpose.Although data all extract, these data are typically all that being set out for wall scroll wall scroll is come, and then show use Family is gone to browse.
1) which data can be extracted on the page, and which can not be extracted, and be not identified clearly, be had to user necessarily Obscure effect.
2) information extracted is all that wall scroll is scattered, not associating between information and information
3) information after extracting can not be changed again when displaying, is added, is deleted, opening relationships, addition point Group, addition remarks etc., is advantageous to the operation that user is browsed.
4) sectional drawing can not be carried out to the data of extraction, data preserve, export, the operations such as annex uploads, are advantageous under user The secondary operation for continuing to browse and backing up
5) data after extraction can not be analyzed again, that is, using the data of extraction as information source, then carry out two Secondary analysis, be advantageous to secondary positioning of the user to problem, be easy to depth analysis data.
The content of the invention
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention provides a kind of page letter based on pc ends Intelligent acquisition instrument and method are ceased, extraction is quick succinct, can help the quick orientation problem of user, substantially increases work effect Rate.
Technical scheme:
A kind of page info intelligent acquisition instrument based on pc ends, including the page can extract data display module, page number Operation module is shown according to extraction module, page data processing module and page data;
The page can extract data display module according to the data type information set in advance on the page, clear in user Look at and displaying is marked in extractible information during the page;
The page data extraction module carries out data extraction to the page according to the mark data of different types of data and divided Class, the page data after being classified;
Between the page data that the page data processing module is extracted by the page data extraction module Mark data judges the relation between data, and the page data extracted is carried out according to data type information set in advance Merge;
The page data displaying operation module is drawing the data extracted and its relation in a manner of node and line Shown on cloth.
Also include subsequent analysis module, the subsequent analysis module is operated to the subsequent analysis of multiple node contents, listed Analyzable data, secondary analysis are carried out to choosing data by the analysis item that user freely chooses, and analysis result is shown Out.
The data type set in advance includes five class data, is task data, case data, card data, relation And object data, event trace data.
The mark displaying is shown using highlighted prompting.
Relation between the data is specially:If the data of same node relationships are then merged under same node, If two datas meet relational structure, so opening relationships connects.
A kind of page info intelligent acquisition method, including step:
(1) the preliminary setting data type information on the page, the data type include five class data, be task data, Case data, card data, relation and object data, event trace data;
(2) during user's browsing pages, according to data type information set in advance on the page, to extractible page Displaying is marked in face data, and user screens to the page data of required extraction;
(3) after the page data of extraction required for user has screened, data extraction is carried out to the page and according to required The mark data of different types of data is classified in the page data of extraction, the page data after being classified;
(4) mark data between the page data extracted by comparison step (3) judges the relation between data, and root The page data extracted according to step (1) data type information set in advance to step (3) merges;
(5) amalgamation result of step (4) is shown in a manner of node and line on painting canvas.
In the step (5), node that can be on freely dragging painting canvas is laid out, on the node and the node Content on content, the relation and the relation line carries out self-defined editor.
The self-defined editor specifically includes:
Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation;Wherein determine to save according to type Point acquiescence shows icon, and node border color is selected according to emphasis rank, if personnel's node, selects identity card picture as section Point icon;
Addition relation manually:Choice relation line and relationship type, and typing related content;
Modification relation:Edition Contains is carried out to relation, dragging relation line carries out position editor;
Deletion relation;
Newly-built block functions:To organizing the newly-built packet of outer multiple or individual node;
It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;
Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is in corresponding packet Directly unbind;
Function is cancelled in packet:Cancel the packet specified.
Beneficial effect:The present invention can help user, and from the extraction of multiple pages progress data, (data that can be extracted all are high Bright, can find out at a glance, facilitate user to be identified), and the intelligent data by these extractions are according to certain relation Merge or establish connection, reduce analysis time of the user to data after extraction, reduce workload.Simultaneously by the number of extraction According to being shown, and user-defined editor is supplied, facilitate user to be analyzed.The small tool module provided is coordinated to help User carries out disposed of in its entirety to the task belonging to data and (preserves, addition annex, remarks, save as, export, navigation, hi-bright mode Deng), the data extracted can be carried out with the secondary analysis (subsequent operation) of diversification in addition, extract useful data, removed It is incoherent.The quick orientation problem of user can be helped, substantially increases operating efficiency.
Brief description of the drawings
Fig. 1 is the structural representation of the present invention.
Fig. 2 is the schematic diagram of inventive article module.
Fig. 3 is the schematic diagram of present invention displaying operation module.
Embodiment
The present invention is further described below in conjunction with the accompanying drawings.
The page info intelligent acquisition based on pc ends of the present invention, by backstage restful interfaces, front stage operation Node.js Environment, the JavaScript function storehouse D3.js with generating figure are used in combination with so that single foreground, rear table frame string Connection gets up, the system for forming a front and back separation, including:
The page can extract data display module:For extractible information to be entered into rower during user's browsing pages Note height is put forward, and is shown to user, is easy to user to screen.
The present invention has pre-defined specific information format on the page, contains the conventional data class needed for user Type, data type include five class data, are task data, case data, card data, relation and object data, event trace Data.Wherein, task data is mission bit stream data;Case data are case relevant information;Card data is identity card, electricity The information such as words, name;Event trace represents action message, such as when in place;Relation and object data represent each number Relation between, so as to it to being matched between each data, opening relationships.The data of these types are carrying out the page It can be shown when extraction.In the present invention, it is shown using highlighted prompting.
Page data extraction module:The page can extract after data screen in user, according to different types of data Page iden-tity the page data extraction and classify, the page data after being classified.It is additionally provided with the module " single Abstraction function ", " a key abstraction function ";So as to carry out individual data extraction and key extraction user-selected number evidence respectively.
Wherein, page iden-tity is as follows:
Task data:
Case data:
<Span ibox_case_id=" " ibox_case_code=" " ibox_case_name=" "
Ibox_case_remark=" "></span>'
Card data:
Relation and object data:
<tr>
<td>
<Span class=" iboxExtract " srctype=" 15014 " srcvalue=" AJBH111111 " ibox_ Case_code=xxx ibox_case_name=xxx ibox_case_remark=xxxx ibox_relation_type_ Src=" 15014 " ibox_relation_value_src=" AJBH111111 " desttype=" 11097 " destvalue =" 653221198010111752 " ibox_relation_type_dest=" 11097 " ibox_relation_value_ Dest=" 653221198010111752 " relationtypes=" 8 "></span>
</td>
<td>xxxxx</td>
<td>xxxxxxxx</td>
</tr>
Event trace data:
Page data processing module:The data of extraction are merged, according to specific data structure, by comparing two Mark data in data judges whether to have relation between data and is what relation:If the data of same node relationships are then It is merged under same node, if two datas meet relational structure, so opening relationships connects.Mould is merged by page data Page data becomes not scattered after block processing, single.
Page data shows operation module:The visualization model developed based on D3.js graph functions, by the number of extraction According to and its relation shown on painting canvas;With node and line, the physical relationship between them is indicated.
Node on painting canvas is freely dragged, is laid out.Content on node can be with self-defined editor, on relation line Content self-defined can also edit, block functions can also be provided, node is grouped and remarks;Specifically include:
Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation.Wherein determine to save according to type Point acquiescence shows icon, selects node border color according to emphasis rank, identity card picture can be selected to make if personnel's node For node icon;
Addition relation manually:Choice relation line and relationship type, and typing related content;
Modification relation:Editor is selected to carry out Edition Contains to the relation, dragging relation line can change its position, click on Save button submits modification operation;
Deletion relation:Selection deletion can delete the relation, click on save button and submit deletion action;
Newly-built block functions:It is automatic during newly-built packet to organizing outer multiple or individual node right button menu " newly-built packet " Group name to one acquiescence of packet (with " newly-built packet "+numeral, to give tacit consent to name);
It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;
Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is in corresponding packet Directly unbind;
Function is cancelled in packet:Specified packet is cancelled.
Subsequent analysis module:The subsequent analysis of multiple node contents is operated, selects attribute function, attribute includes name, Identity card, phone, vehicle, address etc.;The subsequent analysis of multiple node contents is operated, lists analyzable data, can be certainly Secondary analysis are carried out by choosing analysis item, and to choosing data, secondary analysis can jump to the results page of corresponding analysis module, Analysis result is shown, extraction is got up with module analysis the two function strings, makes what the data processing after extraction became Diversification.
The present invention is additionally provided with tool model, including:
Manual newly-built task:Task is added manually for user, and typing task names preserve the task;
Data extract establishment task:After user's extraction is associated with the business module data of task identification, according to the data Automatically create task;
Modification task:For selecting and when opening existing task, changing remarks and the preservation of task;
Deletion task:For selecting existing task, deletion task operating is carried out, the task is associated and all deleted;
Refresh:For loading data newest in storehouse
Text Feature Extraction adds attribute:, being capable of self-defined addition attribute for providing Text Feature Extraction the function of addition attribute;
Uploading pictures:Support self-defined uploading nodes picture, relation picture;
Upload annex:Support self-defined upload task annex
Task saves as:User specifies a task, and the task is saved as operating, by predecessor's business and its all sections The contents such as point, relation copy in the lump, are preserved with new task name;
Navigation:The navigation button is provided, shows navigation picture;
Navigation Picture exports:Export button is provided, exports the picture in Present navigation frame.
Described above is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (8)

  1. A kind of 1. page info intelligent acquisition instrument based on pc ends, it is characterised in that:Data display mould is can extract including the page Block, page data extraction module, page data processing module and page data displaying operation module;
    The page can extract data display module according to the data type information set in advance on the page, in user's browse page Displaying is marked in extractible information during face;
    The page data extraction module carries out data extraction to the page according to the mark data of different types of data and classified, and obtains Page data after to classification;
    Mark between the page data that the page data processing module is extracted by the page data extraction module Data judge the relation between data, and the page data extracted is closed according to data type information set in advance And;
    Page data displaying operation module by the data extracted and its relation in a manner of node and line on painting canvas Show.
  2. 2. page info intelligent acquisition instrument according to claim 1, it is characterised in that:Also include subsequent analysis module, The subsequent analysis module operates to the subsequent analysis of multiple node contents, lists analyzable data, is freely hooked by user The analysis item of choosing carries out secondary analysis to choosing data, and analysis result is shown.
  3. 3. page info intelligent acquisition instrument according to claim 1, it is characterised in that:The data class set in advance Type includes five class data, is task data, case data, card data, relation and object data, event trace data.
  4. 4. page info intelligent acquisition instrument according to claim 1, it is characterised in that:The mark displaying is using highlighted Prompting is shown.
  5. 5. page info intelligent acquisition instrument according to claim 1, it is characterised in that:Relation tool between the data Body is:If the data of same node relationships are then merged under same node, if two datas meet relational structure so Opening relationships connects.
  6. 6. a kind of page info intelligent acquisition method of page info intelligent acquisition instrument according to claim 1, it is special Sign is:Including step:
    (1) the preliminary setting data type information on the page, the data type include five class data, are task data, case Data, card data, relation and object data, event trace data;
    (2) during user's browsing pages, according to data type information set in advance on the page, to extractible page number According to displaying is marked, user screens to the page data of required extraction;
    (3) after the page data of extraction required for user has screened, data extraction is carried out to the page and according to required extraction Page data in the mark data of different types of data classified, the page data after being classified;
    (4) mark data between the page data extracted by comparison step (3) judges the relation between data, and according to step Suddenly the page data that (1) data type information set in advance is extracted to step (3) merges;
    (5) amalgamation result of step (4) is shown in a manner of node and line on painting canvas.
  7. 7. page info intelligent acquisition method according to claim 6, it is characterised in that:, can be free in the step (5) Node on dragging painting canvas is laid out, on the content on the node and the node, the relation and the relation line Content carry out self-defined editor.
  8. 8. page info intelligent acquisition method according to claim 7, it is characterised in that:The specific bag of self-defined editor Include:
    Addition node manually:Node type, emphasis rank are selected, is added on graph of a relation;Wherein determine that node is write from memory according to type Recognize display icon, node border color is selected according to emphasis rank, if personnel's node, select identity card picture as node diagram Mark;
    Addition relation manually:Choice relation line and relationship type, and typing related content;
    Modification relation:Edition Contains is carried out to relation, dragging relation line carries out position editor;
    Deletion relation;
    Newly-built block functions:To organizing the newly-built packet of outer multiple or individual node;
    It is grouped editting function:Enter edlin to specified packet, include the modification of packet name and remarks;
    Node unbinds function:Single or multiple nodes are carried out unbinding operation, the node chosen is direct in corresponding packet Unbind;
    Function is cancelled in packet:Cancel the packet specified.
CN201711034890.8A 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end Active CN107729006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711034890.8A CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711034890.8A CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Publications (2)

Publication Number Publication Date
CN107729006A true CN107729006A (en) 2018-02-23
CN107729006B CN107729006B (en) 2021-06-04

Family

ID=61202384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711034890.8A Active CN107729006B (en) 2017-10-30 2017-10-30 Page information intelligent acquisition tool and method based on pc end

Country Status (1)

Country Link
CN (1) CN107729006B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958877A (en) * 2018-08-15 2018-12-07 北京无线电计量测试研究所 A kind of drafting system and method for real-time update acquisition image data
CN109284598A (en) * 2018-07-23 2019-01-29 深圳点猫科技有限公司 A kind of method and electronic equipment generating electronic identity card in the education cloud platform page

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160380875A1 (en) * 2012-10-05 2016-12-29 Google Inc. Identifying referral pages based on recorded url requests
CN106484408A (en) * 2016-09-29 2017-03-08 电子科技大学 A kind of node relationships figure display methods based on HTML5 and system
CN106789286A (en) * 2016-12-28 2017-05-31 曙光信息产业(北京)有限公司 The display methods and device of a kind of network topological diagram

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160380875A1 (en) * 2012-10-05 2016-12-29 Google Inc. Identifying referral pages based on recorded url requests
CN106484408A (en) * 2016-09-29 2017-03-08 电子科技大学 A kind of node relationships figure display methods based on HTML5 and system
CN106789286A (en) * 2016-12-28 2017-05-31 曙光信息产业(北京)有限公司 The display methods and device of a kind of network topological diagram

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁乔毅: "Web信息抽取系统的设计与实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284598A (en) * 2018-07-23 2019-01-29 深圳点猫科技有限公司 A kind of method and electronic equipment generating electronic identity card in the education cloud platform page
CN109284598B (en) * 2018-07-23 2020-11-10 深圳点猫科技有限公司 Method for generating electronic identity card on education cloud platform page and electronic equipment
CN108958877A (en) * 2018-08-15 2018-12-07 北京无线电计量测试研究所 A kind of drafting system and method for real-time update acquisition image data

Also Published As

Publication number Publication date
CN107729006B (en) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2019005562A1 (en) Computer aided systems and methods for creating custom products
US20040135815A1 (en) Method and apparatus for image metadata entry
US20070050697A1 (en) Integrated spreadsheet expanding table with collapsable columns
CN107111420B (en) Multi-level menu selection device, method for operating multi-level menu selection device, and computer-readable non-transitory recording medium
CN104221356A (en) Application for creating photo albums
US20060212455A1 (en) Method and system for organizing image files based upon workflow
RU2003112730A (en) SYSTEM AND METHOD FOR USER MODIFICATION OF METADATA IN A BROWSER-SHELL
US20100107050A1 (en) Digital photo frame with annotation function and method thereof
CN108399240A (en) Enterprise&#39;s modification information data digging method and system
CN103761303A (en) Method and device for picture arrangement display
KR20170035313A (en) System and method for creating electronic laboratory note
CN111259644A (en) Rich text processing method, editor, equipment and storage medium
CN107729006A (en) A kind of page info intelligent acquisition instrument and method based on pc ends
US9864479B2 (en) System and method for managing and reviewing document integration and updates
CN104978577B (en) Information processing method, device and electronic equipment
CN108228846A (en) Resource file management method and device
CN105242832A (en) Method and device for displaying screen locking information flow
JP2009527830A (en) Method and system for creating a tree file in a computer
CN105094581B (en) The method and apparatus of information processing
US10338780B2 (en) System and method for graphical resources management and computer program product with application for graphical resources management
JP7244158B2 (en) Document management system, electronic device, document management method, method, and program
JP2005173705A (en) Conference support system, program and storage medium
TW201348993A (en) Information processing device and information processing method
KR101045850B1 (en) Device for providing digital timeline through the website
Artese et al. Good 50x70 project: a portal for cultural and social campaigns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant