CN108416034A - Information acquisition system and its control method based on financial isomery big data - Google Patents

Information acquisition system and its control method based on financial isomery big data Download PDF

Info

Publication number
CN108416034A
CN108416034A CN201810201458.1A CN201810201458A CN108416034A CN 108416034 A CN108416034 A CN 108416034A CN 201810201458 A CN201810201458 A CN 201810201458A CN 108416034 A CN108416034 A CN 108416034A
Authority
CN
China
Prior art keywords
information
data
module
rule
financial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810201458.1A
Other languages
Chinese (zh)
Other versions
CN108416034B (en
Inventor
孙善辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810201458.1A priority Critical patent/CN108416034B/en
Publication of CN108416034A publication Critical patent/CN108416034A/en
Application granted granted Critical
Publication of CN108416034B publication Critical patent/CN108416034B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of information acquisition systems and its control method based on financial isomery big data, including internet information source, Linux background service end systems, Web client programming system and client terminal, the internet information source, Linux background service end systems, Web client programming system, client terminal is sequentially connected, the Linux background services end system includes Heterogeneous Information collection and preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and preprocessing module includes reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage.The present invention can take the heterogeneous profiles of financial product and therefrom extract the interested data of user in real time, it is ensured that it is inconvenient to solve the problems, such as that traditional financial field Heterogeneous Information is collected for the actual effect of provided finance data.

Description

Information acquisition system and its control method based on financial isomery big data
Technical field
The present invention relates to information acquisition system technical fields more particularly to a kind of information based on financial isomery big data to adopt Collecting system.
Background technology
With the development of information technology, the behavior for carrying out finance on the internet is more and more.Financial field is all the time There is a large amount of information to be announced by internet, since information beam is huge, information source is not fixed possessed by network itself, text This change expressing feature is apparent, and at present the Financial Information on internet be still in publication in the form of semi-structured based on.And Although the data of structure compared, these Heterogeneous Informations are easy to issue and collect, but level of noise is high, information redundancy amount is big, inconvenient In reading and understanding, therefore effective information extraction becomes most important.
Invention content
The purpose of the present invention is to solve existing financial field information collection level of noise height, information redundancy amount are big, inconvenient In the reading and understanding the problem of, and a kind of information acquisition system and its control method based on financial isomery big data proposed.
To achieve the goals above, present invention employs following technical solutions:
A kind of information acquisition system based on financial isomery big data, including internet information source, Linux background servers System, Web client programming system and client terminal, which is characterized in that the internet information source, Linux background servers System, Web client programming system, client terminal are sequentially connected, and the Linux background services end system includes that Heterogeneous Information is received Collection and preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and pretreatment mould Block, decimation rule generation module, information extraction evaluation module are sequentially connected, and the Heterogeneous Information is collected and preprocessing module includes Reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage, the reptile URL parsings Device, PDF resolvers, search engine retrieving device, html parser, data storage are sequentially connected, and the decimation rule generates mould Block includes that rule sorts out unit, ruled synthesis unit, and the rule sorts out unit and is connected with ruled synthesis unit, and the rule is closed Include adaptation, comparator, extensive device at unit, the adaptation, determining device, extensive device are sequentially connected, and described information extracts Evaluation module includes first database, the second database, the first data comparator, and the first database and the second database are equal It is connected with the first data comparator.
Preferably, the reptile URL parser includes controller module, parsing module, resource library module, the parsing mould Block includes webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit, data storage element, meter Calculation machine analytic unit and computer display unit, the webpage capture unit, webpage information feature extraction unit, webpage information point Class modeling unit is sequentially connected, the Web Information Classification modeling unit and data storage element with computer analytic unit phase Even, the computer analytic unit is connected with computer display unit;The computer analytic unit includes data extractor, number According to receiver and the second data comparator.
Preferably, the extensive device is using the extensive method of rule based on heuristic function, and uses Laplacian errors Estimation is used as heuristic function.
Preferably, the first database includes tri- accuracy rate, recall rate, F-measure parameters, second data There are three pre-set and a reference values corresponding with accuracy rate, recall rate, F-measure respectively for library storage.
Preferably, it is operated according to the following steps:
The first step:First, system searches for the finance production of newest publication on internet information source using reptile URL parser Product, when encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is replaced In generation, devises the resolver of PDF document and Web information in Heterogeneous Information acquisition and preprocessing module, be responsible for heterogeneous profiles into Row parses and therefrom extracts text message, and unloading is subsequent processing data.
Second step:Secondly, in decimation rule generation module, system create-rule collection from the training sample marked It closes, result is imported final rule base by regular collection by clustering and synthesizing.
Third walks:Finally, system is taken out by information extraction evaluation module application rule base in the enterprising row information of unknown data It taking, system is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when When certain extraction task cannot be satisfied preset require, document can be recorded, and preparation enters next Heterogeneous Information and processes Journey.
Compared with prior art, the present invention provides a kind of information acquisition system based on financial isomery big data and its controls Method processed has following advantageous effect:
1, information acquisition system and its management control method based on financial isomery big data, Linux background servers are somebody's turn to do System is responsible for collecting the Heterogeneous Information of financial product from internet information source and goes out structural data from these extracting datas, ties Structure data are for being supplied to Web client programming system, Web client programming system that can carry out data in these data Analysis and research, and it is supplied to client terminal.
2, should information acquisition system and its management control method based on financial isomery big data, Heterogeneous Information collect and Preprocessing module, reptile URL parser search for the financial bulletin information of newest publication from internet information source, and are parsed into PDF Document form, and then by PDF resolvers dissection process at the plain text data of processable form;When encountering the text that can not be handled When shelves, reptile URL parser is processed into web data by search engine retrieving device, and is parsed into plain text through html parser Data.The resolver that PDF document and Web information are devised in Heterogeneous Information collection and preprocessing module, is conducive to a variety of different Structure document carries out parsing and therefrom extracting structured text information and unloading is in data storage, in order to follow-up data Processing.
3, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is generated in decimation rule Block sorts out unit by rule and sorts out the rule for being directed to same target entity in different document, and then obtains same mesh Target rules subset closes, and didactic learning method is used on subclass, belongs to separate document by ruled synthesis unit handle Ruled synthesis be rule normal form, so as to be smoothed out information extraction in the following unknown structure and the document of expression; Specifically, applying adaptation on mark language material, rules subset conjunction is matched on training sample, and regular subsystem can use It is existing generalization rule attempt the entity of the mark sample is covered, can coverage goal when, judged by determining device Whether training set is also had, and no training set is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, there is instruction System can matching of the recurring rule subclass on training sample when practicing collection;When generalized rule can not be to the reality of the mark sample When body is covered, generate the mark sample entity rule can be added to rules subset close in, and by extensive device pair so that It obtains extensive to existing rule progress to this rule.This method obtains general Rule Expression method on mark language material, changes Into the conventional method for needing domain expert to formulate decimation rule.
4, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is assessed in information extraction In block, the first data comparator is by tri- accuracy rate in first database, recall rate, F-measure parameters with the second data Pre-set three a reference values are compared in library, are assessed information extraction effect with realizing.
Description of the drawings
Fig. 1 is a kind of system diagram of the information acquisition system based on financial isomery big data proposed by the present invention;
Fig. 2 is a kind of information acquisition system Linux background servers based on financial isomery big data proposed by the present invention The system diagram of system;
Fig. 3 be a kind of Heterogeneous Information of the information acquisition system based on financial isomery big data proposed by the present invention collect with The system diagram of preprocessing module;
Fig. 4 is that a kind of decimation rule of information acquisition system based on financial isomery big data proposed by the present invention generates mould The system diagram of block;
Fig. 5 is that a kind of information extraction of information acquisition system based on financial isomery big data proposed by the present invention assesses mould The system diagram of block;
Fig. 6 is a kind of reptile URL parser of the information acquisition system based on financial isomery big data proposed by the present invention System diagram;
Fig. 7 is a kind of system of the parsing module of the information acquisition system based on financial isomery big data proposed by the present invention Figure;
Fig. 8 is a kind of computer analytic unit of the information acquisition system based on financial isomery big data proposed by the present invention System diagram;
Fig. 9 is a kind of ruled synthesis unit of the information acquisition system based on financial isomery big data proposed by the present invention System diagram;
Figure 10 is a kind of Heterogeneous Information processing of information acquisition system based on financial isomery big data proposed by the present invention The system diagram of process;
Figure 11 be a kind of information acquisition system based on financial isomery big data proposed by the present invention and its rule generate The system diagram of algorithm.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
Referring to Fig.1-11, a kind of information acquisition system and its management control method based on financial isomery big data, including Internet information source, Linux background services end system, Web client programming system and client terminal, which is characterized in that described Internet information source, Linux background services end system, Web client programming system, client terminal are sequentially connected, the Linux Background service end system include Heterogeneous Information collect and preprocessing module, decimation rule generation module, information extraction evaluation module, The Heterogeneous Information is collected and preprocessing module, decimation rule generation module, information extraction evaluation module are sequentially connected, described different Structure information collect and preprocessing module include reptile URL parser, PDF resolvers, search engine retrieving device, html parser, Data storage, the reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage according to Secondary to be connected, the decimation rule generation module includes that rule sorts out unit, ruled synthesis unit, and the rule sorts out unit and rule Then synthesis unit is connected, and the ruled synthesis unit includes adaptation, comparator, extensive device, the adaptation, determining device, general Change device to be sequentially connected, it includes first database, the second database, the first data comparator, institute that described information, which extracts evaluation module, It states first database and the second database is connected with the first data comparator.
The reptile URL parser includes controller module, parsing module, resource library module, and the parsing module includes Webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit, data storage element, computer point Analyse unit and computer display unit, the webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling Unit is sequentially connected, and the Web Information Classification modeling unit and data storage element are connected with computer analytic unit, institute Computer analytic unit is stated with computer display unit to be connected;The computer analytic unit includes that data extractor, data connect Receive device and the second data comparator.
The extensive device is made using the extensive method of rule based on heuristic function using Laplacian estimation errors For heuristic function.
The first database includes tri- accuracy rate, recall rate, F-measure parameters, second database purchase There are three pre-set and a reference values corresponding with accuracy rate, recall rate, F-measure respectively.
It is operated according to the following steps:
The first step:First, system searches for the finance production of newest publication on internet information source using reptile URL parser Product, when encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is replaced In generation, devises the resolver of PDF document and Web information in Heterogeneous Information acquisition and preprocessing module, be responsible for heterogeneous profiles into Row parses and therefrom extracts text message, and unloading is subsequent processing data.
Second step:Secondly, in decimation rule generation module, system create-rule collection from the training sample marked It closes, result is imported final rule base by regular collection by clustering and synthesizing.
Third walks:Finally, system is taken out by information extraction evaluation module application rule base in the enterprising row information of unknown data It taking, system is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when When certain extraction task cannot be satisfied preset require, document can be recorded, and preparation enters next Heterogeneous Information and processes Journey.
1, information acquisition system and its management control method based on financial isomery big data, Linux background servers are somebody's turn to do System is responsible for collecting the Heterogeneous Information of financial product from internet information source and goes out structural data from these extracting datas, ties Structure data are for being supplied to Web client programming system, Web client programming system that can carry out data in these data Analysis and research, and it is supplied to client terminal.
2, should information acquisition system and its management control method based on financial isomery big data, Heterogeneous Information collect and Preprocessing module, reptile URL parser search for the financial bulletin information of newest publication from internet information source, and are parsed into PDF Document form, and then by PDF resolvers dissection process at the plain text data of processable form;When encountering the text that can not be handled When shelves, reptile URL parser is processed into web data by search engine retrieving device, and is parsed into plain text through html parser Data.The resolver that PDF document and Web information are devised in Heterogeneous Information collection and preprocessing module, is conducive to a variety of different Structure document carries out parsing and therefrom extracting structured text information and unloading is in data storage, in order to follow-up data Processing.
3, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is generated in decimation rule Block sorts out unit by rule and sorts out the rule for being directed to same target entity in different document, and then obtains same mesh Target rules subset closes, and didactic learning method is used on subclass, belongs to separate document by ruled synthesis unit handle Ruled synthesis be rule normal form, so as to be smoothed out information extraction in the following unknown structure and the document of expression; Specifically, applying adaptation on mark language material, rules subset conjunction is matched on training sample, and regular subsystem can use It is existing generalization rule attempt the entity of the mark sample is covered, can coverage goal when, judged by determining device Whether training set is also had, and no training set is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, there is instruction System can matching of the recurring rule subclass on training sample when practicing collection;When generalized rule can not be to the reality of the mark sample When body is covered, generate the mark sample entity rule can be added to rules subset close in, and by extensive device pair so that It obtains extensive to existing rule progress to this rule.This method obtains general Rule Expression method on mark language material, changes Into the conventional method for needing domain expert to formulate decimation rule.
4, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is assessed in information extraction In block, the first data comparator is by tri- accuracy rate in first database, recall rate, F-measure parameters with the second data Pre-set three a reference values are compared in library, are assessed information extraction effect with realizing.
In the present invention, in use, Linux background service end systems are responsible for collecting the different of financial product from internet information source Structure information simultaneously goes out structural data from these extracting datas, specifically, reptile URL parser is searched for most from internet information source The financial bulletin information newly issued, and be parsed into PDF document form, and then by PDF resolvers dissection process at shape can be handled The plain text data of formula;When encountering the document that can not be handled, reptile URL parser handles networking by search engine retrieving device Page data, and it is parsed into plain text data through html parser.PDF document is devised in Heterogeneous Information collection and preprocessing module With the resolver of Web information, be conducive to that a variety of heterogeneous profiles are carried out parsing and therefrom extracting structured text information and be turned There are in data storage, in order to the processing of follow-up data, in decimation rule generation module, sorting out unit by rule will not Sorted out with the rule for being directed to same target entity in document, and then the rules subset for obtaining same target closes, in subclass It is upper use didactic learning method, by ruled synthesis unit belong to separate document ruled synthesis be rule normal form, So as to be smoothed out information extraction in the following unknown structure and the document of expression;Specifically, being applied on mark language material Adaptation, rules subset conjunction are matched on training sample, and regular subsystem can use existing generalization rule trial pair The entity of the mark sample is covered, can coverage goal when, also training set, no training set are judged whether by determining device It is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, system can recurring rule when having training set The matching being integrated on training sample;When generalized rule can not cover the entity of the mark sample, the mark is generated The rule of note sample entity can be added in rules subset conjunction, and by extensive device pair so as to this rule to existing rule It carries out extensive.This method obtains general Rule Expression method on mark language material, improves and domain expert is needed to formulate pumping Take rule conventional method, in information extraction evaluation module, the first data comparator by first database accuracy rate, call together Tri- rate of returning, F-measure parameters are compared with pre-set three a reference values in the second database, to realize to information Effect is extracted to be assessed.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (5)

1. based on the information acquisition system of financial isomery big data, including internet information source, Linux background services end system, Web client programming system and client terminal, which is characterized in that the internet information source, Linux background services end system, Web client programming system, client terminal are sequentially connected, the Linux background services end system include Heterogeneous Information collect and Preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and preprocessing module, pumping Rule generation module, information extraction evaluation module is taken to be sequentially connected, the Heterogeneous Information is collected and preprocessing module includes reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage, the reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage are sequentially connected, the decimation rule generation module packet It includes rule and sorts out unit, ruled synthesis unit, the rule sorts out unit and is connected with ruled synthesis unit, the ruled synthesis list Member includes adaptation, comparator, extensive device, and the adaptation, determining device, extensive device are sequentially connected, and described information extracts assessment Module includes first database, the second database, the first data comparator, and the first database and the second database are with One data comparator is connected.
2. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that the reptile URL parser includes controller module, parsing module, resource library module, and the parsing module includes webpage capture unit, webpage Information characteristics extraction unit, Web Information Classification modeling unit, data storage element, computer analytic unit and Computer display Unit, the webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit are sequentially connected, described Web Information Classification modeling unit and data storage element are connected with computer analytic unit, the computer analytic unit with Computer display unit is connected;The computer analytic unit includes that data extractor, data sink and the second data compare Device.
3. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that described extensive Device uses Laplacian estimation errors as heuristic function using the extensive method of rule based on heuristic function.
4. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that described first Database includes tri- accuracy rate, recall rate, F-measure parameters, and there are three pre-set for second database purchase And a reference value corresponding with accuracy rate, recall rate, F-measure respectively.
5. the control method of the information acquisition system according to claim 1 based on financial isomery big data, feature exist In being operated according to the following steps:
The first step:First, system searches for the financial product of newest publication on internet information source using reptile URL parser, When encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is substituted, The resolver that PDF document and Web information are devised in Heterogeneous Information acquisition and preprocessing module is responsible for solving heterogeneous profiles Text message is analysed and therefrom extracts, unloading is subsequent processing data;
Second step:Secondly, in decimation rule generation module, system create-rule set from the training sample marked, rule Then result is imported final rule base by set by cluster and synthesis;
Third walks:Finally, system is extracted by information extraction evaluation module application rule base in the enterprising row information of unknown data, is System is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when certain Extraction task cannot be satisfied it is preset document can be recorded when requiring, and preparation enter next Heterogeneous Information processing procedure.
CN201810201458.1A 2018-03-12 2018-03-12 Information acquisition system based on financial heterogeneous big data and control method thereof Active CN108416034B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810201458.1A CN108416034B (en) 2018-03-12 2018-03-12 Information acquisition system based on financial heterogeneous big data and control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810201458.1A CN108416034B (en) 2018-03-12 2018-03-12 Information acquisition system based on financial heterogeneous big data and control method thereof

Publications (2)

Publication Number Publication Date
CN108416034A true CN108416034A (en) 2018-08-17
CN108416034B CN108416034B (en) 2021-11-16

Family

ID=63131071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810201458.1A Active CN108416034B (en) 2018-03-12 2018-03-12 Information acquisition system based on financial heterogeneous big data and control method thereof

Country Status (1)

Country Link
CN (1) CN108416034B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635252A (en) * 2018-10-25 2019-04-16 北京中关村科金技术有限公司 A kind of insurance products key message analytic method, apparatus and system based on PDF format
CN110889632A (en) * 2019-11-27 2020-03-17 国网能源研究院有限公司 Data monitoring and analyzing system of company image improving system
CN111209322A (en) * 2019-12-26 2020-05-29 上海大智慧财汇数据科技有限公司 Financial information acquisition and processing system and method
CN112035837A (en) * 2020-07-31 2020-12-04 中国人民解放军战略支援部队信息工程大学 Malicious PDF document detection system and method based on mimicry defense
CN113253659A (en) * 2021-06-04 2021-08-13 厦门致上信息科技有限公司 Financial big data automatic acquisition and intelligent analysis system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201323A1 (en) * 2004-11-22 2008-08-21 Aol Llc Method and apparatus for a ranking engine
CN101582075A (en) * 2009-06-24 2009-11-18 大连海事大学 Web information extraction system
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN102750316A (en) * 2012-04-25 2012-10-24 北京航空航天大学 Concept relation label drawing method based on semantic co-occurrence model
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN103324761A (en) * 2013-07-11 2013-09-25 广州市尊网商通资讯科技有限公司 Product database forming method based on Internet data and system
CN104794211A (en) * 2015-04-24 2015-07-22 清华大学 Method and system for extracting sentiment inducements and analyzing inducement elements based on microblog text
CN104881488A (en) * 2015-06-05 2015-09-02 焦点科技股份有限公司 Relational table-based extraction method of configurable information
CN104933095A (en) * 2015-05-22 2015-09-23 中国电子科技集团公司第十研究所 Heterogeneous information universality correlation analysis system and analysis method thereof
CN106202044A (en) * 2016-07-07 2016-12-07 武汉理工大学 A kind of entity relation extraction method based on deep neural network
CN106294885A (en) * 2016-10-09 2017-01-04 华东师范大学 A kind of data collection towards isomery webpage and mask method
CN106354843A (en) * 2016-08-31 2017-01-25 虎扑(上海)文化传播股份有限公司 Web crawler system and method
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201323A1 (en) * 2004-11-22 2008-08-21 Aol Llc Method and apparatus for a ranking engine
CN101582075A (en) * 2009-06-24 2009-11-18 大连海事大学 Web information extraction system
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
CN102750316A (en) * 2012-04-25 2012-10-24 北京航空航天大学 Concept relation label drawing method based on semantic co-occurrence model
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN103324761A (en) * 2013-07-11 2013-09-25 广州市尊网商通资讯科技有限公司 Product database forming method based on Internet data and system
CN104794211A (en) * 2015-04-24 2015-07-22 清华大学 Method and system for extracting sentiment inducements and analyzing inducement elements based on microblog text
CN104933095A (en) * 2015-05-22 2015-09-23 中国电子科技集团公司第十研究所 Heterogeneous information universality correlation analysis system and analysis method thereof
CN104881488A (en) * 2015-06-05 2015-09-02 焦点科技股份有限公司 Relational table-based extraction method of configurable information
CN106202044A (en) * 2016-07-07 2016-12-07 武汉理工大学 A kind of entity relation extraction method based on deep neural network
CN106354843A (en) * 2016-08-31 2017-01-25 虎扑(上海)文化传播股份有限公司 Web crawler system and method
CN106294885A (en) * 2016-10-09 2017-01-04 华东师范大学 A kind of data collection towards isomery webpage and mask method
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段青玲等: "基于Web数据的农业网络信息自动采集与分类系统", 《农业工程学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635252A (en) * 2018-10-25 2019-04-16 北京中关村科金技术有限公司 A kind of insurance products key message analytic method, apparatus and system based on PDF format
CN110889632A (en) * 2019-11-27 2020-03-17 国网能源研究院有限公司 Data monitoring and analyzing system of company image improving system
CN110889632B (en) * 2019-11-27 2023-10-13 国网能源研究院有限公司 Data monitoring and analyzing system of company image lifting system
CN111209322A (en) * 2019-12-26 2020-05-29 上海大智慧财汇数据科技有限公司 Financial information acquisition and processing system and method
CN111209322B (en) * 2019-12-26 2023-12-15 上海大智慧财汇数据科技有限公司 Financial information acquisition processing system and method
CN112035837A (en) * 2020-07-31 2020-12-04 中国人民解放军战略支援部队信息工程大学 Malicious PDF document detection system and method based on mimicry defense
CN112035837B (en) * 2020-07-31 2023-06-20 中国人民解放军战略支援部队信息工程大学 Malicious PDF document detection system and method based on mimicry defense
CN113253659A (en) * 2021-06-04 2021-08-13 厦门致上信息科技有限公司 Financial big data automatic acquisition and intelligent analysis system

Also Published As

Publication number Publication date
CN108416034B (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN108416034A (en) Information acquisition system and its control method based on financial isomery big data
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN102073726B (en) Structured data import method and device for search engine system
CN109886294A (en) Knowledge fusion method, apparatus, computer equipment and storage medium
CN104573028A (en) Intelligent question-answer implementing method and system
US20080208836A1 (en) Regression framework for learning ranking functions using relative preferences
CN103226578A (en) Method for identifying websites and finely classifying web pages in medical field
CN109145260A (en) A kind of text information extraction method
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN115098650B (en) Comment information analysis method based on historical data model and related device
CN105069103A (en) Method and system for APP search engine to utilize client comment
CN105335487A (en) Agricultural specialist information retrieval system and method on basis of agricultural technology information ontology library
CN104899324A (en) Sample training system based on IDC (internet data center) harmful information monitoring system
CN111666766A (en) Data processing method, device and equipment
CN103310013A (en) Subject-oriented web page collection system
CN102609539B (en) Search method and search system
CN103530312A (en) User identification method and system using multifaceted footprints
Wang et al. Multi-modal transformer using two-level visual features for fake news detection
CN117743564B (en) Automatic extraction and recommendation method and system for technological policy information
CN114722188A (en) Advertisement generation method, device and equipment based on operation data and storage medium
CN114900346A (en) Network security testing method and system based on knowledge graph
CN113918794A (en) Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium
Chakraborty et al. Clustering of web sessions by FOGSAA
CN105447148A (en) Cookie identifier association method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant