CN108416034A - Information acquisition system and its control method based on financial isomery big data - Google Patents
Information acquisition system and its control method based on financial isomery big data Download PDFInfo
- Publication number
- CN108416034A CN108416034A CN201810201458.1A CN201810201458A CN108416034A CN 108416034 A CN108416034 A CN 108416034A CN 201810201458 A CN201810201458 A CN 201810201458A CN 108416034 A CN108416034 A CN 108416034A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- module
- rule
- financial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of information acquisition systems and its control method based on financial isomery big data, including internet information source, Linux background service end systems, Web client programming system and client terminal, the internet information source, Linux background service end systems, Web client programming system, client terminal is sequentially connected, the Linux background services end system includes Heterogeneous Information collection and preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and preprocessing module includes reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage.The present invention can take the heterogeneous profiles of financial product and therefrom extract the interested data of user in real time, it is ensured that it is inconvenient to solve the problems, such as that traditional financial field Heterogeneous Information is collected for the actual effect of provided finance data.
Description
Technical field
The present invention relates to information acquisition system technical fields more particularly to a kind of information based on financial isomery big data to adopt
Collecting system.
Background technology
With the development of information technology, the behavior for carrying out finance on the internet is more and more.Financial field is all the time
There is a large amount of information to be announced by internet, since information beam is huge, information source is not fixed possessed by network itself, text
This change expressing feature is apparent, and at present the Financial Information on internet be still in publication in the form of semi-structured based on.And
Although the data of structure compared, these Heterogeneous Informations are easy to issue and collect, but level of noise is high, information redundancy amount is big, inconvenient
In reading and understanding, therefore effective information extraction becomes most important.
Invention content
The purpose of the present invention is to solve existing financial field information collection level of noise height, information redundancy amount are big, inconvenient
In the reading and understanding the problem of, and a kind of information acquisition system and its control method based on financial isomery big data proposed.
To achieve the goals above, present invention employs following technical solutions:
A kind of information acquisition system based on financial isomery big data, including internet information source, Linux background servers
System, Web client programming system and client terminal, which is characterized in that the internet information source, Linux background servers
System, Web client programming system, client terminal are sequentially connected, and the Linux background services end system includes that Heterogeneous Information is received
Collection and preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and pretreatment mould
Block, decimation rule generation module, information extraction evaluation module are sequentially connected, and the Heterogeneous Information is collected and preprocessing module includes
Reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage, the reptile URL parsings
Device, PDF resolvers, search engine retrieving device, html parser, data storage are sequentially connected, and the decimation rule generates mould
Block includes that rule sorts out unit, ruled synthesis unit, and the rule sorts out unit and is connected with ruled synthesis unit, and the rule is closed
Include adaptation, comparator, extensive device at unit, the adaptation, determining device, extensive device are sequentially connected, and described information extracts
Evaluation module includes first database, the second database, the first data comparator, and the first database and the second database are equal
It is connected with the first data comparator.
Preferably, the reptile URL parser includes controller module, parsing module, resource library module, the parsing mould
Block includes webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit, data storage element, meter
Calculation machine analytic unit and computer display unit, the webpage capture unit, webpage information feature extraction unit, webpage information point
Class modeling unit is sequentially connected, the Web Information Classification modeling unit and data storage element with computer analytic unit phase
Even, the computer analytic unit is connected with computer display unit;The computer analytic unit includes data extractor, number
According to receiver and the second data comparator.
Preferably, the extensive device is using the extensive method of rule based on heuristic function, and uses Laplacian errors
Estimation is used as heuristic function.
Preferably, the first database includes tri- accuracy rate, recall rate, F-measure parameters, second data
There are three pre-set and a reference values corresponding with accuracy rate, recall rate, F-measure respectively for library storage.
Preferably, it is operated according to the following steps:
The first step:First, system searches for the finance production of newest publication on internet information source using reptile URL parser
Product, when encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is replaced
In generation, devises the resolver of PDF document and Web information in Heterogeneous Information acquisition and preprocessing module, be responsible for heterogeneous profiles into
Row parses and therefrom extracts text message, and unloading is subsequent processing data.
Second step:Secondly, in decimation rule generation module, system create-rule collection from the training sample marked
It closes, result is imported final rule base by regular collection by clustering and synthesizing.
Third walks:Finally, system is taken out by information extraction evaluation module application rule base in the enterprising row information of unknown data
It taking, system is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when
When certain extraction task cannot be satisfied preset require, document can be recorded, and preparation enters next Heterogeneous Information and processes
Journey.
Compared with prior art, the present invention provides a kind of information acquisition system based on financial isomery big data and its controls
Method processed has following advantageous effect:
1, information acquisition system and its management control method based on financial isomery big data, Linux background servers are somebody's turn to do
System is responsible for collecting the Heterogeneous Information of financial product from internet information source and goes out structural data from these extracting datas, ties
Structure data are for being supplied to Web client programming system, Web client programming system that can carry out data in these data
Analysis and research, and it is supplied to client terminal.
2, should information acquisition system and its management control method based on financial isomery big data, Heterogeneous Information collect and
Preprocessing module, reptile URL parser search for the financial bulletin information of newest publication from internet information source, and are parsed into PDF
Document form, and then by PDF resolvers dissection process at the plain text data of processable form;When encountering the text that can not be handled
When shelves, reptile URL parser is processed into web data by search engine retrieving device, and is parsed into plain text through html parser
Data.The resolver that PDF document and Web information are devised in Heterogeneous Information collection and preprocessing module, is conducive to a variety of different
Structure document carries out parsing and therefrom extracting structured text information and unloading is in data storage, in order to follow-up data
Processing.
3, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is generated in decimation rule
Block sorts out unit by rule and sorts out the rule for being directed to same target entity in different document, and then obtains same mesh
Target rules subset closes, and didactic learning method is used on subclass, belongs to separate document by ruled synthesis unit handle
Ruled synthesis be rule normal form, so as to be smoothed out information extraction in the following unknown structure and the document of expression;
Specifically, applying adaptation on mark language material, rules subset conjunction is matched on training sample, and regular subsystem can use
It is existing generalization rule attempt the entity of the mark sample is covered, can coverage goal when, judged by determining device
Whether training set is also had, and no training set is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, there is instruction
System can matching of the recurring rule subclass on training sample when practicing collection;When generalized rule can not be to the reality of the mark sample
When body is covered, generate the mark sample entity rule can be added to rules subset close in, and by extensive device pair so that
It obtains extensive to existing rule progress to this rule.This method obtains general Rule Expression method on mark language material, changes
Into the conventional method for needing domain expert to formulate decimation rule.
4, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is assessed in information extraction
In block, the first data comparator is by tri- accuracy rate in first database, recall rate, F-measure parameters with the second data
Pre-set three a reference values are compared in library, are assessed information extraction effect with realizing.
Description of the drawings
Fig. 1 is a kind of system diagram of the information acquisition system based on financial isomery big data proposed by the present invention;
Fig. 2 is a kind of information acquisition system Linux background servers based on financial isomery big data proposed by the present invention
The system diagram of system;
Fig. 3 be a kind of Heterogeneous Information of the information acquisition system based on financial isomery big data proposed by the present invention collect with
The system diagram of preprocessing module;
Fig. 4 is that a kind of decimation rule of information acquisition system based on financial isomery big data proposed by the present invention generates mould
The system diagram of block;
Fig. 5 is that a kind of information extraction of information acquisition system based on financial isomery big data proposed by the present invention assesses mould
The system diagram of block;
Fig. 6 is a kind of reptile URL parser of the information acquisition system based on financial isomery big data proposed by the present invention
System diagram;
Fig. 7 is a kind of system of the parsing module of the information acquisition system based on financial isomery big data proposed by the present invention
Figure;
Fig. 8 is a kind of computer analytic unit of the information acquisition system based on financial isomery big data proposed by the present invention
System diagram;
Fig. 9 is a kind of ruled synthesis unit of the information acquisition system based on financial isomery big data proposed by the present invention
System diagram;
Figure 10 is a kind of Heterogeneous Information processing of information acquisition system based on financial isomery big data proposed by the present invention
The system diagram of process;
Figure 11 be a kind of information acquisition system based on financial isomery big data proposed by the present invention and its rule generate
The system diagram of algorithm.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
Referring to Fig.1-11, a kind of information acquisition system and its management control method based on financial isomery big data, including
Internet information source, Linux background services end system, Web client programming system and client terminal, which is characterized in that described
Internet information source, Linux background services end system, Web client programming system, client terminal are sequentially connected, the Linux
Background service end system include Heterogeneous Information collect and preprocessing module, decimation rule generation module, information extraction evaluation module,
The Heterogeneous Information is collected and preprocessing module, decimation rule generation module, information extraction evaluation module are sequentially connected, described different
Structure information collect and preprocessing module include reptile URL parser, PDF resolvers, search engine retrieving device, html parser,
Data storage, the reptile URL parser, PDF resolvers, search engine retrieving device, html parser, data storage according to
Secondary to be connected, the decimation rule generation module includes that rule sorts out unit, ruled synthesis unit, and the rule sorts out unit and rule
Then synthesis unit is connected, and the ruled synthesis unit includes adaptation, comparator, extensive device, the adaptation, determining device, general
Change device to be sequentially connected, it includes first database, the second database, the first data comparator, institute that described information, which extracts evaluation module,
It states first database and the second database is connected with the first data comparator.
The reptile URL parser includes controller module, parsing module, resource library module, and the parsing module includes
Webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit, data storage element, computer point
Analyse unit and computer display unit, the webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling
Unit is sequentially connected, and the Web Information Classification modeling unit and data storage element are connected with computer analytic unit, institute
Computer analytic unit is stated with computer display unit to be connected;The computer analytic unit includes that data extractor, data connect
Receive device and the second data comparator.
The extensive device is made using the extensive method of rule based on heuristic function using Laplacian estimation errors
For heuristic function.
The first database includes tri- accuracy rate, recall rate, F-measure parameters, second database purchase
There are three pre-set and a reference values corresponding with accuracy rate, recall rate, F-measure respectively.
It is operated according to the following steps:
The first step:First, system searches for the finance production of newest publication on internet information source using reptile URL parser
Product, when encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is replaced
In generation, devises the resolver of PDF document and Web information in Heterogeneous Information acquisition and preprocessing module, be responsible for heterogeneous profiles into
Row parses and therefrom extracts text message, and unloading is subsequent processing data.
Second step:Secondly, in decimation rule generation module, system create-rule collection from the training sample marked
It closes, result is imported final rule base by regular collection by clustering and synthesizing.
Third walks:Finally, system is taken out by information extraction evaluation module application rule base in the enterprising row information of unknown data
It taking, system is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when
When certain extraction task cannot be satisfied preset require, document can be recorded, and preparation enters next Heterogeneous Information and processes
Journey.
1, information acquisition system and its management control method based on financial isomery big data, Linux background servers are somebody's turn to do
System is responsible for collecting the Heterogeneous Information of financial product from internet information source and goes out structural data from these extracting datas, ties
Structure data are for being supplied to Web client programming system, Web client programming system that can carry out data in these data
Analysis and research, and it is supplied to client terminal.
2, should information acquisition system and its management control method based on financial isomery big data, Heterogeneous Information collect and
Preprocessing module, reptile URL parser search for the financial bulletin information of newest publication from internet information source, and are parsed into PDF
Document form, and then by PDF resolvers dissection process at the plain text data of processable form;When encountering the text that can not be handled
When shelves, reptile URL parser is processed into web data by search engine retrieving device, and is parsed into plain text through html parser
Data.The resolver that PDF document and Web information are devised in Heterogeneous Information collection and preprocessing module, is conducive to a variety of different
Structure document carries out parsing and therefrom extracting structured text information and unloading is in data storage, in order to follow-up data
Processing.
3, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is generated in decimation rule
Block sorts out unit by rule and sorts out the rule for being directed to same target entity in different document, and then obtains same mesh
Target rules subset closes, and didactic learning method is used on subclass, belongs to separate document by ruled synthesis unit handle
Ruled synthesis be rule normal form, so as to be smoothed out information extraction in the following unknown structure and the document of expression;
Specifically, applying adaptation on mark language material, rules subset conjunction is matched on training sample, and regular subsystem can use
It is existing generalization rule attempt the entity of the mark sample is covered, can coverage goal when, judged by determining device
Whether training set is also had, and no training set is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, there is instruction
System can matching of the recurring rule subclass on training sample when practicing collection;When generalized rule can not be to the reality of the mark sample
When body is covered, generate the mark sample entity rule can be added to rules subset close in, and by extensive device pair so that
It obtains extensive to existing rule progress to this rule.This method obtains general Rule Expression method on mark language material, changes
Into the conventional method for needing domain expert to formulate decimation rule.
4, it is somebody's turn to do information acquisition system and its management control method based on financial isomery big data, mould is assessed in information extraction
In block, the first data comparator is by tri- accuracy rate in first database, recall rate, F-measure parameters with the second data
Pre-set three a reference values are compared in library, are assessed information extraction effect with realizing.
In the present invention, in use, Linux background service end systems are responsible for collecting the different of financial product from internet information source
Structure information simultaneously goes out structural data from these extracting datas, specifically, reptile URL parser is searched for most from internet information source
The financial bulletin information newly issued, and be parsed into PDF document form, and then by PDF resolvers dissection process at shape can be handled
The plain text data of formula;When encountering the document that can not be handled, reptile URL parser handles networking by search engine retrieving device
Page data, and it is parsed into plain text data through html parser.PDF document is devised in Heterogeneous Information collection and preprocessing module
With the resolver of Web information, be conducive to that a variety of heterogeneous profiles are carried out parsing and therefrom extracting structured text information and be turned
There are in data storage, in order to the processing of follow-up data, in decimation rule generation module, sorting out unit by rule will not
Sorted out with the rule for being directed to same target entity in document, and then the rules subset for obtaining same target closes, in subclass
It is upper use didactic learning method, by ruled synthesis unit belong to separate document ruled synthesis be rule normal form,
So as to be smoothed out information extraction in the following unknown structure and the document of expression;Specifically, being applied on mark language material
Adaptation, rules subset conjunction are matched on training sample, and regular subsystem can use existing generalization rule trial pair
The entity of the mark sample is covered, can coverage goal when, also training set, no training set are judged whether by determining device
It is the rule generation that system can complete rules subset conjunction, and ultimately forms rule base, system can recurring rule when having training set
The matching being integrated on training sample;When generalized rule can not cover the entity of the mark sample, the mark is generated
The rule of note sample entity can be added in rules subset conjunction, and by extensive device pair so as to this rule to existing rule
It carries out extensive.This method obtains general Rule Expression method on mark language material, improves and domain expert is needed to formulate pumping
Take rule conventional method, in information extraction evaluation module, the first data comparator by first database accuracy rate, call together
Tri- rate of returning, F-measure parameters are compared with pre-set three a reference values in the second database, to realize to information
Effect is extracted to be assessed.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Any one skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.
Claims (5)
1. based on the information acquisition system of financial isomery big data, including internet information source, Linux background services end system,
Web client programming system and client terminal, which is characterized in that the internet information source, Linux background services end system,
Web client programming system, client terminal are sequentially connected, the Linux background services end system include Heterogeneous Information collect and
Preprocessing module, decimation rule generation module, information extraction evaluation module, the Heterogeneous Information is collected and preprocessing module, pumping
Rule generation module, information extraction evaluation module is taken to be sequentially connected, the Heterogeneous Information is collected and preprocessing module includes reptile
URL parser, PDF resolvers, search engine retrieving device, html parser, data storage, the reptile URL parser,
PDF resolvers, search engine retrieving device, html parser, data storage are sequentially connected, the decimation rule generation module packet
It includes rule and sorts out unit, ruled synthesis unit, the rule sorts out unit and is connected with ruled synthesis unit, the ruled synthesis list
Member includes adaptation, comparator, extensive device, and the adaptation, determining device, extensive device are sequentially connected, and described information extracts assessment
Module includes first database, the second database, the first data comparator, and the first database and the second database are with
One data comparator is connected.
2. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that the reptile
URL parser includes controller module, parsing module, resource library module, and the parsing module includes webpage capture unit, webpage
Information characteristics extraction unit, Web Information Classification modeling unit, data storage element, computer analytic unit and Computer display
Unit, the webpage capture unit, webpage information feature extraction unit, Web Information Classification modeling unit are sequentially connected, described
Web Information Classification modeling unit and data storage element are connected with computer analytic unit, the computer analytic unit with
Computer display unit is connected;The computer analytic unit includes that data extractor, data sink and the second data compare
Device.
3. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that described extensive
Device uses Laplacian estimation errors as heuristic function using the extensive method of rule based on heuristic function.
4. the information acquisition system according to claim 1 based on financial isomery big data, which is characterized in that described first
Database includes tri- accuracy rate, recall rate, F-measure parameters, and there are three pre-set for second database purchase
And a reference value corresponding with accuracy rate, recall rate, F-measure respectively.
5. the control method of the information acquisition system according to claim 1 based on financial isomery big data, feature exist
In being operated according to the following steps:
The first step:First, system searches for the financial product of newest publication on internet information source using reptile URL parser,
When encountering the PDF document that can not be handled, reptile URL parser is retrieved Web page by search engine retrieving device and is substituted,
The resolver that PDF document and Web information are devised in Heterogeneous Information acquisition and preprocessing module is responsible for solving heterogeneous profiles
Text message is analysed and therefrom extracts, unloading is subsequent processing data;
Second step:Secondly, in decimation rule generation module, system create-rule set from the training sample marked, rule
Then result is imported final rule base by set by cluster and synthesis;
Third walks:Finally, system is extracted by information extraction evaluation module application rule base in the enterprising row information of unknown data, is
System is in iteration operating status, and Heterogeneous Information is collected and preprocessing module constantly provides text message to subsequent module, when certain
Extraction task cannot be satisfied it is preset document can be recorded when requiring, and preparation enter next Heterogeneous Information processing procedure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810201458.1A CN108416034B (en) | 2018-03-12 | 2018-03-12 | Information acquisition system based on financial heterogeneous big data and control method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810201458.1A CN108416034B (en) | 2018-03-12 | 2018-03-12 | Information acquisition system based on financial heterogeneous big data and control method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108416034A true CN108416034A (en) | 2018-08-17 |
CN108416034B CN108416034B (en) | 2021-11-16 |
Family
ID=63131071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810201458.1A Active CN108416034B (en) | 2018-03-12 | 2018-03-12 | Information acquisition system based on financial heterogeneous big data and control method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108416034B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635252A (en) * | 2018-10-25 | 2019-04-16 | 北京中关村科金技术有限公司 | A kind of insurance products key message analytic method, apparatus and system based on PDF format |
CN110889632A (en) * | 2019-11-27 | 2020-03-17 | 国网能源研究院有限公司 | Data monitoring and analyzing system of company image improving system |
CN111209322A (en) * | 2019-12-26 | 2020-05-29 | 上海大智慧财汇数据科技有限公司 | Financial information acquisition and processing system and method |
CN112035837A (en) * | 2020-07-31 | 2020-12-04 | 中国人民解放军战略支援部队信息工程大学 | Malicious PDF document detection system and method based on mimicry defense |
CN113253659A (en) * | 2021-06-04 | 2021-08-13 | 厦门致上信息科技有限公司 | Financial big data automatic acquisition and intelligent analysis system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080201323A1 (en) * | 2004-11-22 | 2008-08-21 | Aol Llc | Method and apparatus for a ranking engine |
CN101582075A (en) * | 2009-06-24 | 2009-11-18 | 大连海事大学 | Web information extraction system |
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN102750316A (en) * | 2012-04-25 | 2012-10-24 | 北京航空航天大学 | Concept relation label drawing method based on semantic co-occurrence model |
CN103049575A (en) * | 2013-01-05 | 2013-04-17 | 华中科技大学 | Topic-adaptive academic conference searching system |
CN103324761A (en) * | 2013-07-11 | 2013-09-25 | 广州市尊网商通资讯科技有限公司 | Product database forming method based on Internet data and system |
CN104794211A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Method and system for extracting sentiment inducements and analyzing inducement elements based on microblog text |
CN104881488A (en) * | 2015-06-05 | 2015-09-02 | 焦点科技股份有限公司 | Relational table-based extraction method of configurable information |
CN104933095A (en) * | 2015-05-22 | 2015-09-23 | 中国电子科技集团公司第十研究所 | Heterogeneous information universality correlation analysis system and analysis method thereof |
CN106202044A (en) * | 2016-07-07 | 2016-12-07 | 武汉理工大学 | A kind of entity relation extraction method based on deep neural network |
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN106354843A (en) * | 2016-08-31 | 2017-01-25 | 虎扑(上海)文化传播股份有限公司 | Web crawler system and method |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
-
2018
- 2018-03-12 CN CN201810201458.1A patent/CN108416034B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080201323A1 (en) * | 2004-11-22 | 2008-08-21 | Aol Llc | Method and apparatus for a ranking engine |
CN101582075A (en) * | 2009-06-24 | 2009-11-18 | 大连海事大学 | Web information extraction system |
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN102750316A (en) * | 2012-04-25 | 2012-10-24 | 北京航空航天大学 | Concept relation label drawing method based on semantic co-occurrence model |
CN102708096A (en) * | 2012-05-29 | 2012-10-03 | 代松 | Network intelligence public sentiment monitoring system based on semantics and work method thereof |
CN103049575A (en) * | 2013-01-05 | 2013-04-17 | 华中科技大学 | Topic-adaptive academic conference searching system |
CN103324761A (en) * | 2013-07-11 | 2013-09-25 | 广州市尊网商通资讯科技有限公司 | Product database forming method based on Internet data and system |
CN104794211A (en) * | 2015-04-24 | 2015-07-22 | 清华大学 | Method and system for extracting sentiment inducements and analyzing inducement elements based on microblog text |
CN104933095A (en) * | 2015-05-22 | 2015-09-23 | 中国电子科技集团公司第十研究所 | Heterogeneous information universality correlation analysis system and analysis method thereof |
CN104881488A (en) * | 2015-06-05 | 2015-09-02 | 焦点科技股份有限公司 | Relational table-based extraction method of configurable information |
CN106202044A (en) * | 2016-07-07 | 2016-12-07 | 武汉理工大学 | A kind of entity relation extraction method based on deep neural network |
CN106354843A (en) * | 2016-08-31 | 2017-01-25 | 虎扑(上海)文化传播股份有限公司 | Web crawler system and method |
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
Non-Patent Citations (1)
Title |
---|
段青玲等: "基于Web数据的农业网络信息自动采集与分类系统", 《农业工程学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635252A (en) * | 2018-10-25 | 2019-04-16 | 北京中关村科金技术有限公司 | A kind of insurance products key message analytic method, apparatus and system based on PDF format |
CN110889632A (en) * | 2019-11-27 | 2020-03-17 | 国网能源研究院有限公司 | Data monitoring and analyzing system of company image improving system |
CN110889632B (en) * | 2019-11-27 | 2023-10-13 | 国网能源研究院有限公司 | Data monitoring and analyzing system of company image lifting system |
CN111209322A (en) * | 2019-12-26 | 2020-05-29 | 上海大智慧财汇数据科技有限公司 | Financial information acquisition and processing system and method |
CN111209322B (en) * | 2019-12-26 | 2023-12-15 | 上海大智慧财汇数据科技有限公司 | Financial information acquisition processing system and method |
CN112035837A (en) * | 2020-07-31 | 2020-12-04 | 中国人民解放军战略支援部队信息工程大学 | Malicious PDF document detection system and method based on mimicry defense |
CN112035837B (en) * | 2020-07-31 | 2023-06-20 | 中国人民解放军战略支援部队信息工程大学 | Malicious PDF document detection system and method based on mimicry defense |
CN113253659A (en) * | 2021-06-04 | 2021-08-13 | 厦门致上信息科技有限公司 | Financial big data automatic acquisition and intelligent analysis system |
Also Published As
Publication number | Publication date |
---|---|
CN108416034B (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416034A (en) | Information acquisition system and its control method based on financial isomery big data | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
CN102073726B (en) | Structured data import method and device for search engine system | |
CN109886294A (en) | Knowledge fusion method, apparatus, computer equipment and storage medium | |
CN104573028A (en) | Intelligent question-answer implementing method and system | |
US20080208836A1 (en) | Regression framework for learning ranking functions using relative preferences | |
CN103226578A (en) | Method for identifying websites and finely classifying web pages in medical field | |
CN109145260A (en) | A kind of text information extraction method | |
CN102270212A (en) | User interest feature extraction method based on hidden semi-Markov model | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
CN115098650B (en) | Comment information analysis method based on historical data model and related device | |
CN105069103A (en) | Method and system for APP search engine to utilize client comment | |
CN105335487A (en) | Agricultural specialist information retrieval system and method on basis of agricultural technology information ontology library | |
CN104899324A (en) | Sample training system based on IDC (internet data center) harmful information monitoring system | |
CN111666766A (en) | Data processing method, device and equipment | |
CN103310013A (en) | Subject-oriented web page collection system | |
CN102609539B (en) | Search method and search system | |
CN103530312A (en) | User identification method and system using multifaceted footprints | |
Wang et al. | Multi-modal transformer using two-level visual features for fake news detection | |
CN117743564B (en) | Automatic extraction and recommendation method and system for technological policy information | |
CN114722188A (en) | Advertisement generation method, device and equipment based on operation data and storage medium | |
CN114900346A (en) | Network security testing method and system based on knowledge graph | |
CN113918794A (en) | Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium | |
Chakraborty et al. | Clustering of web sessions by FOGSAA | |
CN105447148A (en) | Cookie identifier association method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |