CN105447202A - Internet information collecting system - Google Patents

Internet information collecting system Download PDF

Info

Publication number
CN105447202A
CN105447202A CN201511032832.2A CN201511032832A CN105447202A CN 105447202 A CN105447202 A CN 105447202A CN 201511032832 A CN201511032832 A CN 201511032832A CN 105447202 A CN105447202 A CN 105447202A
Authority
CN
China
Prior art keywords
information
unit
internet
module
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511032832.2A
Other languages
Chinese (zh)
Inventor
方净
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NINGBO PUBINFO INDUSTRY Co Ltd
Original Assignee
NINGBO PUBINFO INDUSTRY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NINGBO PUBINFO INDUSTRY Co Ltd filed Critical NINGBO PUBINFO INDUSTRY Co Ltd
Priority to CN201511032832.2A priority Critical patent/CN105447202A/en
Publication of CN105447202A publication Critical patent/CN105447202A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of internet, in particular to an internet information collecting system. An information source recognition unit is used for recognizing an information source correlated to keywords and obtaining a path of the information source according to the keywords input by a user; an information collecting unit is used for obtaining information correlated to the information source according to the path; a filtering and analyzing unit is used for recognizing and analyzing the collected information and filtering information which is not correlated to the keywords; a semantic analyzing unit is used for carrying out semantic analyzing on stored information; a data analyzing unit is used for obtaining information analyzed semantically, analyzing the information and obtaining an analysis result. The internet information collecting system has the advantages that by recognizing the information source, before the information is obtained, the information source meeting the requirement of the user is screened, needed information can be obtained systematically and comprehensively through the information source, and then the data reference and decision support are provided for the user needing the information.

Description

A kind of internet information acquisition system
Technical field
The present invention relates to internet arena, particularly relate to a kind of internet information acquisition system.
Background technology
Popularizing of internet brings huge quantity of information to all trades and professions, large data are also raw thereupon applying, large data (bigdata, megadata), or claim flood tide data, refer to and need new tupe just can have the magnanimity of stronger decision edge, clairvoyance and process optimization ability, high growth rate and diversified information assets.
Large small site number in internet is in necessarily, quantity of information under accumulation is huge especially, also exist quite huge about business opportunity in these information, the data of the aspects such as treatment, the overwhelming majority is distributed in each World Jam, in each space, in the interactive discussion spaces such as BLOG, data in these interaction space possess suitable value, possesses sizable reference value to a certain extent, each enterprises and institutions, government organs etc. also need the internet public opinion paid close attention in these spaces, for client provides the Orientation of internet public opinion timely, for Public Crisis public relations, spins etc. provide Data support.But also there is no to be retrieved as vocational cognition at present and the comparatively system of data reference and decision support and comprehensive infosystem are provided.
Summary of the invention
Now providing for the problems referred to above can compared with system and a kind of internet information acquisition system comprehensively obtaining internet information.
Concrete technical scheme is:
A kind of internet information acquisition system, wherein, comprising:
Information source recognition unit, for the key word inputted according to user, identifies the information source being associated with described key word, obtains the path of described information source;
Information acquisition unit, connects described information source recognition unit, for obtaining the information being associated with described information source according to described path;
Filter analysis unit, connects described information acquisition unit, for carrying out discriminance analysis to the described information gathered, filters the described information with described key word onrelevant relation;
Semantic analysis unit, connects described filter analysis unit, resolves for carrying out semanteme to the described information stored;
Data analysis unit, connects described semantic analysis unit, for obtaining the described information of resolving through described semanteme, and analyzing described information, obtaining analysis result.
Preferably, above-mentioned internet information acquisition system, wherein, described filter analysis unit comprises:
First identification module, for identifying the described information gathered, and classifies by preset classification according to the result identified;
Filtering module, connects described identification module, with filtering the described information with described key word onrelevant relation.
Preferably, above-mentioned internet information acquisition system, wherein, comprising:
Memory management unit, connects described filter analysis unit, for the described information after stored filter of classifying, and manages described information.
Preferably, above-mentioned internet information acquisition system, wherein, described memory management unit comprises:
A plurality of memory module, each described memory module is for storing the described information of a type;
Information classification module, connects described memory module, for classifying to described information according to pre-conditioned, and the described information identified is stored in corresponding described memory module.
Preferably, above-mentioned internet information acquisition system, wherein, described memory management unit comprises:
Information integration module, for screening out the described information repeated in the described information gathered;
Information searching module, connects described information integration module for retrieving according to user's input information the described information after screening out.
Preferably, above-mentioned internet information acquisition system, wherein, described semantic analysis unit comprises:
Second identification module, for identifying the content storing described information, is divided into language message and emotion information by the described information identified;
Language semantic is analyzed, and connects described second identification module, resolving, obtaining the first parsing semantic for carrying out semanteme to the described language message after screening;
Emotion semantic analysis, connects described second identification module, resolving, obtaining second and resolving semanteme for carrying out semanteme to the described emotion information after screening;
Preferably, above-mentioned internet information acquisition system, wherein, comprising:
Policing services unit, connects described data analysis unit, for supervising the described analysis result obtained;
Preferably, above-mentioned internet information acquisition system, wherein, comprising:
Report generation unit, connects described data analysis unit, for according to described analysis result, forms an analysis report by initialize format.
The invention has the beneficial effects as follows, can by the identification to information source, first by screening the information source meeting user and require before obtaining information, compared with system and can comprehensively obtain the information needed by information source, so for the user needed provide data with reference to and decision support.
Accompanying drawing explanation
Fig. 1 is general construction schematic diagram in the preferred embodiment of a kind of internet information acquisition system of the present invention;
Fig. 2-5 is part-structure schematic diagram in the preferred embodiment of a kind of internet information acquisition system of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite of not making creative work, all belongs to the scope of protection of the invention.
It should be noted that, when not conflicting, the embodiment in the present invention and the feature in embodiment can combine mutually.
Below in conjunction with the drawings and specific embodiments, the invention will be further described, but not as limiting to the invention.
As shown in Figure 1,
A kind of internet information acquisition system, wherein, comprising:
Information source recognition unit 1, for the key word inputted according to user, identifies the information source being associated with key word, the path in obtaining information source;
Information acquisition unit 2, link information identifing source unit 1, for obtaining the information being associated with information source according to path;
Filter analysis unit 3, link information collecting unit 2, for carrying out discriminance analysis to the information gathered, filters the information with key word onrelevant relation;
Semantic analysis unit 4, connects filter analysis unit 3, resolves for carrying out semanteme to the information stored;
Data analysis unit 5, connects semantic analysis unit 4, for obtaining the information of resolving through semanteme, and analyzing information, obtaining analysis result.
Native system is the identification by carrying out information source to user entered keyword, obtains the information being associated with information source, is one and possesses internet data crawl, the public sentiment disposal system of underlying semantics analysis and data analysis capabilities.It can provide: internet data Grasping skill, data analysis capabilities, and Data classification ability, analyzes data more accurately.Data mining ability, can carry out deeper analysis to the data of internet.
System needs towards each enterprises and institutions, government organs etc. the client paying close attention to internet public opinion, and for client provides the Orientation of internet public opinion timely, be Public Crisis public relations, spin etc. provide Data support.
In present pre-ferred embodiments, as shown in Figure 2, filter analysis unit 3 comprises:
First identification module 301, for identifying the information gathered, and classifies by preset classification according to the result identified;
Filtering module 302, linkage identification module, with filtering the information with key word onrelevant relation.
Whether comprise the first identification module 301 and filtering module 302 at filter analysis unit 3, for identifying the information gathered, according to being that advertisement is classified, be that advertising message is then filtered advertising message as what gather.
As shown in Figure 3,
In present pre-ferred embodiments, comprising:
Memory management unit, connect filter analysis unit 3, for the information after stored filter of classifying, and manage information, this unit facilitates user to the management of Information Monitoring.
On the basis of technique scheme, further, memory management unit comprises:
A plurality of memory module 501, each memory module 501 is for storing the information of a type;
Information classification module 502, connects memory module 501, for classifying to information according to pre-conditioned, and the information of identification is stored in corresponding memory module 501.
Can, by whether being classify to foodstuff, brand, complaint, suggestion etc., different classification be stored in different independently memory modules 501, to analyze for the information of filtering.
In present pre-ferred embodiments, as shown in Figure 4, memory management unit comprises:
Information integration module 503, for screening out the information repeated in the information of collection;
Information searching module 504, link information integrate module 503 is for retrieving according to user's input information the information after screening out.
Memory management unit also comprises information integration module 503 and the information repeated is carried out to integration and screened out, so that user is retrieved by information searching module 504.
In present pre-ferred embodiments, as shown in Figure 5, semantic analysis unit 4 comprises:
Second identification module 401, for identifying the content of the information of storage, is divided into language message and emotion information by the information of identification;
Language semantic analyzes 402, connects the second identification module 401, resolving, obtaining the first parsing semantic for carrying out semanteme to the language message after screening;
Emotion semantic analysis 403, connects the second identification module 401, resolving, obtaining second and resolving semanteme for carrying out semanteme to the emotion information after screening.
To with integrate after information carry out semantic analysis by semantic analysis unit 4, identify especially by the second identification module 401, the information of storage be divided into language message and emotion information, obtain first and resolve semantic and second resolve semanteme.User is excavated data according to the semanteme of resolving, and then obtains the information having commercial value needed for user.
In present pre-ferred embodiments, comprising:
Policing services unit, connection data analytic unit 5, for supervising the analysis result obtained.
In present pre-ferred embodiments, comprising:
Report generation unit, connection data analytic unit 5, for according to analysis result, forms an analysis report by initialize format.Facilitate user to obtain intuitively to be associated with the business analysis report of key word, to make business decision etc.
The foregoing is only preferred embodiment of the present invention; not thereby embodiments of the present invention and protection domain is limited; to those skilled in the art; should recognize and all should be included in the scheme that equivalent replacement done by all utilizations instructions of the present invention and diagramatic content and apparent change obtain in protection scope of the present invention.

Claims (8)

1. an internet information acquisition system, is characterized in that, comprising:
Information source recognition unit, for the key word inputted according to user, identifies the information source being associated with described key word, obtains the path of described information source;
Information acquisition unit, connects described information source recognition unit, for obtaining the information being associated with described information source according to described path;
Filter analysis unit, connects described information acquisition unit, for carrying out discriminance analysis to the described information gathered, filters the described information with described key word onrelevant relation;
Semantic analysis unit, connects described filter analysis unit, resolves for carrying out semanteme to the described information stored;
Data analysis unit, connects described semantic analysis unit, for obtaining the described information of resolving through described semanteme, and analyzing described information, obtaining analysis result.
2. internet information acquisition system as claimed in claim 1, it is characterized in that, described filter analysis unit comprises:
First identification module, for identifying the described information gathered, and classifies by preset classification according to the result identified;
Filtering module, connects described identification module, with filtering the described information with described key word onrelevant relation.
3. internet information acquisition system as claimed in claim 1, is characterized in that, comprising:
Memory management unit, connects described filter analysis unit, for the described information after stored filter of classifying, and manages described information.
4. internet information acquisition system as claimed in claim 3, it is characterized in that, described memory management unit comprises:
A plurality of memory module, each described memory module is for storing the described information of a type;
Information classification module, connects described memory module, for classifying to described information according to pre-conditioned, and the described information identified is stored in corresponding described memory module.
5. internet information acquisition system as claimed in claim 3, it is characterized in that, described memory management unit comprises:
Information integration module, for screening out the described information repeated in the described information gathered;
Information searching module, connects described information integration module for retrieving according to user's input information the described information after screening out.
6. internet information acquisition system as claimed in claim 1, it is characterized in that, described semantic analysis unit comprises:
Second identification module, for identifying the content storing described information, is divided into language message and emotion information by the described information identified;
Language semantic is analyzed, and connects described second identification module, resolving, obtaining the first parsing semantic for carrying out semanteme to the described language message after screening;
Emotion semantic analysis, connects described second identification module, resolving, obtaining second and resolving semanteme for carrying out semanteme to the described emotion information after screening.
7. internet information acquisition system as claimed in claim 1, is characterized in that, comprising:
Policing services unit, connects described data analysis unit, for supervising the described analysis result obtained.
8. internet information acquisition system as claimed in claim 1, is characterized in that, comprising:
Report generation unit, connects described data analysis unit, for according to described analysis result, forms an analysis report by initialize format.
CN201511032832.2A 2015-12-31 2015-12-31 Internet information collecting system Pending CN105447202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511032832.2A CN105447202A (en) 2015-12-31 2015-12-31 Internet information collecting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511032832.2A CN105447202A (en) 2015-12-31 2015-12-31 Internet information collecting system

Publications (1)

Publication Number Publication Date
CN105447202A true CN105447202A (en) 2016-03-30

Family

ID=55557378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511032832.2A Pending CN105447202A (en) 2015-12-31 2015-12-31 Internet information collecting system

Country Status (1)

Country Link
CN (1) CN105447202A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153865A (en) * 2017-12-22 2018-06-12 中山市小榄企业服务有限公司 A kind of network application acquisition system of internet

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220908A1 (en) * 2002-05-21 2003-11-27 Bridgewell Inc. Automatic knowledge management system
CN102346772A (en) * 2011-09-23 2012-02-08 王楠 Directional acquisition system based on OWL (ontology web language) semantic analysis
CN103176985A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Timely and high-efficiency crawling method for internet information
CN103473369A (en) * 2013-09-27 2013-12-25 清华大学 Semantic-based information acquisition method and semantic-based information acquisition system
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN103744877A (en) * 2013-12-20 2014-04-23 潘大庆 Public opinion monitoring application system deployed in internet and application method
CN103778200A (en) * 2014-01-09 2014-05-07 中国科学院计算技术研究所 Method for extracting information source of message and system thereof
CN104009970A (en) * 2013-09-17 2014-08-27 宁波公众信息产业有限公司 Network information acquisition method
CN104182389A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Semantic-based big data analysis business intelligence service system
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220908A1 (en) * 2002-05-21 2003-11-27 Bridgewell Inc. Automatic knowledge management system
CN102346772A (en) * 2011-09-23 2012-02-08 王楠 Directional acquisition system based on OWL (ontology web language) semantic analysis
CN103176985A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Timely and high-efficiency crawling method for internet information
CN104009970A (en) * 2013-09-17 2014-08-27 宁波公众信息产业有限公司 Network information acquisition method
CN103473369A (en) * 2013-09-27 2013-12-25 清华大学 Semantic-based information acquisition method and semantic-based information acquisition system
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN103744877A (en) * 2013-12-20 2014-04-23 潘大庆 Public opinion monitoring application system deployed in internet and application method
CN103778200A (en) * 2014-01-09 2014-05-07 中国科学院计算技术研究所 Method for extracting information source of message and system thereof
CN104182389A (en) * 2014-07-21 2014-12-03 安徽华贞信息科技有限公司 Semantic-based big data analysis business intelligence service system
CN104933093A (en) * 2015-05-19 2015-09-23 武汉泰迪智慧科技有限公司 Regional public opinion monitoring and decision-making auxiliary system and method based on big data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153865A (en) * 2017-12-22 2018-06-12 中山市小榄企业服务有限公司 A kind of network application acquisition system of internet

Similar Documents

Publication Publication Date Title
CN102831220B (en) Subject-oriented customized news information extraction system
CN105468744B (en) Big data platform for realizing tax public opinion analysis and full text retrieval
CN104504150A (en) News public opinion monitoring system
CN104951512A (en) Public sentiment data collection method and system based on Internet
CN102542061B (en) Intelligent product classification method
CN108897778B (en) Image annotation method based on multi-source big data analysis
CN104573016A (en) System and method for analyzing vertical public opinions based on industry
CN103488635A (en) Method and device for acquiring product information
KR20160075971A (en) Big data management system for public complaints services
CN105677802A (en) Internet information analysis system
CN104504151A (en) Public opinion monitoring system of Wechat
CN111414520A (en) Intelligent mining system for sensitive information in public opinion information
Al-Najran et al. A requirements specification framework for big data collection and capture
CN104598561A (en) Text-based intelligent agricultural video classification method and text-based intelligent agricultural video classification system
US20140280150A1 (en) Multi-source contextual information item grouping for document analysis
US20190384812A1 (en) Portfolio-based text analytics tool
CN107315799A (en) A kind of internet duplicate message screening technique and system
Jaiswal et al. Data Mining Techniques and Knowledge Discovery Database
CN106250405A (en) A kind of magnanimity information processing system
US20200073871A1 (en) A system for managing, analyzing, navigating or searching of data information across one or more sources within a computer or a computer network, without copying, moving or manipulating the source or the data information stored in the source
CN105447202A (en) Internet information collecting system
KR101718599B1 (en) System for analyzing social media data and method for analyzing social media data using the same
CN111368550A (en) Public opinion information management system
Kotiyal et al. Big Data Preprocessing Phase in Engendering Quality Data
KR20210045172A (en) Big Data Management and System for Livestock Disease Outbreak Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160330