CN106779080A - A kind of people information knowledge base method for auto constructing - Google Patents
A kind of people information knowledge base method for auto constructing Download PDFInfo
- Publication number
- CN106779080A CN106779080A CN201710026230.9A CN201710026230A CN106779080A CN 106779080 A CN106779080 A CN 106779080A CN 201710026230 A CN201710026230 A CN 201710026230A CN 106779080 A CN106779080 A CN 106779080A
- Authority
- CN
- China
- Prior art keywords
- data
- people information
- information
- personage
- people
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of people information knowledge base method for auto constructing, including:People information body is built, people information body is improved;Constantly crawl external dynamic is updated, reliable data source is parsed, using character features data, grouped data identification and the individual related information data of personage in people information body;Processed using the data for extracting, obtain people information carries out improving people information body to people information body;Reliable knowledge services are provided using the people information body of constantly improve.The present invention mainly utilizes existing resource, built with reference to computer technology and Ontological concept and improve personage's knowledge base, so as to provide the knowledge services of people information, so as to solve the inadequate orderliness of people information run into during public figure's information retrieval, the problems such as character news are repeated, produced ambiguity because name is identical.
Description
Technical field
The present invention relates to technical field of information retrieval, more particularly to a kind of people information knowledge base method for auto constructing.
Background technology
More and more huger with internet scale, when nowadays carrying out information retrieval on the internet, being no longer can
The problem of oneself desired content is found, but quickly accurate determining can find the problem of oneself desired personage.
Personage as social activities main body, many times it should be understood that the relevant information of personage, such as writer
The related information of thing.But internet information source is excessively numerous and diverse, the explosive information for increasing in individual media epoch is all main
The information of sight, lacks certain justice, and accuracy can not be protected, and accurate acquisition of information to personage forms
Certain interference.Newsmaker's relevant information is collected at present is mainly derived from personage's correlation official website, related news report etc.,
Lack the knowledge base of people information, the retrieval of personage's related news is carried out by search engine, as a result contain substantial amounts of interference
, it is necessary to artificially screen the related news information of personage in numerous matching results, under efficiency is relatively low.For example:Using hundred
Degree search " Zhang Wei " news, display related news there are about 138000, wherein first page 3 amount to 60 record in, occur as soon as
23 different personages, not only containing mathematician " Zhang Wei ", economic analyst " Zhang Wei ", reporter " Zhang Wei " etc. also includes being similar to
The news of the personage such as " Zhang Weili ", " Zhang Weihua ", " big Zhang Wei ".It is therefore desirable to set up a biographical information storehouse for automation,
Offer personage is enriching, accurate information.
Lack a knowledge base for perfect people information at present, the cognition of personage is mainly derived from encyclopaedic knowledge storehouse,
News report and some from media platform.When the retrieval of people information is carried out, it is required for taking a substantial amount of time every time
Collect personage related data, information, in addition it is also necessary to carry out manual examination and verification to the content that these are collected, removal repeats, it is with a low credibility,
Ambiguous content.
Relatively good people information knowledge base mainly some network encyclopaedic knowledge storehouses, such as Baidupedia, dimension are built at present
Base encyclopaedia, interactive encyclopaedia etc..These encyclopaedic knowledge storehouses, using personage as a knowledge entry, by the spontaneous participation of Internet user
To the writing of people information, during improving.But this mode depend on masses spontaneous participation, entry it is accurate
Degree, reliability, it is comprehensive it cannot be guaranteed that, often occur personage's attention rate decline after, the information of these personages will lack more
The new power Stop message safeguarded updates.
Major news websites are more accurately mainly to people information description at present, but these websites are to character news
The report of information is not continuous, and the change for depending on social news to pay close attention to, the content to people information can also change, general right
The content of people information is nor very comprehensive.And many news websites often carry out cross reference to the content of people information,
Therefore the content of repeatability occurs.Other news website, with ageing, seldom forms people to the content description of character news
The continuous news report of thing.
The content of the invention
Weak point present in regarding to the issue above, the present invention provides a kind of people information knowledge base side of structure automatically
Method.
To achieve the above object, the present invention provides a kind of people information knowledge base method for auto constructing, including:
Step 1, structure people information body:The characteristics of gather data, analysis human classification, character attribute, character relation,
The feature of personage body display, structure, storage, representation with reference to body build people information body frame;
Step 2, improve people information body:As needed from all kinds of encyclopaedias, news and already present personality resource storehouse
Middle extraction personage's related data, carries out dissection process later to instances of ontology, creates the personage's individuality in body and is formed
Initial people information body;
Step 3, crawl external resource parsing:Constantly crawl external dynamic is updated, reliable data source is parsed, and is utilized
Character features data, grouped data identification and the individual related information data of personage in people information body;
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to personage
Information Ontology carries out perfect;
Step 5, offer people information knowledge services:Reliable knowledge is provided using the people information body of constantly improve
Service.
As a further improvement on the present invention, the step 2 includes:
Step 21, from internet page capture initial data, initial data include web page tag, advertisement, initial data is entered
Row dissection process obtains primary data;
Step 22, the basic structure according to the primary data combination body after parsing, the structure of webpage instantiate body,
The personage created in body is individual;
Step 23, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, form personage and believe substantially
Breath, people information decimation rule storehouse, character features data and grouped data.
As a further improvement on the present invention, in step 23, personage's essential information is used to provide knowledge services, later stage meeting
Carry out continuous renewal perfect;People information decimation rule storehouse carries out machine learning by the people information for obtaining, and is constantly learnt
It is perfect, for extracting personage's essential information subsequently from the text of Un-structured;The characteristic of personage is used to bear the same name multiple
Personage carries out disambiguation calculating, distinguishes persons with name duplication;Grouped data is used to sort out follow-up ataxonomic data, auxiliary weight
The personage of name personage's disambiguation and follow-up new establishment is individual to be sorted out.
As a further improvement on the present invention, the step 3 includes:
Step 31, timing or trigger-type ground crawl web data, and parse extraction data therein;
Step 32, using initial people information body, the data that timing parsing is extracted are filtered, person recognition,
Identify the data relevant with people information body.
As a further improvement on the present invention, the step 4 includes:
Step 41, the data to extracting are classified, are filtered, duplicate removal;
The existing information of step 42, basis and the data of setting are evaluated the information for obtaining, and obtain final information;
Step 43, the result according to evaluation, in the information supplement people information body that will be obtained, or are commented using confidence level
Valency information high replaces the relatively low information of confidence level.
As a further improvement on the present invention, in steps of 5, reliable knowledge services include:Personage's essential information is provided
Service, there is provided character relation expansion service, there is provided similar personage's expansion service, there is provided knowledge services and carry that simple rule is inquired about
For the dynamic time shaft of personage.
Compared with prior art, beneficial effects of the present invention are:
A kind of people information knowledge base method for auto constructing disclosed by the invention, it mainly utilizes existing resource, with reference to
Computer technology and Ontological concept build and improve personage's knowledge base, so that the knowledge services of people information are provided, so as to solve public affairs
The inadequate orderliness of people information run into during many retrieval of person's information, character news repeat, the problems such as produced ambiguity because name is identical.
Brief description of the drawings
Fig. 1 is the flow chart of people information knowledge base method for auto constructing disclosed in an embodiment of the present invention;
Fig. 2 is people information body frame figure disclosed in an embodiment of the present invention.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
The present invention is described in further detail below in conjunction with the accompanying drawings:
As shown in figure 1, the present invention provides a kind of people information knowledge base method for auto constructing, including:
Step 1, structure people information body:Need gather data, analysis human classification, character attribute, character relation
The feature of feature and personage body display, structure, storage, representation with reference to body build people information body frame
Frame, people information body frame is as shown in Figure 2.
Step 2, improve people information body:Need as needed from all kinds of encyclopaedias, news and already present people's goods and materials
Personage's related data is extracted in the storehouse of source, data are carried out with dissection process later to instances of ontology, create the personage in body
Body and the initial people information body of formation;Wherein:
A, the initial data captured from the internet page of comparison rule, initial data include the contents such as web page tag, advertisement,
Dissection process is needed to obtain primary data;
B, the basic structure of data combination body according to parsing, the structure of webpage instantiate body, in creating body
Personage it is individual;
C, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, form personage's essential information, people
Thing information extraction rules storehouse, character features data, grouped data etc..Wherein, the essential information of personage is used to provide knowledge services,
It is perfect that later stage can carry out continuous renewal.People information decimation rule storehouse carries out machine learning by the people information for obtaining, and constantly enters
Row study is perfect, for extracting personage's essential information subsequently from the text of Un-structured.The characteristic of personage is used for many
Individual persons with name duplication carries out disambiguation calculating, distinguishes persons with name duplication.Grouped data is used to sort out follow-up ataxonomic data,
The personage of auxiliary persons with name duplication disambiguation and follow-up new establishment is individual to be sorted out.
Step 3, crawl external resource parsing:Need constantly the renewal of crawl external dynamic, reliable data source such as news, recognize
The data such as card microblogging, are parsed;Using character features data, grouped data identification and personage body phase in people information body
The information data of pass;Wherein:
A, timing or trigger-type ground crawl web data, and parse extraction data therein;
B, using initial people information body, the data that timing parsing is extracted are filtered, person recognition is identified
The data relevant with people information body.
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to personage
Information Ontology carries out perfect;Wherein;
A, needs are classified to the data for extracting, are filtered, duplicate removal;
B, needs are evaluated the information for obtaining according to the data of existing information and setting, obtain final information;
C, the result according to evaluation, it is in the information supplement people information body that will be obtained or high using trust evaluation
Information replace the relatively low information of confidence level.
Step 5, offer people information knowledge services:Reliable knowledge is provided using the people information body of constantly improve
Service;Wherein, reliable knowledge services include:There is provided the service of personage's essential information, there is provided character relation expansion service, there is provided phase
Like personage's expansion service, there is provided the knowledge services and the offer dynamic time shaft of personage of simple rule inquiry.
A kind of people information knowledge base method for auto constructing disclosed by the invention, it mainly utilizes existing resource, with reference to
Computer technology and Ontological concept build and improve personage's knowledge base, so that the knowledge services of people information are provided, so as to solve public affairs
The inadequate orderliness of people information run into during many retrieval of person's information, character news repeat, the problems such as produced ambiguity because name is identical.
The preferred embodiments of the present invention are these are only, is not intended to limit the invention, for those skilled in the art
For member, the present invention can have various modifications and variations.All any modifications within the spirit and principles in the present invention, made,
Equivalent, improvement etc., should be included within the scope of the present invention.
Claims (6)
1. a kind of people information knowledge base method for auto constructing, it is characterised in that including:
Step 1, structure people information body:The characteristics of gather data, analysis human classification, character attribute, character relation, personage
The feature of individual body display, structure, storage, representation with reference to body build people information body frame;
Step 2, improve people information body:Taken out from all kinds of encyclopaedias, news and already present personality resource storehouse as needed
Personage's related data is taken, dissection process is carried out later to instances of ontology, the personage's individuality in body is created and is formed initial
People information body;
Step 3, crawl external resource parsing:Constantly crawl external dynamic is updated, reliable data source is parsed, using personage
Characteristic, grouped data identification and the individual related information data of personage in people information body;
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to people information
Body carries out perfect;
Step 5, offer people information knowledge services:Reliable knowledge clothes are provided using the people information body of constantly improve
Business.
2. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 2 includes:
Step 21, from internet page capture initial data, initial data include web page tag, advertisement, initial data is solved
Analysis treatment obtains primary data;
Step 22, the basic structure according to the primary data combination body after parsing, the structure of webpage instantiate body, create
Personage in body is individual;
Step 23, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, formed personage's essential information,
People information decimation rule storehouse, character features data and grouped data.
3. people information knowledge base method for auto constructing as claimed in claim 2, it is characterised in that in step 23, personage
Essential information is used to provide knowledge services, and it is perfect that the later stage can carry out continuous renewal;People information decimation rule storehouse is by the people that obtains
Thing information carries out machine learning, constantly carry out study it is perfect, for subsequently from the text of Un-structured extract personage believe substantially
Breath;The characteristic of personage is used to carry out multiple persons with name duplication disambiguation calculating, distinguishes persons with name duplication;Grouped data is used for rear
Continuous ataxonomic data are sorted out, and the personage of auxiliary persons with name duplication disambiguation and follow-up new establishment is individual to be sorted out.
4. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 3 includes:
Step 31, timing or trigger-type ground crawl web data, and parse extraction data therein;
Step 32, using initial people information body, the data that timing parsing is extracted are filtered, person recognition, identification
Go out the data relevant with people information body.
5. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 4 includes:
Step 41, the data to extracting are classified, are filtered, duplicate removal;
The existing information of step 42, basis and the data of setting are evaluated the information for obtaining, and obtain final information;
Step 43, the result according to evaluation, it is in the information supplement people information body that will be obtained or high using trust evaluation
Information replace the relatively low information of confidence level.
6. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that in steps of 5, reliably
Knowledge services include:There is provided the service of personage's essential information, there is provided character relation expansion service, there is provided similar personage's expansion service,
The knowledge services of simple rule inquiry are provided and the dynamic time shaft of personage is provided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710026230.9A CN106779080A (en) | 2017-01-13 | 2017-01-13 | A kind of people information knowledge base method for auto constructing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710026230.9A CN106779080A (en) | 2017-01-13 | 2017-01-13 | A kind of people information knowledge base method for auto constructing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106779080A true CN106779080A (en) | 2017-05-31 |
Family
ID=58946771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710026230.9A Pending CN106779080A (en) | 2017-01-13 | 2017-01-13 | A kind of people information knowledge base method for auto constructing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106779080A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255806A (en) * | 2017-12-22 | 2018-07-06 | 北京奇艺世纪科技有限公司 | A kind of name recognition methods and device |
CN109241307A (en) * | 2018-08-29 | 2019-01-18 | 山东浪潮商用系统有限公司 | A kind of performers and clerks' contents management method and system |
CN110717091A (en) * | 2019-09-16 | 2020-01-21 | 苏宁云计算有限公司 | Entry data expansion method and device based on face recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3466630B2 (en) * | 1991-04-12 | 2003-11-17 | 株式会社東芝 | Information communication device |
CN101286148A (en) * | 2007-04-12 | 2008-10-15 | 上海思阔雅软件有限公司 | Computer Chinese characters knowledge base collection system based on text fragment |
CN102467531A (en) * | 2010-11-12 | 2012-05-23 | 中国科学院计算机网络信息中心 | Continuously searching method of scientific research information based on character attributes |
-
2017
- 2017-01-13 CN CN201710026230.9A patent/CN106779080A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3466630B2 (en) * | 1991-04-12 | 2003-11-17 | 株式会社東芝 | Information communication device |
CN101286148A (en) * | 2007-04-12 | 2008-10-15 | 上海思阔雅软件有限公司 | Computer Chinese characters knowledge base collection system based on text fragment |
CN102467531A (en) * | 2010-11-12 | 2012-05-23 | 中国科学院计算机网络信息中心 | Continuously searching method of scientific research information based on character attributes |
Non-Patent Citations (1)
Title |
---|
刁丽娟: ""通用本体学习方法及其应用的关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255806A (en) * | 2017-12-22 | 2018-07-06 | 北京奇艺世纪科技有限公司 | A kind of name recognition methods and device |
CN108255806B (en) * | 2017-12-22 | 2021-12-17 | 北京奇艺世纪科技有限公司 | Name recognition method and device |
CN109241307A (en) * | 2018-08-29 | 2019-01-18 | 山东浪潮商用系统有限公司 | A kind of performers and clerks' contents management method and system |
CN110717091A (en) * | 2019-09-16 | 2020-01-21 | 苏宁云计算有限公司 | Entry data expansion method and device based on face recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325165B (en) | Network public opinion analysis method, device and storage medium | |
CN104899508B (en) | A kind of multistage detection method for phishing site and system | |
CN107437038B (en) | Webpage tampering detection method and device | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN107544988B (en) | Method and device for acquiring public opinion data | |
Kumar et al. | Analysis of various machine learning algorithms for enhanced opinion mining using twitter data streams | |
US10387805B2 (en) | System and method for ranking news feeds | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
Kim et al. | Event diffusion patterns in social media | |
CN106250402B (en) | Website classification method and device | |
CN104424308A (en) | Web page classification standard acquisition method and device and web page classification method and device | |
CN106844786A (en) | A kind of public sentiment region focus based on text similarity finds method | |
CN101515272A (en) | Method and device for extracting webpage content | |
CN104899335A (en) | Method for performing sentiment classification on network public sentiment of information | |
CN109918648B (en) | Rumor depth detection method based on dynamic sliding window feature score | |
CN114915468B (en) | Intelligent analysis and detection method for network crime based on knowledge graph | |
CN112149422B (en) | Dynamic enterprise news monitoring method based on natural language | |
Chumwatana | Using sentiment analysis technique for analyzing Thai customer satisfaction from social media | |
CN115238688B (en) | Method, device, equipment and storage medium for analyzing association relation of electronic information data | |
CN106779080A (en) | A kind of people information knowledge base method for auto constructing | |
WO2015062377A1 (en) | Device and method for detecting similar text, and application | |
CN106933878B (en) | Information processing method and device | |
CN106202312B (en) | A kind of interest point search method and system for mobile Internet | |
CN106126495B (en) | One kind being based on large-scale corpus prompter method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |