CN106779080A - A kind of people information knowledge base method for auto constructing - Google Patents

A kind of people information knowledge base method for auto constructing Download PDF

Info

Publication number
CN106779080A
CN106779080A CN201710026230.9A CN201710026230A CN106779080A CN 106779080 A CN106779080 A CN 106779080A CN 201710026230 A CN201710026230 A CN 201710026230A CN 106779080 A CN106779080 A CN 106779080A
Authority
CN
China
Prior art keywords
data
people information
information
personage
people
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710026230.9A
Other languages
Chinese (zh)
Inventor
刘永坚
白立华
李文忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN UNIVERSITY OF TECHNOLOGY COMMUNICATION ENGINEERING Co Ltd
Original Assignee
WUHAN UNIVERSITY OF TECHNOLOGY COMMUNICATION ENGINEERING Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN UNIVERSITY OF TECHNOLOGY COMMUNICATION ENGINEERING Co Ltd filed Critical WUHAN UNIVERSITY OF TECHNOLOGY COMMUNICATION ENGINEERING Co Ltd
Priority to CN201710026230.9A priority Critical patent/CN106779080A/en
Publication of CN106779080A publication Critical patent/CN106779080A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of people information knowledge base method for auto constructing, including:People information body is built, people information body is improved;Constantly crawl external dynamic is updated, reliable data source is parsed, using character features data, grouped data identification and the individual related information data of personage in people information body;Processed using the data for extracting, obtain people information carries out improving people information body to people information body;Reliable knowledge services are provided using the people information body of constantly improve.The present invention mainly utilizes existing resource, built with reference to computer technology and Ontological concept and improve personage's knowledge base, so as to provide the knowledge services of people information, so as to solve the inadequate orderliness of people information run into during public figure's information retrieval, the problems such as character news are repeated, produced ambiguity because name is identical.

Description

A kind of people information knowledge base method for auto constructing
Technical field
The present invention relates to technical field of information retrieval, more particularly to a kind of people information knowledge base method for auto constructing.
Background technology
More and more huger with internet scale, when nowadays carrying out information retrieval on the internet, being no longer can The problem of oneself desired content is found, but quickly accurate determining can find the problem of oneself desired personage.
Personage as social activities main body, many times it should be understood that the relevant information of personage, such as writer The related information of thing.But internet information source is excessively numerous and diverse, the explosive information for increasing in individual media epoch is all main The information of sight, lacks certain justice, and accuracy can not be protected, and accurate acquisition of information to personage forms Certain interference.Newsmaker's relevant information is collected at present is mainly derived from personage's correlation official website, related news report etc., Lack the knowledge base of people information, the retrieval of personage's related news is carried out by search engine, as a result contain substantial amounts of interference , it is necessary to artificially screen the related news information of personage in numerous matching results, under efficiency is relatively low.For example:Using hundred Degree search " Zhang Wei " news, display related news there are about 138000, wherein first page 3 amount to 60 record in, occur as soon as 23 different personages, not only containing mathematician " Zhang Wei ", economic analyst " Zhang Wei ", reporter " Zhang Wei " etc. also includes being similar to The news of the personage such as " Zhang Weili ", " Zhang Weihua ", " big Zhang Wei ".It is therefore desirable to set up a biographical information storehouse for automation, Offer personage is enriching, accurate information.
Lack a knowledge base for perfect people information at present, the cognition of personage is mainly derived from encyclopaedic knowledge storehouse, News report and some from media platform.When the retrieval of people information is carried out, it is required for taking a substantial amount of time every time Collect personage related data, information, in addition it is also necessary to carry out manual examination and verification to the content that these are collected, removal repeats, it is with a low credibility, Ambiguous content.
Relatively good people information knowledge base mainly some network encyclopaedic knowledge storehouses, such as Baidupedia, dimension are built at present Base encyclopaedia, interactive encyclopaedia etc..These encyclopaedic knowledge storehouses, using personage as a knowledge entry, by the spontaneous participation of Internet user To the writing of people information, during improving.But this mode depend on masses spontaneous participation, entry it is accurate Degree, reliability, it is comprehensive it cannot be guaranteed that, often occur personage's attention rate decline after, the information of these personages will lack more The new power Stop message safeguarded updates.
Major news websites are more accurately mainly to people information description at present, but these websites are to character news The report of information is not continuous, and the change for depending on social news to pay close attention to, the content to people information can also change, general right The content of people information is nor very comprehensive.And many news websites often carry out cross reference to the content of people information, Therefore the content of repeatability occurs.Other news website, with ageing, seldom forms people to the content description of character news The continuous news report of thing.
The content of the invention
Weak point present in regarding to the issue above, the present invention provides a kind of people information knowledge base side of structure automatically Method.
To achieve the above object, the present invention provides a kind of people information knowledge base method for auto constructing, including:
Step 1, structure people information body:The characteristics of gather data, analysis human classification, character attribute, character relation, The feature of personage body display, structure, storage, representation with reference to body build people information body frame;
Step 2, improve people information body:As needed from all kinds of encyclopaedias, news and already present personality resource storehouse Middle extraction personage's related data, carries out dissection process later to instances of ontology, creates the personage's individuality in body and is formed Initial people information body;
Step 3, crawl external resource parsing:Constantly crawl external dynamic is updated, reliable data source is parsed, and is utilized Character features data, grouped data identification and the individual related information data of personage in people information body;
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to personage Information Ontology carries out perfect;
Step 5, offer people information knowledge services:Reliable knowledge is provided using the people information body of constantly improve Service.
As a further improvement on the present invention, the step 2 includes:
Step 21, from internet page capture initial data, initial data include web page tag, advertisement, initial data is entered Row dissection process obtains primary data;
Step 22, the basic structure according to the primary data combination body after parsing, the structure of webpage instantiate body, The personage created in body is individual;
Step 23, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, form personage and believe substantially Breath, people information decimation rule storehouse, character features data and grouped data.
As a further improvement on the present invention, in step 23, personage's essential information is used to provide knowledge services, later stage meeting Carry out continuous renewal perfect;People information decimation rule storehouse carries out machine learning by the people information for obtaining, and is constantly learnt It is perfect, for extracting personage's essential information subsequently from the text of Un-structured;The characteristic of personage is used to bear the same name multiple Personage carries out disambiguation calculating, distinguishes persons with name duplication;Grouped data is used to sort out follow-up ataxonomic data, auxiliary weight The personage of name personage's disambiguation and follow-up new establishment is individual to be sorted out.
As a further improvement on the present invention, the step 3 includes:
Step 31, timing or trigger-type ground crawl web data, and parse extraction data therein;
Step 32, using initial people information body, the data that timing parsing is extracted are filtered, person recognition, Identify the data relevant with people information body.
As a further improvement on the present invention, the step 4 includes:
Step 41, the data to extracting are classified, are filtered, duplicate removal;
The existing information of step 42, basis and the data of setting are evaluated the information for obtaining, and obtain final information;
Step 43, the result according to evaluation, in the information supplement people information body that will be obtained, or are commented using confidence level Valency information high replaces the relatively low information of confidence level.
As a further improvement on the present invention, in steps of 5, reliable knowledge services include:Personage's essential information is provided Service, there is provided character relation expansion service, there is provided similar personage's expansion service, there is provided knowledge services and carry that simple rule is inquired about For the dynamic time shaft of personage.
Compared with prior art, beneficial effects of the present invention are:
A kind of people information knowledge base method for auto constructing disclosed by the invention, it mainly utilizes existing resource, with reference to Computer technology and Ontological concept build and improve personage's knowledge base, so that the knowledge services of people information are provided, so as to solve public affairs The inadequate orderliness of people information run into during many retrieval of person's information, character news repeat, the problems such as produced ambiguity because name is identical.
Brief description of the drawings
Fig. 1 is the flow chart of people information knowledge base method for auto constructing disclosed in an embodiment of the present invention;
Fig. 2 is people information body frame figure disclosed in an embodiment of the present invention.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
The present invention is described in further detail below in conjunction with the accompanying drawings:
As shown in figure 1, the present invention provides a kind of people information knowledge base method for auto constructing, including:
Step 1, structure people information body:Need gather data, analysis human classification, character attribute, character relation The feature of feature and personage body display, structure, storage, representation with reference to body build people information body frame Frame, people information body frame is as shown in Figure 2.
Step 2, improve people information body:Need as needed from all kinds of encyclopaedias, news and already present people's goods and materials Personage's related data is extracted in the storehouse of source, data are carried out with dissection process later to instances of ontology, create the personage in body Body and the initial people information body of formation;Wherein:
A, the initial data captured from the internet page of comparison rule, initial data include the contents such as web page tag, advertisement, Dissection process is needed to obtain primary data;
B, the basic structure of data combination body according to parsing, the structure of webpage instantiate body, in creating body Personage it is individual;
C, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, form personage's essential information, people Thing information extraction rules storehouse, character features data, grouped data etc..Wherein, the essential information of personage is used to provide knowledge services, It is perfect that later stage can carry out continuous renewal.People information decimation rule storehouse carries out machine learning by the people information for obtaining, and constantly enters Row study is perfect, for extracting personage's essential information subsequently from the text of Un-structured.The characteristic of personage is used for many Individual persons with name duplication carries out disambiguation calculating, distinguishes persons with name duplication.Grouped data is used to sort out follow-up ataxonomic data, The personage of auxiliary persons with name duplication disambiguation and follow-up new establishment is individual to be sorted out.
Step 3, crawl external resource parsing:Need constantly the renewal of crawl external dynamic, reliable data source such as news, recognize The data such as card microblogging, are parsed;Using character features data, grouped data identification and personage body phase in people information body The information data of pass;Wherein:
A, timing or trigger-type ground crawl web data, and parse extraction data therein;
B, using initial people information body, the data that timing parsing is extracted are filtered, person recognition is identified The data relevant with people information body.
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to personage Information Ontology carries out perfect;Wherein;
A, needs are classified to the data for extracting, are filtered, duplicate removal;
B, needs are evaluated the information for obtaining according to the data of existing information and setting, obtain final information;
C, the result according to evaluation, it is in the information supplement people information body that will be obtained or high using trust evaluation Information replace the relatively low information of confidence level.
Step 5, offer people information knowledge services:Reliable knowledge is provided using the people information body of constantly improve Service;Wherein, reliable knowledge services include:There is provided the service of personage's essential information, there is provided character relation expansion service, there is provided phase Like personage's expansion service, there is provided the knowledge services and the offer dynamic time shaft of personage of simple rule inquiry.
A kind of people information knowledge base method for auto constructing disclosed by the invention, it mainly utilizes existing resource, with reference to Computer technology and Ontological concept build and improve personage's knowledge base, so that the knowledge services of people information are provided, so as to solve public affairs The inadequate orderliness of people information run into during many retrieval of person's information, character news repeat, the problems such as produced ambiguity because name is identical.
The preferred embodiments of the present invention are these are only, is not intended to limit the invention, for those skilled in the art For member, the present invention can have various modifications and variations.All any modifications within the spirit and principles in the present invention, made, Equivalent, improvement etc., should be included within the scope of the present invention.

Claims (6)

1. a kind of people information knowledge base method for auto constructing, it is characterised in that including:
Step 1, structure people information body:The characteristics of gather data, analysis human classification, character attribute, character relation, personage The feature of individual body display, structure, storage, representation with reference to body build people information body frame;
Step 2, improve people information body:Taken out from all kinds of encyclopaedias, news and already present personality resource storehouse as needed Personage's related data is taken, dissection process is carried out later to instances of ontology, the personage's individuality in body is created and is formed initial People information body;
Step 3, crawl external resource parsing:Constantly crawl external dynamic is updated, reliable data source is parsed, using personage Characteristic, grouped data identification and the individual related information data of personage in people information body;
Step 4, renolation people information body:Processed using the data for extracting, obtained people information to people information Body carries out perfect;
Step 5, offer people information knowledge services:Reliable knowledge clothes are provided using the people information body of constantly improve Business.
2. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 2 includes:
Step 21, from internet page capture initial data, initial data include web page tag, advertisement, initial data is solved Analysis treatment obtains primary data;
Step 22, the basic structure according to the primary data combination body after parsing, the structure of webpage instantiate body, create Personage in body is individual;
Step 23, maintenance adjustment is carried out to body according to the result of instantiation, using data are built, formed personage's essential information, People information decimation rule storehouse, character features data and grouped data.
3. people information knowledge base method for auto constructing as claimed in claim 2, it is characterised in that in step 23, personage Essential information is used to provide knowledge services, and it is perfect that the later stage can carry out continuous renewal;People information decimation rule storehouse is by the people that obtains Thing information carries out machine learning, constantly carry out study it is perfect, for subsequently from the text of Un-structured extract personage believe substantially Breath;The characteristic of personage is used to carry out multiple persons with name duplication disambiguation calculating, distinguishes persons with name duplication;Grouped data is used for rear Continuous ataxonomic data are sorted out, and the personage of auxiliary persons with name duplication disambiguation and follow-up new establishment is individual to be sorted out.
4. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 3 includes:
Step 31, timing or trigger-type ground crawl web data, and parse extraction data therein;
Step 32, using initial people information body, the data that timing parsing is extracted are filtered, person recognition, identification Go out the data relevant with people information body.
5. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that the step 4 includes:
Step 41, the data to extracting are classified, are filtered, duplicate removal;
The existing information of step 42, basis and the data of setting are evaluated the information for obtaining, and obtain final information;
Step 43, the result according to evaluation, it is in the information supplement people information body that will be obtained or high using trust evaluation Information replace the relatively low information of confidence level.
6. people information knowledge base method for auto constructing as claimed in claim 1, it is characterised in that in steps of 5, reliably Knowledge services include:There is provided the service of personage's essential information, there is provided character relation expansion service, there is provided similar personage's expansion service, The knowledge services of simple rule inquiry are provided and the dynamic time shaft of personage is provided.
CN201710026230.9A 2017-01-13 2017-01-13 A kind of people information knowledge base method for auto constructing Pending CN106779080A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710026230.9A CN106779080A (en) 2017-01-13 2017-01-13 A kind of people information knowledge base method for auto constructing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710026230.9A CN106779080A (en) 2017-01-13 2017-01-13 A kind of people information knowledge base method for auto constructing

Publications (1)

Publication Number Publication Date
CN106779080A true CN106779080A (en) 2017-05-31

Family

ID=58946771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710026230.9A Pending CN106779080A (en) 2017-01-13 2017-01-13 A kind of people information knowledge base method for auto constructing

Country Status (1)

Country Link
CN (1) CN106779080A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN109241307A (en) * 2018-08-29 2019-01-18 山东浪潮商用系统有限公司 A kind of performers and clerks' contents management method and system
CN110717091A (en) * 2019-09-16 2020-01-21 苏宁云计算有限公司 Entry data expansion method and device based on face recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3466630B2 (en) * 1991-04-12 2003-11-17 株式会社東芝 Information communication device
CN101286148A (en) * 2007-04-12 2008-10-15 上海思阔雅软件有限公司 Computer Chinese characters knowledge base collection system based on text fragment
CN102467531A (en) * 2010-11-12 2012-05-23 中国科学院计算机网络信息中心 Continuously searching method of scientific research information based on character attributes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3466630B2 (en) * 1991-04-12 2003-11-17 株式会社東芝 Information communication device
CN101286148A (en) * 2007-04-12 2008-10-15 上海思阔雅软件有限公司 Computer Chinese characters knowledge base collection system based on text fragment
CN102467531A (en) * 2010-11-12 2012-05-23 中国科学院计算机网络信息中心 Continuously searching method of scientific research information based on character attributes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刁丽娟: ""通用本体学习方法及其应用的关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255806A (en) * 2017-12-22 2018-07-06 北京奇艺世纪科技有限公司 A kind of name recognition methods and device
CN108255806B (en) * 2017-12-22 2021-12-17 北京奇艺世纪科技有限公司 Name recognition method and device
CN109241307A (en) * 2018-08-29 2019-01-18 山东浪潮商用系统有限公司 A kind of performers and clerks' contents management method and system
CN110717091A (en) * 2019-09-16 2020-01-21 苏宁云计算有限公司 Entry data expansion method and device based on face recognition

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
CN104899508B (en) A kind of multistage detection method for phishing site and system
CN107437038B (en) Webpage tampering detection method and device
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN107544988B (en) Method and device for acquiring public opinion data
Kumar et al. Analysis of various machine learning algorithms for enhanced opinion mining using twitter data streams
US10387805B2 (en) System and method for ranking news feeds
CN103324666A (en) Topic tracing method and device based on micro-blog data
Kim et al. Event diffusion patterns in social media
CN106250402B (en) Website classification method and device
CN104424308A (en) Web page classification standard acquisition method and device and web page classification method and device
CN106844786A (en) A kind of public sentiment region focus based on text similarity finds method
CN101515272A (en) Method and device for extracting webpage content
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN114915468B (en) Intelligent analysis and detection method for network crime based on knowledge graph
CN112149422B (en) Dynamic enterprise news monitoring method based on natural language
Chumwatana Using sentiment analysis technique for analyzing Thai customer satisfaction from social media
CN115238688B (en) Method, device, equipment and storage medium for analyzing association relation of electronic information data
CN106779080A (en) A kind of people information knowledge base method for auto constructing
WO2015062377A1 (en) Device and method for detecting similar text, and application
CN106933878B (en) Information processing method and device
CN106202312B (en) A kind of interest point search method and system for mobile Internet
CN106126495B (en) One kind being based on large-scale corpus prompter method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication