CN110390022A - A kind of professional knowledge map construction method of automation - Google Patents

A kind of professional knowledge map construction method of automation Download PDF

Info

Publication number
CN110390022A
CN110390022A CN201910542202.1A CN201910542202A CN110390022A CN 110390022 A CN110390022 A CN 110390022A CN 201910542202 A CN201910542202 A CN 201910542202A CN 110390022 A CN110390022 A CN 110390022A
Authority
CN
China
Prior art keywords
professional knowledge
semantic
text
label
knowledge map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910542202.1A
Other languages
Chinese (zh)
Inventor
刘家祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central Mdt Infotech Ltd Of United States Of Xiamen
Original Assignee
Central Mdt Infotech Ltd Of United States Of Xiamen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central Mdt Infotech Ltd Of United States Of Xiamen filed Critical Central Mdt Infotech Ltd Of United States Of Xiamen
Priority to CN201910542202.1A priority Critical patent/CN110390022A/en
Publication of CN110390022A publication Critical patent/CN110390022A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

A kind of professional knowledge map construction method of automation, comprising the following steps: obtain professional knowledge text;Professional knowledge text is segmented and removes the stop words in participle text;Every text is converted into several set of words;Part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;Noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;Semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relation schema;Obtained candidate semantic relation schema is clustered, one group of final semantic relation mode is obtained;Professional knowledge data are obtained using semantic dictionary and semantic relation mode;It is write data as owl file and is imported into protege.The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, and cost is relatively low, and time saving and energy saving and constructed professional knowledge map accuracy is high.

Description

A kind of professional knowledge map construction method of automation
Technical field
The present invention relates to knowledge mapping constructing technology field more particularly to a kind of professional knowledge map construction sides of automation Method.
Background technique
Knowledge services are the hot spots that pro digital publishing area is pursued for a period of time recently, and country is a large amount of in this respect to be provided Gold investment also accelerates the time that knowledge services are landed in publisher, but the Knowledge Service System of domestic construction is general at present For be still traditional document rank knowledge services, traditional full-text search mode is provided, is also in terms of resource associations Interrelational form between document and bibliography, in order to realize real knowledge retrieval, building is used as knowledge retrieval base support Various types of knowledge hierarchies just at key, part society, commercial press leading in terms of knowledge hierarchy building is leading at present There is certain accumulation in terms of the thesaurus of domain, but for knowledge retrieval, has constructed domain body and knowledge graph Spectrum is only optimal target;
However, for areas of expertise, the expert's manual construction knowledge mapping for needing to be well understood by this profession, A large amount of manpower and time are put into, the efficiency for constructing knowledge mapping is too low, and cost is too high, the also bad guarantee of accuracy rate.
Summary of the invention
(1) goal of the invention
To solve technical problem present in background technique, the present invention proposes a kind of professional knowledge map construction of automation Method is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving and constructed Professional knowledge map accuracy is high.
(2) technical solution
To solve the above problems, the invention proposes a kind of professional knowledge map construction method of automation, including it is following Step:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
Preferably, specific step is as follows by S2: by every target text segmentation at multiple sentences;Simultaneously to each sentence participle Obtain the sequence of word.
Preferably, further include following steps in S4: being calculated between any two texts using Shingling algorithm Similarity, and all texts that similarity is greater than threshold value are put into the same text cluster;Classify to each text cluster, and To a text cluster generic.
Preferably, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains its part of speech label;To every A set of words carries out dependency analysis, to there are the two of grammer dependence words to obtain interdependent label;All words it is interdependent Label constitutes dependency tree.
Preferably, in S6, noun phrase is the phrase of multiple continuous word compositions comprising noun;Verb phrase be according to Deposit the phrase for having guest's relationship on tree.
Preferably, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculate every two noun phrase it Between whether there is relationship.
Preferably, in S7, semantic tagger is to obtain corresponding semantic type by searching word in semantic dictionary, And the part of speech of word is combined to be judged;After word each in morphology relation schema is carried out semantic tagger, corresponding language is obtained Adopted relation schema.
Preferably, it in S7, for there is the word of multiple semantic types, is drawn into all on the entire text collection Semantic relation mode is counted, and matched mode is therefrom found;If matching is less than semantic relation mode, polynary semantic pass It is that mode is converted to multiple dual modes, then is matched.
Above-mentioned technical proposal of the invention has following beneficial technical effect:
The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving, structure The accuracy for the professional knowledge map built is high.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the professional knowledge map construction method of automation proposed by the present invention.
Specific embodiment
In order to make the objectives, technical solutions and advantages of the present invention clearer, With reference to embodiment and join According to attached drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair Bright range.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid this is unnecessarily obscured The concept of invention.
As shown in Figure 1, a kind of professional knowledge map construction method of automation proposed by the present invention, comprising the following steps:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
In an alternative embodiment, specific step is as follows by S2: by every target text segmentation at multiple sentences;It is right Each sentence segments and obtains the sequence of word.
In an alternative embodiment, further include following steps in S4: being calculated using Shingling algorithm any Similarity between two texts, and all texts that similarity is greater than threshold value are put into the same text cluster;To each text Cluster is classified, and obtains a text cluster generic.
In an alternative embodiment, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains Its part of speech label;Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label; The interdependent label of all words constitutes dependency tree.
In an alternative embodiment, in S6, noun phrase is the short of multiple continuous word compositions comprising noun Language;Verb phrase is the phrase for having guest's relationship on dependency tree.
In an alternative embodiment, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculates every It whether there is relationship between two noun phrases.
In an alternative embodiment, in S7, semantic tagger is obtained pair by searching word in semantic dictionary The semantic type answered, and the part of speech of word is combined to be judged;After word each in morphology relation schema is carried out semantic tagger, Obtain corresponding semantic relation mode.
In an alternative embodiment, in S7, for there is the word of multiple semantic types, on entire text collection All semantic relation modes being drawn into are counted, matched mode is therefrom found;If matching less than semantic relation mode, Polynary semantic relation mode is converted to multiple dual modes, then is matched.
In the present invention, professional knowledge text is obtained first with web crawlers technology;Then using jieba tool to acquisition Professional knowledge text segmented, and using stopwords tool removal participle text in stop words;Then n- is carried out Gram processing, is converted into several set of words for every text;Part of speech label is obtained by part-of-speech tagging later, passes through interdependent point Analysis obtains interdependent label and dependency tree, carries out noun phrase detection based on part of speech label and interdependent label, verb phrase detection, waits It selects relationship to detect, and combines semantic dictionary, semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relationship mould Formula;Then obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode, and utilize semanteme Dictionary and semantic relation mode obtain professional knowledge data;Data finally are write as owl file using jena tool, by owl text Part is imported into protege, generates final professional knowledge map;
The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving, structure The accuracy for the professional knowledge map built is high.
It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims (8)

1. a kind of professional knowledge map construction method of automation, which comprises the following steps:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
2. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S2's is specific Steps are as follows:
By every target text segmentation at multiple sentences;
Each sentence is segmented and obtains the sequence of word.
3. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S4, Further include following steps:
The similarity between any two texts is calculated using Shingling algorithm, and similarity is greater than to all texts of threshold value Originally it is put into the same text cluster;
Classify to each text cluster, and obtains a text cluster generic.
4. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S5's is specific Steps are as follows:
Part-of-speech tagging is carried out to each set of words and obtains its part of speech label;
Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label;
The interdependent label of all words constitutes dependency tree.
5. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6, Noun phrase is the phrase of multiple continuous word compositions comprising noun;Verb phrase is to have the short of guest's relationship on dependency tree Language.
6. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6, Candidate relationship detection is mainly used for after obtaining noun phrase, calculates and whether there is relationship between every two noun phrase.
7. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7, Semantic tagger is to obtain corresponding semantic type, and the part of speech of word is combined to be sentenced by searching word in semantic dictionary It is disconnected;
After word each in morphology relation schema is carried out semantic tagger, corresponding semantic relation mode is obtained.
8. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7, For there is the word of multiple semantic types, all semantic relation modes being drawn into are counted on entire text collection, Therefrom find matched mode;
If polynary semantic relation mode is converted to multiple dual modes, then matched less than semantic relation mode by matching.
CN201910542202.1A 2019-06-21 2019-06-21 A kind of professional knowledge map construction method of automation Pending CN110390022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910542202.1A CN110390022A (en) 2019-06-21 2019-06-21 A kind of professional knowledge map construction method of automation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910542202.1A CN110390022A (en) 2019-06-21 2019-06-21 A kind of professional knowledge map construction method of automation

Publications (1)

Publication Number Publication Date
CN110390022A true CN110390022A (en) 2019-10-29

Family

ID=68285661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910542202.1A Pending CN110390022A (en) 2019-06-21 2019-06-21 A kind of professional knowledge map construction method of automation

Country Status (1)

Country Link
CN (1) CN110390022A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888991A (en) * 2019-11-28 2020-03-17 哈尔滨工程大学 Sectional semantic annotation method in weak annotation environment
CN110910168A (en) * 2019-11-05 2020-03-24 北京洪泰文旅科技股份有限公司 Method and equipment for acquiring guests in text and travel industry
CN111488741A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Tax knowledge data semantic annotation method and related device
CN111737400A (en) * 2020-06-15 2020-10-02 上海理想信息产业(集团)有限公司 Knowledge reasoning-based big data service tag expansion method and system
CN112149427A (en) * 2020-10-12 2020-12-29 腾讯科技(深圳)有限公司 Method for constructing verb phrase implication map and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184569A (en) * 2011-06-11 2011-09-14 福州大学 Individual plant wood modeling method driven by domain ontology
CN107633044A (en) * 2017-09-14 2018-01-26 国家计算机网络与信息安全管理中心 A kind of public sentiment knowledge mapping construction method based on focus incident
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184569A (en) * 2011-06-11 2011-09-14 福州大学 Individual plant wood modeling method driven by domain ontology
CN107633044A (en) * 2017-09-14 2018-01-26 国家计算机网络与信息安全管理中心 A kind of public sentiment knowledge mapping construction method based on focus incident
CN109522418A (en) * 2018-11-08 2019-03-26 杭州费尔斯通科技有限公司 A kind of automanual knowledge mapping construction method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910168A (en) * 2019-11-05 2020-03-24 北京洪泰文旅科技股份有限公司 Method and equipment for acquiring guests in text and travel industry
CN110888991A (en) * 2019-11-28 2020-03-17 哈尔滨工程大学 Sectional semantic annotation method in weak annotation environment
CN110888991B (en) * 2019-11-28 2023-12-01 哈尔滨工程大学 Sectional type semantic annotation method under weak annotation environment
CN111488741A (en) * 2020-04-14 2020-08-04 税友软件集团股份有限公司 Tax knowledge data semantic annotation method and related device
CN111737400A (en) * 2020-06-15 2020-10-02 上海理想信息产业(集团)有限公司 Knowledge reasoning-based big data service tag expansion method and system
CN111737400B (en) * 2020-06-15 2023-06-20 上海理想信息产业(集团)有限公司 Knowledge reasoning-based big data service label expansion method and system
CN112149427A (en) * 2020-10-12 2020-12-29 腾讯科技(深圳)有限公司 Method for constructing verb phrase implication map and related equipment
CN112149427B (en) * 2020-10-12 2024-02-02 腾讯科技(深圳)有限公司 Verb phrase implication map construction method and related equipment

Similar Documents

Publication Publication Date Title
CN110390022A (en) A kind of professional knowledge map construction method of automation
Alzahrani et al. Fuzzy semantic-based string similarity for extrinsic plagiarism detection
CN109241538A (en) Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN105824933A (en) Automatic question answering system based on main statement position and implementation method thereof
WO2008107305A2 (en) Search-based word segmentation method and device for language without word boundary tag
CN109522418A (en) A kind of automanual knowledge mapping construction method
CN106446018B (en) Query information processing method and device based on artificial intelligence
Pourvali et al. Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base
Falk et al. Classifying French verbs using French and English lexical resources
CN109376352A (en) A kind of patent text modeling method based on word2vec and semantic similarity
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN102662936A (en) Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
Chen et al. A boundary assembling method for Chinese entity-mention recognition
CN105912522A (en) Automatic extraction method and extractor of English corpora based on constituent analyses
CN111191464A (en) Semantic similarity calculation method based on combined distance
Bougouin et al. Keyphrase annotation with graph co-ranking
Kessler et al. Extraction of terminology in the field of construction
CN103020311B (en) A kind of processing method of user search word and system
Watrin et al. An N-gram frequency database reference to handle MWE extraction in NLP applications
CN110705295B (en) Entity name disambiguation method based on keyword extraction
Guisado-Gámez et al. Massive query expansion by exploiting graph knowledge bases for image retrieval
CN101576876B (en) System and method for automatically splitting English generalized phrase
Pourvali A new graph based text segmentation using Wikipedia for automatic text summarization
Maheswari et al. Rule based morphological variation removable stemming algorithm
Souza et al. Extraction of keywords from texts: an exploratory study using Noun Phrases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191029