CN110390022A - A kind of professional knowledge map construction method of automation - Google Patents
A kind of professional knowledge map construction method of automation Download PDFInfo
- Publication number
- CN110390022A CN110390022A CN201910542202.1A CN201910542202A CN110390022A CN 110390022 A CN110390022 A CN 110390022A CN 201910542202 A CN201910542202 A CN 201910542202A CN 110390022 A CN110390022 A CN 110390022A
- Authority
- CN
- China
- Prior art keywords
- professional knowledge
- semantic
- text
- label
- knowledge map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
A kind of professional knowledge map construction method of automation, comprising the following steps: obtain professional knowledge text;Professional knowledge text is segmented and removes the stop words in participle text;Every text is converted into several set of words;Part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;Noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;Semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relation schema;Obtained candidate semantic relation schema is clustered, one group of final semantic relation mode is obtained;Professional knowledge data are obtained using semantic dictionary and semantic relation mode;It is write data as owl file and is imported into protege.The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, and cost is relatively low, and time saving and energy saving and constructed professional knowledge map accuracy is high.
Description
Technical field
The present invention relates to knowledge mapping constructing technology field more particularly to a kind of professional knowledge map construction sides of automation
Method.
Background technique
Knowledge services are the hot spots that pro digital publishing area is pursued for a period of time recently, and country is a large amount of in this respect to be provided
Gold investment also accelerates the time that knowledge services are landed in publisher, but the Knowledge Service System of domestic construction is general at present
For be still traditional document rank knowledge services, traditional full-text search mode is provided, is also in terms of resource associations
Interrelational form between document and bibliography, in order to realize real knowledge retrieval, building is used as knowledge retrieval base support
Various types of knowledge hierarchies just at key, part society, commercial press leading in terms of knowledge hierarchy building is leading at present
There is certain accumulation in terms of the thesaurus of domain, but for knowledge retrieval, has constructed domain body and knowledge graph
Spectrum is only optimal target;
However, for areas of expertise, the expert's manual construction knowledge mapping for needing to be well understood by this profession,
A large amount of manpower and time are put into, the efficiency for constructing knowledge mapping is too low, and cost is too high, the also bad guarantee of accuracy rate.
Summary of the invention
(1) goal of the invention
To solve technical problem present in background technique, the present invention proposes a kind of professional knowledge map construction of automation
Method is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving and constructed
Professional knowledge map accuracy is high.
(2) technical solution
To solve the above problems, the invention proposes a kind of professional knowledge map construction method of automation, including it is following
Step:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
Preferably, specific step is as follows by S2: by every target text segmentation at multiple sentences;Simultaneously to each sentence participle
Obtain the sequence of word.
Preferably, further include following steps in S4: being calculated between any two texts using Shingling algorithm
Similarity, and all texts that similarity is greater than threshold value are put into the same text cluster;Classify to each text cluster, and
To a text cluster generic.
Preferably, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains its part of speech label;To every
A set of words carries out dependency analysis, to there are the two of grammer dependence words to obtain interdependent label;All words it is interdependent
Label constitutes dependency tree.
Preferably, in S6, noun phrase is the phrase of multiple continuous word compositions comprising noun;Verb phrase be according to
Deposit the phrase for having guest's relationship on tree.
Preferably, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculate every two noun phrase it
Between whether there is relationship.
Preferably, in S7, semantic tagger is to obtain corresponding semantic type by searching word in semantic dictionary,
And the part of speech of word is combined to be judged;After word each in morphology relation schema is carried out semantic tagger, corresponding language is obtained
Adopted relation schema.
Preferably, it in S7, for there is the word of multiple semantic types, is drawn into all on the entire text collection
Semantic relation mode is counted, and matched mode is therefrom found;If matching is less than semantic relation mode, polynary semantic pass
It is that mode is converted to multiple dual modes, then is matched.
Above-mentioned technical proposal of the invention has following beneficial technical effect:
The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving, structure
The accuracy for the professional knowledge map built is high.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the professional knowledge map construction method of automation proposed by the present invention.
Specific embodiment
In order to make the objectives, technical solutions and advantages of the present invention clearer, With reference to embodiment and join
According to attached drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair
Bright range.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid this is unnecessarily obscured
The concept of invention.
As shown in Figure 1, a kind of professional knowledge map construction method of automation proposed by the present invention, comprising the following steps:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
In an alternative embodiment, specific step is as follows by S2: by every target text segmentation at multiple sentences;It is right
Each sentence segments and obtains the sequence of word.
In an alternative embodiment, further include following steps in S4: being calculated using Shingling algorithm any
Similarity between two texts, and all texts that similarity is greater than threshold value are put into the same text cluster;To each text
Cluster is classified, and obtains a text cluster generic.
In an alternative embodiment, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains
Its part of speech label;Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label;
The interdependent label of all words constitutes dependency tree.
In an alternative embodiment, in S6, noun phrase is the short of multiple continuous word compositions comprising noun
Language;Verb phrase is the phrase for having guest's relationship on dependency tree.
In an alternative embodiment, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculates every
It whether there is relationship between two noun phrases.
In an alternative embodiment, in S7, semantic tagger is obtained pair by searching word in semantic dictionary
The semantic type answered, and the part of speech of word is combined to be judged;After word each in morphology relation schema is carried out semantic tagger,
Obtain corresponding semantic relation mode.
In an alternative embodiment, in S7, for there is the word of multiple semantic types, on entire text collection
All semantic relation modes being drawn into are counted, matched mode is therefrom found;If matching less than semantic relation mode,
Polynary semantic relation mode is converted to multiple dual modes, then is matched.
In the present invention, professional knowledge text is obtained first with web crawlers technology;Then using jieba tool to acquisition
Professional knowledge text segmented, and using stopwords tool removal participle text in stop words;Then n- is carried out
Gram processing, is converted into several set of words for every text;Part of speech label is obtained by part-of-speech tagging later, passes through interdependent point
Analysis obtains interdependent label and dependency tree, carries out noun phrase detection based on part of speech label and interdependent label, verb phrase detection, waits
It selects relationship to detect, and combines semantic dictionary, semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relationship mould
Formula;Then obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode, and utilize semanteme
Dictionary and semantic relation mode obtain professional knowledge data;Data finally are write as owl file using jena tool, by owl text
Part is imported into protege, generates final professional knowledge map;
The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving, structure
The accuracy for the professional knowledge map built is high.
It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention
Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing
Change example.
Claims (8)
1. a kind of professional knowledge map construction method of automation, which comprises the following steps:
S1, professional knowledge text is obtained using web crawlers technology;
S2, it is segmented using professional knowledge text of the jieba tool to acquisition;
S3, the stop words segmented in text is removed using stopwords tool;
S4, n-gram processing is carried out, every text is converted into several set of words;
S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis;
S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label;
S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary;
S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode;
S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode;
S10, data are write as owl file using jena tool, owl file is imported into protege;
S11, final professional knowledge map is generated.
2. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S2's is specific
Steps are as follows:
By every target text segmentation at multiple sentences;
Each sentence is segmented and obtains the sequence of word.
3. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S4,
Further include following steps:
The similarity between any two texts is calculated using Shingling algorithm, and similarity is greater than to all texts of threshold value
Originally it is put into the same text cluster;
Classify to each text cluster, and obtains a text cluster generic.
4. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S5's is specific
Steps are as follows:
Part-of-speech tagging is carried out to each set of words and obtains its part of speech label;
Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label;
The interdependent label of all words constitutes dependency tree.
5. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6,
Noun phrase is the phrase of multiple continuous word compositions comprising noun;Verb phrase is to have the short of guest's relationship on dependency tree
Language.
6. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6,
Candidate relationship detection is mainly used for after obtaining noun phrase, calculates and whether there is relationship between every two noun phrase.
7. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7,
Semantic tagger is to obtain corresponding semantic type, and the part of speech of word is combined to be sentenced by searching word in semantic dictionary
It is disconnected;
After word each in morphology relation schema is carried out semantic tagger, corresponding semantic relation mode is obtained.
8. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7,
For there is the word of multiple semantic types, all semantic relation modes being drawn into are counted on entire text collection,
Therefrom find matched mode;
If polynary semantic relation mode is converted to multiple dual modes, then matched less than semantic relation mode by matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910542202.1A CN110390022A (en) | 2019-06-21 | 2019-06-21 | A kind of professional knowledge map construction method of automation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910542202.1A CN110390022A (en) | 2019-06-21 | 2019-06-21 | A kind of professional knowledge map construction method of automation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390022A true CN110390022A (en) | 2019-10-29 |
Family
ID=68285661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910542202.1A Pending CN110390022A (en) | 2019-06-21 | 2019-06-21 | A kind of professional knowledge map construction method of automation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390022A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110888991A (en) * | 2019-11-28 | 2020-03-17 | 哈尔滨工程大学 | Sectional semantic annotation method in weak annotation environment |
CN110910168A (en) * | 2019-11-05 | 2020-03-24 | 北京洪泰文旅科技股份有限公司 | Method and equipment for acquiring guests in text and travel industry |
CN111488741A (en) * | 2020-04-14 | 2020-08-04 | 税友软件集团股份有限公司 | Tax knowledge data semantic annotation method and related device |
CN111737400A (en) * | 2020-06-15 | 2020-10-02 | 上海理想信息产业(集团)有限公司 | Knowledge reasoning-based big data service tag expansion method and system |
CN112149427A (en) * | 2020-10-12 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Method for constructing verb phrase implication map and related equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184569A (en) * | 2011-06-11 | 2011-09-14 | 福州大学 | Individual plant wood modeling method driven by domain ontology |
CN107633044A (en) * | 2017-09-14 | 2018-01-26 | 国家计算机网络与信息安全管理中心 | A kind of public sentiment knowledge mapping construction method based on focus incident |
CN109522418A (en) * | 2018-11-08 | 2019-03-26 | 杭州费尔斯通科技有限公司 | A kind of automanual knowledge mapping construction method |
-
2019
- 2019-06-21 CN CN201910542202.1A patent/CN110390022A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184569A (en) * | 2011-06-11 | 2011-09-14 | 福州大学 | Individual plant wood modeling method driven by domain ontology |
CN107633044A (en) * | 2017-09-14 | 2018-01-26 | 国家计算机网络与信息安全管理中心 | A kind of public sentiment knowledge mapping construction method based on focus incident |
CN109522418A (en) * | 2018-11-08 | 2019-03-26 | 杭州费尔斯通科技有限公司 | A kind of automanual knowledge mapping construction method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110910168A (en) * | 2019-11-05 | 2020-03-24 | 北京洪泰文旅科技股份有限公司 | Method and equipment for acquiring guests in text and travel industry |
CN110888991A (en) * | 2019-11-28 | 2020-03-17 | 哈尔滨工程大学 | Sectional semantic annotation method in weak annotation environment |
CN110888991B (en) * | 2019-11-28 | 2023-12-01 | 哈尔滨工程大学 | Sectional type semantic annotation method under weak annotation environment |
CN111488741A (en) * | 2020-04-14 | 2020-08-04 | 税友软件集团股份有限公司 | Tax knowledge data semantic annotation method and related device |
CN111737400A (en) * | 2020-06-15 | 2020-10-02 | 上海理想信息产业(集团)有限公司 | Knowledge reasoning-based big data service tag expansion method and system |
CN111737400B (en) * | 2020-06-15 | 2023-06-20 | 上海理想信息产业(集团)有限公司 | Knowledge reasoning-based big data service label expansion method and system |
CN112149427A (en) * | 2020-10-12 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Method for constructing verb phrase implication map and related equipment |
CN112149427B (en) * | 2020-10-12 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Verb phrase implication map construction method and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390022A (en) | A kind of professional knowledge map construction method of automation | |
Alzahrani et al. | Fuzzy semantic-based string similarity for extrinsic plagiarism detection | |
CN109241538A (en) | Based on the interdependent Chinese entity relation extraction method of keyword and verb | |
CN105824933A (en) | Automatic question answering system based on main statement position and implementation method thereof | |
WO2008107305A2 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN109522418A (en) | A kind of automanual knowledge mapping construction method | |
CN106446018B (en) | Query information processing method and device based on artificial intelligence | |
Pourvali et al. | Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base | |
Falk et al. | Classifying French verbs using French and English lexical resources | |
CN109376352A (en) | A kind of patent text modeling method based on word2vec and semantic similarity | |
CN109614620B (en) | HowNet-based graph model word sense disambiguation method and system | |
CN102662936A (en) | Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning | |
Chen et al. | A boundary assembling method for Chinese entity-mention recognition | |
CN105912522A (en) | Automatic extraction method and extractor of English corpora based on constituent analyses | |
CN111191464A (en) | Semantic similarity calculation method based on combined distance | |
Bougouin et al. | Keyphrase annotation with graph co-ranking | |
Kessler et al. | Extraction of terminology in the field of construction | |
CN103020311B (en) | A kind of processing method of user search word and system | |
Watrin et al. | An N-gram frequency database reference to handle MWE extraction in NLP applications | |
CN110705295B (en) | Entity name disambiguation method based on keyword extraction | |
Guisado-Gámez et al. | Massive query expansion by exploiting graph knowledge bases for image retrieval | |
CN101576876B (en) | System and method for automatically splitting English generalized phrase | |
Pourvali | A new graph based text segmentation using Wikipedia for automatic text summarization | |
Maheswari et al. | Rule based morphological variation removable stemming algorithm | |
Souza et al. | Extraction of keywords from texts: an exploratory study using Noun Phrases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |