CN110390022A

CN110390022A - A kind of professional knowledge map construction method of automation

Info

Publication number: CN110390022A
Application number: CN201910542202.1A
Authority: CN
Inventors: 刘家祥
Original assignee: Central Mdt Infotech Ltd Of United States Of Xiamen
Current assignee: Central Mdt Infotech Ltd Of United States Of Xiamen
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2019-10-29

Abstract

A kind of professional knowledge map construction method of automation, comprising the following steps: obtain professional knowledge text；Professional knowledge text is segmented and removes the stop words in participle text；Every text is converted into several set of words；Part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis；Noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label；Semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relation schema；Obtained candidate semantic relation schema is clustered, one group of final semantic relation mode is obtained；Professional knowledge data are obtained using semantic dictionary and semantic relation mode；It is write data as owl file and is imported into protege.The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, and cost is relatively low, and time saving and energy saving and constructed professional knowledge map accuracy is high.

Description

A kind of professional knowledge map construction method of automation

Technical field

The present invention relates to knowledge mapping constructing technology field more particularly to a kind of professional knowledge map construction sides of automation Method.

Background technique

Knowledge services are the hot spots that pro digital publishing area is pursued for a period of time recently, and country is a large amount of in this respect to be provided Gold investment also accelerates the time that knowledge services are landed in publisher, but the Knowledge Service System of domestic construction is general at present For be still traditional document rank knowledge services, traditional full-text search mode is provided, is also in terms of resource associations Interrelational form between document and bibliography, in order to realize real knowledge retrieval, building is used as knowledge retrieval base support Various types of knowledge hierarchies just at key, part society, commercial press leading in terms of knowledge hierarchy building is leading at present There is certain accumulation in terms of the thesaurus of domain, but for knowledge retrieval, has constructed domain body and knowledge graph Spectrum is only optimal target；

However, for areas of expertise, the expert's manual construction knowledge mapping for needing to be well understood by this profession, A large amount of manpower and time are put into, the efficiency for constructing knowledge mapping is too low, and cost is too high, the also bad guarantee of accuracy rate.

Summary of the invention

(1) goal of the invention

To solve technical problem present in background technique, the present invention proposes a kind of professional knowledge map construction of automation Method is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving and constructed Professional knowledge map accuracy is high.

(2) technical solution

To solve the above problems, the invention proposes a kind of professional knowledge map construction method of automation, including it is following Step:

S1, professional knowledge text is obtained using web crawlers technology；

S2, it is segmented using professional knowledge text of the jieba tool to acquisition；

S3, the stop words segmented in text is removed using stopwords tool；

S4, n-gram processing is carried out, every text is converted into several set of words；

S5, part of speech label is obtained by part-of-speech tagging, interdependent label and dependency tree are obtained by dependency analysis；

S6, noun phrase detection, verb phrase detection, candidate relationship detection are carried out based on part of speech label and interdependent label；

S7, candidate semantic relation schema is obtained to the word progress semantic tagger in candidate relationship in conjunction with semantic dictionary；

S8, obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode；

S9, professional knowledge data are obtained using semantic dictionary and semantic relation mode；

S10, data are write as owl file using jena tool, owl file is imported into protege；

S11, final professional knowledge map is generated.

Preferably, specific step is as follows by S2: by every target text segmentation at multiple sentences；Simultaneously to each sentence participle Obtain the sequence of word.

Preferably, further include following steps in S4: being calculated between any two texts using Shingling algorithm Similarity, and all texts that similarity is greater than threshold value are put into the same text cluster；Classify to each text cluster, and To a text cluster generic.

Preferably, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains its part of speech label；To every A set of words carries out dependency analysis, to there are the two of grammer dependence words to obtain interdependent label；All words it is interdependent Label constitutes dependency tree.

Preferably, in S6, noun phrase is the phrase of multiple continuous word compositions comprising noun；Verb phrase be according to Deposit the phrase for having guest's relationship on tree.

Preferably, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculate every two noun phrase it Between whether there is relationship.

Preferably, in S7, semantic tagger is to obtain corresponding semantic type by searching word in semantic dictionary, And the part of speech of word is combined to be judged；After word each in morphology relation schema is carried out semantic tagger, corresponding language is obtained Adopted relation schema.

Preferably, it in S7, for there is the word of multiple semantic types, is drawn into all on the entire text collection Semantic relation mode is counted, and matched mode is therefrom found；If matching is less than semantic relation mode, polynary semantic pass It is that mode is converted to multiple dual modes, then is matched.

Above-mentioned technical proposal of the invention has following beneficial technical effect:

The present invention is optimized and extends, and constructs the high-efficient of professional knowledge map, cost is relatively low, time saving and energy saving, structure The accuracy for the professional knowledge map built is high.

Detailed description of the invention

Fig. 1 is a kind of flow chart of the professional knowledge map construction method of automation proposed by the present invention.

Specific embodiment

In order to make the objectives, technical solutions and advantages of the present invention clearer, With reference to embodiment and join According to attached drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair Bright range.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid this is unnecessarily obscured The concept of invention.

As shown in Figure 1, a kind of professional knowledge map construction method of automation proposed by the present invention, comprising the following steps:

S1, professional knowledge text is obtained using web crawlers technology；

S3, the stop words segmented in text is removed using stopwords tool；

S11, final professional knowledge map is generated.

In an alternative embodiment, specific step is as follows by S2: by every target text segmentation at multiple sentences；It is right Each sentence segments and obtains the sequence of word.

In an alternative embodiment, further include following steps in S4: being calculated using Shingling algorithm any Similarity between two texts, and all texts that similarity is greater than threshold value are put into the same text cluster；To each text Cluster is classified, and obtains a text cluster generic.

In an alternative embodiment, specific step is as follows by S5: carrying out part-of-speech tagging to each set of words and obtains Its part of speech label；Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label； The interdependent label of all words constitutes dependency tree.

In an alternative embodiment, in S6, noun phrase is the short of multiple continuous word compositions comprising noun Language；Verb phrase is the phrase for having guest's relationship on dependency tree.

In an alternative embodiment, in S6, candidate relationship detection is mainly used for after obtaining noun phrase, calculates every It whether there is relationship between two noun phrases.

In an alternative embodiment, in S7, semantic tagger is obtained pair by searching word in semantic dictionary The semantic type answered, and the part of speech of word is combined to be judged；After word each in morphology relation schema is carried out semantic tagger, Obtain corresponding semantic relation mode.

In an alternative embodiment, in S7, for there is the word of multiple semantic types, on entire text collection All semantic relation modes being drawn into are counted, matched mode is therefrom found；If matching less than semantic relation mode, Polynary semantic relation mode is converted to multiple dual modes, then is matched.

In the present invention, professional knowledge text is obtained first with web crawlers technology；Then using jieba tool to acquisition Professional knowledge text segmented, and using stopwords tool removal participle text in stop words；Then n- is carried out Gram processing, is converted into several set of words for every text；Part of speech label is obtained by part-of-speech tagging later, passes through interdependent point Analysis obtains interdependent label and dependency tree, carries out noun phrase detection based on part of speech label and interdependent label, verb phrase detection, waits It selects relationship to detect, and combines semantic dictionary, semantic tagger is carried out to the word in candidate relationship, obtains candidate semantic relationship mould Formula；Then obtained candidate semantic relation schema is clustered, obtains one group of final semantic relation mode, and utilize semanteme Dictionary and semantic relation mode obtain professional knowledge data；Data finally are write as owl file using jena tool, by owl text Part is imported into protege, generates final professional knowledge map；

It should be understood that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing Change example.

Claims

1. a kind of professional knowledge map construction method of automation, which comprises the following steps:

S1, professional knowledge text is obtained using web crawlers technology；

S3, the stop words segmented in text is removed using stopwords tool；

S11, final professional knowledge map is generated.

2. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S2's is specific Steps are as follows:

By every target text segmentation at multiple sentences；

Each sentence is segmented and obtains the sequence of word.

3. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S4, Further include following steps:

The similarity between any two texts is calculated using Shingling algorithm, and similarity is greater than to all texts of threshold value Originally it is put into the same text cluster；

Classify to each text cluster, and obtains a text cluster generic.

4. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that S5's is specific Steps are as follows:

Part-of-speech tagging is carried out to each set of words and obtains its part of speech label；

Dependency analysis is carried out to each set of words, to there are the two of grammer dependence words to obtain interdependent label；

The interdependent label of all words constitutes dependency tree.

5. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6, Noun phrase is the phrase of multiple continuous word compositions comprising noun；Verb phrase is to have the short of guest's relationship on dependency tree Language.

6. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S6, Candidate relationship detection is mainly used for after obtaining noun phrase, calculates and whether there is relationship between every two noun phrase.

7. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7, Semantic tagger is to obtain corresponding semantic type, and the part of speech of word is combined to be sentenced by searching word in semantic dictionary It is disconnected；

After word each in morphology relation schema is carried out semantic tagger, corresponding semantic relation mode is obtained.

8. a kind of professional knowledge map construction method of automation according to claim 1, which is characterized in that in S7, For there is the word of multiple semantic types, all semantic relation modes being drawn into are counted on entire text collection, Therefrom find matched mode；

If polynary semantic relation mode is converted to multiple dual modes, then matched less than semantic relation mode by matching.