CN102662923A - Entity instance leading method based on machine learning - Google Patents

Entity instance leading method based on machine learning Download PDF

Info

Publication number
CN102662923A
CN102662923A CN2012101218391A CN201210121839A CN102662923A CN 102662923 A CN102662923 A CN 102662923A CN 2012101218391 A CN2012101218391 A CN 2012101218391A CN 201210121839 A CN201210121839 A CN 201210121839A CN 102662923 A CN102662923 A CN 102662923A
Authority
CN
China
Prior art keywords
speech
text
maximum entropy
ontology
instances
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101218391A
Other languages
Chinese (zh)
Inventor
张萌
王文俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN2012101218391A priority Critical patent/CN102662923A/en
Publication of CN102662923A publication Critical patent/CN102662923A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of natural language treatment and entity leaning and relates to an entity instance leading method based on the machine learning. The method comprises the following steps of: carrying out linguistic data marking after document pretreatment, selecting various feathers including word feathers, word class feathers and combination feathers of words and word class feathers, and converting the linguistic data and texts to be identified into feature vector modes; carrying out the maximum entropy model training, and obtaining the maximum entropy classifier through the marked linguistic data maximum entropy model parameters; and carrying out instance extraction by the maximum entropy classifier. The method has the advantage that the entity instance can be fast and effectively leaned from a large number of texts.

Description

A kind of instances of ontology learning method based on machine learning
Affiliated technical field
The present invention relates to natural language processing and body learning technical field.Mainly be characteristics, based on the method and the experience of machine learning processing text, carry out the study of instances of ontology in the absorption natural language processing according to ontology model.
Background technology
At present, the information major part on the legacy network is non-structured, lacks the sense of organization, and has a large amount of useless, redundant information.And the volatile growth of internet information more process information, obtain knowledge and brought difficulty.Body is the clear and definite formalization normalized illustration of shared ideas model, in semantic net, is playing the part of to make interchange good between each service layer, the role of understanding.Body is that the foundation of semantic net provides knowledge base and rule base, can carry out semantic search, intelligent work based on this.Current notion, the relation that has many researchs to lay particular emphasis on how to make up body.And for the body that Primary Construction is accomplished, particularly with it in the stronger system of usability the time, how from a large amount of unstructured datas, extracting instances of ontology also is the problem that is worth thinking.On the one hand, complete body should comprise the instance of each conception of species, relation, on the other hand, constantly draws new instance and helps the perfect of ontology model, and body is developed to better direction.
At present, the research that mainly contains two aspects relates to the generation of instances of ontology: one type of work is generated as fundamental purpose with instances of ontology.These class methods are many to be core with the method for mode matching.In another kind of work, the study of instances of ontology and attribute normally realizes based on the information extraction technique of body through use.Here, instances of ontology is as the secondary product of research, and for example in the extraction system based on body, the researcher makes full use of body in the process of information extraction characteristics improve efficiency in extracting, precision, and this process finally can produce a large amount of instances of ontology.Much the extraction system based on body all adopts the GATE framework, introduces the body that has made up completion and carries out the identification of named entity.
Along with Internet fast development, quantity of information is huge day by day, and this is with regard to the study instances of ontology that proposed how from a large amount of unstructured datas robotization and the problem of attribute.And great majority use the rule-based or method of pattern match in these methods.The characteristics of these class methods are easy to understand, realize simple, quick.Meanwhile, also exist dirigibility not strong, need deficiencies such as too much artificial participation.
Summary of the invention
In order to overcome the above-mentioned deficiency of prior art; The present invention proposes a kind of method of carrying out instances of ontology study quickly and efficiently from a large amount of text datas; Form structurized information, expand instances of ontology, accomplish from the transformation of unstructured data to machine understandable structured message.Technical scheme of the present invention is following:
A kind of instances of ontology learning method based on machine learning is used for identifying the word that belongs to instances of ontology from text, and to its classification, comprises the following steps:
(1) document pre-service: extract the input of body part as subsequent step;
(2) text pre-service: the text that extracts is carried out participle, subordinate sentence processing, form the text set that has marked part of speech;
(3) mark language material: the text set that has marked part of speech is carried out the manual work mark, add type label, form mark text, i.e. language material in the back of the word that belongs to instances of ontology;
(4) feature selecting: choosing the various characteristics that comprise speech characteristic, part of speech characteristic, speech and part of speech combination of features characteristic, is the form of proper vector with language material and text-converted to be identified;
(5) maximum entropy model training: set up maximum entropy model, utilize the parameter of the good language material training maximum entropy model of mark, obtain maximum entropy classifiers;
(6) utilizing maximum entropy classifiers to carry out instance extracts: according to the characteristic that chooses; With becoming the form that sorter can be accepted through pretreated text-processing; Utilizing the maximum entropy classifiers that has trained is identification and the classification that unit carries out instance with the speech, for what identify
Instances of ontology is selected the net result of the maximum classification of probable value as concept classification under it, realizes that instance extracts.
Utilize the method for the present invention instance of study body from a large amount of texts quickly and efficiently.This based on machine learning can be from the automatic acquire knowledge of training data, thereby avoided a large amount of to the manpower studies on the natural Text Linguistics.Can more easily in every field, switch, finally serve multi-field body learning work.Simultaneously, can improve performance, meet the trend of current web high speed development, make full use of the network data resource, for the research in body field, use the data basis that provides solid through expansion to corpus.
Description of drawings
Fig. 1 general flow chart of the present invention.
Fig. 2 model training process flow diagram.H among the figure iRepresent sorter, the subclass below the same sorter belongs to same parent.
Fig. 3 is based on the instances of ontology learning process figure of maximum entropy.
Embodiment
The present invention introduces machine learning method in learning process.Concept type in the ontology model, level are often a lot, machine learning method can handle nuance, fuzzy notion, thereby from text, extract the instance and the attribute of body effectively.
Maximum entropy is the common model in the machine learning.The main thought of maximum entropy model is to satisfy under the situation of constraint condition, chooses the distribution that makes entropy maximum.With this model is that the sorter of theoretical foundation is widely used in natural language processing, like problems such as named entity recognition, part-of-speech taggings.The principle that the use maximum entropy model carries out the entity classification is following: the contextual information of each entity is expressed as (x 1, x 2..., x m), the classification under this entity is expressed as (y 1, y 2..., y p).Then p (y x) is illustrated in the probability that this entity under the condition of x is classified as y.P (y x) should meet the following conditions:
p ( y | x ) = Z λ ( x ) exp ( Σ i λ i f i ( x , y ) )
Z λ ( x ) = 1 Σ y exp [ Σ i λ i f i ( x , y ) ]
Wherein, f iRepresentation feature.λ iBe the parameter of each characteristic, it has represented the percentage contribution of a characteristic for model.Z (x) is a normaliztion constant.In training process, model utilizes the characteristic of training data to obtain parameter value.A given new entity, model will provide the probability that this entity belongs to each type.That type that the researcher can select corresponding maximum probability as the case may be perhaps chosen the result of top as the candidate as net result.
The present invention uses the sorter based on maximum entropy, can from a large amount of texts, learn instances of ontology effectively automatically.Referring to Fig. 1, be starting point with the html document, the present invention is carried out comparatively detailed explanation, mainly comprise following step:
1, document pre-service: mainly be to resolve the html document, remove the html label, extract the input of body part as subsequent step.
2, text pre-service: the text that extracts is carried out participle, subordinate sentence processing.Participle is the unusual ring on basis in the Chinese natural language processing task, and this method is unit with the speech, adopts the ICTLCAS platform participle of Computer Department of the Chinese Academy of Science's exploitation here.Simultaneously, when pre-service, need carry out the detection of sentence boundary.According to the characteristics of Chinese text, adopt simple rule-based method to get final product, promptly survey "." "? " "! " wait sentence tail tag point commonly used.The final text set that has marked part of speech that forms.
3, learn based on the instances of ontology of maximum entropy.In this step, utilize maximum entropy classifiers to identify the entity in the sentence, and enclose tag along sort, be i.e. the class of the Ontological concept under it for it.The learning process of Ontological concept instance is similar with named entity recognition, just will in text, extract the instance that belongs to a certain notion of place name body.For example " Beijing is the capital of China ", wherein " Beijing ", " China " all are the instances that belong to this classification of geographical entity.On concrete the realization, mainly need following step:
(1) mark of corpus.Maximum entropy model belongs to the category of supervised learning, the support of the training need idiom material of model.The labeled standards of language material is by target ontology model decision, identifies the instance in the language material according to the class of notion in the body.The source of language material is web equally, at first carries out the document pre-service, on the text that extracts, marks according to specifying good standard to carry out manual work then, promptly adds type label in the back of instances of ontology.The final mark text that forms is exactly required language material.
(2) feature selecting.Characteristic can be expressed the characteristics of dissimilar instances, is the important indicator of classification, identification.When handling, need be the form of proper vector with language material and text-converted to be identified.One of advantage of maximum entropy model is the selection that only notes characteristic in use, and it is very careful that this when selecting characteristic, need also to require.Below be exactly the selected characteristic of the present invention:
A, speech characteristic: current speech.Selecting window is 2, first speech of the current speech left and right sides and about the speech of second speech itself.
B, part of speech characteristic: current part of speech.Selecting window is 2, first speech of the current speech left and right sides and about the part of speech of second speech.
C, assemblage characteristic: with current speech and part of speech, the speech itself of first speech of the left and right sides and second speech makes up respectively with part of speech in twos.
D, other supplementary features: combine the characteristic of different language material own characteristics, like suffix speech characteristic etc.
(3) maximum entropy model training.In this step, will utilize the parameter of the language material training maximum entropy model that mark is good in the step (1), finally obtain sorter.Consider the characteristics that notion is more in the ontology model, classification is thinner, in training, make full use of the sub-parent relation between Ontological concept.For the subclass under the same parent is trained same sorter.Avoid a sorter to bear excessive classification pressure like this, also good use the hierarchical structure of ontology model.Training process is as shown in Figure 2.
(4) utilizing maximum entropy classifiers to carry out instance extracts.According to the characteristic that chooses, with becoming the form that sorter can be accepted through pretreated text-processing, being utilized in the maximum entropy classifiers that has trained in the step (3) is identification and the classification that unit carries out instance with the speech.Sorter can provide the probable value that one group of current speech belongs to each candidate's classification.The pairing probable value of word that does not belong to any instances of ontology should be zero.For the instances of ontology that identifies, select the net result of the maximum classification of probable value as concept classification under it.
4, instances of ontology notion mapping.In the step, we have utilized sorter to extract the instance in the text on this.In this step, these example map to corresponding concept, and are preserved with the form of owl.Fig. 3 is the instances of ontology learning process figure based on maximum entropy.
Method among utilization the present invention can be directed against the target body, from the given corresponding instances of ontology of text focusing study.Here, be how the example explanation uses this method study instances of ontology with the Chinese Place Names ontology model.Language material source is the Chinese edition wikipedia china administration zoning page, totally 500 pieces of articles, and wherein 400 pieces as corpus, and 100 pieces as testing material.According to the characteristics of the relevant geography information of " wikipedia " china administration zoning, select the emphasis of this classification of geographical entity as identification, classification.Comprise classifications such as political geography entity, physical geography entity under " geographical entity " this notion, it comprises multiple different classes again down, is the emphasis of classification.And most place name relations occur between the geographical entity, therefore it are weighed the emphasis of machine learning method as utilization.Have the instance of 50 types of geographical entities in the corpus, part type statistics information wherein is following:
Table 1 political geography entity statistical form
Figure BDA0000156296330000031
Figure BDA0000156296330000041
Table 2 physical geography entity statistical form
Figure BDA0000156296330000042
1. at first 500 pieces of documents are carried out pre-service.
2. choosing 400 pieces marks as language material.Corpus labeling is following:
[Shizhu County] DLST-RWDLST-XZQY-SJXZQY-EJXZQYZZX (formerly known as "stone Zhu County," 1959 called "Shizhu " [1]) in [China] DLST-RWDLST-XZQY-GJ [Chongqing ]
The DLST-RWDLST-XZQY-YJXZQY-YJXZQYCSX central and east; Border on [the Changjiang river] DLST-ZRDLST-SX-HL in the west; East is adjacent with [Hubei Province] DLST-RWDLST-XZQY-YJXZQY-YJXZQYPTX; Apart from 321 kilometers in Chongqing, be nearest one in middle distance Chongqing, four ethnic minority autonomous counties of having under its command, Chongqing.Have jurisdiction over 12 towns, 20 townshiies.[yellow water] DLST-ZRDLST-SX-HL [National forest park] DLST-RWDLST-JNLYD-GY [Wan Shouzhai] DLST-RWDLST-JNLYD-FJMSQ, accurate "Danxia" landform view [foot bath small stream] DLST-RWDLST-JNLYD-FJMSQ [western Tuo Yuntijie] DLST-RWDLST-JNLYD-FJMSQ
3. according to the characteristics of place name, on the basis of essential characteristic, optionally add suffix speech characteristic:
(1) humane entity suffix speech characteristic: whether current speech has comprised the speech in the suffix dictionaries such as " economizing city, county ".If current speech comprise in the suffix dictionary speech then eigenwert be made as 1, otherwise be 0.
(2) natural entity suffix speech characteristic: whether current speech has comprised the speech in the suffix dictionaries such as " mountain, river, lake, seas ".If current speech comprise in the suffix dictionary speech then eigenwert be made as 1, otherwise be 0.
4. model training stage.Sorter of initial use is only distinguished two kinds, i.e. two classes of the superiors of Ontological concept: political geography entity and physical geography entity.
5. model measurement.Choose 100 pieces of remaining documents and do participle, subordinate sentence processing.Become the input form of machine learning algorithm needs according to characteristic processing.Sending into sorter classifies.Following table is a classification performance:
Two types of classifying qualities of table 4
Figure BDA0000156296330000043
The present invention has obtained gratifying effect on this language material, and is easier to migrate in the new field compared to rule-based method.Along with being on the increase of language material, also can effectively improve accurate rate, recall rate.Can tackle the application in the open field of web rank.

Claims (1)

1. the instances of ontology learning method based on machine learning is used for identifying the word that belongs to instances of ontology from text, and to its classification, comprises the following steps:
(1) document pre-service: extract the input of body part as subsequent step;
(2) text pre-service: the text that extracts is carried out participle, subordinate sentence processing, form the text set that has marked part of speech;
(3) mark language material: the text set that has marked part of speech is carried out the manual work mark, add type label, form mark text, i.e. language material in the back of the word that belongs to instances of ontology;
(4) feature selecting: choosing the various characteristics that comprise speech characteristic, part of speech characteristic, speech and part of speech combination of features characteristic, is the form of proper vector with language material and text-converted to be identified;
(5) maximum entropy model training.Set up maximum entropy model, utilize the parameter of the good language material training maximum entropy model of mark, obtain maximum entropy classifiers;
(6) utilizing maximum entropy classifiers to carry out instance extracts: according to the characteristic that chooses; With becoming the form that sorter can be accepted through pretreated text-processing; Utilizing the maximum entropy classifiers that has trained is identification and the classification that unit carries out instance with the speech; For the instances of ontology that identifies, select the net result of the maximum classification of probable value as concept classification under it, realize that instance extracts.
CN2012101218391A 2012-04-23 2012-04-23 Entity instance leading method based on machine learning Pending CN102662923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101218391A CN102662923A (en) 2012-04-23 2012-04-23 Entity instance leading method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101218391A CN102662923A (en) 2012-04-23 2012-04-23 Entity instance leading method based on machine learning

Publications (1)

Publication Number Publication Date
CN102662923A true CN102662923A (en) 2012-09-12

Family

ID=46772418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101218391A Pending CN102662923A (en) 2012-04-23 2012-04-23 Entity instance leading method based on machine learning

Country Status (1)

Country Link
CN (1) CN102662923A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN103617290A (en) * 2013-12-13 2014-03-05 江苏名通信息科技有限公司 Chinese machine-reading system
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device
CN103678281A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Method and device for automatically labeling text
CN104346327A (en) * 2014-10-23 2015-02-11 苏州大学 Method and device for determining emotion complexity of texts
CN104391902A (en) * 2014-11-12 2015-03-04 清华大学 Maximum entropy topic model-based online document classification method and device
CN105654144A (en) * 2016-02-29 2016-06-08 东南大学 Social network body constructing method based on machine learning
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information
CN105718256A (en) * 2014-12-18 2016-06-29 通用汽车环球科技运作有限责任公司 Methodology and apparatus for consistency check by comparison of ontology models
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN106570002A (en) * 2016-11-07 2017-04-19 网易(杭州)网络有限公司 Natural language processing method and device
CN107622126A (en) * 2017-09-28 2018-01-23 联想(北京)有限公司 The method and apparatus sorted out to the solid data in data acquisition system
CN107679031A (en) * 2017-09-04 2018-02-09 昆明理工大学 Based on the advertisement blog article recognition methods for stacking the self-editing ink recorder of noise reduction
CN108604222A (en) * 2015-12-28 2018-09-28 云脑科技有限公司 System and method for deployment customized machine learning service
CN109145296A (en) * 2018-08-09 2019-01-04 新华智云科技有限公司 A kind of general word recognition method and device based on monitor model
CN109284374A (en) * 2018-09-07 2019-01-29 百度在线网络技术(北京)有限公司 For determining the method, apparatus, equipment and computer readable storage medium of entity class
CN110020120A (en) * 2017-10-10 2019-07-16 腾讯科技(北京)有限公司 Feature word treatment method, device and storage medium in content delivery system
CN113051875A (en) * 2021-03-22 2021-06-29 北京百度网讯科技有限公司 Training method of information conversion model, and text information conversion method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0816611A (en) * 1994-06-27 1996-01-19 Sharp Corp Data retrieving device using natural language
US6212532B1 (en) * 1998-10-22 2001-04-03 International Business Machines Corporation Text categorization toolkit
CN101310274A (en) * 2005-11-14 2008-11-19 马克森斯公司 A knowledge correlation search engine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0816611A (en) * 1994-06-27 1996-01-19 Sharp Corp Data retrieving device using natural language
US6212532B1 (en) * 1998-10-22 2001-04-03 International Business Machines Corporation Text categorization toolkit
CN101310274A (en) * 2005-11-14 2008-11-19 马克森斯公司 A knowledge correlation search engine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李茹等: "基于汉语框架网的中文问题分类", 《计算机工程与应用》 *
赵文娟: "基于汉语框架本体的网络资源标注", 《中国优秀硕士学位论文全文数据库》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN103530282B (en) * 2013-10-23 2016-07-13 北京紫冬锐意语音科技有限公司 Corpus labeling method and equipment
CN103617245A (en) * 2013-11-27 2014-03-05 苏州大学 Bilingual sentiment classification method and device
CN103617290B (en) * 2013-12-13 2017-02-15 江苏名通信息科技有限公司 Chinese machine-reading system
CN103617290A (en) * 2013-12-13 2014-03-05 江苏名通信息科技有限公司 Chinese machine-reading system
CN103678281A (en) * 2013-12-31 2014-03-26 北京百度网讯科技有限公司 Method and device for automatically labeling text
CN103678281B (en) * 2013-12-31 2016-10-19 北京百度网讯科技有限公司 The method and apparatus that text is carried out automatic marking
CN105830060B (en) * 2014-02-06 2020-12-11 富士施乐株式会社 Information processing apparatus, information processing program, storage medium, and information processing method
CN105830060A (en) * 2014-02-06 2016-08-03 富士施乐株式会社 Information processing device, information processing program, storage medium, and information processing method
CN104346327A (en) * 2014-10-23 2015-02-11 苏州大学 Method and device for determining emotion complexity of texts
CN104391902A (en) * 2014-11-12 2015-03-04 清华大学 Maximum entropy topic model-based online document classification method and device
CN105718256A (en) * 2014-12-18 2016-06-29 通用汽车环球科技运作有限责任公司 Methodology and apparatus for consistency check by comparison of ontology models
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information
CN108604222A (en) * 2015-12-28 2018-09-28 云脑科技有限公司 System and method for deployment customized machine learning service
CN108604222B (en) * 2015-12-28 2022-03-18 云脑科技有限公司 System and method for deploying customized machine learning services
CN105654144B (en) * 2016-02-29 2019-01-29 东南大学 A kind of social network ontologies construction method based on machine learning
CN105654144A (en) * 2016-02-29 2016-06-08 东南大学 Social network body constructing method based on machine learning
CN106570002A (en) * 2016-11-07 2017-04-19 网易(杭州)网络有限公司 Natural language processing method and device
CN106570002B (en) * 2016-11-07 2021-09-14 网易(杭州)网络有限公司 Natural language processing method and device
CN107679031A (en) * 2017-09-04 2018-02-09 昆明理工大学 Based on the advertisement blog article recognition methods for stacking the self-editing ink recorder of noise reduction
CN107679031B (en) * 2017-09-04 2021-01-05 昆明理工大学 Advertisement and blog identification method based on stacking noise reduction self-coding machine
CN107622126A (en) * 2017-09-28 2018-01-23 联想(北京)有限公司 The method and apparatus sorted out to the solid data in data acquisition system
CN110020120A (en) * 2017-10-10 2019-07-16 腾讯科技(北京)有限公司 Feature word treatment method, device and storage medium in content delivery system
CN110020120B (en) * 2017-10-10 2023-11-10 腾讯科技(北京)有限公司 Feature word processing method, device and storage medium in content delivery system
CN109145296A (en) * 2018-08-09 2019-01-04 新华智云科技有限公司 A kind of general word recognition method and device based on monitor model
CN109284374A (en) * 2018-09-07 2019-01-29 百度在线网络技术(北京)有限公司 For determining the method, apparatus, equipment and computer readable storage medium of entity class
CN113051875A (en) * 2021-03-22 2021-06-29 北京百度网讯科技有限公司 Training method of information conversion model, and text information conversion method and device
CN113051875B (en) * 2021-03-22 2024-02-02 北京百度网讯科技有限公司 Training method of information conversion model, and text information conversion method and device

Similar Documents

Publication Publication Date Title
CN102662923A (en) Entity instance leading method based on machine learning
CN109271529B (en) Method for constructing bilingual knowledge graph of Xilier Mongolian and traditional Mongolian
Alwehaibi et al. Comparison of pre-trained word vectors for arabic text classification using deep learning approach
CN112329467B (en) Address recognition method and device, electronic equipment and storage medium
Su et al. Chinese sentiment classification using a neural network tool—Word2vec
CN110147436A (en) A kind of mixing automatic question-answering method based on padagogical knowledge map and text
CN104809176A (en) Entity relationship extracting method of Zang language
CN110781670B (en) Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors
CN107220237A (en) A kind of method of business entity's Relation extraction based on convolutional neural networks
CN104008092B (en) Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping
CN103324700B (en) Noumenon concept attribute learning method based on Web information
CN105512209A (en) Biomedicine event trigger word identification method based on characteristic automatic learning
CN106202543A (en) Ontology Matching method and system based on machine learning
CN113515632B (en) Text classification method based on graph path knowledge extraction
CN111144119B (en) Entity identification method for improving knowledge migration
CN109684449A (en) A kind of natural language characterizing semantics method based on attention mechanism
WO2023108991A1 (en) Model training method and apparatus, knowledge classification method and apparatus, and device and medium
CN113032541A (en) Answer extraction method based on bert and fusion sentence cluster retrieval
CN107679124B (en) Knowledge graph Chinese question-answer retrieval method based on dynamic programming algorithm
CN113392223A (en) Knowledge graph construction method based on meteorological field
CN116383352A (en) Knowledge graph-based method for constructing field intelligent question-answering system by using zero samples
Saputro et al. Development of semi-supervised named entity recognition to discover new tourism places
CN106897274B (en) Cross-language comment replying method
CN115730078A (en) Event knowledge graph construction method and device for class case retrieval and electronic equipment
CN107577713A (en) Text handling method based on electric power dictionary

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120912