CN103823857A - Space information searching method based on natural language processing - Google Patents

Space information searching method based on natural language processing Download PDF

Info

Publication number
CN103823857A
CN103823857A CN201410059272.9A CN201410059272A CN103823857A CN 103823857 A CN103823857 A CN 103823857A CN 201410059272 A CN201410059272 A CN 201410059272A CN 103823857 A CN103823857 A CN 103823857A
Authority
CN
China
Prior art keywords
weight
natural language
language processing
method based
utilizes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410059272.9A
Other languages
Chinese (zh)
Other versions
CN103823857B (en
Inventor
吴朝晖
高啸
柳云超
陈华钧
郑国轴
杨建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410059272.9A priority Critical patent/CN103823857B/en
Publication of CN103823857A publication Critical patent/CN103823857A/en
Application granted granted Critical
Publication of CN103823857B publication Critical patent/CN103823857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a space information searching method based on natural language processing. The space information searching method comprises the following steps of (1), performing word segmentation on an indexing document, and changing weights of various words obtained by word segmentation to obtain an indexing document comprising the weights; (2) inputting an inquire statement by a user, performing work segmentation on the inquire statement, and changing weights of various words obtained by word segmentation to obtain an inquire statement comprising the weights; and (3) searching the inquire statement comprising the weights in the indexing document comprising the weights. According to the space information searching method, a natural language processing tool is used, a word segmentation technology and a named entity identity technology are applied to the field of space information searching, and a searching effect is optimized.

Description

Spatial information search method based on natural language processing
Technical field
The present invention relates to retrieval technique and natural language processing technique, relate in particular to the spatial information search method based on natural language processing.
Background technology
Natural language processing is an important directions in artificial intelligence field, and main research realizes the theory and the method that between people and computing machine, exchange with natural language symbol.Natural language processing is one and melts computer science, mathematics and linguistics in the science of one.The nineties in last century, there is huge variation in the field of natural language understanding and processing: require system can process real large-scale text, requirement can extract useful information from natural language text.Due to requirement above, the development of Large Scale Corpus, and the establishment of informative scale dictionary really is all developed, thereby applies and bring great convenience for low levels such as participle, part-of-speech taggings.
Search engine refers to according to certain strategy, uses specific computer program to gather information from internet, after information being organized and is processed, and for user provides retrieval service, the system by information display relevant user search to user.Search engine comprises full-text index, directory index, first search engine, vertical search engine, aggregation type search engine, door search engine, free lists of links etc.
The work of modern search engine can be divided into three phases: collection stage, pretreatment stage and inquiry phase.For the retrieval in vertical field, the collection stage is comparatively simple, conventionally only need to carry out simple uniform formatization to metadata and process.Pretreatment stage is also referred to as the index construct stage, and this stage is the stage the most complicated in search engine, and most of sort algorithm can be applied in this stage.First, search engine can be treated index data and clear up, and carries out and comprises participle, removes the operations such as stop words; Be exactly most important step afterwards: build inverted index, inverted index is expressed as a word, frequency and position etc. that corresponding this word occurs in document, be equivalent to dictionary of all data construct, according to word can quick indexing to relevant documentation; Inquiry phase is the actual operational phase of search engine, and all and part user interactions all completes in this stage.Search engine is done cleaning to user's input and is processed, and is equally to use participle and remove the operations such as stop words, then lexical item to be retrieved is updated to inverted index and marking formula, after sequence, returns.
Technology binding site between natural language and retrieval is a lot, is all widely used in academia and industry member, comprising: participle, keyword extraction and semantic retrieval etc.
Summary of the invention
The invention provides a kind of spatial information Optimization of Information Retrieval method based on natural language processing, its object is to use the effect of natural language processing algorithm room for promotion information retrieval.
A spatial information search method based on natural language processing, comprising:
Step 1, carries out participle by index file, and changes the weight of each word after participle, obtains the index file after weight change;
Step 2, user input query statement, carries out participle to query statement, and changes the weight of each word after participle, obtains the query statement after weight change;
Step 3, retrieves the query statement after weight change in the index file after weight change.
Wherein, index file refers to the text being pre-stored in retrieval platform, and query statement refers to the text that user inputs in the time retrieving.In the time retrieving, by by the query statement of user input with mate with index file, the text of coupling is exported as result for retrieval.By changing the weight of each word in index file and query statement, the word weight of representation space information is increased, thereby improve the accuracy of retrieval.
In step 1, utilize overall linear model to carry out participle to index file, and in step 2, utilize overall linear model to carry out participle to query statement.
Overall situation linear model carries out modeling to target sequence on the basis of observation sequence, solves the problem of serializing mark.There is discriminative model and production model consideration simultaneously, considered the transition probability between contextual tagging, carry out global parameter optimization and decoding with serializing form.
The method for building up of described overall linear model is:
Step 1-1, marks corpus, the corresponding label of each individual character in the corpus after mark;
Step 1-2, utilizes the corpus after default feature templates and mark to carry out model training, obtains described overall linear model.
Aspect rule-based machine learning, the present invention has used a large amount of participle samples for geo-spatial data, has comprised the spatial information natural language sentences of point good word in these samples.These sample sentences comprise the sentence of the Sample Storehouse of increasing income, and are the sentence through manually marking on the other hand for spatial geographic information.These sample sentences have formed corpus.Corpus is marked, be convenient to follow-up word segmentation processing.
In step 1-2, the step of carrying out model training is as follows:
Step 1-21, applies mechanically feature templates to the corpus after mark, to the list of each individual character generating feature;
Step 1-22, extracts the feature in each feature list, utilize feature and and weight build model, wherein the initial value of each weight is 0;
Step 1-23, utilizes model to predict all individual characters in the corpus after marking, and predicts the outcome and is handled as follows for each individual character:
Prediction is correct, carries out the prediction of next individual character;
Prediction error, utilizes the weight of online updating algorithm regeneration characteristics, obtains new model, utilizes new model this individual character to be predicted again, until prediction update times correct or weight exceedes preset value.
The part of speech of character representation word, comprises the part of speech of word and the part of speech of previous word in feature templates.Wherein prediction mode has a lot, for example, adopt viterbi algorithm prediction, the error between the predicted value of individual character and actual value and threshold value is compared, thereby judge whether individual character is predicted correctly.
In step 1 and step 2, the method for carrying out participle is as follows,
Step a, inputs to text in overall linear model, and described overall linear model is applied to feature templates in text, and obtains the corresponding feature list of text according to weight calculation;
Step b, adopts dynamic programming algorithm to obtain all possible tag combination according to feature list, utilizes back-track algorithm to find optimum tag combination;
Step c, carries out word division according to optimum tag combination by text;
Wherein, the text described in step a to c is the query statement in index file or the step 2 in step 1.
Due to the corresponding label of each individual character, therefore optimum tag combination has represented the most possible division position of each word in text, thereby carries out word division (participle) according to optimum tag combination.
Described dynamic programming algorithm is viterbi algorithm.
Adopt viterbi algorithm to carry out best consideration to whole context, thereby obtain preferably word segmentation result.
In step 1 and step 2, utilize keyword extraction to change the weight of word, the weight of keyword is increased.
Wherein, keyword refers to the word that comprises spatial information.
Utilize TextRank algorithm to carry out keyword extraction.
TextRank algorithm, adopt and the similar figure TRANSFER MODEL of Page Rank of Google, can realize the extraction of keyword well.
In step 1 and step 2, utilize the weight of each word after named entity recognition method change participle, increase the weight of spatial information noun in text, be index file at step 1 Chinese version, in step 2, be query statement.
The noun that adopts representation space information in named entity recognition method identification text, makes result for retrieval more concentrated in spatial information field, thereby has improved effectiveness of retrieval.
The inventive method is used natural language processing instrument, by participle technique and named entity recognition technology application space information retrieval field, has optimized the effect of retrieval.
Accompanying drawing explanation
Fig. 1 utilizes viterbi algorithm to carry out the method schematic diagram of participle in one embodiment of the invention;
Fig. 2 is the effect schematic diagram of Chinese word segmentation in the current embodiment of the present invention;
Fig. 3 is the inventive method process flow diagram.
Embodiment
Below in conjunction with accompanying drawing, specific embodiments of the invention are described.It should be noted that the embodiments described herein, only for illustrating, is not limited to the present invention.
As shown in Figure 3, the step of the embodiment of the present invention is as follows:
Step 1, carries out participle by index file, and changes the weight of each word after participle, obtains the index file after weight change;
Step 2, user input query statement, carries out participle to query statement, and changes the weight of each word after participle, obtains the query statement after weight change;
Wherein, the participle in step 1, index file being carried out and all adopt overall linear model to carry out to the participle of query statement in step 2.
The method for building up of overall situation linear model is:
Step 1-1, marks corpus, the corresponding label of each individual character in the corpus after mark;
Step 1-2, utilizes the corpus after default feature templates and mark to carry out model training, obtains overall linear model.The step of carrying out model training is as follows:
Step 1-21, applies mechanically feature templates to the corpus after mark, to the list of each individual character generating feature.Take Chinese individual character as example,
Step 1-22, extracts the feature in each feature list, utilize feature and and weight build model, wherein the initial value of each weight is 0;
Step 1-23, utilizes model to predict each individual character in the corpus after marking:
Prediction is correct, carries out the prediction of next individual character;
Prediction error, utilizes the weight of online updating algorithm regeneration characteristics, obtains new model, and repeating step 1-23, until prediction update times correct or weight exceedes preset value.
In embodiments of the present invention, adopt viterbi algorithm to carry out individual character prediction, judge whether that according to the error between the predicted value of individual character and sample value prediction accurately, if prediction error, the label of prediction is different with actual label, represent that parameter has problem to the prediction of this individual character, need undated parameter, concrete update algorithm is online updating (OnlinePassive-Aggressive) algorithm;
When the error amount of loop iteration is less than the threshold value of setting, or exceed the iterations of setting, finish algorithm.
After model training finishes, just can predict by the overall situation obtaining, the method of concrete prediction is more, conventional one is dynamic programming algorithm, as shown in Figure 2, we use dynamic programming algorithm, infer the mark of current state according to the mark of previous state, finally use back-track algorithm find out optimization path and return.
In participle to index file in step 1 and step 2, query statement is carried out to the method for participle as follows:
Step a, inputs to text in overall linear model, and overall linear model is applied to feature templates in text, and obtains the corresponding feature list of text according to weight calculation.
Step b, adopts dynamic programming algorithm to obtain all possible tag combination according to feature list, utilizes back-track algorithm to find optimum tag combination.
In the current embodiment of the present invention, dynamic programming algorithm is viterbi algorithm.Fig. 1 utilizes viterbi algorithm to select the schematic diagram of optimum label combination.Based on the segmenting method schematic diagram of mark.Take Chinese word segmentation as example, Fig. 2 is a sentence having marked, the corresponding label of each individual character (comprising punctuation mark) in sentence, in the corpus through mark, only have four kinds of possible labels: S represents individual character, B represents the beginning of word, M represents the centre of word, and E represents the end of word.In the above example, sentence is divided into:
| modernization | battleship | upper |, | or not exist | technology | simple | | post.
In sentence, " " this word independently becomes word, so use S mark; " modernization " is three words, and the corresponding B of " showing " word, represents the beginning of word, the corresponding M of " generation " word, and the centre of expression word, word does not also finish, and " change " corresponding E, the end of tagged words.
Step c, carries out word division according to optimum tag combination by text.
After completing participle, change the weight of each word, so that later retrieval retrieves according to the weight of word, thereby improve effectiveness of retrieval and accuracy.The weight method that changes word can be the keyword extraction of utilizing TextRank algorithm.In embodiments of the present invention, adopt named entity recognition to carry out the change of weight, the word of representation space information in the text after participle is increased to weight, thereby increase the professional domain specific aim of retrieval.
Step 3, retrieves in the index file by the query statement after weight change after weight change.
To index file with after being weighted, can impel two statements that similarity is higher to obtain higher weight in the time of retrieval, thereby in Search Results, arrange forward.The computing formula of similarity is as follows:
sim(d,q)=cosine(d ,q )=(d ·q )/(|d |×|q |)
Wherein d represent index file, q represent query statement, the similarity between the two calculates by cosine angle formulae, and weight information has been included in d and q among, by increasing the weight of keyword, can make the index file that similarity is high obtain higher score, thereby in result for retrieval, make the index file sequence of higher score forward, improve the accuracy of retrieval.
The present invention combines participle technique and named entity recognition technology, natural language processing technique is applied in the retrieval of spatial geographic information field to effectively room for promotion geographic information retrieval effect.

Claims (9)

1. the spatial information search method based on natural language processing, is characterized in that, comprising:
Step 1, carries out participle by index file, and changes the weight of each word after participle, obtains the index file after weight change;
Step 2, user input query statement, carries out participle to query statement, and changes the weight of each word after participle, obtains the query statement after weight change;
Step 3, retrieves the query statement after weight change in the index file after weight change.
2. the spatial information search method based on natural language processing as claimed in claim 1, is characterized in that, in step 1, utilizes overall linear model to carry out participle to index file, and in step 2, utilizes overall linear model to carry out participle to query statement.
3. the spatial information search method based on natural language processing as claimed in claim 2, is characterized in that, the method for building up of described overall linear model is:
Step 1-1, marks corpus, the corresponding label of each individual character in the corpus after mark;
Step 1-2, utilizes the corpus after default feature templates and mark to carry out model training, obtains described overall linear model.
4. the spatial information search method based on natural language processing as claimed in claim 3, is characterized in that, in step 1-2, the step of carrying out model training is as follows:
Step 1-21, applies mechanically feature templates to the corpus after mark, to the list of each individual character generating feature;
Step 1-22, extracts the feature in each feature list, utilize feature and and weight build model, wherein the initial value of each weight is 0;
Step 1-23, utilizes model to predict all individual characters in the corpus after marking, and predicts the outcome and is handled as follows for each individual character:
Prediction is correct, carries out the prediction of next individual character;
Prediction error, utilizes the weight of online updating algorithm regeneration characteristics, obtains new model, utilizes new model this individual character to be predicted again, until prediction update times correct or weight exceedes preset value.
5. the spatial information search method based on natural language processing as claimed in claim 4, is characterized in that, in step 1 and step 2, the method for carrying out participle is as follows,
Step a, inputs to text in overall linear model, and described overall linear model is applied to feature templates in text, and obtains the corresponding feature list of text according to weight calculation;
Step b, adopts dynamic programming algorithm to obtain all possible tag combination according to feature list, utilizes back-track algorithm to find optimum tag combination;
Step c, carries out word division according to optimum tag combination by text;
Wherein, the text described in step a to c is the query statement in index file or the step 2 in step 1.
6. the spatial information search method based on natural language processing as claimed in claim 5, is characterized in that, in step b, described dynamic programming algorithm is viterbi algorithm.
7. the spatial information search method based on natural language processing as claimed in claim 1, is characterized in that, utilizes keyword extraction to change the weight of word in step 1 and step 2, and the weight of keyword is increased.
8. the spatial information search method based on natural language processing as claimed in claim 7, is characterized in that, utilizes TextRank algorithm to carry out keyword extraction.
9. the spatial information search method based on natural language processing as claimed in claim 1, it is characterized in that, in step 1 and step 2, utilize the weight of each word after named entity recognition method change participle, increase the weight of spatial information noun in text, being index file at step 1 Chinese version, is query statement at step 2 Chinese version.
CN201410059272.9A 2014-02-21 2014-02-21 Space information searching method based on natural language processing Active CN103823857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410059272.9A CN103823857B (en) 2014-02-21 2014-02-21 Space information searching method based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410059272.9A CN103823857B (en) 2014-02-21 2014-02-21 Space information searching method based on natural language processing

Publications (2)

Publication Number Publication Date
CN103823857A true CN103823857A (en) 2014-05-28
CN103823857B CN103823857B (en) 2017-02-01

Family

ID=50758921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410059272.9A Active CN103823857B (en) 2014-02-21 2014-02-21 Space information searching method based on natural language processing

Country Status (1)

Country Link
CN (1) CN103823857B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008166A (en) * 2014-05-30 2014-08-27 华东师范大学 Dialogue short text clustering method based on form and semantic similarity
CN104268144A (en) * 2014-08-12 2015-01-07 华东师范大学 Electronic medical record query statement constructing method
CN106372063A (en) * 2016-11-01 2017-02-01 上海智臻智能网络科技股份有限公司 Information processing method and device and terminal
CN106970922A (en) * 2016-01-14 2017-07-21 北大方正集团有限公司 Index establishing method, search method and directory system based on multi-field keyword
CN107992514A (en) * 2016-10-26 2018-05-04 谷歌有限责任公司 The search and retrieval of structured message card
CN108897861A (en) * 2018-07-01 2018-11-27 东莞市华睿电子科技有限公司 A kind of information search method
CN110705249A (en) * 2019-09-03 2020-01-17 东南大学 NLP library combined use method based on overlapping degree calculation
CN111259145A (en) * 2020-01-16 2020-06-09 广西计算中心有限责任公司 Text retrieval classification method, system and storage medium based on intelligence data
CN112183087A (en) * 2020-09-27 2021-01-05 武汉华工安鼎信息技术有限责任公司 System and method for sensitive text recognition
WO2021254227A1 (en) * 2020-06-18 2021-12-23 International Business Machines Corporation Targeted partial re-enrichment of a corpus based on nlp model enhancements
TWI779599B (en) * 2021-02-09 2022-10-01 鼎新電腦股份有限公司 Application programming interface service search system and application programming interface service search method
CN112183087B (en) * 2020-09-27 2024-05-28 武汉华工安鼎信息技术有限责任公司 System and method for identifying sensitive text

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530415A (en) * 2013-10-29 2014-01-22 谭永 Natural language search method and system compatible with keyword search
CN103544309B (en) * 2013-11-04 2017-03-15 北京中搜网络技术股份有限公司 A kind of retrieval string method for splitting of Chinese vertical search

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008166A (en) * 2014-05-30 2014-08-27 华东师范大学 Dialogue short text clustering method based on form and semantic similarity
CN104008166B (en) * 2014-05-30 2017-05-24 华东师范大学 Dialogue short text clustering method based on form and semantic similarity
CN104268144A (en) * 2014-08-12 2015-01-07 华东师范大学 Electronic medical record query statement constructing method
CN106970922A (en) * 2016-01-14 2017-07-21 北大方正集团有限公司 Index establishing method, search method and directory system based on multi-field keyword
CN107992514B (en) * 2016-10-26 2022-04-05 谷歌有限责任公司 Structured information card search and retrieval
US11238058B2 (en) 2016-10-26 2022-02-01 Google Llc Search and retrieval of structured information cards
CN107992514A (en) * 2016-10-26 2018-05-04 谷歌有限责任公司 The search and retrieval of structured message card
CN106372063A (en) * 2016-11-01 2017-02-01 上海智臻智能网络科技股份有限公司 Information processing method and device and terminal
CN108897861A (en) * 2018-07-01 2018-11-27 东莞市华睿电子科技有限公司 A kind of information search method
CN110705249A (en) * 2019-09-03 2020-01-17 东南大学 NLP library combined use method based on overlapping degree calculation
CN110705249B (en) * 2019-09-03 2023-04-11 东南大学 NLP library combined use method based on overlapping degree calculation
CN111259145A (en) * 2020-01-16 2020-06-09 广西计算中心有限责任公司 Text retrieval classification method, system and storage medium based on intelligence data
WO2021254227A1 (en) * 2020-06-18 2021-12-23 International Business Machines Corporation Targeted partial re-enrichment of a corpus based on nlp model enhancements
US11537660B2 (en) 2020-06-18 2022-12-27 International Business Machines Corporation Targeted partial re-enrichment of a corpus based on NLP model enhancements
GB2611682A (en) * 2020-06-18 2023-04-12 Ibm Targeted partial re-enrichment of a corpus based on NLP model enhancements
AU2021294112B2 (en) * 2020-06-18 2023-05-11 International Business Machines Corporation Targeted partial re-enrichment of a corpus based on NLP model enhancements
CN112183087A (en) * 2020-09-27 2021-01-05 武汉华工安鼎信息技术有限责任公司 System and method for sensitive text recognition
CN112183087B (en) * 2020-09-27 2024-05-28 武汉华工安鼎信息技术有限责任公司 System and method for identifying sensitive text
TWI779599B (en) * 2021-02-09 2022-10-01 鼎新電腦股份有限公司 Application programming interface service search system and application programming interface service search method

Also Published As

Publication number Publication date
CN103823857B (en) 2017-02-01

Similar Documents

Publication Publication Date Title
CN103823857B (en) Space information searching method based on natural language processing
CN107861939B (en) Domain entity disambiguation method fusing word vector and topic model
CN104699763B (en) The text similarity gauging system of multiple features fusion
CN108121700B (en) Keyword extraction method and device and electronic equipment
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN105095204B (en) The acquisition methods and device of synonym
CN108959258B (en) Specific field integrated entity linking method based on representation learning
CN110851596A (en) Text classification method and device and computer readable storage medium
CN106777957B (en) The new method of biomedical more ginseng event extractions on unbalanced dataset
CN110879834B (en) Viewpoint retrieval system based on cyclic convolution network and viewpoint retrieval method thereof
US20190317986A1 (en) Annotated text data expanding method, annotated text data expanding computer-readable storage medium, annotated text data expanding device, and text classification model training method
CN103324700A (en) Noumenon concept attribute learning method based on Web information
CN104699797A (en) Webpage data structured analytic method and device
CN112328800A (en) System and method for automatically generating programming specification question answers
Sasidhar et al. A survey on named entity recognition in Indian languages with particular reference to Telugu
Devi et al. Entity extraction for malayalam social media text using structured skip-gram based embedding features from unlabeled data
CN111159332A (en) Text multi-intention identification method based on bert
CN111881256B (en) Text entity relation extraction method and device and computer readable storage medium equipment
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
JP2007156545A (en) Symbol string conversion method, word translation method, its device, its program and recording medium
Wang et al. Semi-supervised chinese open entity relation extraction
US20190095525A1 (en) Extraction of expression for natural language processing
CN110377690B (en) Information acquisition method and system based on remote relationship extraction
Wang et al. A sentence segmentation method for ancient Chinese texts based on NNLM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant