CN104391963A - Method for constructing correlation networks of keywords of natural language texts - Google Patents

Method for constructing correlation networks of keywords of natural language texts Download PDF

Info

Publication number
CN104391963A
CN104391963A CN201410719639.5A CN201410719639A CN104391963A CN 104391963 A CN104391963 A CN 104391963A CN 201410719639 A CN201410719639 A CN 201410719639A CN 104391963 A CN104391963 A CN 104391963A
Authority
CN
China
Prior art keywords
word
words
natural language
related network
construction method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410719639.5A
Other languages
Chinese (zh)
Inventor
郭光�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING ZHONGKE CHUANGYI TECHNOLOGY Co Ltd
Original Assignee
BEIJING ZHONGKE CHUANGYI TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING ZHONGKE CHUANGYI TECHNOLOGY Co Ltd filed Critical BEIJING ZHONGKE CHUANGYI TECHNOLOGY Co Ltd
Priority to CN201410719639.5A priority Critical patent/CN104391963A/en
Publication of CN104391963A publication Critical patent/CN104391963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for constructing correlation networks of keywords of natural language texts. The method includes steps of constructing dictionaries of the keywords and segmenting words of target corpuses according to the dictionaries to obtain a plurality of words; performing statistics on front and rear word correlation frequencies of the multiple obtained words on the basis of N-element statistic language models; training the language models by the aid of neural networks under training conditions which are the frequencies obtained by means of statistics, and acquiring word vectors; computing the similarity degrees of the word vectors of each two corresponding words and generating semantic correlation between each two corresponding words; generating the correlation networks of the keywords of the texts according to the level of the semantic correlation between the corresponding words. The similarity degrees of the word vectors of each two corresponding words are used as measurement for the semantic correlation of the two words. The method has the advantage that the accuracy of the correlation networks of the texts in relevant items can be effectively improved by the aid of the method.

Description

A kind of natural language text keyword related network construction method
Technical field
The invention belongs to natural language processing technique field, more particularly, particularly a kind of natural language text keyword related network construction method.
Background technology
Generally, evaluate magnanimity science and technology item data processing or expert info Data Summary, computer process seems particularly necessary, in natural language processing technique, due to the language feature of Chinese self, Chinese language processing is more more complex than the western language process based on Romance.And the prerequisite making computing machine can process natural language is text quantification.The process means that text quantizes extract the Feature Words in content of text, namely from the text materials such as all kinds of scientific and technical literature, science and technology item project verification and evaluation, extract industry or field keyword, then build the related network between text by Keywords matching etc.
For Chinese language processing, a prerequisite of keyword extraction carries out participle to text, carry out participle operation obtain vocabulary after, current the most frequently used word method for expressing is that each word is expressed as a very long vector, the dimension of vector is vocabulary size, and wherein most element is 0, only has the value of a dimension to be 1, this dimension just represents current word, namely imparts a numerical coding to each word in text.The method is that sparse mode stores, very brief and practical.But be all isolated between any two words, vector cannot represent the relation between word.Therefore, the synonym of different word composition, such as " microphone " and " microphone ", cannot embody its identical meaning by this method for expressing.Which results in the keyword that the degree of association is very high sometimes can not be identified, make the related network precision of structure not high.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of natural language text keyword related network construction method, in order to solve the problems of the technologies described above.
For this reason, the invention provides a kind of natural language text keyword related network construction method, comprise step:
Steps A, builds the dictionary of keyword, carries out participle operation, obtain multiple word according to described dictionary to target corpus;
Step B, to the multiple bases the obtained frequency that word association occurs before and after N unit statistical language model statistics;
Step C, with the frequency counted for training condition, adopts neural metwork training language model, and obtains term vector;
Step D, calculates the similarity of the term vector of two words, as the tolerance of two word semantic dependencies, generates semantic association degree between two words;
Step e, according to described semantic association degree, according to the height of the described semantic association degree between two words, generates text key word related network.
Wherein, the dictionary building keyword in described steps A comprises step:
Crawl the key word information in target corpus by crawler technology, the multiple keywords obtained are gathered for dictionary.
Wherein, carry out participle operation according to described dictionary to target corpus in described steps A to comprise:
Carry out participle based on string matching, and carry out participle based on semantic understanding and/or carry out participle based on the adjacent co-occurrence frequency statistics of word.
Wherein, obtain term vector in described step C and comprise the low-dimensional real number vector that acquisition dimension is less than or equal to 100.
Wherein, in described step B, step is comprised to the multiple bases the obtained frequency that word association occurs before and after N unit statistical language model statistics:
To the multiple words after cutting, according to adjacent appearance 1, a 2N word is a tuple, carries out tuple division, the probability occurred under adding up the condition that each word occurs at front N-1 word.
Wherein, neural metwork training language model is adopted to comprise in described step C:
Adopt the neural metwork training language model of three layers, front N-1 vectorial end to end spelling got up, form the vector that (N-1) m ties up, as the ground floor of described neural network, m is the dimension of described term vector;
Use d+Hx to calculate the second layer, and use tanh as activation function, d is a bias term;
Third layer exports V node yi, and output valve y is normalized into probability by rear use softmax activation function, and yi represents that next word is the non-normalization log probability of i, and the computing formula of y is:
y=b+Wx+Utanh(d+Hx)
Wherein U is the parameter that the second layer arrives third layer, and b is also a bias term;
By stochastic gradient descent method, described language model is optimized out.
Wherein, the similarity calculating the term vector of two words in described step D comprises the COS distance of the term vector of calculating two words.
The invention provides a kind of natural language text keyword related network construction method, after participle is carried out to Chinese natural language text, based on the frequency that word association before and after N unit statistical language model statistics occurs, with the frequency counted for training condition, adopt neural metwork training language model, and obtain term vector, with the similarity of two term vectors, measure semantic association degree between two words, and then structure related network, be about to the mode of semantic information by probability statistics of Chinese, the training of language model is carried out in conjunction with neural network, be quantified as term vector information, the related network of such structure, combine semantic information, compare simple different words is encoded and do not consider semantic interrelational form, the precision of obvious related network provided by the invention is higher.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
A kind of natural language text keyword related network construction method process flow diagram that Fig. 1 provides for the embodiment of the present invention.
Embodiment
In order to make those skilled in the art person understand the present invention better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
Embodiments provide a kind of natural language text keyword related network construction method.
Shown in Figure 1, the method comprising the steps of:
Step S110, builds the dictionary of keyword, carries out participle operation, obtain multiple word according to described dictionary to target corpus.
Crawl the key word information in target corpus by crawler technology, the multiple keywords obtained are gathered for dictionary, and according to dictionary, participle operation is carried out to corpus.
Participle operation comprises carries out participle based on string matching, preferably, also in conjunction with carrying out segmenting method based on semantic understanding and/or carrying out segmenting method based on the adjacent co-occurrence frequency statistics of word, comprehensively should carry out participle, obtain vocabulary.Adopt single participle mode, possible accuracy is not high, therefore, will carry out reasonably combined and R. concomitans, and can improve the accuracy of participle based on string matching with based on understanding and these three kinds of modes of Corpus--based Method.
Preferably, can utilize n rank Markov model n-gram model, will treat that participle text carries out participle, and obtain the first text, n-gram model is for eliminating segmentation ambiguity, and it is the word string at interval that the first text comprises with space; When the first text comprises target word string, target word string is added into dictionary, obtain the dictionary after upgrading, target word string is the word string be not stored in described dictionary.According to the dictionary after described renewal, utilize forward direction maximum match segmentation and backward maximum match segmentation to carry out participle to described first text, obtain the second text and the 3rd text respectively.From second this paper and the 3rd text, choose word hope that the satisfactory text of rectangular difference is as word segmentation result with word for a long time.
More preferably, training is carried out to the language material of participle and obtain CRF model; Described CRF model is adopted to carry out participle to the language material of non-participle; Judge that whether the successful language material of participle meets the condition arranged, if so, then in the language material of participle described in joining; Circulation performs above-mentioned steps, until the scale of the language material of described participle no longer expands, obtains final CRF model.
Step S111, to the multiple bases the obtained frequency that word association occurs before and after N unit statistical language model statistics.
Statistics needs the word string after to cutting to carry out tuple division, the probability occurred under adding up the condition that each word occurs at front N-1 word.To the multiple words after cutting, according to adjacent appearance 1, a 2N word is a tuple, carries out tuple division, the probability occurred under adding up the condition that each word occurs at front N-1 word.
Wherein N is natural number, is not namely the real number integer of 0.
N unit statistical language model formalized description: a given word string, its be natural language probability P (w1, w2 ..., wt), w1 to wt represents each word in text successively, then have following inference:
P(w1,w2,…,wt)=P(w1)×P(w2|w1)×P(w3|w1,w2)×…×P(wt|w1,w2,…,wt-1)
Wherein P (w1) represents the probability that first word w1 occurs, P (w2|w1) is under the prerequisite of known first word, and the probability that second word occurs is analogized in proper order.Can find out, the probability of occurrence of word w depends on all words before it, because the word amount in conventional natural language is all very large, cause calculate P (w1, w2 ... wt) very complicated, therefore current natural language processing field is all considered to use N gram language model, and N meta-model supposes that the probability that each word occurs only has relation with N-1 the word occurred above, therefore P (wt|wt-n+1 is used,, wt-1) and approximate solution P (wt|w1, w2,, wt-1).
Such as, for 3 gram language model, assuming that whole corpus has been cut into word string w1, w2, wn, then can obtain all continuous print 1 tuple (<w1>, <w2>, <w3>, <wn>), 2 tuple (<w1, w2>, <w2, w3>, <wn-1, wn>) and 3 tuple (<w1, w2, w3>, <w2, w3, w4>, <wn-2, wn-1, wn>), and then count each word wt at front 2 word wt-1, the probability occurred under the condition that wt-2 occurs.
Step S112, with the frequency counted for training condition, adopts neural metwork training language model, and obtains term vector.
The term vector used in the embodiment of the present invention be a kind of shape as [0.792 ,-0.177 ,-0.107,0.109 ,-0.542 ...] low-dimensional real number vector, dimension is generally no more than 100, can be 50 or 100 such integers.This term vector obtains semantic similarity by the distance weighed each other, and the expression complexity of higher-dimension vocabulary reduces greatly simultaneously.
Term vector in the present invention obtains by utilizing feedforward or recurrent neural network train language model, the term vector corresponding to word w is represented with C (w), the input of neural network is front N-1 word wt-n+1, the term vector that wt-1 word is corresponding, output is a vector, the next word of i-th element representation in vector is the probability of wi, and then the statistical probability that the N tuple utilizing corpus to obtain calculates is as training condition, and then constantly adjust each layer weight of neural network, optimize after terminating and obtain language model and term vector.
As a kind of embodiment, the embodiment of the present invention uses the neural network of three layers to build language model.
Wt-n+1 ..., wt-1 is a front N-1 word, needs to predict next word wt according to this known N-1 word.C (w) represents the term vector corresponding to word w, uses a set of unique term vector in whole model, exists in Matrix C (| the matrix of V| × m).Wherein | V| represents the size (the total word number in language material) of vocabulary, and m represents the dimension of term vector.The conversion of C to C (w) takes out a line exactly from matrix.
The ground floor (input layer) of network is by C (wt-n+1) ..., C (wt-2), C (wt-1) this N-1 vectorial end to end spelling is got up, and forms the vector that (N-1) m ties up.
The second layer (hidden layer) of network, as common neural network, directly uses d+Hx to calculate.D is a bias term.After this, use tanh as activation function.
The third layer (output layer) one of network has | and V| node, each node yi represents that next word is the non-normalization log probability of i.Finally use softmax activation function that output valve y is normalized into probability.Finally, the computing formula of y is:
y=b+Wx+Utanh(d+Hx)
U in formula (one | the matrix of V| × h) be the parameter of hidden layer to output layer, the majority of whole model calculates and concentrates in the matrix multiplication of U and hidden layer.Finally use stochastic gradient descent method this model optimization out.The input layer of general neural network is an input value, and the input layer of this model is also parameter (existing in C), also need optimize.Optimize and terminate to create term vector and language model afterwards simultaneously.
More preferably, following neural network algorithm representation language model is adopted:
h = &Sigma; i = 1 t - 1 H i C ( w i )
y j=C(w j)T h
Wherein, C (w) is term vector.Wherein, h represents the second layer hidden layer in n-gram three-layer neural network, with semantic information.Hi is the matrix of a m × m, and this matrix can be understood as the contribution that i-th word produces t word after Hi conversion.Therefore hidden layer is here the summary to a front t-1 word, and namely hidden layer h predicts the one of next word.
Yj predicts that next word is the log probability of wj, is calculated and obtains, directly can react the similarity of two words by the inner product of C (wj) and h.If the mould of each term vector is basically identical, the numerical values recited of inner product can react two vectorial cosine value sizes.
Preferably, also large vocabulary can be split as multiple little vocabulary; By corresponding for an each little vocabulary neural network language model, the input dimension of each neural network language model is identical and independently carry out first time and train; The output vector of each neural network language model is merged and carries out second time training; Obtain normalized neural network language model.
Step S113, calculates the similarity of the term vector of two words, as the tolerance of two word semantic dependencies, generates semantic association degree between two words.
Utilize vector space model (Vector Space Model) to carry out distance calculating as the tolerance of two word semantic dependencies to the vector of two words, generate semantic association degree between two words, and then construct whole keyword semantic network.The similarity calculating the term vector of two words comprises the COS distance of the term vector of calculating two words.
Each word is expressed as a floating point vector, can be expressed as a vector in higher dimensional space, utilizes the vectorial distance of the angle calcu-lation two between two vectors and represents their similarity degree.
Step S114, according to described semantic association degree, according to the height of the described semantic association degree between two words, generates text key word related network.
Science and technology item, scientific and technological achievement or expert info are all be described in the form of text and express.Quantizing the content in the databases such as large-scale scientific and technological achievement database, expert info storehouse and document databse, compare, the analysis operation such as evaluation time, need computing machine can understand the semanteme of various content of text, correlation computations could be carried out more accurately.Whether such as, all need to use Text similarity computing when there is between analysis project similarity, fuzzy search; The keyword and project keyword that are used for describing expert is needed to carry out pattern match analysis etc. in expert domain capability analysis.
In addition, existing keyword network is many builds dictionary realization, for emerging word and the word None-identified not in dictionary by manual type.The keyword of normally used segmentation methods None-identified specific industry in Chinese information processing, and project to be evaluated is often owing to relating to scientific and technical innovation, can create some new technical term and nouns.Therefore, not only need to identify keyword, also need the semantic dependency depending on keyword to carry out keyword association more accurately to identify, the i.e. technology such as unified with nature Language Processing, information retrieval, pattern-recognition, corpus is formed, by the correlativity between statistical means analysis of key word according to existing information.
Therefore, the construction method that the embodiment of the present invention provides, can be used for quantitatively evaluating science and technology item, scientific and technological achievement, and expert assessment and evaluation and the application such as to select.Owing to have employed the term vector algorithm that can calculate distance, therefore can obtain the semantic similarity of term vector, and then the word semantic network generated can represent the approximation relation between each word preferably.The method is when being applied to process extensive expectation storehouse simultaneously, and term vector dimension is lower, is generally no more than 100, and relative to now conventional sparse term vector representation, complexity greatly reduces.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (7)

1. a natural language text keyword related network construction method, is characterized in that, comprise step:
Steps A, builds the dictionary of keyword, carries out participle operation, obtain multiple word according to described dictionary to target corpus;
Step B, to the multiple bases the obtained frequency that word association occurs before and after N unit statistical language model statistics;
Step C, with the frequency counted for training condition, adopts neural metwork training language model, and obtains term vector;
Step D, calculates the similarity of the term vector of two words, as the tolerance of two word semantic dependencies, generates semantic association degree between two words;
Step e, according to described semantic association degree, according to the height of the described semantic association degree between two words, generates text key word related network.
2. natural language text keyword related network construction method according to claim 1, it is characterized in that, the dictionary building keyword in described steps A comprises step:
Crawl the key word information in target corpus by crawler technology, the multiple keywords obtained are gathered for dictionary.
3. natural language text keyword related network construction method according to claim 2, is characterized in that, carries out participle operation comprise in described steps A according to described dictionary to target corpus:
Carry out participle based on string matching, and carry out participle based on semantic understanding and/or carry out participle based on the adjacent co-occurrence frequency statistics of word.
4. natural language text keyword related network construction method according to claim 1, is characterized in that, obtains term vector and comprise the low-dimensional real number vector that acquisition dimension is less than or equal to 100 in described step C.
5. natural language text keyword related network construction method according to claim 1, is characterized in that, comprises step in described step B to the multiple bases the obtained frequency that word association occurs before and after N unit statistical language model statistics:
To the multiple words after cutting, according to adjacent appearance 1, a 2N word is a tuple, carries out tuple division, the probability occurred under adding up the condition that each word occurs at front N-1 word.
6. natural language text keyword related network construction method according to claim 1, is characterized in that, adopt neural metwork training language model to comprise in described step C:
Adopt the neural metwork training language model of three layers, front N-1 vectorial end to end spelling got up, form the vector that (N-1) m ties up, as the ground floor of described neural network, m is the dimension of described term vector;
Use d+Hx to calculate the second layer, and use tanh as activation function, d is a bias term;
Third layer exports V node y i, output valve y is normalized into probability, y by rear use softmax activation function irepresent that next word is the non-normalization log probability of i, the computing formula of y is:
y=b+Wx+Utanh(d+Hx)
Wherein U is the parameter that the second layer arrives third layer, and b is also a bias term;
By stochastic gradient descent method, described language model is optimized out.
7. natural language text keyword related network construction method according to claim 1, is characterized in that, the similarity calculating the term vector of two words in described step D comprises the COS distance of the term vector of calculating two words.
CN201410719639.5A 2014-12-01 2014-12-01 Method for constructing correlation networks of keywords of natural language texts Pending CN104391963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410719639.5A CN104391963A (en) 2014-12-01 2014-12-01 Method for constructing correlation networks of keywords of natural language texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410719639.5A CN104391963A (en) 2014-12-01 2014-12-01 Method for constructing correlation networks of keywords of natural language texts

Publications (1)

Publication Number Publication Date
CN104391963A true CN104391963A (en) 2015-03-04

Family

ID=52609867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410719639.5A Pending CN104391963A (en) 2014-12-01 2014-12-01 Method for constructing correlation networks of keywords of natural language texts

Country Status (1)

Country Link
CN (1) CN104391963A (en)

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881400A (en) * 2015-05-19 2015-09-02 上海交通大学 Semantic dependency calculating method based on associative network
CN105183714A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 Sentence similarity calculation method and apparatus
CN105488207A (en) * 2015-12-10 2016-04-13 合一网络技术(北京)有限公司 Semantic coding method and apparatus for network resources
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
CN105787078A (en) * 2016-03-02 2016-07-20 合网络技术(北京)有限公司 Method and device for displaying multimedia headlines
WO2016180268A1 (en) * 2015-05-13 2016-11-17 阿里巴巴集团控股有限公司 Text aggregate method and device
CN106293114A (en) * 2015-06-02 2017-01-04 阿里巴巴集团控股有限公司 The method and device of prediction user's word to be entered
CN106372086A (en) * 2015-07-23 2017-02-01 华中师范大学 Word vector acquisition method and apparatus
CN106503231A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus
CN106815252A (en) * 2015-12-01 2017-06-09 阿里巴巴集团控股有限公司 A kind of searching method and equipment
CN106844327A (en) * 2015-12-07 2017-06-13 科大讯飞股份有限公司 Text code method and system
CN106874643A (en) * 2016-12-27 2017-06-20 中国科学院自动化研究所 Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN106997376A (en) * 2017-02-28 2017-08-01 浙江大学 The problem of one kind is based on multi-stage characteristics and answer sentence similarity calculating method
CN107122413A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of keyword extracting method and device based on graph model
CN107122451A (en) * 2017-04-26 2017-09-01 北京科技大学 A kind of legal documents case by grader method for auto constructing
CN107291690A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Punctuate adding method and device, the device added for punctuate
CN107291836A (en) * 2017-05-31 2017-10-24 北京大学 A kind of Chinese text summary acquisition methods based on semantic relevancy model
CN107341152A (en) * 2016-04-28 2017-11-10 阿里巴巴集团控股有限公司 A kind of method and device of parameter input
WO2017206492A1 (en) * 2016-05-31 2017-12-07 北京百度网讯科技有限公司 Binary feature dictionary construction method and apparatus
CN107729509A (en) * 2017-10-23 2018-02-23 中国电子科技集团公司第二十八研究所 The chapter similarity decision method represented based on recessive higher-dimension distributed nature
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107871158A (en) * 2016-09-26 2018-04-03 清华大学 A kind of knowledge mapping of binding sequence text message represents learning method and device
CN107920773A (en) * 2016-01-18 2018-04-17 国立研究开发法人情报通信研究机构 Material evaluation method and material evaluating apparatus
CN108135520A (en) * 2015-10-23 2018-06-08 美国西门子医疗解决公司 It is represented from the natural language of functional brain image generation mental contents
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN108287916A (en) * 2018-02-11 2018-07-17 北京方正阿帕比技术有限公司 A kind of resource recommendation method
WO2018157703A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Natural language semantic extraction method and device, and computer storage medium
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN109543041A (en) * 2018-11-30 2019-03-29 安徽听见科技有限公司 A kind of generation method and device of language model scores
CN109558586A (en) * 2018-11-02 2019-04-02 中国科学院自动化研究所 A kind of speech of information is according to from card methods of marking, equipment and storage medium
CN109614538A (en) * 2018-12-17 2019-04-12 广东工业大学 A kind of extracting method, device and the equipment of agricultural product price data
CN109614617A (en) * 2018-06-01 2019-04-12 安徽省泰岳祥升软件有限公司 Word vector generation method and device supporting polarity differentiation and polysemous
CN109871530A (en) * 2018-12-28 2019-06-11 广州索答信息科技有限公司 A kind of menu field seed words automatically extract implementation method and storage medium
CN109918654A (en) * 2019-02-21 2019-06-21 北京一品智尚信息科技有限公司 A kind of logo interpretation method, equipment and medium
CN110532547A (en) * 2019-07-31 2019-12-03 厦门快商通科技股份有限公司 Building of corpus method, apparatus, electronic equipment and medium
CN110765765A (en) * 2019-09-16 2020-02-07 平安科技(深圳)有限公司 Contract key clause extraction method and device based on artificial intelligence and storage medium
US10593422B2 (en) 2017-12-01 2020-03-17 International Business Machines Corporation Interaction network inference from vector representation of words
CN111192682A (en) * 2019-12-25 2020-05-22 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111199154A (en) * 2019-12-20 2020-05-26 重庆邮电大学 Fault-tolerant rough set-based polysemous word expression method, system and medium
CN111460169A (en) * 2020-03-27 2020-07-28 科大讯飞股份有限公司 Semantic expression generation method, device and equipment
CN111583910A (en) * 2019-01-30 2020-08-25 北京猎户星空科技有限公司 Model updating method and device, electronic equipment and storage medium
CN111581952A (en) * 2020-05-20 2020-08-25 长沙理工大学 Large-scale replaceable word bank construction method for natural language information hiding
CN111611809A (en) * 2020-05-26 2020-09-01 西藏大学 Chinese sentence similarity calculation method based on neural network
CN111694961A (en) * 2020-06-23 2020-09-22 上海观安信息技术股份有限公司 Keyword semantic classification method and system for sensitive data leakage detection
US10922486B2 (en) 2019-03-13 2021-02-16 International Business Machines Corporation Parse tree based vectorization for natural language processing
CN113377965A (en) * 2021-06-30 2021-09-10 中国农业银行股份有限公司 Method and related device for perceiving text keywords
US11182665B2 (en) 2016-09-21 2021-11-23 International Business Machines Corporation Recurrent neural network processing pooling operation
CN114154513A (en) * 2022-02-07 2022-03-08 杭州远传新业科技有限公司 Automatic domain semantic web construction method and system
CN115168563A (en) * 2022-09-05 2022-10-11 深圳市华付信息技术有限公司 Airport service guiding method, system and device based on intention recognition

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477566A (en) * 2009-01-19 2009-07-08 腾讯科技(深圳)有限公司 Method and apparatus used for putting candidate key words advertisement
CN101894351A (en) * 2010-08-09 2010-11-24 北京邮电大学 Multi-agent based tour multimedia information personalized service system
US20120253792A1 (en) * 2011-03-30 2012-10-04 Nec Laboratories America, Inc. Sentiment Classification Based on Supervised Latent N-Gram Analysis
CN103678282A (en) * 2014-01-07 2014-03-26 苏州思必驰信息科技有限公司 Word segmentation method and device
CN103810999A (en) * 2014-02-27 2014-05-21 清华大学 Linguistic model training method and system based on distributed neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477566A (en) * 2009-01-19 2009-07-08 腾讯科技(深圳)有限公司 Method and apparatus used for putting candidate key words advertisement
CN101894351A (en) * 2010-08-09 2010-11-24 北京邮电大学 Multi-agent based tour multimedia information personalized service system
US20120253792A1 (en) * 2011-03-30 2012-10-04 Nec Laboratories America, Inc. Sentiment Classification Based on Supervised Latent N-Gram Analysis
CN103678282A (en) * 2014-01-07 2014-03-26 苏州思必驰信息科技有限公司 Word segmentation method and device
CN103810999A (en) * 2014-02-27 2014-05-21 清华大学 Linguistic model training method and system based on distributed neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗灏: "基于语义的科技项目相似度计算研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016180268A1 (en) * 2015-05-13 2016-11-17 阿里巴巴集团控股有限公司 Text aggregate method and device
CN104881400A (en) * 2015-05-19 2015-09-02 上海交通大学 Semantic dependency calculating method based on associative network
CN104881400B (en) * 2015-05-19 2018-01-19 上海交通大学 Semantic dependency computational methods based on associative network
CN106293114A (en) * 2015-06-02 2017-01-04 阿里巴巴集团控股有限公司 The method and device of prediction user's word to be entered
CN106293114B (en) * 2015-06-02 2019-03-29 阿里巴巴集团控股有限公司 Predict the method and device of user's word to be entered
CN106372086A (en) * 2015-07-23 2017-02-01 华中师范大学 Word vector acquisition method and apparatus
CN106372086B (en) * 2015-07-23 2019-12-03 华中师范大学 A kind of method and apparatus obtaining term vector
CN105183714A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 Sentence similarity calculation method and apparatus
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus
CN108135520B (en) * 2015-10-23 2021-06-04 美国西门子医疗解决公司 Generating natural language representations of psychological content from functional brain images
US10856815B2 (en) 2015-10-23 2020-12-08 Siemens Medical Solutions Usa, Inc. Generating natural language representations of mental content from functional brain images
CN108135520A (en) * 2015-10-23 2018-06-08 美国西门子医疗解决公司 It is represented from the natural language of functional brain image generation mental contents
CN106815252B (en) * 2015-12-01 2020-08-25 阿里巴巴集团控股有限公司 Searching method and device
CN106815252A (en) * 2015-12-01 2017-06-09 阿里巴巴集团控股有限公司 A kind of searching method and equipment
CN106844327A (en) * 2015-12-07 2017-06-13 科大讯飞股份有限公司 Text code method and system
CN106844327B (en) * 2015-12-07 2020-11-17 科大讯飞股份有限公司 Text coding method and system
CN105488207A (en) * 2015-12-10 2016-04-13 合一网络技术(北京)有限公司 Semantic coding method and apparatus for network resources
CN105677769B (en) * 2015-12-29 2018-01-05 广州神马移动信息科技有限公司 One kind is based on latent Dirichletal location(LDA)The keyword recommendation method and system of model
CN105677769A (en) * 2015-12-29 2016-06-15 广州神马移动信息科技有限公司 Keyword recommending method and system based on latent Dirichlet allocation (LDA) model
CN107920773A (en) * 2016-01-18 2018-04-17 国立研究开发法人情报通信研究机构 Material evaluation method and material evaluating apparatus
CN107920773B (en) * 2016-01-18 2020-11-17 国立研究开发法人情报通信研究机构 Material evaluation method and material evaluation device
CN105787078B (en) * 2016-03-02 2020-02-14 合一网络技术(北京)有限公司 Multimedia title display method and device
CN105787078A (en) * 2016-03-02 2016-07-20 合网络技术(北京)有限公司 Method and device for displaying multimedia headlines
CN107341152B (en) * 2016-04-28 2020-05-08 创新先进技术有限公司 Parameter input method and device
CN107341152A (en) * 2016-04-28 2017-11-10 阿里巴巴集团控股有限公司 A kind of method and device of parameter input
US10831993B2 (en) 2016-05-31 2020-11-10 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for constructing binary feature dictionary
WO2017206492A1 (en) * 2016-05-31 2017-12-07 北京百度网讯科技有限公司 Binary feature dictionary construction method and apparatus
US11182665B2 (en) 2016-09-21 2021-11-23 International Business Machines Corporation Recurrent neural network processing pooling operation
CN107871158A (en) * 2016-09-26 2018-04-03 清华大学 A kind of knowledge mapping of binding sequence text message represents learning method and device
CN106503231B (en) * 2016-10-31 2020-02-04 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN106503231A (en) * 2016-10-31 2017-03-15 北京百度网讯科技有限公司 Searching method and device based on artificial intelligence
CN106874643A (en) * 2016-12-27 2017-06-20 中国科学院自动化研究所 Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN106997376A (en) * 2017-02-28 2017-08-01 浙江大学 The problem of one kind is based on multi-stage characteristics and answer sentence similarity calculating method
CN106997376B (en) * 2017-02-28 2020-12-08 浙江大学 Question and answer sentence similarity calculation method based on multi-level features
WO2018157703A1 (en) * 2017-03-02 2018-09-07 腾讯科技(深圳)有限公司 Natural language semantic extraction method and device, and computer storage medium
US11113234B2 (en) 2017-03-02 2021-09-07 Tencent Technology (Shenzhen) Company Ltd Semantic extraction method and apparatus for natural language, and computer storage medium
CN107122413B (en) * 2017-03-31 2020-04-10 北京奇艺世纪科技有限公司 Keyword extraction method and device based on graph model
CN107122413A (en) * 2017-03-31 2017-09-01 北京奇艺世纪科技有限公司 A kind of keyword extracting method and device based on graph model
CN107122451A (en) * 2017-04-26 2017-09-01 北京科技大学 A kind of legal documents case by grader method for auto constructing
CN107122451B (en) * 2017-04-26 2020-01-21 北京科技大学 Automatic construction method of legal document sorter
CN107291690B (en) * 2017-05-26 2020-10-27 北京搜狗科技发展有限公司 Punctuation adding method and device and punctuation adding device
CN107291690A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Punctuate adding method and device, the device added for punctuate
CN107291836A (en) * 2017-05-31 2017-10-24 北京大学 A kind of Chinese text summary acquisition methods based on semantic relevancy model
CN107291836B (en) * 2017-05-31 2020-06-02 北京大学 Chinese text abstract obtaining method based on semantic relevancy model
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107729509A (en) * 2017-10-23 2018-02-23 中国电子科技集团公司第二十八研究所 The chapter similarity decision method represented based on recessive higher-dimension distributed nature
CN107729509B (en) * 2017-10-23 2020-07-07 中国电子科技集团公司第二十八研究所 Discourse similarity determination method based on recessive high-dimensional distributed feature representation
US10593422B2 (en) 2017-12-01 2020-03-17 International Business Machines Corporation Interaction network inference from vector representation of words
CN108280061B (en) * 2018-01-17 2021-10-26 北京百度网讯科技有限公司 Text processing method and device based on ambiguous entity words
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
US11455542B2 (en) 2018-01-17 2022-09-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Text processing method and device based on ambiguous entity words
CN108287916A (en) * 2018-02-11 2018-07-17 北京方正阿帕比技术有限公司 A kind of resource recommendation method
CN109614617B (en) * 2018-06-01 2022-12-16 安徽省泰岳祥升软件有限公司 Word vector generation method and device supporting polarity differentiation and polysemous
CN109614617A (en) * 2018-06-01 2019-04-12 安徽省泰岳祥升软件有限公司 Word vector generation method and device supporting polarity differentiation and polysemous
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN109558586B (en) * 2018-11-02 2023-04-18 中国科学院自动化研究所 Self-evidence scoring method, equipment and storage medium for statement of information
CN109558586A (en) * 2018-11-02 2019-04-02 中国科学院自动化研究所 A kind of speech of information is according to from card methods of marking, equipment and storage medium
CN109543041A (en) * 2018-11-30 2019-03-29 安徽听见科技有限公司 A kind of generation method and device of language model scores
CN109614538A (en) * 2018-12-17 2019-04-12 广东工业大学 A kind of extracting method, device and the equipment of agricultural product price data
CN109871530A (en) * 2018-12-28 2019-06-11 广州索答信息科技有限公司 A kind of menu field seed words automatically extract implementation method and storage medium
CN109871530B (en) * 2018-12-28 2023-10-31 广州索答信息科技有限公司 Automatic extraction realization method for seed words in menu field and storage medium
CN111583910A (en) * 2019-01-30 2020-08-25 北京猎户星空科技有限公司 Model updating method and device, electronic equipment and storage medium
CN111583910B (en) * 2019-01-30 2023-09-26 北京猎户星空科技有限公司 Model updating method and device, electronic equipment and storage medium
CN109918654B (en) * 2019-02-21 2022-12-27 厦门一品威客网络科技股份有限公司 Logo paraphrasing method, device and medium
CN109918654A (en) * 2019-02-21 2019-06-21 北京一品智尚信息科技有限公司 A kind of logo interpretation method, equipment and medium
US10922486B2 (en) 2019-03-13 2021-02-16 International Business Machines Corporation Parse tree based vectorization for natural language processing
CN110532547A (en) * 2019-07-31 2019-12-03 厦门快商通科技股份有限公司 Building of corpus method, apparatus, electronic equipment and medium
CN110765765B (en) * 2019-09-16 2023-10-20 平安科技(深圳)有限公司 Contract key term extraction method, device and storage medium based on artificial intelligence
CN110765765A (en) * 2019-09-16 2020-02-07 平安科技(深圳)有限公司 Contract key clause extraction method and device based on artificial intelligence and storage medium
CN111199154A (en) * 2019-12-20 2020-05-26 重庆邮电大学 Fault-tolerant rough set-based polysemous word expression method, system and medium
CN111199154B (en) * 2019-12-20 2022-12-27 重庆邮电大学 Fault-tolerant rough set-based polysemous word expression method, system and medium
CN111192682B (en) * 2019-12-25 2024-04-09 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111192682A (en) * 2019-12-25 2020-05-22 上海联影智能医疗科技有限公司 Image exercise data processing method, system and storage medium
CN111460169A (en) * 2020-03-27 2020-07-28 科大讯飞股份有限公司 Semantic expression generation method, device and equipment
CN111581952A (en) * 2020-05-20 2020-08-25 长沙理工大学 Large-scale replaceable word bank construction method for natural language information hiding
CN111581952B (en) * 2020-05-20 2023-10-03 长沙理工大学 Large-scale replaceable word library construction method for natural language information hiding
CN111611809B (en) * 2020-05-26 2023-04-18 西藏大学 Chinese sentence similarity calculation method based on neural network
CN111611809A (en) * 2020-05-26 2020-09-01 西藏大学 Chinese sentence similarity calculation method based on neural network
CN111694961A (en) * 2020-06-23 2020-09-22 上海观安信息技术股份有限公司 Keyword semantic classification method and system for sensitive data leakage detection
CN113377965A (en) * 2021-06-30 2021-09-10 中国农业银行股份有限公司 Method and related device for perceiving text keywords
CN113377965B (en) * 2021-06-30 2024-02-23 中国农业银行股份有限公司 Method and related device for sensing text keywords
CN114154513A (en) * 2022-02-07 2022-03-08 杭州远传新业科技有限公司 Automatic domain semantic web construction method and system
CN115168563A (en) * 2022-09-05 2022-10-11 深圳市华付信息技术有限公司 Airport service guiding method, system and device based on intention recognition

Similar Documents

Publication Publication Date Title
CN104391963A (en) Method for constructing correlation networks of keywords of natural language texts
CN104375989A (en) Natural language text keyword association network construction system
CN110704598B (en) Statement information extraction method, extraction device and readable storage medium
Spedicato Discrete Time Markov Chains with R.
Neelakantan et al. Efficient non-parametric estimation of multiple embeddings per word in vector space
CN104933183B (en) A kind of query word Improvement for merging term vector model and naive Bayesian
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
CN110458181A (en) A kind of syntax dependency model, training method and analysis method based on width random forest
CN111241294A (en) Graph convolution network relation extraction method based on dependency analysis and key words
CN104834747A (en) Short text classification method based on convolution neutral network
Pilehvar et al. A robust approach to aligning heterogeneous lexical resources
CN110750640A (en) Text data classification method and device based on neural network model and storage medium
CN111581954B (en) Text event extraction method and device based on grammar dependency information
CN105930413A (en) Training method for similarity model parameters, search processing method and corresponding apparatuses
CN104008187A (en) Semi-structured text matching method based on the minimum edit distance
CN116796744A (en) Entity relation extraction method and system based on deep learning
CN106485370B (en) A kind of method and apparatus of information prediction
Bortnikova et al. Queries classification using machine learning for implementation in intelligent manufacturing
Avrachenkov et al. The fundamental matrix of singularly perturbed Markov chains
Tiwari et al. Next word prediction using deep learning
Han et al. Automatic business process structure discovery using ordered neurons LSTM: a preliminary study
Kumagai et al. Human-like natural language generation using monte carlo tree search
Omar Performance Evaluation Of Supervised Machine Learning Classifiers For Mapping Natural Language Text To Entity Relationship Models
JP2007072610A (en) Information processing method, apparatus and program
Zhang Comparing the Effect of Smoothing and N-gram Order: Finding the Best Way to Combine the Smoothing and Order of N-gram

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150304