CN107102985A - Multi-threaded keyword extraction techniques in improved document - Google Patents

Multi-threaded keyword extraction techniques in improved document Download PDF

Info

Publication number
CN107102985A
CN107102985A CN201710268836.3A CN201710268836A CN107102985A CN 107102985 A CN107102985 A CN 107102985A CN 201710268836 A CN201710268836 A CN 201710268836A CN 107102985 A CN107102985 A CN 107102985A
Authority
CN
China
Prior art keywords
word
text
vocabulary
threaded
above formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710268836.3A
Other languages
Chinese (zh)
Inventor
金平艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yonglian Information Technology Co Ltd
Original Assignee
Sichuan Yonglian Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yonglian Information Technology Co Ltd filed Critical Sichuan Yonglian Information Technology Co Ltd
Priority to CN201710268836.3A priority Critical patent/CN107102985A/en
Publication of CN107102985A publication Critical patent/CN107102985A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Multi-threaded keyword extraction techniques in improved document, according to Chinese word segmentation preprocessing process, degree of correlation function is constructed with the difference of two co-occurrence degree functions, it is then converted to a multi-threaded network model, construct object functionConjunction is extracted, using pitching functionConjunction is dissolved into multi-threaded network modelIn, obtain new model figure, then before extractingPosition vocabulary is text key word.The degree of accuracy of the present invention is high, with more preferable application value, can accurately calculate different vocabulary to the contribution degree of text thought, consider that multi-threaded property has distinguished different characteristic again, when extracting keyword for the first time, file characteristics are extracted with accurate algorithm, more preferable place mat is provided for subsequent document keyword extraction, is also that follow-up text similarity and text cluster provide good theoretical foundation.

Description

Multi-threaded keyword extraction techniques in improved document
Technical field
The present invention relates to Semantic Web technology field, and in particular to multi-threaded keyword extraction skill in improved document Art.
Background technology
Keyword is the summary of article theme, is often occurred in the form of word or phrase, is to express text subject meaning most Subsection, can make reader understand the approximate contents of article in a short time, so as to save the time of reader.So document is crucial Word, which can help user rapidly to find user from substantial amounts of collection of document, to be needed or relative document.But except science Paper is comprising outside keyword, and substantial amounts of document does not have numerous webpages on keyword, internet especially mentioned above.Face The text data of magnanimity, manual extracting keywords are wasted time and energy, and subjectivity is strong, extract improper application that can also be to next step Cause negative influence.Traditional keyword abstraction algorithm generally lacks the consideration to file structure feature, cause structural information this The missing of one key character, have impact on the accuracy of keyword extraction to a certain extent, can not particularly extract real anti- Reflect the vocabulary of content of text.The existing keyword abstraction algorithm based on complex network or graph model build text complex network or It is simple using morphology as network node during graph model, although this algorithm can keep the structure of text to greatest extent Information, but be due to not carry out semantic tagger, cause the keyword extracted not having interpretation semantically, it is possible to meeting Produce ambiguity.Therefore, in order to improve the present situation of text retrieval, people actively study the various of artificial intelligence and natural language processing Technology, many scholars propose the method for automatically extracting keyword using machine intelligence.As can be seen here, keyword Automatic is text Originally the basis automatically processed and core technology, are the key technologies for the efficiency and degree of accuracy for solving information retrieval, keyword is table Text subject is stated, in order to meet the demand, the present invention provides keyword extraction techniques multi-threaded in a kind of improved document.
The content of the invention
For some non-high frequencies are found out from multi-threaded document and theme is contributed big word as keyword, realize from Dynamic to extract theme word problem and the not high deficiency of conventional keyword extracting method precision in document, the invention provides one Plant keyword extraction techniques multi-threaded in improved document.
In order to solve the above problems, the present invention is achieved by the following technical solutions:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation;
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder w is obtained;
Step 3:Construct degree of correlation function RE (wi, wj) processing of sequence from big to small is carried out to above-mentioned word finder w, n before taking Word constitutes a multi-threaded network model M;
Step 4:Construct object functionDetermine the conjunction LINK (C) between different themes;
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as M '.
Present invention has the advantages that:
1st, the method is higher than the degree of accuracy for the text key word set that traditional word frequency-anti-document frequency method is obtained.
2nd, in phrase semantic relationship map to subject network illustraton of model, multi-threaded property had both been considered, theme has been distinguished again Between different characteristic, the text key word of extraction more meets empirical value;
3rd, it is that follow-up text similarity and text cluster technology provide good theoretical foundation.
4th, this algorithm has bigger value.
5th, the method has been precisely calculated contribution degree of the different vocabulary to text thought in feature vocabulary.
6th, the method obtains more accurate file characteristics with accurate algorithm, is follow-up when extracting keyword for the first time Document keyword extraction provides more preferable place mat.
Brief description of the drawings
The structure flow chart of multi-threaded keyword extraction techniques in the improved documents of Fig. 1
Fig. 2 n-grams segmentation methods are illustrated
Fig. 3 Chinese text preprocessing process flow charts
Fig. 4 n words constitute a multi-threaded network model figure M
The multi-threaded network model figure M ' of Fig. 5
Embodiment
In order to solve to find out some non-high frequencies from multi-threaded document and contribute big word as keyword, reality to theme The problem of now automatically extracting in document theme word problem and not high conventional keyword extracting method precision, with reference to Fig. 1-figure 5 couples of present invention are described in detail, and its specific implementation step is as follows:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation, its specific participle technique process is as follows:
Step 1.1:According to《Dictionary for word segmentation》The word for treating to be matched in participle sentence with dictionary is found, the Chinese character for treating participle The complete scanning of string one time, carries out lookup matching in the dictionary of system, and the word that run into has in dictionary is just identified;If word Relevant matches are not present in allusion quotation, individual character are just simply partitioned into as word;Until Chinese character string is sky.
Step 1.2:It according to probability statistics, will treat that participle sentence is split as network structure, produce the n sentence that may be combined Every sequential node of this structure, is defined as SM by minor structure successively1M2M3M4M5E, its structure chart is as shown in Figure 2.
Step 1.3:Based on method of information theory, certain weights are assigned to above-mentioned network structure each edge, its specific calculating Process is as follows:
According to《Dictionary for word segmentation》The dictionary word matched and the single word not matched, the i-th paths are comprising the number of word ni.That is the number collection of n paths word is combined into (n1, n2..., nn)。
Obtain min ()=min (n1, n2..., nn)
In above-mentioned remaining (n-m) path left, the weight size of every adjacent path is solved.
In statistics corpus, the information content X (C of each word are calculatedi), then the adjacent word of solution path co-occurrence information amount X (Ci, Ci+1).Existing following formula:
X(Ci)=| x (Ci)1-x(Ci)2|
Above formula x (Ci)1For word C in text corpusiInformation content, x (Ci)2For C containing wordiText message amount.
x(Ci)1=-p (Ci)1lnp(Ci)1
Above formula p (Ci)1For CiProbability in text corpus, n is C containing wordiText corpus number.
x(Ci)2=-p (Ci)2lnp(Ci)2
Above formula p (Ci)2For C containing wordiTextual data probable value, N is statistics corpus Chinese version sum.
Similarly X (Ci, Ci+1)=| x (Ci, Ci+1)1-x(Ci, Ci+1)2|
x(Ci, Ci+1)1For the word (C in text corpusi, Ci+1) co-occurrence information amount, x (Ci, Ci+1)2For adjacent word (Ci, Ci+1) co-occurrence text message amount.
Similarly x (Ci, Ci+1)1=-p (Ci, Ci+1)1lnp(Ci, Ci+1)1
Above formula p (Ci, Ci+1)1For the word (C in text corpusi, Ci+1) co-occurrence probabilities, m be in text library word (Ci, Ci+1) co-occurrence amount of text.
x(Ci,Ci+1)2=-P (Ci,Ci+i)2lnp(Ci,Ci+i)2
p(Ci, Ci+1)2For adjacent word (C in text libraryi, Ci+1) co-occurrence textual data probability.
The weights that every adjacent path can to sum up be obtained are
w(Ci, Ci+1)=X (Ci)+X(Ci+1)-2X(Ci, Ci+1)
Step 1.4:A paths of maximum weight are found, the word segmentation result of participle sentence is as treated, it was specifically calculated Journey is as follows:
There are n paths, it is different per paths length, it is assumed that path length collection is combined into (L1, L2..., Ln)。
Assuming that by taking the minimum number of word in path to operate, eliminating m paths, m < n.It is left (n-m) path, If its path length collection is combined into
It is per paths weight then:
Above formulaRespectively the 1,2nd arrivesThe weighted value on path side, can be calculated one by one according to step 1.4,For S in remaining (n-m) pathj The length of paths.
One paths of maximum weight:
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder w is obtained, it is specifically described such as Under:
Stop words refers to that the frequency of occurrences is high in the text, but for word of the Text Flag without too big effect.Go to stop The process of word is exactly to be compared characteristic item with the word in deactivation vocabulary, by this feature entry deletion if matching.
Comprehensive participle and deletion stop words technology, Chinese text preprocessing process flow chart such as Fig. 3.
Step 3:Construct degree of correlation function RE (wi, wj) processing of sequence from big to small is carried out to above-mentioned word finder w, n before taking Word constitutes a multi-threaded network model M, and its specific calculating process is as follows:
Degree of correlation function RE (wi, wj):
Above formula ε is correction factor, d (wi, wj) it is vocabulary (wi, wj) between difference.
d(wi, wj)=| R1(wi, wj)-R2(wi, wj)|
Above formula R1(wi, wj)、R2(wi, wj) all it is relevance degree, g (w between vocabularyi/wj) it is wiRelative to wj′Co-occurrence degree, g (wj/wi) it is wjRelative to Ci′Co-occurrence degree, n (wi, wj) it is two vocabulary (wi, wj) number of times that occurs in a word, n (wi) be Vocabulary wiThe number of times occurred in a document, n (wj) it is vocabulary wjThe number of times occurred in a document.
The n keywords as text before extracting, i.e., according to RE (wi, wj) value extracts preceding n keyword from big to small.
Step 4:Construct object functionThe conjunction LINK (C) between different themes is determined, its specific calculating process is such as Under:
Object function
Above formula ρ is correction factor, TjBe the theme factor of influence.
Above formula j is j-th theme, and theme number is g, and h is the theme the number of middle vocabulary, and it is a variable, and theme is not Together, h value is just different,It is N for j-th of main in the title of the key words vocabulary number,Occur for conjunction C in theme j Number of times,For the similarity of vocabulary in conjunction C and theme, this can be calculated by conventional method, α, β RespectivelyInfluence coefficient, general β > α, and alpha+beta=1, α, β can go out most preferably by experiment test Value, above formulaBe the theme ZjTo the influence degree of document.
According toValue, from greatly to m conjunction LINK (C) of small selection.
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as M ', Its calculating process is as follows:
Pitch function:
Above formula G (Ci′/wj′) it is Ci′Relative to wj′Co-occurrence degree, G (wj′/Ci′) it is wj′Relative to Ci′Co-occurrence degree, above formula MfFor the common father node density of two vocabulary Ontological concepts, SfFor the common father node depth of two vocabulary Ontological concepts, nfFor adopted original The maximum node density value in tree in network structure where correspondence father node, dfFor correspondence father in adopted former network structure The degree of the tree in tree where node
Similarly
Above formula n (Ci′, wj′) it is conjunction Ci′With vocabulary w in word finderj′The number of times occurred in a word, N (wj′) be Vocabulary w in word finderj′The number of times occurred in a document, N (Ci′) it is conjunction Ci′The number of times occurred in a document, here N (Ci′) ≠N(wj′)、n(Ci′, wj′)=n (wj′, Ci′)。
According to fork functionValue take n-1 vocabulary pair from big to small, produce n keyword in document.
Multi-threaded keyword extraction techniques in improved document, its false code calculating process is as follows:
Input:One document
Output:Extract the kernel keyword in document.

Claims (3)

1. multi-threaded keyword extraction techniques in improved document, the present invention relates to Semantic Web technology field, and in particular to Multi-threaded keyword extraction techniques in improved document, it is characterized in that, comprise the following steps:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation, its specific participle technique process is as follows:
Step 1.1:According to《Dictionary for word segmentation》The word for treating to be matched in participle sentence with dictionary is found, the Chinese character string for treating participle is complete Whole scanning one time, carries out lookup matching in the dictionary of system, and the word that run into has in dictionary is just identified;If in dictionary In the absence of relevant matches, individual character is just simply partitioned into as word;Until Chinese character string is sky
Step 1.2:According to probability statistics, it will treat that participle sentence is split as network structure, produceThe individual sentence that may be combined Structure, is successively defined as every sequential node of this structure, its structure chart is as shown in Figure 2
Step 1.3:Based on method of information theory, certain weights, its specific calculating process are assigned to above-mentioned network structure each edge It is as follows:
According to《Dictionary for word segmentation》The dictionary word matched and the single word not matched, thePaths are comprising the number of word, i.e.,The number collection of paths word is combined into
It is above-mentioned leave it is remainingIn path, the weight size of every adjacent path is solved
In statistics corpus, the information content of each word is calculated, then the adjacent word of solution path co-occurrence information amount, existing following formula:
Above formulaFor word in text corpusInformation content,For containing wordText message amount
Above formulaForProbability in text corpus,For containing wordText corpus number
Above formulaFor containing wordTextual data probable value,For statistics corpus Chinese version sum
Similarly
For the word in text corpusCo-occurrence information amount,For adjacent wordThe text message amount of co-occurrence
Similarly
Above formulaFor the word in text corpusCo-occurrence probabilities,For the word in text libraryThe amount of text of co-occurrence
For adjacent word in text libraryThe textual data probability of co-occurrence
The weights that every adjacent path can to sum up be obtained are
Step 1.4:A paths of maximum weight are found, the word segmentation result of participle sentence is as treated, its specific calculating process is such as Under:
HavePaths, it is different per paths length, it is assumed that path length collection is combined into
Assuming that by taking the minimum number of word in path to operate, eliminatingPaths,, that is, it is leftRoad Footpath, if its path length collection is combined into
It is per paths weight then:
Above formulaRespectively the 1,2nd arrivesRoad The weighted value on footpath side, can be calculated one by one according to step 1.4,It is remainingIn path The length of paths
One paths of maximum weight:
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder is obtained, it is described in detail below:
Stop words refers to that the frequency of occurrences is high in the text, but for word of the Text Flag without too big effect, removes stop words Process be exactly to be compared with disabling the word in vocabulary characteristic item, by the spy if matching
Levy entry deletion
Comprehensive participle and deletion stop words technology, Chinese text preprocessing process flow chart such as Fig. 3
Step 3:Construct degree of correlation functionTo above-mentioned word finderThe processing of sequence from big to small is carried out, before taking Individual word constitutes a multi-threaded network model
Step 4:Construct object functionDetermine the conjunction between different themes
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as, it is counted Calculation process is as follows:
Pitch function:
Above formulaForRelative toCo-occurrence degree,ForRelative toCo-occurrence degree, Above formulaFor the common father node density of two vocabulary Ontological concepts,For the common father node depth of two vocabulary Ontological concepts,To correspond to the maximum node density value in the tree where father node in adopted former network structure,It is netted for adopted original The degree of the tree in tree in structure where correspondence father node
Similarly
Above formulaFor conjunctionWith vocabulary in word finderThe number of times occurred in a word, For vocabulary in word finderThe number of times occurred in a document,For conjunctionThe number of times occurred in a document, this In
According to fork functionValue take from big to smallIndividual vocabulary pair, is produced in documentIndividual keyword.
2. multi-threaded keyword extraction techniques in the improved document according to claim 1, it is characterized in that, above institute The specific calculating process stated in step 3 is as follows:
Step 3:Construct degree of correlation functionTo above-mentioned word finderThe processing of sequence from big to small is carried out, before taking Individual word constitutes a multi-threaded network model, its specific calculating process is as follows:
Degree of correlation function:
Above formulaFor correction factor,For vocabularyBetween difference
Above formulaAll it is relevance degree between vocabulary,ForRelative to Co-occurrence degree,ForRelative toCo-occurrence degree,For two vocabularyIn short The number of times of middle appearance,For vocabularyThe number of times occurred in a document,For vocabularyOccur in a document Number of times
,
Before extractionPosition is used as the keyword of text, i.e. basisBefore value is extracted from big to smallIndividual keyword.
3. multi-threaded keyword extraction techniques in the improved document according to claim 1, it is characterized in that, above institute The specific calculating process stated in step 4 is as follows:
Step 4:Construct object functionDetermine the conjunction between different themes, its specific calculating process is as follows:
Object function
Above formulaFor correction factor,Be the theme factor of influence
Above formulaForIndividual theme, theme number isIt is individual,Be the theme the number of middle vocabulary, and it is a variable, and theme is not Together,Value it is just different,ForIndividual main in the title of the key words vocabulary number is,For conjunctionIn theme The number of times of middle appearance,For conjunctionWith the similarity of vocabulary in theme, this can be calculated by conventional method Go out,RespectivelyInfluence coefficient, typically, and,Optimum value, above formula can be gone out by experiment testIt is the themeTo the influence degree of document
According toValue, from greatly to small selectionIndividual conjunction
CN201710268836.3A 2017-04-23 2017-04-23 Multi-threaded keyword extraction techniques in improved document Pending CN107102985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710268836.3A CN107102985A (en) 2017-04-23 2017-04-23 Multi-threaded keyword extraction techniques in improved document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710268836.3A CN107102985A (en) 2017-04-23 2017-04-23 Multi-threaded keyword extraction techniques in improved document

Publications (1)

Publication Number Publication Date
CN107102985A true CN107102985A (en) 2017-08-29

Family

ID=59657043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710268836.3A Pending CN107102985A (en) 2017-04-23 2017-04-23 Multi-threaded keyword extraction techniques in improved document

Country Status (1)

Country Link
CN (1) CN107102985A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920660A (en) * 2018-07-04 2018-11-30 中国银行股份有限公司 Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing
CN109522392A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Voice-based search method, server and computer readable storage medium
WO2019165678A1 (en) * 2018-03-02 2019-09-06 广东技术师范学院 Keyword extraction method for mooc
CN110263345A (en) * 2019-06-26 2019-09-20 北京百度网讯科技有限公司 Keyword extracting method, device and storage medium
CN110348133A (en) * 2019-07-15 2019-10-18 西南交通大学 A kind of bullet train three-dimensional objects structure technology effect figure building system and method
CN110717047A (en) * 2019-10-22 2020-01-21 湖南科技大学 Web service classification method based on graph convolution neural network
CN112597776A (en) * 2021-03-08 2021-04-02 中译语通科技股份有限公司 Keyword extraction method and system
CN115713085A (en) * 2022-10-31 2023-02-24 北京市农林科学院 Document theme content analysis method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243065A (en) * 2014-06-24 2016-01-13 中兴通讯股份有限公司 Material information output method and system
CN105843795A (en) * 2016-03-21 2016-08-10 华南理工大学 Topic model based document keyword extraction method and system
CN106570120A (en) * 2016-11-02 2017-04-19 四川用联信息技术有限公司 Process for realizing searching engine optimization through improved keyword optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243065A (en) * 2014-06-24 2016-01-13 中兴通讯股份有限公司 Material information output method and system
CN105843795A (en) * 2016-03-21 2016-08-10 华南理工大学 Topic model based document keyword extraction method and system
CN106570120A (en) * 2016-11-02 2017-04-19 四川用联信息技术有限公司 Process for realizing searching engine optimization through improved keyword optimization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宁建飞等: "融合Word2vec与TextRank的关键词抽取研究", 《现代图书情报技术》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019165678A1 (en) * 2018-03-02 2019-09-06 广东技术师范学院 Keyword extraction method for mooc
CN108920660B (en) * 2018-07-04 2020-11-20 中国银行股份有限公司 Keyword weight obtaining method and device, electronic equipment and readable storage medium
CN108920660A (en) * 2018-07-04 2018-11-30 中国银行股份有限公司 Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing
CN109522392A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Voice-based search method, server and computer readable storage medium
CN110263345A (en) * 2019-06-26 2019-09-20 北京百度网讯科技有限公司 Keyword extracting method, device and storage medium
CN110263345B (en) * 2019-06-26 2023-09-05 北京百度网讯科技有限公司 Keyword extraction method, keyword extraction device and storage medium
CN110348133A (en) * 2019-07-15 2019-10-18 西南交通大学 A kind of bullet train three-dimensional objects structure technology effect figure building system and method
CN110348133B (en) * 2019-07-15 2022-08-19 西南交通大学 System and method for constructing high-speed train three-dimensional product structure technical effect diagram
CN110717047A (en) * 2019-10-22 2020-01-21 湖南科技大学 Web service classification method based on graph convolution neural network
CN110717047B (en) * 2019-10-22 2022-06-28 湖南科技大学 Web service classification method based on graph convolution neural network
CN112597776A (en) * 2021-03-08 2021-04-02 中译语通科技股份有限公司 Keyword extraction method and system
CN115713085A (en) * 2022-10-31 2023-02-24 北京市农林科学院 Document theme content analysis method and device
CN115713085B (en) * 2022-10-31 2023-11-07 北京市农林科学院 Method and device for analyzing literature topic content

Similar Documents

Publication Publication Date Title
CN107102985A (en) Multi-threaded keyword extraction techniques in improved document
CN106598940A (en) Text similarity solution algorithm based on global optimization of keyword quality
CN106970910B (en) Keyword extraction method and device based on graph model
CN108681574B (en) Text abstract-based non-fact question-answer selection method and system
Wen et al. Research on keyword extraction based on word2vec weighted textrank
CN108763213A (en) Theme feature text key word extracting method
CN106610951A (en) Improved text similarity solving algorithm based on semantic analysis
CN111680488B (en) Cross-language entity alignment method based on knowledge graph multi-view information
CN106776562A (en) A kind of keyword extracting method and extraction system
CN106611041A (en) New text similarity solution method
CN106570112A (en) Improved ant colony algorithm-based text clustering realization method
CN106598941A (en) Algorithm for globally optimizing quality of text keywords
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN106528621A (en) Improved density text clustering algorithm
CN111625622B (en) Domain ontology construction method and device, electronic equipment and storage medium
CN106610954A (en) Text feature word extraction method based on statistics
CN106610952A (en) Mixed text feature word extraction method
CN106610949A (en) Text feature extraction method based on semantic analysis
CN107092595A (en) New keyword extraction techniques
CN106570120A (en) Process for realizing searching engine optimization through improved keyword optimization
CN113268606A (en) Knowledge graph construction method and device
CN106610953A (en) Method for solving text similarity based on Gini index
CN107038155A (en) The extracting method of text feature is realized based on improved small-world network model
CN106528726A (en) Keyword optimization-based search engine optimization realization technology
CN106776678A (en) Search engine optimization technology is realized in new keyword optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170829