CN107102985A - Multi-threaded keyword extraction techniques in improved document - Google Patents
Multi-threaded keyword extraction techniques in improved document Download PDFInfo
- Publication number
- CN107102985A CN107102985A CN201710268836.3A CN201710268836A CN107102985A CN 107102985 A CN107102985 A CN 107102985A CN 201710268836 A CN201710268836 A CN 201710268836A CN 107102985 A CN107102985 A CN 107102985A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- vocabulary
- threaded
- above formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Multi-threaded keyword extraction techniques in improved document, according to Chinese word segmentation preprocessing process, degree of correlation function is constructed with the difference of two co-occurrence degree functions, it is then converted to a multi-threaded network model, construct object functionConjunction is extracted, using pitching functionConjunction is dissolved into multi-threaded network modelIn, obtain new model figure, then before extractingPosition vocabulary is text key word.The degree of accuracy of the present invention is high, with more preferable application value, can accurately calculate different vocabulary to the contribution degree of text thought, consider that multi-threaded property has distinguished different characteristic again, when extracting keyword for the first time, file characteristics are extracted with accurate algorithm, more preferable place mat is provided for subsequent document keyword extraction, is also that follow-up text similarity and text cluster provide good theoretical foundation.
Description
Technical field
The present invention relates to Semantic Web technology field, and in particular to multi-threaded keyword extraction skill in improved document
Art.
Background technology
Keyword is the summary of article theme, is often occurred in the form of word or phrase, is to express text subject meaning most
Subsection, can make reader understand the approximate contents of article in a short time, so as to save the time of reader.So document is crucial
Word, which can help user rapidly to find user from substantial amounts of collection of document, to be needed or relative document.But except science
Paper is comprising outside keyword, and substantial amounts of document does not have numerous webpages on keyword, internet especially mentioned above.Face
The text data of magnanimity, manual extracting keywords are wasted time and energy, and subjectivity is strong, extract improper application that can also be to next step
Cause negative influence.Traditional keyword abstraction algorithm generally lacks the consideration to file structure feature, cause structural information this
The missing of one key character, have impact on the accuracy of keyword extraction to a certain extent, can not particularly extract real anti-
Reflect the vocabulary of content of text.The existing keyword abstraction algorithm based on complex network or graph model build text complex network or
It is simple using morphology as network node during graph model, although this algorithm can keep the structure of text to greatest extent
Information, but be due to not carry out semantic tagger, cause the keyword extracted not having interpretation semantically, it is possible to meeting
Produce ambiguity.Therefore, in order to improve the present situation of text retrieval, people actively study the various of artificial intelligence and natural language processing
Technology, many scholars propose the method for automatically extracting keyword using machine intelligence.As can be seen here, keyword Automatic is text
Originally the basis automatically processed and core technology, are the key technologies for the efficiency and degree of accuracy for solving information retrieval, keyword is table
Text subject is stated, in order to meet the demand, the present invention provides keyword extraction techniques multi-threaded in a kind of improved document.
The content of the invention
For some non-high frequencies are found out from multi-threaded document and theme is contributed big word as keyword, realize from
Dynamic to extract theme word problem and the not high deficiency of conventional keyword extracting method precision in document, the invention provides one
Plant keyword extraction techniques multi-threaded in improved document.
In order to solve the above problems, the present invention is achieved by the following technical solutions:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation;
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder w is obtained;
Step 3:Construct degree of correlation function RE (wi, wj) processing of sequence from big to small is carried out to above-mentioned word finder w, n before taking
Word constitutes a multi-threaded network model M;
Step 4:Construct object functionDetermine the conjunction LINK (C) between different themes;
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as M '.
Present invention has the advantages that:
1st, the method is higher than the degree of accuracy for the text key word set that traditional word frequency-anti-document frequency method is obtained.
2nd, in phrase semantic relationship map to subject network illustraton of model, multi-threaded property had both been considered, theme has been distinguished again
Between different characteristic, the text key word of extraction more meets empirical value;
3rd, it is that follow-up text similarity and text cluster technology provide good theoretical foundation.
4th, this algorithm has bigger value.
5th, the method has been precisely calculated contribution degree of the different vocabulary to text thought in feature vocabulary.
6th, the method obtains more accurate file characteristics with accurate algorithm, is follow-up when extracting keyword for the first time
Document keyword extraction provides more preferable place mat.
Brief description of the drawings
The structure flow chart of multi-threaded keyword extraction techniques in the improved documents of Fig. 1
Fig. 2 n-grams segmentation methods are illustrated
Fig. 3 Chinese text preprocessing process flow charts
Fig. 4 n words constitute a multi-threaded network model figure M
The multi-threaded network model figure M ' of Fig. 5
Embodiment
In order to solve to find out some non-high frequencies from multi-threaded document and contribute big word as keyword, reality to theme
The problem of now automatically extracting in document theme word problem and not high conventional keyword extracting method precision, with reference to Fig. 1-figure
5 couples of present invention are described in detail, and its specific implementation step is as follows:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation, its specific participle technique process is as follows:
Step 1.1:According to《Dictionary for word segmentation》The word for treating to be matched in participle sentence with dictionary is found, the Chinese character for treating participle
The complete scanning of string one time, carries out lookup matching in the dictionary of system, and the word that run into has in dictionary is just identified;If word
Relevant matches are not present in allusion quotation, individual character are just simply partitioned into as word;Until Chinese character string is sky.
Step 1.2:It according to probability statistics, will treat that participle sentence is split as network structure, produce the n sentence that may be combined
Every sequential node of this structure, is defined as SM by minor structure successively1M2M3M4M5E, its structure chart is as shown in Figure 2.
Step 1.3:Based on method of information theory, certain weights are assigned to above-mentioned network structure each edge, its specific calculating
Process is as follows:
According to《Dictionary for word segmentation》The dictionary word matched and the single word not matched, the i-th paths are comprising the number of word
ni.That is the number collection of n paths word is combined into (n1, n2..., nn)。
Obtain min ()=min (n1, n2..., nn)
In above-mentioned remaining (n-m) path left, the weight size of every adjacent path is solved.
In statistics corpus, the information content X (C of each word are calculatedi), then the adjacent word of solution path co-occurrence information amount X
(Ci, Ci+1).Existing following formula:
X(Ci)=| x (Ci)1-x(Ci)2|
Above formula x (Ci)1For word C in text corpusiInformation content, x (Ci)2For C containing wordiText message amount.
x(Ci)1=-p (Ci)1lnp(Ci)1
Above formula p (Ci)1For CiProbability in text corpus, n is C containing wordiText corpus number.
x(Ci)2=-p (Ci)2lnp(Ci)2
Above formula p (Ci)2For C containing wordiTextual data probable value, N is statistics corpus Chinese version sum.
Similarly X (Ci, Ci+1)=| x (Ci, Ci+1)1-x(Ci, Ci+1)2|
x(Ci, Ci+1)1For the word (C in text corpusi, Ci+1) co-occurrence information amount, x (Ci, Ci+1)2For adjacent word (Ci,
Ci+1) co-occurrence text message amount.
Similarly x (Ci, Ci+1)1=-p (Ci, Ci+1)1lnp(Ci, Ci+1)1
Above formula p (Ci, Ci+1)1For the word (C in text corpusi, Ci+1) co-occurrence probabilities, m be in text library word (Ci,
Ci+1) co-occurrence amount of text.
x(Ci,Ci+1)2=-P (Ci,Ci+i)2lnp(Ci,Ci+i)2
p(Ci, Ci+1)2For adjacent word (C in text libraryi, Ci+1) co-occurrence textual data probability.
The weights that every adjacent path can to sum up be obtained are
w(Ci, Ci+1)=X (Ci)+X(Ci+1)-2X(Ci, Ci+1)
Step 1.4:A paths of maximum weight are found, the word segmentation result of participle sentence is as treated, it was specifically calculated
Journey is as follows:
There are n paths, it is different per paths length, it is assumed that path length collection is combined into (L1, L2..., Ln)。
Assuming that by taking the minimum number of word in path to operate, eliminating m paths, m < n.It is left (n-m) path,
If its path length collection is combined into
It is per paths weight then:
Above formulaRespectively the 1,2nd arrivesThe weighted value on path side, can be calculated one by one according to step 1.4,For S in remaining (n-m) pathj
The length of paths.
One paths of maximum weight:
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder w is obtained, it is specifically described such as
Under:
Stop words refers to that the frequency of occurrences is high in the text, but for word of the Text Flag without too big effect.Go to stop
The process of word is exactly to be compared characteristic item with the word in deactivation vocabulary, by this feature entry deletion if matching.
Comprehensive participle and deletion stop words technology, Chinese text preprocessing process flow chart such as Fig. 3.
Step 3:Construct degree of correlation function RE (wi, wj) processing of sequence from big to small is carried out to above-mentioned word finder w, n before taking
Word constitutes a multi-threaded network model M, and its specific calculating process is as follows:
Degree of correlation function RE (wi, wj):
Above formula ε is correction factor, d (wi, wj) it is vocabulary (wi, wj) between difference.
d(wi, wj)=| R1(wi, wj)-R2(wi, wj)|
Above formula R1(wi, wj)、R2(wi, wj) all it is relevance degree, g (w between vocabularyi/wj) it is wiRelative to wj′Co-occurrence degree, g
(wj/wi) it is wjRelative to Ci′Co-occurrence degree, n (wi, wj) it is two vocabulary (wi, wj) number of times that occurs in a word, n (wi) be
Vocabulary wiThe number of times occurred in a document, n (wj) it is vocabulary wjThe number of times occurred in a document.
The n keywords as text before extracting, i.e., according to RE (wi, wj) value extracts preceding n keyword from big to small.
Step 4:Construct object functionThe conjunction LINK (C) between different themes is determined, its specific calculating process is such as
Under:
Object function
Above formula ρ is correction factor, TjBe the theme factor of influence.
Above formula j is j-th theme, and theme number is g, and h is the theme the number of middle vocabulary, and it is a variable, and theme is not
Together, h value is just different,It is N for j-th of main in the title of the key words vocabulary number,Occur for conjunction C in theme j
Number of times,For the similarity of vocabulary in conjunction C and theme, this can be calculated by conventional method, α, β
RespectivelyInfluence coefficient, general β > α, and alpha+beta=1, α, β can go out most preferably by experiment test
Value, above formulaBe the theme ZjTo the influence degree of document.
According toValue, from greatly to m conjunction LINK (C) of small selection.
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as M ',
Its calculating process is as follows:
Pitch function:
Above formula G (Ci′/wj′) it is Ci′Relative to wj′Co-occurrence degree, G (wj′/Ci′) it is wj′Relative to Ci′Co-occurrence degree, above formula
MfFor the common father node density of two vocabulary Ontological concepts, SfFor the common father node depth of two vocabulary Ontological concepts, nfFor adopted original
The maximum node density value in tree in network structure where correspondence father node, dfFor correspondence father in adopted former network structure
The degree of the tree in tree where node
Similarly
Above formula n (Ci′, wj′) it is conjunction Ci′With vocabulary w in word finderj′The number of times occurred in a word, N (wj′) be
Vocabulary w in word finderj′The number of times occurred in a document, N (Ci′) it is conjunction Ci′The number of times occurred in a document, here N (Ci′)
≠N(wj′)、n(Ci′, wj′)=n (wj′, Ci′)。
According to fork functionValue take n-1 vocabulary pair from big to small, produce n keyword in document.
Multi-threaded keyword extraction techniques in improved document, its false code calculating process is as follows:
Input:One document
Output:Extract the kernel keyword in document.
Claims (3)
1. multi-threaded keyword extraction techniques in improved document, the present invention relates to Semantic Web technology field, and in particular to
Multi-threaded keyword extraction techniques in improved document, it is characterized in that, comprise the following steps:
Step 1:Word segmentation processing is carried out to text using Chinese words segmentation, its specific participle technique process is as follows:
Step 1.1:According to《Dictionary for word segmentation》The word for treating to be matched in participle sentence with dictionary is found, the Chinese character string for treating participle is complete
Whole scanning one time, carries out lookup matching in the dictionary of system, and the word that run into has in dictionary is just identified;If in dictionary
In the absence of relevant matches, individual character is just simply partitioned into as word;Until Chinese character string is sky
Step 1.2:According to probability statistics, it will treat that participle sentence is split as network structure, produceThe individual sentence that may be combined
Structure, is successively defined as every sequential node of this structure, its structure chart is as shown in Figure 2
Step 1.3:Based on method of information theory, certain weights, its specific calculating process are assigned to above-mentioned network structure each edge
It is as follows:
According to《Dictionary for word segmentation》The dictionary word matched and the single word not matched, thePaths are comprising the number of word, i.e.,The number collection of paths word is combined into
It is above-mentioned leave it is remainingIn path, the weight size of every adjacent path is solved
In statistics corpus, the information content of each word is calculated, then the adjacent word of solution path co-occurrence information amount, existing following formula:
Above formulaFor word in text corpusInformation content,For containing wordText message amount
Above formulaForProbability in text corpus,For containing wordText corpus number
Above formulaFor containing wordTextual data probable value,For statistics corpus Chinese version sum
Similarly
For the word in text corpusCo-occurrence information amount,For adjacent wordThe text message amount of co-occurrence
Similarly
Above formulaFor the word in text corpusCo-occurrence probabilities,For the word in text libraryThe amount of text of co-occurrence
For adjacent word in text libraryThe textual data probability of co-occurrence
The weights that every adjacent path can to sum up be obtained are
Step 1.4:A paths of maximum weight are found, the word segmentation result of participle sentence is as treated, its specific calculating process is such as
Under:
HavePaths, it is different per paths length, it is assumed that path length collection is combined into
Assuming that by taking the minimum number of word in path to operate, eliminatingPaths,, that is, it is leftRoad
Footpath, if its path length collection is combined into
It is per paths weight then:
Above formulaRespectively the 1,2nd arrivesRoad
The weighted value on footpath side, can be calculated one by one according to step 1.4,It is remainingIn path
The length of paths
One paths of maximum weight:
Step 2:Text vocabulary is carried out according to deactivation table to go stop words to handle, word finder is obtained, it is described in detail below:
Stop words refers to that the frequency of occurrences is high in the text, but for word of the Text Flag without too big effect, removes stop words
Process be exactly to be compared with disabling the word in vocabulary characteristic item, by the spy if matching
Levy entry deletion
Comprehensive participle and deletion stop words technology, Chinese text preprocessing process flow chart such as Fig. 3
Step 3:Construct degree of correlation functionTo above-mentioned word finderThe processing of sequence from big to small is carried out, before taking
Individual word constitutes a multi-threaded network model;
Step 4:Construct object functionDetermine the conjunction between different themes;
Step 5:Construction fork functionConjunction is effectively incorporated in multi-threaded network model, illustraton of model is designated as, it is counted
Calculation process is as follows:
Pitch function:
Above formulaForRelative toCo-occurrence degree,ForRelative toCo-occurrence degree,
Above formulaFor the common father node density of two vocabulary Ontological concepts,For the common father node depth of two vocabulary Ontological concepts,To correspond to the maximum node density value in the tree where father node in adopted former network structure,It is netted for adopted original
The degree of the tree in tree in structure where correspondence father node
Similarly
Above formulaFor conjunctionWith vocabulary in word finderThe number of times occurred in a word,
For vocabulary in word finderThe number of times occurred in a document,For conjunctionThe number of times occurred in a document, this
In、
According to fork functionValue take from big to smallIndividual vocabulary pair, is produced in documentIndividual keyword.
2. multi-threaded keyword extraction techniques in the improved document according to claim 1, it is characterized in that, above institute
The specific calculating process stated in step 3 is as follows:
Step 3:Construct degree of correlation functionTo above-mentioned word finderThe processing of sequence from big to small is carried out, before taking
Individual word constitutes a multi-threaded network model, its specific calculating process is as follows:
Degree of correlation function:
Above formulaFor correction factor,For vocabularyBetween difference
Above formula、All it is relevance degree between vocabulary,ForRelative to
Co-occurrence degree,ForRelative toCo-occurrence degree,For two vocabularyIn short
The number of times of middle appearance,For vocabularyThe number of times occurred in a document,For vocabularyOccur in a document
Number of times
,
Before extractionPosition is used as the keyword of text, i.e. basisBefore value is extracted from big to smallIndividual keyword.
3. multi-threaded keyword extraction techniques in the improved document according to claim 1, it is characterized in that, above institute
The specific calculating process stated in step 4 is as follows:
Step 4:Construct object functionDetermine the conjunction between different themes, its specific calculating process is as follows:
Object function:
Above formulaFor correction factor,Be the theme factor of influence
Above formulaForIndividual theme, theme number isIt is individual,Be the theme the number of middle vocabulary, and it is a variable, and theme is not
Together,Value it is just different,ForIndividual main in the title of the key words vocabulary number is,For conjunctionIn theme
The number of times of middle appearance,For conjunctionWith the similarity of vocabulary in theme, this can be calculated by conventional method
Go out,、Respectively、Influence coefficient, typically, and,、Optimum value, above formula can be gone out by experiment testIt is the themeTo the influence degree of document
According toValue, from greatly to small selectionIndividual conjunction。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710268836.3A CN107102985A (en) | 2017-04-23 | 2017-04-23 | Multi-threaded keyword extraction techniques in improved document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710268836.3A CN107102985A (en) | 2017-04-23 | 2017-04-23 | Multi-threaded keyword extraction techniques in improved document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107102985A true CN107102985A (en) | 2017-08-29 |
Family
ID=59657043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710268836.3A Pending CN107102985A (en) | 2017-04-23 | 2017-04-23 | Multi-threaded keyword extraction techniques in improved document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107102985A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920660A (en) * | 2018-07-04 | 2018-11-30 | 中国银行股份有限公司 | Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing |
CN109522392A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Voice-based search method, server and computer readable storage medium |
WO2019165678A1 (en) * | 2018-03-02 | 2019-09-06 | 广东技术师范学院 | Keyword extraction method for mooc |
CN110263345A (en) * | 2019-06-26 | 2019-09-20 | 北京百度网讯科技有限公司 | Keyword extracting method, device and storage medium |
CN110348133A (en) * | 2019-07-15 | 2019-10-18 | 西南交通大学 | A kind of bullet train three-dimensional objects structure technology effect figure building system and method |
CN110717047A (en) * | 2019-10-22 | 2020-01-21 | 湖南科技大学 | Web service classification method based on graph convolution neural network |
CN112597776A (en) * | 2021-03-08 | 2021-04-02 | 中译语通科技股份有限公司 | Keyword extraction method and system |
CN115713085A (en) * | 2022-10-31 | 2023-02-24 | 北京市农林科学院 | Document theme content analysis method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105243065A (en) * | 2014-06-24 | 2016-01-13 | 中兴通讯股份有限公司 | Material information output method and system |
CN105843795A (en) * | 2016-03-21 | 2016-08-10 | 华南理工大学 | Topic model based document keyword extraction method and system |
CN106570120A (en) * | 2016-11-02 | 2017-04-19 | 四川用联信息技术有限公司 | Process for realizing searching engine optimization through improved keyword optimization |
-
2017
- 2017-04-23 CN CN201710268836.3A patent/CN107102985A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105243065A (en) * | 2014-06-24 | 2016-01-13 | 中兴通讯股份有限公司 | Material information output method and system |
CN105843795A (en) * | 2016-03-21 | 2016-08-10 | 华南理工大学 | Topic model based document keyword extraction method and system |
CN106570120A (en) * | 2016-11-02 | 2017-04-19 | 四川用联信息技术有限公司 | Process for realizing searching engine optimization through improved keyword optimization |
Non-Patent Citations (1)
Title |
---|
宁建飞等: "融合Word2vec与TextRank的关键词抽取研究", 《现代图书情报技术》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019165678A1 (en) * | 2018-03-02 | 2019-09-06 | 广东技术师范学院 | Keyword extraction method for mooc |
CN108920660B (en) * | 2018-07-04 | 2020-11-20 | 中国银行股份有限公司 | Keyword weight obtaining method and device, electronic equipment and readable storage medium |
CN108920660A (en) * | 2018-07-04 | 2018-11-30 | 中国银行股份有限公司 | Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing |
CN109522392A (en) * | 2018-10-11 | 2019-03-26 | 平安科技(深圳)有限公司 | Voice-based search method, server and computer readable storage medium |
CN110263345A (en) * | 2019-06-26 | 2019-09-20 | 北京百度网讯科技有限公司 | Keyword extracting method, device and storage medium |
CN110263345B (en) * | 2019-06-26 | 2023-09-05 | 北京百度网讯科技有限公司 | Keyword extraction method, keyword extraction device and storage medium |
CN110348133A (en) * | 2019-07-15 | 2019-10-18 | 西南交通大学 | A kind of bullet train three-dimensional objects structure technology effect figure building system and method |
CN110348133B (en) * | 2019-07-15 | 2022-08-19 | 西南交通大学 | System and method for constructing high-speed train three-dimensional product structure technical effect diagram |
CN110717047A (en) * | 2019-10-22 | 2020-01-21 | 湖南科技大学 | Web service classification method based on graph convolution neural network |
CN110717047B (en) * | 2019-10-22 | 2022-06-28 | 湖南科技大学 | Web service classification method based on graph convolution neural network |
CN112597776A (en) * | 2021-03-08 | 2021-04-02 | 中译语通科技股份有限公司 | Keyword extraction method and system |
CN115713085A (en) * | 2022-10-31 | 2023-02-24 | 北京市农林科学院 | Document theme content analysis method and device |
CN115713085B (en) * | 2022-10-31 | 2023-11-07 | 北京市农林科学院 | Method and device for analyzing literature topic content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107102985A (en) | Multi-threaded keyword extraction techniques in improved document | |
CN106598940A (en) | Text similarity solution algorithm based on global optimization of keyword quality | |
CN106970910B (en) | Keyword extraction method and device based on graph model | |
CN108681574B (en) | Text abstract-based non-fact question-answer selection method and system | |
Wen et al. | Research on keyword extraction based on word2vec weighted textrank | |
CN108763213A (en) | Theme feature text key word extracting method | |
CN106610951A (en) | Improved text similarity solving algorithm based on semantic analysis | |
CN111680488B (en) | Cross-language entity alignment method based on knowledge graph multi-view information | |
CN106776562A (en) | A kind of keyword extracting method and extraction system | |
CN106611041A (en) | New text similarity solution method | |
CN106570112A (en) | Improved ant colony algorithm-based text clustering realization method | |
CN106598941A (en) | Algorithm for globally optimizing quality of text keywords | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN106528621A (en) | Improved density text clustering algorithm | |
CN111625622B (en) | Domain ontology construction method and device, electronic equipment and storage medium | |
CN106610954A (en) | Text feature word extraction method based on statistics | |
CN106610952A (en) | Mixed text feature word extraction method | |
CN106610949A (en) | Text feature extraction method based on semantic analysis | |
CN107092595A (en) | New keyword extraction techniques | |
CN106570120A (en) | Process for realizing searching engine optimization through improved keyword optimization | |
CN113268606A (en) | Knowledge graph construction method and device | |
CN106610953A (en) | Method for solving text similarity based on Gini index | |
CN107038155A (en) | The extracting method of text feature is realized based on improved small-world network model | |
CN106528726A (en) | Keyword optimization-based search engine optimization realization technology | |
CN106776678A (en) | Search engine optimization technology is realized in new keyword optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170829 |