CN116414971A - Keyword weight calculation method and keyword extraction method for multi-feature fusion - Google Patents

Keyword weight calculation method and keyword extraction method for multi-feature fusion Download PDF

Info

Publication number
CN116414971A
CN116414971A CN202310185632.9A CN202310185632A CN116414971A CN 116414971 A CN116414971 A CN 116414971A CN 202310185632 A CN202310185632 A CN 202310185632A CN 116414971 A CN116414971 A CN 116414971A
Authority
CN
China
Prior art keywords
word
value
document
pseudo
quasi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310185632.9A
Other languages
Chinese (zh)
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mabo Nanjing Intelligent Technology Co ltd
Original Assignee
Mabo Nanjing Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mabo Nanjing Intelligent Technology Co ltd filed Critical Mabo Nanjing Intelligent Technology Co ltd
Priority to CN202310185632.9A priority Critical patent/CN116414971A/en
Publication of CN116414971A publication Critical patent/CN116414971A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a keyword weight calculation method of multi-feature fusion, which comprises the steps of firstly collecting and analyzing a document set, and marking each article in the document set by word segmentation to form alternative phrases; then, any alternative word is used as a quasi-selected word to obtain a normalized word frequency value, a word head position value, a word tail position value, a word average span value, a word head and tail span value, a word length value, a part-of-speech value, a TFIDF value and an average information entropy as multidimensional characteristics; and finally, calculating a fusion weight calculation formula of the quasi-selected word. The invention also discloses a keyword extraction method of the multi-feature fusion, which is carried out based on a textword algorithm, wherein the weights of edges in the algorithm are obtained according to a fusion weight calculation formula, so that keywords are extracted. The method has the remarkable effects that the word average span capable of better expressing the keyword distribution and the word head-tail span are adopted; the mode of fusion weight is solved by using Lasso regression, so that unimportant features are deleted, important features are reserved, and the accuracy of keyword extraction can be improved.

Description

Keyword weight calculation method and keyword extraction method for multi-feature fusion
Technical Field
The invention relates to a keyword extraction technology in a document, in particular to a TextRank automatic keyword extraction method integrating various word importance characteristics.
Background
With the rapid development of Internet big data, unstructured document data resources become huge, users are surrounded by a large amount of irrelevant information, and the accurate keyword extraction technology can effectively classify the document data, so that convenience is provided for users to accurately search and inquire.
Keyword extraction refers to extracting important words or phrases from text as abstracts or points of the text. This technique is commonly used in text summarization, document classification, information retrieval, and other applications. Traditional keyword extraction methods can be divided into TFIDF based on word frequency statistics, but only depend on word frequency information, other important information of words is absent, and the effect is not ideal; the topic model based on LDA often needs to be trained in advance, and the topic distribution of a training document set is greatly dependent; the TextRank algorithm based on graph nodes ignores the importance of the words and has poor effect, so that the Lasso-TextRank keyword extraction method with multi-feature fusion is provided for improving the current situation.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for extracting document keywords from a large number of documents, wherein the fusion method comprises the steps of constructing multidimensional features of the keywords, and then solving fusion weight expressions of the multidimensional features through Lasso regression. And then, solving the fusion weight of the word through a fusion weight expression to improve the weight of the TextRank initial vocabulary node, and finally, obtaining the keyword of the document through iterative updating. The method of the scheme fully considers the importance degree of the words on the single-document and multi-document layers, and can effectively avoid missed extraction and wrong extraction of the keywords.
In order to solve the calculation problem of the fusion weight, the main technical scheme adopted is as follows:
the keyword weight calculation method for multi-feature fusion is characterized by comprising the following steps of:
step 1.1, collecting and selecting n documents as an analysis document set;
step 1.2, performing word segmentation marking on each article in the analysis document set, and arranging all candidate words of the same article in a front-back sequence to form a candidate phrase;
taking any alternative word as a quasi-optional word, and acquiring the multidimensional feature of each quasi-optional word;
the multi-dimensional feature comprises a normalized word frequency value w TF1 Head position value w FP Word tail position value w PL Word averageSpan value w MTS End-of-word span value w PFL Word length value w TL Part of speech value w PSO TFIDF value w TFIDF Average information entropy w IH
Step 1.3, calculating fusion weight y of the pseudo words based on the following formula (1);
Figure SMS_1
alpha is normalized word frequency value w of the quasi-selected word TF1 Weight coefficient of (2);
beta is the initial position value w of the word to be selected FP Weight coefficient of (2);
gamma is the word tail position value w of the pseudo word PL Weight coefficient of (2);
delta being a pseudonym word average span value w MTS Weight coefficient of (2);
epsilon is the word head-tail span value w of the quasi-selected word PFL Weight coefficient of (2);
e is word length value w of the quasi-selected word TL Weight coefficient of (2);
θ is the part of speech value w of the word to be selected PSO Weight coefficient of (2);
Figure SMS_2
TFIDF value w for the pseudo-word TFIDF Weight coefficient of (2);
mu is the average information entropy w of the pseudo-optional word IH Weight coefficient of (2);
w 0 is a constant term.
In order to obtain the keywords of the document, the following main technical scheme is adopted:
the keyword extraction method for multi-feature fusion comprises the following steps:
step 2.1, obtaining a calculation formula of the fusion weight according to the method;
step 2.2, dividing the content of the document to be judged into complete sentences according to the sequence, then carrying out word segmentation and part-of-speech tagging on each sentence, filtering out stop words, and reserving words with specified part-of-speech as candidate keywords;
step 2.3, constructing a candidate keyword graph g= (V, E), wherein V is a node set of words, and is composed of the candidate keywords obtained in step 2.1; constructing an edge E between any two word nodes by adopting a co-occurrence relation;
edges exist between two word nodes, and only when corresponding words coexist in a window with the length of K, wherein K represents the size of the window, namely K words at most coexist;
updating the ith word node V according to the following formula (11) i Importance weight WS (V) i ):
Figure SMS_3
d is an adjustment coefficient;
In(V i ) To point to word node V i Other node sets;
y j for the j-th word node V j Pointing to word node V i Firstly, acquiring word node V j The multi-dimensional characteristics of the three-dimensional model are calculated according to a formula (1);
Out(V j ) For slave word node V j A set of other nodes indicated;
y jk for word node V j Fusion weights pointing to the kth pointed node;
WS(V j ) For the j-th word node V j Importance weights of (2);
iteratively propagating the weight of each node according to the formula (11) until convergence;
and 2.4, carrying out reverse order sequencing on importance weights of all word nodes, and sequentially extracting selected keywords from the candidate keywords.
Detailed Description
The invention is further illustrated by the following examples.
Example 1:
a keyword weight calculation method for multi-feature fusion is carried out according to the following steps:
step 1.1, collecting and selecting n documents as an analysis document set;
the method is necessary to clean the documents in advance, including directly filtering and removing abnormal documents such as data text messy code documents, abnormal documents and the like existing in the documents;
step 1.2, performing word segmentation marking on each article in the analysis document set, and arranging all candidate words of the same article in a front-back sequence to form a candidate phrase;
taking any alternative word as a quasi-optional word, and acquiring the multidimensional feature of each quasi-optional word;
the multi-dimensional feature comprises a normalized word frequency value w TF1 Head position value w FP Word tail position value w PL Word average span value w MTS End-of-word span value w PFL Word length value w TL Part of speech value w PSO TFIDF value w TFIDF Average information entropy w IH
The multi-dimensional characteristics are calculated as follows:
(1) based on the same document, the word average span value wMTS of the pseudo word is calculated according to the following formula (3):
Figure SMS_4
S g (w) is the distance of the g-th front-rear span of the pseudo sentence in the corresponding document;
h is the number of spans which can be calculated in the corresponding document of the quasi-optional word, and the value of h is equal to the number of times that the quasi-optional word appears in the corresponding document minus one.
The g-th span means how many other alternatives are separated between two adjacent occurrences of the same term. Keywords are often mentioned in documents, so the span length of the keywords is generally smaller, and non-keywords are larger. If a word to be selected only appears once, the word average span value of the word to be selected defaults to the maximum span value of other alternative words in the corresponding document.
(2) Based on the same document, pressCalculating the end-of-word span value w of the pseudo word according to the following formula (4) PFL
Figure SMS_5
LP1 (w) is the total number of other preceding alternatives when the term of choice appears last time in the corresponding document;
FP (w) is a prefix position value of the pseudo word;
SumPard (d) is the total number of occurrences of all alternatives in the corresponding document.
(3) A method for calculating the weight of a multi-feature fusion keyword according to claim 1, wherein: based on the same document, calculating the normalized word frequency value w of the quasi word according to the following formula (5) TF1
Figure SMS_6
Wherein:
f (w) is the number of times the pseudo-word appears in the corresponding document;
min (f (d)) is the minimum number of occurrences of the candidate word in the corresponding document;
max (f (d)) is the maximum number of occurrences of the candidate word in the corresponding document;
(4) based on the same document, the head position value wFP of the pseudo word is calculated according to the following formula (6):
Figure SMS_7
wherein:
PF (w) is the number of preceding alternatives when the candidate appears for the first time in the corresponding document;
(5) based on the same document, calculating the end-of-word position value wPL of the pseudo word according to the following formula (7):
Figure SMS_8
wherein:
LP (w) is the number of alternatives following the last occurrence of the term in the corresponding document;
(6) based on the same document, the word length value wTL of the pseudo word is calculated according to the following formula (8):
Figure SMS_9
wherein:
l (w) is the length of the word to be selected;
max (L (d)) is the maximum value of the word length of the candidate words in the corresponding document;
min (L (d)) is the minimum value of the word length of the candidate words in the corresponding document;
(7) the probability of the part of speech of the pseudo-selected word as the key word is recorded as the part of speech value w of the pseudo-selected word PSO The method comprises the steps of carrying out a first treatment on the surface of the The part of speech of the word to be selected comprises a plurality of types such as noun, adjective, verb, preposition, adverb, auxiliary word and the like, the probability of the noun, adjective, verb and the like serving as keywords is higher, the probability of the preposition, adverb and auxiliary word is lower, and the probability of the part of speech of the word to be selected serving as keywords is defined after artificial statistics according to experience and long-term statistics;
(8) calculating the TFIDF value w of the pseudo word according to the following formula (9) TFIDF
Figure SMS_10
Wherein:
f (w) is the number of times that the pseudonym appears in the corresponding same document;
k f (w) is the total number of occurrences of all alternatives in the same document;
n is the number of documents in the analysis document set;
|{j:w∈n j the } | is the number of files in the analysis file set, which contains the quasi-selected words;
(9) according toThe average information entropy w is calculated by the following formula (10) IH
Figure SMS_11
Wherein:
f wd the frequency of the quasi-optional word in the corresponding same document is used;
f w the frequency of the quasi-word in all the documents in the analysis document set is determined;
n is the number of documents within the analysis document set.
Step 1.3, calculating fusion weight y of the pseudo words based on the following formula (1);
Figure SMS_12
alpha is normalized word frequency value w of the quasi-selected word TF1 Weight coefficient of (2);
beta is the initial position value w of the word to be selected FP Weight coefficient of (2);
gamma is the word tail position value w of the pseudo word PL Weight coefficient of (2);
delta is word average span value w of the pseudo word MTS Weight coefficient of (2);
epsilon is the word head-tail span value w of the quasi-selected word PFL Weight coefficient of (2);
e is word length value w of the quasi-selected word TL Weight coefficient of (2);
θ is the part of speech value w of the word to be selected PSO Weight coefficient of (2);
Figure SMS_13
TFIDF value w for the pseudo-word TFIDF Weight coefficient of (2);
mu is the average information entropy w of the pseudo-optional word IH Weight coefficient of (2);
w 0 is a constant term.
In the step 1.3, the weight coefficients alpha, beta, gamma, delta, epsilon, theta of each item,
Figure SMS_14
μ,w 0 can be manually specified or calculated according to the following formula (2):
Figure SMS_15
Figure SMS_16
n is the number of documents in the analysis document set;
y i fusion weights for artificially labeled pseudonyms;
Figure SMS_17
fusing weights for the estimated keywords;
lambda is the adjustment coefficient.
Example 2:
the keyword extraction method for multi-feature fusion comprises the following steps:
step 2.1, obtaining a calculation formula of the fusion weight according to the method of the embodiment 1;
step 2.2, dividing the content of the document to be judged into complete sentences according to the sequence, then carrying out word segmentation and part-of-speech tagging on each sentence, filtering out stop words, and reserving words with specified part-of-speech as candidate keywords;
step 2.3, constructing a candidate keyword graph g= (V, E), wherein V is a node set of words, and is composed of the candidate keywords obtained in step 2.1; constructing an edge E between any two word nodes by adopting a co-occurrence relation;
edges exist between two word nodes, and only when corresponding words coexist in a window with the length of K, wherein K represents the size of the window, namely K words at most coexist;
updating the ith word node V according to the following formula (11) i Importance weight WS (V) i ):
Figure SMS_18
d is an adjustment coefficient;
In(V i ) To point to word node V i Other node sets;
y j for the j-th word node V j Pointing to word node V i Is a fusion weight of (2);
Out(V j ) For slave word node V j A set of other nodes indicated;
y jk for word node V j Fusion weights pointing to the kth pointed node;
WS(V j ) For the j-th word node V j Importance weights of (2);
iteratively propagating the weight of each node according to the formula (11) until convergence;
and 2.4, carrying out reverse order sequencing on importance weights of all word nodes, and sequentially extracting selected keywords from the candidate keywords.
Jth word node V j Pointing to word node V i Is a fusion weight of V j Calculated according to the formula (1), firstly, the word node V is obtained j Multi-dimensional features of w TF1 、w FP 、w PL 、w MTS 、w PFL 、w TL 、w PSO 、w TFIDF 、w IH The method comprises the steps of carrying out a first treatment on the surface of the Then calculate y according to the formula (1) j
The beneficial effects are that: by adopting the method, the comprehensive evaluation of the characteristics of the keyword distribution word such as average span, head-to-tail span and the like can be better expressed; and the mode of solving the fusion weight by using Lasso regression is adopted to select the self-carried features, so that the unimportant features are deleted, the important features are reserved, and the accuracy of extracting the keywords can be improved.
Example 3:
the network collects 500 Chinese academic documents for keyword extraction, 400 of the 500 Chinese academic documents are used for training parameters to be estimated in a fusion weight formula (1) according to the method of the embodiment 1, and the rest 100 are used for testing the effect of extracting keywords by the algorithm of the text according to the method of the embodiment 2. The test comparison effect of the algorithm herein with the TextRank algorithm and TFIDF-TextRank algorithm is recorded in table 1, with window length k=5.
TABLE 1 test comparison of the algorithms herein with TextRank and TFIDF-TextRank
Algorithm model Accuracy (%) Recall (%)
TextRank 0.49 0.38
TFIDF-TextRank 0.55 0.49
Example 2 method 0.61 0.57
The accuracy is defined as: accuracy = number of extracted correct related words/number of co-extracted keywords;
recall is defined as: recall = number of extracted correct related words/number of actual keywords.
Finally, it should be noted that the above description is only a preferred embodiment of the present invention, and that many similar changes can be made by those skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A keyword weight calculation method for multi-feature fusion is characterized by comprising the following steps:
step 1.1, collecting documents and selecting n documents as an analysis document set;
step 1.2, performing word segmentation marking on each article in the analysis document set, and arranging all candidate words of the same article in a front-back sequence to form a candidate phrase;
taking any alternative word as a quasi-optional word, and acquiring the multidimensional feature of each quasi-optional word;
the multi-dimensional feature comprises a normalized word frequency value w TF1 Head position value w FP Word tail position value w PL Word average span value w MTS End-of-word span value w PFL Word length value w TL Part of speech value w PSO TFIDF value w TFIDF Average information entropy w IH
Step 1.3, calculating fusion weight y of the pseudo words based on the following formula (1);
Figure FDA0004103692650000011
wherein:
alpha is normalized word frequency value w of the quasi-selected word TF1 Weight coefficient of (2);
beta is the initial position value w of the word to be selected FP Weight coefficient of (2);
gamma is the word tail position value w of the pseudo word PL Weight coefficient of (2);
delta is word average span value w of the pseudo word MTS Weight coefficient of (2);
epsilon is the word head-tail span value w of the quasi-selected word PFL Weight coefficient of (2);
e is word length value w of the quasi-selected word TL Weight coefficient of (2);
θ is the part of speech value w of the word to be selected PSO Weight coefficient of (2);
Figure FDA0004103692650000012
TFIDF value w for the pseudo-word TFIDF Weight coefficient of (2);
mu is the average information entropy w of the pseudo-optional word IH Weight coefficient of (2);
w 0 is a constant term.
2. The keyword weight calculation method of multi-feature fusion according to claim 1, wherein the method comprises the following steps: in the step 1.3, each weight coefficient is calculated according to the following formula (2):
Figure FDA0004103692650000021
Figure FDA0004103692650000022
n is the number of documents in the analysis document set;
y i fusion weights for artificially labeled pseudonyms;
Figure FDA0004103692650000023
fusing weights for the estimated keywords;
lambda is the adjustment coefficient.
3. The keyword weight calculation method of multi-feature fusion according to claim 1, wherein the method comprises the following steps: based on the same document, the word average span value w of the pseudo word is calculated according to the following formula (3) MTS
Figure FDA0004103692650000024
Wherein: s is S g (w) is the distance of the g-th front-rear span of the pseudo sentence in the corresponding document;
h is the number of spans which can be calculated in the corresponding document of the quasi-optional word, and the value of h is equal to the number of times of occurrence of the quasi-optional word in the corresponding document minus one.
4. The keyword weight calculation method of multi-feature fusion according to claim 1, wherein the method comprises the following steps: based on the same document, calculating the end-of-word span value w of the pseudo word according to the following formula (4) PFL
Figure FDA0004103692650000025
Wherein:
LP1 (w) is the total number of other preceding alternatives when the term of choice appears last time in the corresponding document;
FP (w) is a prefix position value of the pseudo word;
SumPard (d) is the total number of occurrences of all alternatives in the corresponding document.
5. The keyword weight calculation method of multi-feature fusion according to claim 1, wherein the method comprises the following steps: based on the same document, calculating the normalized word frequency value w of the quasi word according to the following formula (5) TF1
Figure FDA0004103692650000031
Wherein:
f (w) is the number of times the pseudo-word appears in the corresponding document;
min (f (d)) is the minimum number of occurrences of the candidate word in the corresponding document;
max (f (d)) is the maximum number of occurrences of the candidate word in the corresponding document;
based on the same document, the head position value wFP of the pseudo word is calculated according to the following formula (6):
Figure FDA0004103692650000032
wherein:
PF (w) is the number of preceding alternatives when the candidate appears for the first time in the corresponding document;
based on the same document, the end-of-word position value wPL of the pseudo word is calculated according to the following formula (7):
Figure FDA0004103692650000033
wherein:
LP (w) is the number of alternatives following the last occurrence of the term in the corresponding document;
based on the same document, calculating the word length value w of the quasi-selected word according to the following formula (8) TL
Figure FDA0004103692650000034
Wherein:
l (w) is the length of the word to be selected;
max (L (d)) is the maximum value of the word length of the candidate words in the corresponding document;
min (L (d)) is the minimum value of the word length of the candidate words in the corresponding document;
the probability of the part of speech of the pseudo-word is taken as the key word is marked as the part of speech value w of the pseudo-word PSO
6. The keyword weight calculation method of multi-feature fusion according to claim 1, wherein the method comprises the following steps: calculating the TFIDF value w of the pseudo word according to the following formula (9) TFIDF
Figure FDA0004103692650000041
Wherein:
f (w) is the number of times that the pseudonym appears in the corresponding same document;
k f (w) is the total number of occurrences of all alternatives in the same document;
n is the number of documents in the analysis document set;
|{j:w∈n j the } | is the number of files in the analysis file set, which contains the quasi-selected words;
calculating the average information entropy w according to the following formula (10) IH
Figure FDA0004103692650000042
Wherein:
f wd the frequency of the quasi-optional word in the corresponding same document is used;
f w the frequency of the quasi-word in all the documents in the analysis document set is determined;
n is the number of documents within the analysis document set.
7. A keyword extraction method for multi-feature fusion is characterized by comprising the following steps:
step 2.1, obtaining a calculation formula of the fusion weight according to the method of any one of claims 1-6;
step 2.2, dividing the content of the document to be judged into complete sentences according to the sequence, then carrying out word segmentation and part-of-speech tagging on each sentence, filtering out stop words, and reserving words with specified part-of-speech as candidate keywords;
step 2.3, constructing a candidate keyword graph g= (V, E), wherein V is a node set of words, and is composed of the candidate keywords obtained in step 2.1; constructing an edge E between any two word nodes by adopting a co-occurrence relation;
edges exist between two word nodes, and only when corresponding words coexist in a window with the length of K, wherein K represents the size of the window, namely K words at most coexist;
updating the ith word node V according to the following formula (11) i Importance weight WS (V) i ):
Figure FDA0004103692650000043
d is an adjustment coefficient;
In(V i ) To point to word node V i Other node sets;
y j for the j-th word node V j Pointing to word node V i Firstly, acquiring word node V j The multi-dimensional characteristics of the three-dimensional model are calculated according to a formula (1);
Out(V j ) For slave word node V j A set of other nodes indicated;
y jk for word node V j Fusion weights pointing to the kth pointed node;
WS(V j ) For the j-th word node V j Importance weights of (2);
iteratively propagating the weight of each node according to the formula (11) until convergence;
and 2.4, carrying out reverse order sequencing on importance weights of all word nodes, and sequentially extracting selected keywords from the candidate keywords.
CN202310185632.9A 2023-03-01 2023-03-01 Keyword weight calculation method and keyword extraction method for multi-feature fusion Pending CN116414971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310185632.9A CN116414971A (en) 2023-03-01 2023-03-01 Keyword weight calculation method and keyword extraction method for multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310185632.9A CN116414971A (en) 2023-03-01 2023-03-01 Keyword weight calculation method and keyword extraction method for multi-feature fusion

Publications (1)

Publication Number Publication Date
CN116414971A true CN116414971A (en) 2023-07-11

Family

ID=87054035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310185632.9A Pending CN116414971A (en) 2023-03-01 2023-03-01 Keyword weight calculation method and keyword extraction method for multi-feature fusion

Country Status (1)

Country Link
CN (1) CN116414971A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013955A (en) * 2024-04-08 2024-05-10 中国标准化研究院 Standard information updating method based on association algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013955A (en) * 2024-04-08 2024-05-10 中国标准化研究院 Standard information updating method based on association algorithm

Similar Documents

Publication Publication Date Title
CN110059311B (en) Judicial text data-oriented keyword extraction method and system
CN111177365B (en) Unsupervised automatic abstract extraction method based on graph model
CN108052593B (en) Topic keyword extraction method based on topic word vector and network structure
CN110413986B (en) Text clustering multi-document automatic summarization method and system for improving word vector model
CN109408642B (en) Domain entity attribute relation extraction method based on distance supervision
CN109858028B (en) Short text similarity calculation method based on probability model
CN109190117B (en) Short text semantic similarity calculation method based on word vector
CN107562717B (en) Text keyword extraction method based on combination of Word2Vec and Word co-occurrence
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN113011533A (en) Text classification method and device, computer equipment and storage medium
CN108132927B (en) Keyword extraction method for combining graph structure and node association
CN107577671B (en) Subject term extraction method based on multi-feature fusion
CN109522547B (en) Chinese synonym iteration extraction method based on pattern learning
WO2019024838A1 (en) Search item generation method and relevant apparatus
CN110543564B (en) Domain label acquisition method based on topic model
CN114065758A (en) Document keyword extraction method based on hypergraph random walk
CN110851714A (en) Text recommendation method and system based on heterogeneous topic model and word embedding model
CN113988053A (en) Hot word extraction method and device
CN114706972A (en) Unsupervised scientific and technical information abstract automatic generation method based on multi-sentence compression
CN116414971A (en) Keyword weight calculation method and keyword extraction method for multi-feature fusion
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
Ao et al. News keywords extraction algorithm based on TextRank and classified TF-IDF
CN113343118A (en) Hot event discovery method under mixed new media
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
Meena et al. Feature priority based sentence filtering method for extractive automatic text summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination