CN117669561A - Unsupervised keyword extraction method, system, equipment and medium - Google Patents
Unsupervised keyword extraction method, system, equipment and medium Download PDFInfo
- Publication number
- CN117669561A CN117669561A CN202311628915.2A CN202311628915A CN117669561A CN 117669561 A CN117669561 A CN 117669561A CN 202311628915 A CN202311628915 A CN 202311628915A CN 117669561 A CN117669561 A CN 117669561A
- Authority
- CN
- China
- Prior art keywords
- mask
- document
- cls
- vector
- original document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 34
- 239000013598 vector Substances 0.000 claims abstract description 75
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 230000000873 masking effect Effects 0.000 claims abstract description 11
- 238000012512 characterization method Methods 0.000 claims abstract description 9
- 238000004590 computer program Methods 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 15
- 238000012216 screening Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the field of keyword extraction technology, and in particular, to an unsupervised keyword extraction method, system, device, and medium. Preprocessing an original document to obtain a plurality of candidate keywords; masking operation is carried out on the original document according to the candidate keywords respectively, so that a masking document corresponding to each candidate keyword is obtained; inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword; respectively calculating the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents; weighting the first cosine similarity and the second cosine similarity to obtain total similarity; the accuracy and the diversity of keyword extraction can be improved.
Description
Technical Field
The present invention relates to the field of keyword extraction technology, and in particular, to an unsupervised keyword extraction method, system, device, and medium.
Background
At present, keyword extraction is divided into two types, namely supervised keyword extraction and unsupervised keyword extraction, and the document data is easy to obtain in actual engineering, the labeling data is difficult to obtain, and the unsupervised keyword extraction is more widely used. The traditional unsupervised keyword extraction only focuses on low-level features such as word frequency, position, part of speech and the like, but does not use semantics, but the semantics are decisive factors for keyword extraction, so that the traditional method is low in accuracy. The accuracy of word embedding and extracting technology based on the pre-training model is greatly improved compared with that of the traditional method, but repeated extraction can occur for words with similar semantics, and the word embedding and extracting technology lacks diversity; most embedding methods use words and documents to embed and calculate similarity, but words are often much shorter than documents, and it is difficult to use one word to represent a whole document, so that much information can be lost in the calculation method; when the pre-training language model is used for obtaining the embedding, only the last layer of output is used, the information of the middle layer is not utilized, and the information is lost.
Disclosure of Invention
The invention aims to solve the problems of low keyword extraction accuracy and lack of diversity in the prior art.
In order to achieve the above object, the present invention provides an unsupervised keyword extraction method, which is characterized in that the method includes:
preprocessing an original document to obtain a plurality of candidate keywords;
masking operation is carried out on the original document according to the candidate keywords respectively, so that a masking document corresponding to each candidate keyword is obtained;
inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
respectively calculating the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents;
weighting the first cosine similarity and the second cosine similarity to obtain total similarity;
and screening target candidate keywords according to the total similarity.
Further, the preprocessing the original document to obtain a plurality of candidate keywords includes:
and performing word segmentation, part-of-speech tagging and stop word removal on the original document through a jieba tool.
Further, the method comprises the steps of,
the pre-trained language characterization model is an AlBert model.
Further, the original document cls vector is represented by the following formula:
wherein h is i As a trainable parameter, representing the weight output by the ith layer; cls i Cls representing the i-th layer]Embedding the representation;
the cls vector of the mask document corresponding to each candidate keyword is expressed by the following formula:
the first cosine similarity is calculated using the following formula:
wherein sim is i Representing cosine similarity of the ith mask document vector and the original document vector;
the second cosine similarity is calculated using the following formula:
wherein,representing the cosine similarity of the ith and kth mask-document vectors.
Further, the weighting the first cosine similarity and the second cosine similarity to obtain a total similarity includes:
summing cosine similarities of all mask document vectors;
setting a weighting coefficient of the first cosine similarity and the second cosine similarity;
the overall similarity is calculated according to the following formula:
wherein lambda is a weighting coefficient, and the value range is 0-1.
Further, the screening the target candidate keywords according to the total similarity includes:
and sequentially screening a predetermined number of target candidate keywords according to the sequence from small to large of the total similarity.
The invention provides an unsupervised keyword extraction system, which is characterized by comprising:
the preprocessing module is used for preprocessing the original document to obtain a plurality of candidate keywords;
the mask operation module is used for performing mask operation on the original document according to the plurality of candidate keywords to obtain mask documents corresponding to each candidate keyword;
the vector acquisition module is used for inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
the computing module is used for respectively computing the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents;
the weighting module is used for weighting the first cosine similarity and the second cosine similarity to obtain total similarity;
and the screening module is used for screening target candidate keywords according to the total similarity.
Another embodiment of the present invention also proposes a computer-readable storage medium including a stored computer program; wherein the computer program, when run, controls a device in which the computer-readable storage medium resides to perform the unsupervised keyword extraction method as described above.
Another embodiment of the present invention also proposes a terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the unsupervised keyword extraction method as described above when executing the computer program.
According to the method, the system, the equipment and the medium for extracting the unsupervised keywords, which are disclosed by the embodiment of the invention, the original document is preprocessed to obtain a plurality of candidate keywords; masking operation is carried out on the original document according to the candidate keywords respectively, so that a masking document corresponding to each candidate keyword is obtained; inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword; respectively calculating the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents; weighting the first cosine similarity and the second cosine similarity to obtain total similarity; the accuracy and the diversity of keyword extraction can be improved.
Drawings
FIG. 1 is a flowchart of an unsupervised keyword extraction method provided by an embodiment of the present invention;
FIG. 2 is a block diagram of an unsupervised keyword extraction system according to an embodiment of the present invention;
fig. 3 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps. The method provided in this embodiment may be executed by a relevant server, and the following description will take the server as an execution body as an example.
As shown in fig. 1, an unsupervised keyword extraction method according to a preferred embodiment of the present invention includes steps S1 to S6:
step S1, preprocessing an original document to obtain a plurality of candidate keywords;
according to the embodiment of the invention, the jieba tool is utilized to segment the original document, label the part of speech and remove the stop word, so that the candidate word is obtained. The stopping word set uses a Chinese stopping word set built in jieba, words marked as n, nr, ns, nt, nw, nz, vn are extracted, and the words respectively represent common nouns, person names, place names, organization names, work names, other proper nouns and proper nouns to form a candidate word set A. The word segmentation, part-of-speech tagging and stop word removal operations performed on the original document in this embodiment are not limited to the use of jieba tools, which are used to explain the process of obtaining candidate keywords in this embodiment, so that other word segmentation tools may be selected to perform the above preprocessing operations on the original document.
S2, masking operation is carried out on the original document according to the candidate keywords, and masking documents corresponding to the candidate keywords are obtained;
for the candidate word set A obtained in the above step, the embodiment of the invention performs the following steps on each candidate word A in the candidate word set A n And (n candidate words) performing masking operation, namely mask operation, shielding one candidate keyword at a time, and sequentially obtaining mask documents corresponding to each candidate keyword. Through the step, the embodiment of the invention can keep the lengths of the document after mask and the original document consistent, so that the information quantity difference only exists whether the document has a shielded screen or notThe candidate keywords are masked.
If the similarity between the mask document and the original document is high, the candidate keywords which are removed by the mask are not greatly influenced on the document, namely, the keyword degree is low; if the similarity between the mask document and the original document is low, the candidate keywords which are used for explaining mask drop have a great influence on the document, namely the keyword degree is high. Therefore, the embodiment of the present invention continues to perform the comparison operation of the similarity between the original document and the mask document after the mask document is obtained, i.e., steps S3 and S4.
S3, inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
in the embodiment, an Albert model (a lightweight Bert model) is selected to perform training reasoning on an original document and mask documents corresponding to each candidate keyword, and the original document and the mask documents corresponding to each candidate keyword are input into the Albert model to obtain final document vector representations which are respectively marked as cls and cls';
wherein cls is (cls) 1 ,cls 2 ,...,cls n ) Cls 'is (cls' 1 ,cls‘ 2 ,...,cls‘ n )。
The Albert model adopted by the embodiment trains the document, and can fully use partial information of the middle layer, namely, the output of the middle layer is utilized, and a method of weighting and summing the output of each layer is adopted to obtain the final embedded representation of the document. Therefore, compared with the existing mode of only using the last layer of the pre-training model as output, partial information of the middle layer cannot be lost, the output of the middle layer can be utilized, and training reasoning accuracy is greatly improved.
Step S4, respectively calculating the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vector of the rest mask documents;
the original document cls vector is expressed by the following formula:
wherein h is i As a trainable parameter, representing the weight output by the ith layer; cls i Cls representing the i-th layer]Embedding the representation; n is the number of layers of the Albert model, and n=12 is preferred in this embodiment.
The cls vector of the mask document corresponding to each candidate keyword is expressed by the following formula:
the first cosine similarity is calculated using the following formula:
wherein sim is i Representing cosine similarity of the ith mask document vector and the original document vector;
the second cosine similarity is calculated using the following formula:
wherein,representing the cosine similarity of the ith and kth mask-document vectors.
The calculation is that the selected keywords are as various as possible, and the semantic difference is larger
Step S5, weighting the first cosine similarity and the second cosine similarity to obtain total similarity;
specifically, the present embodiment sums cosine similarities of all mask document vectors; setting a weighting coefficient of the first cosine similarity and the second cosine similarity; and calculates the total similarity according to the following formula:
wherein lambda is a weighting coefficient, and the value range is 0-1.
In this embodiment, the weighting system λ is an adjustable parameter, for example, λ is smaller than 0.5, so that the diversity of the extracted candidate keywords is more concerned in the similarity calculation, and λ is larger than 0.5, so that the association degree between the extracted candidate keywords and the original document is more concerned. The extracted vocabulary may be more varied when λ is less than 0.5, but the relevance of the vocabulary to the original document may be weakened. The association of words with the original document with lambda greater than 0.5 is enhanced, but diversity may be impaired. And (5) adjusting according to actual conditions. If the diversity of the extracted candidate keywords is more focused, λ may be set to a value less than 0.5, and if the association degree of the extracted candidate keywords with the original document is more focused, λ may be set to a value greater than 0.5.
And S6, screening target candidate keywords according to the total similarity.
The smaller the SIM is, the more important the keywords are, and the target candidate keywords with preset numbers are sequentially screened according to the sequence from the smaller total similarity to the larger total similarity.
In summary, the present example provides that a jieba tool is used to process an original document to obtain candidate keywords, then mask operation is performed on the document, a mask document corresponding to each candidate keyword and the original document are input into an Albert model, and a final vector representation is obtained by using the output of an Albert middle layer, so as to improve accuracy; and acquiring final similarity by using the cosine similarity of the original document vector and the mask document vector corresponding to each candidate keyword and the cosine similarity of the mask document vector corresponding to each candidate keyword and the rest mask document vectors, and screening the keywords according to the final similarity, so that the extracted keywords are more diversified and have high importance.
As shown in fig. 2, the embodiment of the present invention further provides an unsupervised keyword extraction system, configured to perform an unsupervised keyword extraction method as described above, where the system includes:
a preprocessing module 21, configured to preprocess an original document to obtain a plurality of candidate keywords;
the mask operation module 22 is configured to perform a mask operation on the original document according to the plurality of candidate keywords, so as to obtain a mask document corresponding to each candidate keyword;
the vector obtaining module 23 is configured to input the original document and each of the mask documents into a pre-trained language representation model, so as to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
a calculating module 24, configured to calculate a first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword, and a second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the remaining mask documents, respectively;
a weighting module 25, configured to weight the first cosine similarity and the second cosine similarity to obtain a total similarity;
and a screening module 26, configured to screen the target candidate keywords according to the total similarity.
The technical features and technical effects of the unsupervised keyword extraction system provided by the embodiment of the present invention are the same as those of the unsupervised keyword extraction method provided by the embodiment of the present invention, and are not repeated here. The modules in the above-described unsupervised keyword extraction system may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program; wherein the computer program, when run, controls a device in which the computer-readable storage medium resides to perform an unsupervised keyword extraction method as described above.
As shown in fig. 3, the embodiment of the present invention further provides a computer device, and fig. 3 is a block diagram of a preferred embodiment of the computer device provided by the present invention, where the computer device includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements an unsupervised keyword extraction method as described above when executing the computer program.
Preferably, the computer program may be divided into one or more modules/units (e.g. computer program 1, computer program 2, … …) stored in the memory and executed by the processor to complete the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the computer device.
The processor may be a central processing unit (Central Processing Unit, CPU), or may be other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., or the general purpose processor may be a microprocessor, or any conventional processor, which is the control center of the terminal device, that connects the various parts of the terminal device using various interfaces and lines.
The memory mainly includes a program storage area, which may store an operating system, an application program required for at least one function, and the like, and a data storage area, which may store related data and the like. In addition, the memory may be a high-speed random access memory, a nonvolatile memory such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like, or may be other volatile solid-state memory devices.
It should be noted that the above-mentioned terminal device may include, but is not limited to, a processor, a memory, and those skilled in the art will understand that the structural block diagram of fig. 3 is merely an example of the terminal device, and does not constitute limitation of the terminal device, and may include more or less components than those illustrated, or may combine some components, or different components.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and substitutions will now occur to those skilled in the art without departing from the spirit of the present invention, and these modifications and substitutions should also be considered to be within the scope of the present invention.
Claims (9)
1. An unsupervised keyword extraction method, comprising:
preprocessing an original document to obtain a plurality of candidate keywords;
masking operation is carried out on the original document according to the candidate keywords respectively, so that a masking document corresponding to each candidate keyword is obtained;
inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
respectively calculating the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents;
weighting the first cosine similarity and the second cosine similarity to obtain total similarity;
and screening target candidate keywords according to the total similarity.
2. The method for extracting unsupervised keywords of claim 1, wherein the preprocessing the original document to obtain a plurality of candidate keywords comprises:
and performing word segmentation, part-of-speech tagging and stop word removal on the original document through a jieba tool.
3. The method for extracting an unsupervised keyword according to claim 1,
the pre-trained language characterization model is an AlBert model.
4. The method for extracting unsupervised keywords according to claim 1, wherein the cls vector of the original document is represented by the following formula:
wherein h is i As a trainable parameter, representing the weight output by the ith layer; cls i Cls representing the i-th layer]Embedding the representation;
the cls vector of the mask document corresponding to each candidate keyword is expressed by the following formula:
the first cosine similarity is calculated using the following formula:
wherein sim is i Representing cosine similarity of the ith mask document vector and the original document vector;
the second cosine similarity is calculated using the following formula:
wherein,representing the cosine similarity of the ith and kth mask-document vectors.
5. The method for extracting an unsupervised keyword according to claim 4, wherein weighting the first cosine similarity and the second cosine similarity to obtain a total similarity comprises:
summing cosine similarities of all mask document vectors;
setting a weighting coefficient of the first cosine similarity and the second cosine similarity;
the overall similarity is calculated according to the following formula:
wherein lambda is a weighting coefficient, and the value range is 0-1.
6. The method for extracting an unsupervised keyword according to claim 1, wherein the screening the target candidate keywords according to the total similarity comprises:
and sequentially screening a predetermined number of target candidate keywords according to the sequence from small to large of the total similarity.
7. An unsupervised keyword extraction system, the system comprising:
the preprocessing module is used for preprocessing the original document to obtain a plurality of candidate keywords;
the mask operation module is used for performing mask operation on the original document according to the plurality of candidate keywords to obtain mask documents corresponding to each candidate keyword;
the vector acquisition module is used for inputting the original document and each mask document into a pre-trained language characterization model to obtain an original document cls vector and a mask document cls vector corresponding to each candidate keyword;
the computing module is used for respectively computing the first cosine similarity of the cls vector of the original document and the cls vector of the mask document corresponding to each candidate keyword and the second cosine similarity of the cls vector of the mask document corresponding to each candidate keyword and the cls vectors of the rest mask documents;
the weighting module is used for weighting the first cosine similarity and the second cosine similarity to obtain total similarity;
and the screening module is used for screening target candidate keywords according to the total similarity.
8. A computer device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the unsupervised keyword extraction method of any one of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; wherein the computer program, when run, controls a device in which the computer-readable storage medium is located to perform the unsupervised keyword extraction method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311628915.2A CN117669561A (en) | 2023-11-30 | 2023-11-30 | Unsupervised keyword extraction method, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311628915.2A CN117669561A (en) | 2023-11-30 | 2023-11-30 | Unsupervised keyword extraction method, system, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117669561A true CN117669561A (en) | 2024-03-08 |
Family
ID=90070770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311628915.2A Pending CN117669561A (en) | 2023-11-30 | 2023-11-30 | Unsupervised keyword extraction method, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117669561A (en) |
-
2023
- 2023-11-30 CN CN202311628915.2A patent/CN117669561A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111291177B (en) | Information processing method, device and computer storage medium | |
CN110502742B (en) | Complex entity extraction method, device, medium and system | |
CN112633423B (en) | Training method of text recognition model, text recognition method, device and equipment | |
CN112287656B (en) | Text comparison method, device, equipment and storage medium | |
CN113159013A (en) | Paragraph identification method and device based on machine learning, computer equipment and medium | |
CN112100374A (en) | Text clustering method and device, electronic equipment and storage medium | |
EP4089568A1 (en) | Cascade pooling for natural language document processing | |
CN112307175B (en) | Text processing method, text processing device, server and computer readable storage medium | |
CN112445914A (en) | Text classification method, device, computer equipment and medium | |
CN116257601A (en) | Illegal word stock construction method and system based on deep learning | |
CN113486169B (en) | Synonymous statement generation method, device, equipment and storage medium based on BERT model | |
CN112579774B (en) | Model training method, model training device and terminal equipment | |
CN117669561A (en) | Unsupervised keyword extraction method, system, equipment and medium | |
CN112528646B (en) | Word vector generation method, terminal device and computer-readable storage medium | |
CN110928987B (en) | Legal provision retrieval method and related equipment based on neural network hybrid model | |
CN113962221A (en) | Text abstract extraction method and device, terminal equipment and storage medium | |
CN111767710B (en) | Indonesia emotion classification method, device, equipment and medium | |
CN113836297A (en) | Training method and device for text emotion analysis model | |
CN110688472A (en) | Method for automatically screening answers to questions, terminal equipment and storage medium | |
CN112380974B (en) | Classifier optimization method, back door detection method and device and electronic equipment | |
CN115688771B (en) | Document content comparison performance improving method and system | |
CN115146596B (en) | Recall text generation method and device, electronic equipment and storage medium | |
CN114091456B (en) | Intelligent positioning method and system for quotation contents | |
CN117669493B (en) | Intelligent image-text typesetting method and system based on significance detection | |
CN113297353B (en) | Text matching method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |