CN112597776A - Keyword extraction method and system - Google Patents

Keyword extraction method and system Download PDF

Info

Publication number
CN112597776A
CN112597776A CN202110251354.3A CN202110251354A CN112597776A CN 112597776 A CN112597776 A CN 112597776A CN 202110251354 A CN202110251354 A CN 202110251354A CN 112597776 A CN112597776 A CN 112597776A
Authority
CN
China
Prior art keywords
keyword
vocabulary
module
keyword extraction
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110251354.3A
Other languages
Chinese (zh)
Inventor
郑志军
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN202110251354.3A priority Critical patent/CN112597776A/en
Publication of CN112597776A publication Critical patent/CN112597776A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a keyword extraction method and a system, wherein the method comprises the following steps: segmenting a news text into words so as to segment the news text into a sequence with the words as minimum semantic units; performing entity identification on the news text; combining at least two words adjacent to each other in the sequence to obtain a combined vocabulary, judging whether the combined vocabulary is a certain entity vocabulary, and if the combined vocabulary is the certain entity vocabulary, replacing each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence; extracting a first candidate keyword vocabulary set from the sequence by a keyword extraction algorithm based on a word graph model, and extracting a second candidate keyword vocabulary set from the sequence by a keyword extraction algorithm based on statistical characteristics; and (6) solving intersection. The keyword extraction method and the system can quickly and accurately extract the keywords.

Description

Keyword extraction method and system
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a keyword extraction method and system.
Background
With the popularization of networks, more and more people acquire information through internet. Reading news becomes a part of people's daily life, but the network is full of a large amount of text data, so that the idea of how to help people to quickly browse news and make people quickly know the news is always a research hotspot.
Keyword extraction is a common task in the field of NLP (natural language processing), and can extract a plurality of words most relevant to the meaning of an article, so that a user can quickly know the idea of the text by reading the keywords of the article, and the development of the technology reduces the time for people to browse information to a certain extent. At present, common keyword extraction methods can be divided into two categories, namely an unsupervised keyword extraction method and a supervised keyword extraction method.
The unsupervised keyword extraction method comprises the steps of extracting candidate words, scoring each candidate word, and outputting a plurality of candidate words with higher scores as keywords. According to different scoring strategies, the method can be divided into keyword extraction based on a word graph model, keyword extraction based on statistical characteristics and keyword extraction based on a topic model; specifically, extracting keywords based on a word graph model to construct a word network graph of a document, analyzing words in the network graph, and searching words or phrases with important functions on the graph, wherein the words (or phrases) are keywords of the document; the idea of the keyword extraction algorithm based on the statistical characteristics is to extract keywords of the document by utilizing the statistical information of words in the document; the keyword extraction algorithm based on the theme mainly utilizes the property about theme distribution in the theme model to extract keywords.
The method for extracting the keywords based on supervision is to regard the keyword extraction task as a classification task or a sequence labeling task. In the classification task, candidate words are extracted, then each candidate word is subjected to secondary classification, and whether the candidate word is a keyword or not is judged. In the sequence labeling task, an algorithm labels the minimum semantic unit (characters, words and the like) of the text, and extracts key words in the text through the combination of the labels.
The inventor finds that the keyword extraction method based on supervised learning needs high labor cost to label the linguistic data in the process of realizing the method, so that the method is difficult to be applied in a large scale. The method based on unsupervised learning does not need the process of manually labeling a training set, so that the method is faster, but due to the fact that word segmentation errors exist, various information cannot be effectively and comprehensively utilized to screen keywords, the problem that the ordering of the keywords is not logical and the like is solved, and the unsupervised keyword extraction method is poor in effect.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a keyword extraction method and a keyword extraction system, which can quickly and accurately extract keywords.
In order to achieve the above object, the present invention provides a keyword extraction method, which includes: segmenting a news text into words so as to segment the news text into a sequence with the words as minimum semantic units; carrying out entity recognition on the news text and extracting each entity vocabulary; combining at least two words adjacent to each other in the sequence to obtain a combined vocabulary, judging whether the combined vocabulary is a certain entity vocabulary, and if the combined vocabulary is the certain entity vocabulary, replacing each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence; extracting candidate keyword vocabularies from the sequence by a keyword extraction algorithm based on a word graph model so as to obtain a first candidate keyword vocabulary set, and extracting candidate keyword vocabularies from the sequence by a keyword extraction algorithm based on statistical characteristics so as to obtain a second candidate keyword vocabulary set; and solving the intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set.
In an embodiment of the present invention, the keyword extraction method further includes: and sequencing the candidate keyword vocabularies in the intersection according to the sequence of the candidate keyword vocabularies appearing in the news text, so as to obtain a third candidate keyword vocabulary set.
In an embodiment of the present invention, the keyword extraction method further includes: and sequencing the candidate keyword vocabularies in the intersection according to a linguistic rule so as to obtain a third candidate keyword vocabulary set.
In an embodiment of the present invention, the keyword extraction method further includes: and calculating mutual information of two adjacent words in the third candidate keyword vocabulary set, combining the two adjacent words with mutual information values larger than a preset threshold value into one word, and thus obtaining a final keyword vocabulary set.
Based on the same inventive concept, the invention also provides a keyword extraction system, which is characterized by comprising the following steps: the system comprises a word segmentation module, an entity identification module, a first combination module, a first keyword extraction algorithm module, a second keyword extraction algorithm module and an intersection solving module. The word segmentation module is used for segmenting the news text into words which are used as sequences of the minimum semantic unit; the entity recognition module is coupled with the word segmentation module and used for carrying out entity recognition on the news text and extracting each entity word; the first combination module is coupled with the word segmentation module and the entity recognition module and is used for combining at least two words adjacent to each other in the sequence to obtain a combined vocabulary, judging whether the combined vocabulary is a certain entity vocabulary or not, and if the combined vocabulary is the certain entity vocabulary, replacing each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence; the first keyword extraction algorithm module is coupled with the first combination module and used for extracting candidate keyword vocabularies from the sequence based on a keyword extraction algorithm of a word graph model so as to obtain a first candidate keyword vocabulary set; the second keyword extraction algorithm module is coupled with the first combination module and used for extracting candidate keyword vocabularies from the sequence based on a keyword extraction algorithm of statistical characteristics so as to obtain a second candidate keyword vocabulary set; and the intersection solving module is coupled with the first keyword extraction algorithm module and the second keyword extraction algorithm module and is used for solving the intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set.
In an embodiment of the present invention, the keyword extraction system further includes: and the sequencing module is coupled with the intersection solving module and is used for sequencing the candidate keyword vocabularies in the intersection according to the sequence of the candidate keyword vocabularies appearing in the news text, so that a third candidate keyword vocabulary set is obtained.
In an embodiment of the present invention, the keyword extraction system further includes: and the sequencing module is coupled with the intersection solving module and is used for sequencing the candidate keyword vocabularies in the intersection according to the linguistic rule so as to obtain a third candidate keyword vocabulary set.
In an embodiment of the present invention, the keyword extraction system further includes: and the second combination module is coupled with the sorting module and used for calculating mutual information of two adjacent words in the third candidate keyword vocabulary set, combining the two adjacent words with mutual information values larger than a preset threshold value into one word, and thus obtaining a final keyword vocabulary set.
Based on the same inventive concept, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the keyword extraction method according to any of the above embodiments.
Based on the same inventive concept, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the keyword extraction method according to any one of the above embodiments.
Compared with the prior art, according to the keyword extraction method and the system, linguistic data do not need to be labeled, the wrongly-divided words are repaired by using entity recognition, the keyword sets are respectively screened out by using the word graph model and the statistical characteristics, the intersection is obtained for the two sets, and the keywords can be quickly and accurately extracted. Preferably, the keywords in the intersection are also sequenced, and the words are combined by means of mutual information, so that the extraction accuracy of the keywords is further improved.
Drawings
FIG. 1 is a block diagram of the steps of a keyword extraction method according to an embodiment of the present invention;
FIG. 2 is a block diagram of the steps of a keyword extraction method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a keyword extraction system according to an embodiment of the present invention;
fig. 4 is a block diagram of a keyword extraction system according to an embodiment of the present invention.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
The inventor finds that the keyword extraction method based on supervised learning needs high labor cost to label the linguistic data in the process of realizing the method, so that the method is difficult to be applied in a large scale. The method based on unsupervised learning does not need the process of manually labeling a training set, so that the method is faster, but due to the fact that word segmentation errors exist, various information cannot be effectively and comprehensively utilized to screen keywords, the problem that the ordering of the keywords is not logical and the like is solved, and the unsupervised keyword extraction method is poor in effect.
In order to overcome the problems, the invention provides a keyword extraction method which can quickly and accurately extract keywords.
Fig. 1 is a keyword extraction method according to an embodiment of the present invention. The method comprises the following steps: step S1 to step S5.
In step S1, the news text is segmented into sequences with words as the smallest semantic units. The inventor also finds that the traditional word segmentation method is easy to segment the entity words, so that the accurate extraction of the keywords is not facilitated. Therefore, in the embodiment, the text is segmented, and then the segmented words are combined to determine whether the words are entities, and if the words are entities, the original words are directly replaced by the entities.
Thus, in step S2, the news text is subjected to entity recognition and respective entity words are extracted. In step S3, at least two words adjacent to each other in the sequence are combined to obtain a combined word, and it is determined whether the combined word is an entity word, and if the combined word is the entity word, the word before the combination of the entity word is replaced with the entity word in the sequence.
In step S4, the keyword extraction algorithm based on the word graph model extracts candidate individual keyword words from the sequence, thereby obtaining a first candidate keyword word set, and the keyword extraction algorithm based on the statistical characteristics extracts candidate keyword words from the sequence, thereby obtaining a second candidate keyword word set.
In step S5, an intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set is obtained.
Therefore, the keyword extraction method of the implementation method does not need to label linguistic data, restores wrongly-divided words by using entity recognition, respectively screens out candidate keyword word sets by using a word graph model and a statistical characteristic algorithm, obtains the intersection of the two sets, obtains a keyword extraction result, can effectively improve the accuracy of keyword extraction, and can quickly extract keywords.
Preferably, the keyword extraction method further includes: step S6 and step S7.
In step S6, the candidate keyword words in the intersection are ranked, so as to obtain a third candidate keyword word set. Specifically, the keyword sequences in the intersection generally have no logic, and optionally, in an embodiment, in step S6, the candidate keyword words are sorted according to the sequence of occurrence of the candidate keyword words in the news text, so as to obtain a third candidate keyword word set. This allows a relatively fast calculation speed. In another embodiment, in step S6, the candidate keyword vocabularies in the intersection may be sorted according to a linguistic rule (e.g., n-gram determination method) to obtain a third candidate keyword vocabulary set; thus, a more logical sort result can be obtained, and the accuracy of the combination result in the subsequent step S7 can be improved.
In step S7, mutual information of two adjacent words in the third candidate keyword vocabulary set is calculated, and the two adjacent words with mutual information values greater than a preset threshold are combined into one word, so as to obtain a final keyword vocabulary set. Thus, the words and phrases are combined by combining the mutual information, and the words and phrases which are common in news texts are combined. The accuracy of keyword extraction can be further improved. For example, the current political news often contains many professional words, which are not entities but are separated by word segmentation, and the two words can be combined into one word by the embodiment.
Based on the same inventive concept, an embodiment further provides a keyword extraction system, which includes: the system comprises a word segmentation module 10, an entity recognition module 11, a first combination module 12, a first keyword extraction algorithm module 13, a second keyword extraction algorithm module 14 and an intersection solving module 15.
The segmentation module 10 is configured to segment news text into sequences with words as minimum semantic units.
The entity recognition module 11 is coupled to the word segmentation module 10, and is configured to perform entity recognition on the news text and extract each entity word.
The first combining module 12 is coupled to the word segmentation module 10 and the entity recognition module 11, and configured to combine at least two words adjacent to each other in the sequence to obtain a combined vocabulary, determine whether the combined vocabulary is a certain entity vocabulary, and if the combined vocabulary is the certain entity vocabulary, replace each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence.
A first keyword extraction algorithm module 13 is coupled to the first combination module 12, and configured to extract candidate individual keyword words from the sequence based on a keyword extraction algorithm of a word graph model, so as to obtain a first candidate keyword word set.
A second keyword extraction algorithm module 14 is coupled to the first assembly module 12, and configured to extract candidate keyword words from the sequence based on a statistical feature-based keyword extraction algorithm, so as to obtain a second candidate keyword word set.
The intersection finding module 15 is coupled to both the first keyword extraction algorithm module 13 and the second keyword extraction algorithm module 14, and is configured to find an intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set.
Preferably, the keyword extraction system of an embodiment further includes: a sorting module 16 and a second combining module 17.
The sorting module 16 is coupled to the intersection solving module 15, and configured to sort each candidate keyword vocabulary in the intersection according to a sequence of the candidate keyword vocabulary appearing in the news text, so as to obtain a third candidate keyword vocabulary set. In other embodiments, the sorting module 16 is configured to sort each candidate keyword vocabulary in the intersection according to a linguistic rule, so as to obtain a third candidate keyword vocabulary set.
The second combining module 17 is coupled to the sorting module 16, and configured to calculate mutual information of two adjacent words in the third candidate keyword vocabulary set, combine the two adjacent words with mutual information values larger than a preset threshold value, and combine the two adjacent words into one word, so as to obtain a final keyword vocabulary set.
Based on the same inventive concept, an embodiment further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the keyword extraction method according to any of the above embodiments when executing the program.
Based on the same inventive concept, an embodiment further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the keyword extraction method according to any of the above embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. A keyword extraction method is characterized by comprising the following steps:
segmenting a news text into words so as to segment the news text into a sequence with the words as minimum semantic units;
carrying out entity recognition on the news text and extracting each entity vocabulary;
combining at least two words adjacent to each other in the sequence to obtain a combined vocabulary, judging whether the combined vocabulary is a certain entity vocabulary, and if the combined vocabulary is the certain entity vocabulary, replacing each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence;
extracting candidate keyword vocabularies from the sequence by a keyword extraction algorithm based on a word graph model so as to obtain a first candidate keyword vocabulary set, and extracting candidate keyword vocabularies from the sequence by a keyword extraction algorithm based on statistical characteristics so as to obtain a second candidate keyword vocabulary set; and
and solving the intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set.
2. The keyword extraction method according to claim 1, further comprising:
and sequencing the candidate keyword vocabularies in the intersection according to the sequence of the candidate keyword vocabularies appearing in the news text, so as to obtain a third candidate keyword vocabulary set.
3. The keyword extraction method according to claim 1, further comprising:
and sequencing the candidate keyword vocabularies in the intersection according to a linguistic rule so as to obtain a third candidate keyword vocabulary set.
4. The keyword extraction method according to claim 2 or 3, characterized by further comprising:
and calculating mutual information of two adjacent words in the third candidate keyword vocabulary set, combining the two adjacent words with mutual information values larger than a preset threshold value into one word, and thus obtaining a final keyword vocabulary set.
5. A keyword extraction system, comprising:
the word segmentation module is used for segmenting the news text into words which are used as sequences of the minimum semantic unit;
the entity recognition module is coupled with the word segmentation module and used for carrying out entity recognition on the news text and extracting each entity word;
the first combination module is coupled with the word segmentation module and the entity recognition module and used for combining at least two words adjacent to each other in the sequence to obtain a combined vocabulary, judging whether the combined vocabulary is a certain entity vocabulary or not, and if the combined vocabulary is the certain entity vocabulary, replacing each word before the certain entity vocabulary is combined with the certain entity vocabulary in the sequence;
the first keyword extraction algorithm module is coupled with the first combination module and used for extracting candidate keyword vocabularies from the sequence based on a keyword extraction algorithm of a word graph model so as to obtain a first candidate keyword vocabulary set;
a second keyword extraction algorithm module, coupled to the first combination module, configured to extract candidate keyword vocabularies from the sequence based on a statistical feature keyword extraction algorithm, so as to obtain a second candidate keyword vocabulary set; and
and the intersection solving module is coupled with the first keyword extraction algorithm module and the second keyword extraction algorithm module and is used for solving the intersection of the first candidate keyword vocabulary set and the second candidate keyword vocabulary set.
6. The keyword extraction system of claim 5, wherein the keyword extraction system further comprises:
and the sequencing module is coupled with the intersection solving module and is used for sequencing the candidate keyword vocabularies in the intersection according to the sequence of the candidate keyword vocabularies appearing in the news text, so that a third candidate keyword vocabulary set is obtained.
7. The keyword extraction system of claim 5, wherein the keyword extraction system further comprises:
and the sequencing module is coupled with the intersection solving module and is used for sequencing the candidate keyword vocabularies in the intersection according to the linguistic rule so as to obtain a third candidate keyword vocabulary set.
8. The keyword extraction system according to claim 6 or 7, wherein the keyword extraction system further comprises:
and the second combination module is coupled with the sorting module and used for calculating mutual information of two adjacent words in the third candidate keyword vocabulary set, combining the two adjacent words with mutual information values larger than a preset threshold value into one word, and thus obtaining a final keyword vocabulary set.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 4 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN202110251354.3A 2021-03-08 2021-03-08 Keyword extraction method and system Pending CN112597776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110251354.3A CN112597776A (en) 2021-03-08 2021-03-08 Keyword extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110251354.3A CN112597776A (en) 2021-03-08 2021-03-08 Keyword extraction method and system

Publications (1)

Publication Number Publication Date
CN112597776A true CN112597776A (en) 2021-04-02

Family

ID=75210177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110251354.3A Pending CN112597776A (en) 2021-03-08 2021-03-08 Keyword extraction method and system

Country Status (1)

Country Link
CN (1) CN112597776A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091318A1 (en) * 2015-09-29 2017-03-30 Kabushiki Kaisha Toshiba Apparatus and method for extracting keywords from a single document
CN107102985A (en) * 2017-04-23 2017-08-29 四川用联信息技术有限公司 Multi-threaded keyword extraction techniques in improved document
CN108009149A (en) * 2017-11-23 2018-05-08 东软集团股份有限公司 A kind of keyword extracting method, extraction element, medium and electronic equipment
CN109241525A (en) * 2018-08-20 2019-01-18 深圳追科技有限公司 Extracting method, the device and system of keyword
CN110019675A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of method and device of keyword extraction
CN112215008A (en) * 2020-10-23 2021-01-12 中国平安人寿保险股份有限公司 Entity recognition method and device based on semantic understanding, computer equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091318A1 (en) * 2015-09-29 2017-03-30 Kabushiki Kaisha Toshiba Apparatus and method for extracting keywords from a single document
CN107102985A (en) * 2017-04-23 2017-08-29 四川用联信息技术有限公司 Multi-threaded keyword extraction techniques in improved document
CN108009149A (en) * 2017-11-23 2018-05-08 东软集团股份有限公司 A kind of keyword extracting method, extraction element, medium and electronic equipment
CN110019675A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of method and device of keyword extraction
CN109241525A (en) * 2018-08-20 2019-01-18 深圳追科技有限公司 Extracting method, the device and system of keyword
CN112215008A (en) * 2020-10-23 2021-01-12 中国平安人寿保险股份有限公司 Entity recognition method and device based on semantic understanding, computer equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
袁津生 等: "基于组合特征的中文新闻网页关键词提取方法", 《计算机工程与应用》 *

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN107729468B (en) answer extraction method and system based on deep learning
CN111309912A (en) Text classification method and device, computer equipment and storage medium
CN104881458B (en) A kind of mask method and device of Web page subject
CN112163424A (en) Data labeling method, device, equipment and medium
CN112214984A (en) Content plagiarism identification method, device, equipment and storage medium
CN113821605A (en) Event extraction method
CN113282754A (en) Public opinion detection method, device, equipment and storage medium for news events
CN116304020A (en) Industrial text entity extraction method based on semantic source analysis and span characteristics
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN113204956B (en) Multi-model training method, abstract segmentation method, text segmentation method and text segmentation device
CN111178080A (en) Named entity identification method and system based on structured information
CN112328469B (en) Function level defect positioning method based on embedding technology
CN107480126B (en) Intelligent identification method for engineering material category
CN110263345B (en) Keyword extraction method, keyword extraction device and storage medium
CN110750712A (en) Software security requirement recommendation method based on data driving
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN115827871A (en) Internet enterprise classification method, device and system
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
CN114880496A (en) Multimedia information topic analysis method, device, equipment and storage medium
CN111400606B (en) Multi-label classification method based on global and local information extraction
CN115563278A (en) Question classification processing method and device for sentence text
CN112597776A (en) Keyword extraction method and system
CN113688633A (en) Outline determination method and device
CN111858860B (en) Search information processing method and system, server and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210402

RJ01 Rejection of invention patent application after publication