CN111611341B - Method and device for acquiring structural position weight of term document - Google Patents

Method and device for acquiring structural position weight of term document Download PDF

Info

Publication number
CN111611341B
CN111611341B CN202010274874.1A CN202010274874A CN111611341B CN 111611341 B CN111611341 B CN 111611341B CN 202010274874 A CN202010274874 A CN 202010274874A CN 111611341 B CN111611341 B CN 111611341B
Authority
CN
China
Prior art keywords
type
weight
document
location
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010274874.1A
Other languages
Chinese (zh)
Other versions
CN111611341A (en
Inventor
邓吉秋
路馥毓
刘文毅
李晨菡
何美香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010274874.1A priority Critical patent/CN111611341B/en
Publication of CN111611341A publication Critical patent/CN111611341A/en
Application granted granted Critical
Publication of CN111611341B publication Critical patent/CN111611341B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a device for acquiring a position weight of a term document structure, comprising the following steps: acquiring a first weight corresponding to a position type based on a plurality of position types of a preset document structure position and a document level corresponding to the position type of the document structure position; acquiring the number of terms in the document structure position corresponding to the position type; acquiring a second weight corresponding to the position type based on the first weight corresponding to the position type and the number of terms in the document structure position corresponding to the position type; acquiring a third weight corresponding to the position type based on the first weight and the second weight corresponding to the position type; the third weight corresponding to the position type is the sum of the first weight and the second weight corresponding to the position type; and acquiring the structural position weight of the preset specific term based on the third weight corresponding to the position type and the preset specific term corresponding to the position type.

Description

Method and device for acquiring structural position weight of term document
Technical Field
The invention relates to the field of natural language processing, in particular to a method and a device for acquiring a structural position weight of a term document.
Background
The most commonly used and effective text characterization method is to build a term-document matrix. Each element value in the term-document matrix represents the weight of the term on the corresponding row corresponding to the document on the corresponding column, i.e., the importance of the term to the document. Whether a word is important for a document is reflected in two aspects: the more times a term appears in a document, the greater the importance with respect to the document; if the term appears more times in the whole corpus, the term is less meaningful, i.e. less important, for the document, which is the idea of the TF-IDF algorithm.
Keyword extraction based on TextRank is another type of method, and keyword extraction can be implemented for a single document. The task of extracting the TextRank keywords is to automatically extract a plurality of meaningful words or phrases from a given text, and the TextRank algorithm is to sort the subsequent keywords by using the relation (co-occurrence window) among the local vocabularies and directly extract the keywords from the text.
The same term in a document may be located differently in the document, and the characterization effect on the subject of the document may also be different. For example, the term "study" may appear at a different location in the title of a document, in a section of the title, in a specific paragraph, in a reference, etc., where it is apparent that the "study" appearing in the section of the title of the document has the greatest effect on characterizing the content of the document and that the "study" appearing in the section of the reference has a lesser effect. The general term-document matrix is characterized by purely adopting the occurrence frequency of the term to represent the term to the document theme, and the term with low frequency in the term of a specific document and high frequency relative to other documents is used as a subject term, so that the TF-IDF tends to filter common words and keep important words; the TextRank algorithm sorts the subsequent keywords by using the relation (co-occurrence window) among the local vocabularies, and only considers the co-occurrence relation among the local adjacent terms; both conventional methods do not consider differences of different structural positions of terms in a document on document characterization, so that the characterization of a document theme is inaccurate.
Disclosure of Invention
First, the technical problem to be solved
In order to solve the problem that the difference of different structural positions of the term in the document to the document representation is not considered in the prior art, the invention provides a method and a device for acquiring the structural position weight of the term document.
(II) technical scheme
In order to achieve the above object, the present invention provides a method for obtaining a positional weight of a term document structure, comprising the steps of:
a1, acquiring a first weight corresponding to a position type based on a plurality of position types of a preset document structure position and a document level corresponding to the position type of the document structure position;
the first weight is N; wherein n=2 n-1 The method comprises the steps of carrying out a first treatment on the surface of the n is the document level corresponding to the location type;
a2, acquiring the number of terms in the document structure position corresponding to the position type;
a3, acquiring a second weight corresponding to the position type based on the first weight corresponding to the position type and the number of terms in the document structure position corresponding to the position type;
the second weight corresponding to the position type is the ratio of the first weight corresponding to the position type to the number of terms in the document structure position corresponding to the position type;
a4, acquiring a third weight corresponding to the position type based on the first weight and the second weight corresponding to the position type;
the third weight corresponding to the position type is the sum of the first weight and the second weight corresponding to the position type;
a5, acquiring the document structure position weight of the preset specific term based on a third weight corresponding to the position type and the preset specific term corresponding to the position type;
the document structure position weight of the preset specific term is the sum of third weights corresponding to all position types corresponding to the specific term.
Preferably, the method further comprises:
a6, sorting the preset specific terms according to the document structure position weights of the preset specific terms to obtain specific terms in a first sequence;
the first sequence is as follows: the structure position weights are in the sequence from high to low;
a7, acquiring the first M specific terms of the first sequence according to the specific terms of the first sequence;
wherein M is a preset value.
Preferably, the plurality of location types of the preset document structure location include: a first location type, a second location type, a third location type, a fourth location type, a fifth location type, a sixth location type, a seventh location type, an eighth location type, a ninth location type, a tenth location type, an eleventh location type, a twelfth location type, a thirteenth location type, a fourteenth location type, a fifteenth location type, a sixteenth location type.
Preferably, the specific term corresponding to the position type is a term corresponding to the first position type and/or the second position type and/or the third position type and/or the fourth position type and/or the fifth position type and/or the sixth position type and/or the seventh position type and/or the eighth position type and/or the ninth position type and/or the tenth position type and/or the eleventh position type and/or the twelfth position type and/or the thirteenth position type and/or the fourteenth position type and/or the fifteenth position type and/or the sixteenth position type.
A device for acquiring the position weight of a term document structure, wherein the device for acquiring the position weight of the term document structure stores a first instruction;
the first instruction causes the obtaining device of the term document structure position weight to execute the obtaining method of the term document structure position weight according to any one of the above.
(III) beneficial effects
The beneficial effects of the invention are as follows: the invention considers the differences of the characterization of the vocabulary terms at different structural positions in the document on the document theme, so that the calculation of the vocabulary term weight is more effective, and the characterization effect of the keyword terms at the structural position of the document and the structure positions of the different document of the vocabulary terms on the document is highlighted.
Drawings
FIG. 1 is a flowchart of a method for obtaining a term document structure position weight according to the present invention;
fig. 2 is a schematic diagram of a method for obtaining a term document structure position weight in the second embodiment of the present invention.
Detailed Description
The invention will be better explained by the following detailed description of the embodiments with reference to the drawings.
Example 1
As shown in fig. 1, the method for obtaining the structural position weight of the term document provided in this embodiment is characterized by comprising the following steps:
a1, acquiring a first weight corresponding to a position type based on a plurality of preset position types of a document structure position and a document level corresponding to the position type of the document structure position.
The first weight is N; wherein n=2 n-1 The method comprises the steps of carrying out a first treatment on the surface of the n is a preset document level corresponding to the position type.
In this embodiment, the first weight represents a weight difference of terms in different document location levels, terms in document structure locations of a high document level are more important for document characterization than terms in document structure locations of a low document level, but terms in document structure locations of a high document level are fewer than terms in document structure locations of a low document level. If the term word frequency in the document structure position with the position type being the title is extremely low, the term word frequency in the document structure position with the position type being the text paragraph can be tens of thousands, which is unfavorable for reflecting the characterization effect of the term in the high-level document structure position.
In this embodiment, 2 is adopted in the practical application of this embodiment, considering that the number of terms in the low-document level position is geometrically increased relative to the number of terms of the previous level n-1 As a first weight corresponding to the location type.
The plurality of location types of the preset document structure location include: a first location type, a second location type, a third location type, a fourth location type, a fifth location type, a sixth location type, a seventh location type, an eighth location type, a ninth location type, a tenth location type, an eleventh location type, a twelfth location type, a thirteenth location type, a fourteenth location type, a fifteenth location type, a sixteenth location type.
The first location type in this embodiment is a document title; the second position type is a summary keyword; the third position type is summary content; the fourth location type is a catalog entry chapter title; the fifth location type is a non-directory entry chapter title; the sixth location type is the numbered item title; the seventh location type is a non-chapter directory entry; the eighth location type is cover non-title content; the ninth location type is flyleaf non-title content; the tenth location type is the unnumbered item content; the eleventh location type is a graph; the twelfth location type is a table; the thirteenth location type is a text paragraph; the fourteenth location type is an annex title; the fifteenth location type is the annex content; the sixteenth position type is other content including references, back covers, and the like.
A2, acquiring the number of terms in the document structure position corresponding to the position type.
A3, acquiring a second weight corresponding to the position type based on the first weight corresponding to the position type and the number of terms in the document structure position corresponding to the position type.
The second weight corresponding to the position type is the ratio of the first weight corresponding to the position type to the number of terms in the document structure position corresponding to the position type.
For example, if the location type is a document title of a first location type, the second weight corresponding to the document title is a ratio of the first weight corresponding to the document title to the number of terms contained in the document title.
In this embodiment, the second weight is fine adjustment to the first weight, and because the number of terms in the document structure positions of the same level and different position types is asymmetric, for example, the number of terms in the abstract may be smaller than the number of terms in the catalog, the weight is increased for the position type with fewer terms in the document structure position of the same document level. Since terms in a certain level position should not be weighted higher than terms in a higher level position, the level difference between the two levels is equally divided by terms number to highlight terms in a document structure position with a small number of terms.
A4, acquiring a third weight corresponding to the position type based on the first weight and the second weight corresponding to the position type.
The third weight corresponding to the location type is the sum of the first weight and the second weight corresponding to the location type.
In this embodiment, after the third weight obtained by adding the first weight and the second weight, it is ensured that the third weight of the term at the specific document structure position does not exceed the third weight at the higher document level position, so as to maintain the differences of the document characterization by the document structure positions of different document levels.
A5, acquiring the document structure position weight of the preset specific term based on a third weight corresponding to the position type and the preset specific term corresponding to the position type;
the document structure position weight of the preset specific term is the sum of third weights corresponding to the position types corresponding to the specific term.
In this embodiment, the subject term in the document may also be extracted according to the preset document structure position weight of the specific term. The subject terms in the document are the preset x term terms with the highest paragraph weight value.
In this embodiment, after the document is processed according to the existing TF-IDF algorithm, important terms in the document are obtained, then the document structure position weights of the terms are obtained according to the document structure position weight method for obtaining the terms in this embodiment, and finally the subject terms in the document are extracted, where the subject terms in the document are a preset number of terms with the highest document structure position weights of the terms.
The embodiment further includes:
a6, sorting the preset specific terms according to the document structure position weights of the preset specific terms to obtain specific terms in a first sequence;
the first sequence is as follows: the structural position weights are in the order from high to low.
A7, acquiring the first M specific terms of the first sequence according to the specific terms of the first sequence;
wherein M is a preset value.
In the embodiment, the differences of the vocabulary terms at different structural positions in the document on the document theme representation are considered, and the representation effect of the keyword terms at the structural positions of the document on the document at the structural positions of the different document structural positions of the vocabulary terms is highlighted.
In this embodiment, the document structure position weights represent the comprehensive representation of the weights of the terms at different positions of the document, and a plurality of terms appearing at the same high document level position have equal third weights, but a certain term also appears at a low document level position and has the low document level position weights possibly appear, so that the term is more important for the document characterization than other terms appearing at the same high document level position, the low document level position weights should be considered on the basis of the high level position weights, and therefore, the third weights of the accumulated specific terms in the document structure positions of different position types are the most suitable way. Because averaging the third weights of the particular term in the document structure locations of the different location types pulls down the document structure location weights, multiplying the third weights of the particular term in the document structure locations of the different location types results in the document structure location weights of the particular term being greater than the document structure location weights of the particular term in the location of the last document level.
In this embodiment, the specific term corresponding to the preset position type is a preset term corresponding to the first position type and/or the second position type and/or the third position type and/or the fourth position type and/or the fifth position type and/or the sixth position type and/or the seventh position type and/or the eighth position type and/or the ninth position type and/or the tenth position type and/or the eleventh position type and/or the twelfth position type and/or the thirteenth position type and/or the fourteenth position type and/or the fifteenth position type and/or the sixteenth position type.
In this embodiment, if the preset specific term corresponds to both the first location type and the second location type, the structural location weight of the preset specific term is the sum of the third weight corresponding to the first location type and the third weight of the second location type.
In the embodiment, in the document structure position of the same position type, weights of different terms are equal, and equality positions of different terms of key parts of the document structure are embodied.
Example two
As shown in fig. 2, a method for obtaining a term document structure position weight is provided in the second embodiment.
(1) Input description in this embodiment
The term document structure position list word_list of the specific document is input, and is a database list containing all terms extracted from the specific document and document structure position information thereof, wherein a plurality of records can exist in the same structure position or different structure positions of the document for each term with specific number in the list, and specific field definition is shown in the list 1.
Table 1 term document Structure location Table definition
Figure BDA0002444412250000071
Figure BDA0002444412250000081
The pos_id value in table 1 depends on the document structure and its specific location, and the specific number is shown in table 2.
Table 2 term document structure location level definition
Figure BDA0002444412250000082
Each document structure position in table 2 corresponds to a certain document level, and terms in a high document level position are more capable of characterizing the subject characteristics of the document than terms in a low document level position.
(2) Output description in this embodiment
The output is word term document structure position weight table words_weights, which is a database table containing word term numbers and corresponding relation of the document structure position weights, and the specific field definition is shown in table 3.
Table 3 term document structure location weight table definition
Field name Meaning of field Field type Field description
word_id Lexical item numbering INTEGER Unique numbering of specific terms
pos_weights Position comprehensive weight DECIMAL Comprehensive weight of document structure position of term
(3) The term document structure position weight calculation process specifically comprises the following steps:
(3-1) entering system initialization, and defining a database operation statement execution function sql_execution, wherein the input parameter of the function sql_execution is a text SQL, and the text SQL is a database operation statement meeting SQL-92 standards; the function call database system functions execute text sql, the execution result of the text sql is the change of a table or data in the table in the database, and the function does not directly output the result; and then 3-2).
(3-2) setting the text sql to: SELECT pos_id, pos_level, COUNT (pos_id) AS cnt intotemp 1 FROM words_ list GROUP BY pos _id, pos_ level ORDER BY pos _id, summarizing the number of terms at different document structure locations by calling a function sql_execution to a structure location weight table temp1, the structure location weight table temp1 containing location number pos_id, location level pos_level, location term number cnt, and then entering 3-3.
(3-3) setting the text sql to: the ALTER TABLE temp1 ADD level_weight DECIMAL ADD average_weight DECIAL, two fields level_ weigt, average _weight are added for the structure location weight TABLE temp1 by calling the function sql_execution, the first weight and the second weight are recorded respectively, and then 3-4 is entered.
(3-4) setting the text sql to: UPDATE temp1 SET level_weight=power (2, pos_level-1), where POWER (2, pos_level-1) represents the pos_level-1 POWER of 2, the first weight to calculate the document structure location is achieved by calling the function sql_execution, and then going to 3-5.
(3-5) setting the text sql to: UPDATE temp1 SET average_weight=level_weight/cnt, and by calling the function sql_execution, a second weight for calculating the document structure location is achieved, and then 3-6 is entered.
(3-6) setting the text sql to: SELECT DISTINCT word_id, pos_id, level_weight, average_weight inside 2 FROM words_list, temp1 WHER words_list, pos_id=temp1. Pos_id GROUP BY words_id, pos_id, BY calling function sql_execution, realizing creating term location weight table temp2, recording first weight and second weight of term document structure location, wherein the same term at the same location is recorded only BY one piece, and then entering 3-7.
(3-7) setting the text sql to: ALTER TABLE temp2 ADD ADD pos_weight DECIAL, ADDs a field pos_weight to the term location weight TABLE temp2 for recording the third weight by calling the function sql_execution, and then goes to 3-8).
(3-8) setting the text sql to: UPDATE temp2 SET pos_weight=level_weight+average_weight, and by calling the function sql_execution, a third weight of the term at a different document structure position is calculated, and then 3-9 is entered.
(3-9) setting the text sql to: SELECT word_id, SUM (pos_weight) AS pos_ weights INTO words _ weights FROM temp2 GROUP BY word_id, BY calling function sql_execution, the summation of the third weights of the terms at different document structure positions is achieved, the document structure position weights of the terms are obtained, and then 3-10 is entered.
(3-10) outputting a term document structure position weight table words_weights.
In the embodiment, counting the number of terms at the document structure positions corresponding to different position types by summarizing, calculating a first weight at a specific position according to the corresponding document level of the position type, calculating a second weight according to the number of terms in the same position type, and adding the two weights to obtain a third weight of the terms; and finally, accumulating the third weights of the terms in different position types to obtain the document structure position weights of the terms. In the embodiment, when the document theme is represented, the position of the keyword in the document structure can be promoted and highlighted by increasing the level weight of the level document structure position. In the embodiment, weights of different terms are equal in the document structure positions corresponding to the same position type, so that equality of different terms of key parts of the document structure is reflected. In the embodiment, the same term weight appearing in different document structure positions is accumulated to obtain the final weight of the term, the representation of the document theme by the document structure positions of the high and low document levels of the same term is comprehensively considered, and the difference of different terms in the same high document level position is reflected; the method is suitable for calculating the term weight of the document characterization difference by all terms needing to be highlighted at different document structure positions.
The technical principles of the present invention have been described above in connection with specific embodiments, which are provided for the purpose of explaining the principles of the present invention and are not to be construed as limiting the scope of the present invention in any way. Other embodiments of the invention will be apparent to those skilled in the art from consideration of this specification without undue burden.

Claims (5)

1. The method for acquiring the position weight of the term document structure is characterized by comprising the following steps:
a1, acquiring a first weight corresponding to a position type based on a plurality of position types of a preset document structure position and a document level corresponding to the position type of the document structure position;
the first weight is N; wherein n=2 n-1 The method comprises the steps of carrying out a first treatment on the surface of the n is the document level corresponding to the location type;
a2, acquiring the number of terms in the document structure position corresponding to the position type;
a3, acquiring a second weight corresponding to the position type based on the first weight corresponding to the position type and the number of terms in the document structure position corresponding to the position type;
the second weight corresponding to the position type is the ratio of the first weight corresponding to the position type to the number of terms in the document structure position corresponding to the position type;
a4, acquiring a third weight corresponding to the position type based on the first weight and the second weight corresponding to the position type;
the third weight corresponding to the position type is the sum of the first weight and the second weight corresponding to the position type;
a5, acquiring the document structure position weight of the preset specific term based on a third weight corresponding to the position type and the preset specific term corresponding to the position type;
the document structure position weight of the preset specific term is the sum of third weights corresponding to all position types corresponding to the specific term.
2. The method as recited in claim 1, further comprising:
a6, sorting the preset specific terms according to the document structure position weights of the preset specific terms to obtain specific terms in a first sequence;
the first sequence is as follows: the document structure position weight is in the order from high to low;
a7, acquiring the first M specific terms of the first sequence according to the specific terms of the first sequence;
wherein M is a preset value.
3. The method of claim 1, wherein the plurality of location types of the predetermined document structure location include: a first location type, a second location type, a third location type, a fourth location type, a fifth location type, a sixth location type, a seventh location type, an eighth location type, a ninth location type, a tenth location type, an eleventh location type, a twelfth location type, a thirteenth location type, a fourteenth location type, a fifteenth location type, a sixteenth location type.
4. A method according to claim 3, characterized in that the specific term corresponding to the position type is a term corresponding to the first position type and/or the second position type and/or the third position type and/or the fourth position type and/or the fifth position type and/or the sixth position type and/or the seventh position type and/or the eighth position type and/or the ninth position type and/or the tenth position type and/or the eleventh position type and/or the twelfth position type and/or the thirteenth position type and/or the fourteenth position type and/or the fifteenth position type and/or the sixteenth position type.
5. The acquisition device of the structural position weight of the term document is characterized in that the acquisition device of the structural position weight of the term document stores a first instruction;
the first instruction causes the obtaining means of the term document structure position weight to execute the obtaining method of the term document structure position weight as recited in any one of claims 1 to 4.
CN202010274874.1A 2020-04-09 2020-04-09 Method and device for acquiring structural position weight of term document Active CN111611341B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010274874.1A CN111611341B (en) 2020-04-09 2020-04-09 Method and device for acquiring structural position weight of term document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010274874.1A CN111611341B (en) 2020-04-09 2020-04-09 Method and device for acquiring structural position weight of term document

Publications (2)

Publication Number Publication Date
CN111611341A CN111611341A (en) 2020-09-01
CN111611341B true CN111611341B (en) 2023-04-25

Family

ID=72198146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010274874.1A Active CN111611341B (en) 2020-04-09 2020-04-09 Method and device for acquiring structural position weight of term document

Country Status (1)

Country Link
CN (1) CN111611341B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872363A (en) * 2010-06-24 2010-10-27 北京邮电大学 Method for extracting keywords
CN102314448A (en) * 2010-07-06 2012-01-11 株式会社理光 Equipment for acquiring one or more key elements from document and method
CN103559310A (en) * 2013-11-18 2014-02-05 广东利为网络科技有限公司 Method for extracting key word from article
CN105760474A (en) * 2016-02-14 2016-07-13 Tcl集团股份有限公司 Document collection feature word extracting method and system based on position information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4342575B2 (en) * 2007-06-25 2009-10-14 株式会社東芝 Device, method, and program for keyword presentation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872363A (en) * 2010-06-24 2010-10-27 北京邮电大学 Method for extracting keywords
CN102314448A (en) * 2010-07-06 2012-01-11 株式会社理光 Equipment for acquiring one or more key elements from document and method
CN103559310A (en) * 2013-11-18 2014-02-05 广东利为网络科技有限公司 Method for extracting key word from article
CN105760474A (en) * 2016-02-14 2016-07-13 Tcl集团股份有限公司 Document collection feature word extracting method and system based on position information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李航等.融合多特征的TextRank关键词抽取方法.《情报杂志》.2017,第第36卷卷(第第8期期),第183-187页. *

Also Published As

Publication number Publication date
CN111611341A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
Danielson et al. Perceptions of social change: 100 years of front-page content in The New York Times and The Los Angeles Times
CN107862070B (en) Online classroom discussion short text instant grouping method and system based on text clustering
US8180785B2 (en) Method and system for searching numerical terms
CN108363694B (en) Keyword extraction method and device
US8560485B2 (en) Generating a domain corpus and a dictionary for an automated ontology
US8200671B2 (en) Generating a dictionary and determining a co-occurrence context for an automated ontology
US10643031B2 (en) System and method of content based recommendation using hypernym expansion
US20180032608A1 (en) Flexible summarization of textual content
US20180300323A1 (en) Multi-Factor Document Analysis
JP4226862B2 (en) Document search device
CN104376115B (en) A kind of fuzzy word based on global search determines method and device
CN107291939A (en) The clustering match method and system of hotel information
CN106997390A (en) A kind of equipment part or parts commodity transaction information search method
CN111611341B (en) Method and device for acquiring structural position weight of term document
JP2011070291A (en) Device, system and method for extraction of topic word, and program
CN107609006B (en) Search optimization method based on local log research
CN107622058B (en) Method and device for manufacturing foreign language place name library, electronic navigation chip and server
CN111079425B (en) Geological document term grading method and device
CN111611342B (en) Method and device for obtaining lexical item and paragraph association weight
CN111090997B (en) Geological document feature lexical item ordering method and device based on hierarchical lexical items
CN111079426B (en) Method and device for obtaining field document lexical item hierarchical weight
JP4497337B2 (en) Concept search device and recording medium recording computer program
Willkomm et al. Efficient interval-focused similarity search under dynamic time warping
Bashir et al. A word stemming algorithm for Hausa language
CN109299260A (en) Data classification method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Deng Jiqiu

Inventor after: Lu Biyu

Inventor after: Liu Wenyi

Inventor after: Li Chenhan

Inventor after: He Meixiang

Inventor before: Deng Jiqiu

Inventor before: Lu Biyu

Inventor before: Li Chenhan

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant